#21120 closed enhancement (fixed)
Optimize get_user_by()
Reported by: | kurtpayne | Owned by: | |
---|---|---|---|
Milestone: | 3.5 | Priority: | normal |
Severity: | normal | Version: | |
Component: | Performance | Keywords: | dev-feedback has-patch |
Focuses: | Cc: |
Description
The get_user_by()
function is used repeatedly on the "All Posts" page.
I used the theme unit test sample posts to test this, and there were only 4 unique calls to get_user_by()
but it was called 264 times to generate the first page of "All Posts" (20 posts of 25 total).
When viewing 26 posts, this function is called 264 times. Each time, a new WP_User object is created and WP_User::init()
is called.
Attachments (11)
Change History (51)
#1
@
12 years ago
To test this out, I put together a simple static cache for get_user_by()
and ran a profile for the "All Posts" page before and after. It shaved off a good amount of time and duplicate work.
#2
follow-ups:
↓ 3
↓ 4
@
12 years ago
We already have a caching system. We should not invent ad-hoc systems for each function.
What we could do is collect all the user ids and call get_users() with them. This will in turn call cache_users().
#3
in reply to:
↑ 2
@
12 years ago
Replying to scribu:
What we could do is collect all the user ids and call get_users() with them. This will in turn call cache_users().
Perhaps WP_Query should have a flag to do this. We already prime terms, meta, and optionally comments for a post. It then could be on by default in the admin.
#4
in reply to:
↑ 2
;
follow-ups:
↓ 5
↓ 6
@
12 years ago
- Keywords has-patch added
Replying to scribu:
We already have a caching system. We should not invent ad-hoc systems for each function.
I've uploaded a new patch that plays well with the caching system and the users group.
What we could do is collect all the user ids and call get_users() with them. This will in turn call cache_users().
Where do we draw the line on this? wp_is_large_network()
?
Replying to nacin:
Perhaps WP_Query should have a flag to do this.
I'm not sure where to put this. Hint? I'm also not sure if it's needed. Priming a user/cap cache on the "get all posts page" could net a positive performance benefit. Priming a user/cap cache on an "upload file" or "new post" page may be a waste.
#5
in reply to:
↑ 4
@
12 years ago
Replying to kurtpayne:
What we could do is collect all the user ids and call get_users() with them. This will in turn call cache_users().
Where do we draw the line on this?
wp_is_large_network()
?
We wouldn't fetch all the users, only the authors of the posts currently on display.
#6
in reply to:
↑ 4
@
12 years ago
Replying to nacin:
Perhaps WP_Query should have a flag to do this.
I'm not sure where to put this. Hint? I'm also not sure if it's needed. Priming a user/cap cache on the "get all posts page" could net a positive performance benefit. Priming a user/cap cache on an "upload file" or "new post" page may be a waste.
On the contrary: the flag would be off by default, so you can choose when you need the user cache primed.
For an example, take a look at the 'update_post_term_cache' query var.
#7
@
12 years ago
The 'all posts' page uses a list table which routes the requests through wp_edit_posts_query()
which sends the query through wp()
. There was no direct call to WP_Query
as with the example for 'update_post_term_cache'
Here's the best workaround I could come up with. Please let me know what you think.
#8
@
12 years ago
The workaround seems fine, but if you used get_users( array( 'include' => $user_ids ) )
you wouldn't need _prime_user_cache().
#9
@
12 years ago
Because WP_User_Query internally does exactly what _prime_user_cache() from your patch does.
#10
@
12 years ago
This is the important behavior from _prime_user_cache
that isn't replicated by WP_User_Query
$user->init( $userdata );
Right now the only place where WP_User::init
is called is from get_user_by
, so the added caching here is important. The author caching in the post query may be moot as a list of 20 posts that reference 4 authors will result in 4 cache misses either way.
#11
follow-up:
↓ 12
@
12 years ago
Actually, WP_User::init()
is also called in WP_User::__construct()
if you pass an id or a login.
#12
in reply to:
↑ 11
@
12 years ago
Replying to scribu:
Actually,
WP_User::init()
is also called inWP_User::__construct()
if you pass an id or a login.
You're right. In WP_User_Query::query
, though, the results of the query are cached, but not the results of WP_User::__construct
called for each user in the query.
Perhaps this should be an additional optimization?
Here's the code in question from WP_User_Query::query
if ( 'all_with_meta' == $this->query_vars['fields'] ) { cache_users( $this->results ); $r = array(); foreach ( $this->results as $userid ) $r[ $userid ] = new WP_User( $userid, '', $this->query_vars['blog_id'] ); $this->results = $r; }
#13
follow-up:
↓ 14
@
12 years ago
I'm not sure if caching the whole user object is a good idea, since it contains the capabilities for the current blog. Then, if you call for_blog() on it with a different ID, you end up in a weird state.
Besides, WP_User::init() doesn't seem that expensive.
#14
in reply to:
↑ 13
@
12 years ago
Replying to scribu:
Besides, WP_User::init() doesn't seem that expensive.
It compounds when it's called 260+ times when it's only needed 4 times.
On my local dev server, this saves ~30ms of wall time (~5%). The xdebug profiling times (get_user_by_grind.png) are a little inflated because of the overhead, but this optimization does have a net positive impact.
I vote for leaving it at 21120.2.patch and dropping the post_author caching.
#15
follow-up:
↓ 16
@
12 years ago
Here's the workflow with 21120.2.patch:
First call:
get_user_by() -> WP_User::get_data_by() -> wp_cache_get( $user_id, 'users' ) // gets the raw userdata -> wp_cache_get( $user_id, 'users' ) // gets the raw userdata -> wp_cache_set( $user_id, $full_user, 'users' )
Subsequent calls:
get_user_by() -> WP_User::get_data_by( ... ) -> wp_cache_get( $user_id, 'users' ) // gets the full userdata -> wp_cache_get( $user_id, 'users' ) // gets the full userdata
That doesn't seem sane to me. At a minimum, you could eliminate the additional wp_cache_get() check from get_user_by().
The real question is why is get_user_by() called so many times?
#16
in reply to:
↑ 15
@
12 years ago
Replying to scribu:
The real question is why is get_user_by() called so many times?
get_user_by: - called 2 times from wp_validate_auth_cookie - called 1 time from get_currentuserinfo - called 1 time from from auth_redirect - called 262 times from get_userdata - called 2 times from get_avatar - called 20 times from setup_postdata (from WP_Post_List_Table::single_row) - called 240 times from map_meta_cap
Here's some of the places I could find that call map_meta_cap
that are in a loop for showing posts:
- get_edit_post_link( $post->ID ); - current_user_can( $post_type_object->cap->edit_post, $post->ID ); - current_user_can( 'read_post', $post->ID ) - get_edit_post_link( $post->ID, true ) - current_user_can( $post_type_object->cap->delete_post, $post->ID ) - get_delete_post_link( $post->ID ) ... then in WP_Posts_List_Table::get_inline_data() ... - current_user_can($post_type_object->cap->edit_post, $post->ID)
#17
@
12 years ago
The map_meta_cap.patch patch makes the following assumptions:
- calls to map_meta_cap are deterministic - a user is either capable or incapable of doing something in a given invocation
- if #1 isn't true, then the call to
map_meta_cap
can use$cache=false
- the cache should only be valid for the life of the process; using the
wp_cache_*
methods for persistent caching would be inefficient and unnecessary
In xdebug profiles, this has decreased run time of the "all posts" page by 12%. Without xdebug (running faster without the profiling overhead), run time (measured with Chrome's net panel) has decreased by about 8%.
#18
follow-up:
↓ 19
@
12 years ago
map_meta_cap() receives a variable number of arguments:
$args = array_slice( func_get_args(), 2 );
You can't just add a $cache parameter to it.
#19
in reply to:
↑ 18
@
12 years ago
Replying to scribu:
map_meta_cap() receives a variable number of arguments:
$args = array_slice( func_get_args(), 2 );You can't just add a $cache parameter to it.
Thanks, I missed the forest for the trees.
The extra parameter could be taken care of with a filter.
But since $args
can be any object/scalar/array, the work it would take to generate a consistent key from this negate any cache savings.
Seems like caching in map_meta_cap
is out.
@scribu, @nacin, Any ideas?
#20
@
12 years ago
What we need to do is to avoid the get_user_by() calls, since we already have the full object for the current user.
#21
@
12 years ago
I don't think it's a simple matter of optimizing the calls to get_user_by(). Rather, we should consider (consider) a WP_User::get_instance(), which is essentially get_user_by() but it instead returns a reference to an already existing WP_User object, if it has spun one up before. This is somewhat similar to how WP_Screen works.
Alternatively, we can look to make WP_User instantiation faster. Looking at some profiling output, that doesn't look simple — the bulk of the time spent is on unserializing metadata, initing caps/roles, and pulling the user's data from cache. Lazy-loading roles/caps won't help us either, as the bulk of our WP_User instantiations are for cap checks.
What we could do is make map_meta_cap() a bit smarter as well. Rather than calling get_userdata(), check if we're dealing with the current user, and if so, call wp_get_current_user().
Additionally, is_super_admin() is almost always passed a user ID in the capabilities API, but it is also almost always referring to the current user. When we do get a user_id in is_super_admin(), we should check against get_current_user_id() to see if we can get away with wp_get_current_user().
Again, this is where WP_User::get_instance() could come in — for when you're not sure if you need a new WP_User object, or if you can use wp_get_current_user(). Even if it doesn't keep a reference of *all* WP_Users it has spun up, just short-circuiting and calling wp_get_current_user() could make a huge performance difference.
@
12 years ago
This ought to be API'd as suggested in the comment, but this does a number to the profiling output..
#22
@
12 years ago
With 21120.2.diff, our "API" could simply be get_userdata(). We'd then just convert most new WP_User calls to get_userdata().
#24
@
12 years ago
I've long wanted to deprecate get_userdata() since it is one of the pluggable functions that should not be. Now that we have deprecated-pluggable.php, why not either standardize on get_user_by() or a new get_user() function?
#25
@
12 years ago
I'm happy with a wp_get_user() (or get_user()) and deprecate get_userdata(). I agree, pluggable-ness is lame.
#26
@
12 years ago
The main decision to be made here is whether wp_get_user() should return WP_User on failure, or false on failure.
Currently, new WP_User
and wp_get_current_user()
will return a WP_User object. This object can then be checked with the exists()
method.
get_userdata()
returns false if the user does not exist. The same goes for get_user_by()
.
scribu and I had a long discussion in IRC, with ryan chiming in as well. We outlined the pros and cons of each approach, how we see it currently implemented throughout WordPress, and how we envision similar getters to operate in the future. See https://irclogs.wordpress.org/chanlog.php?channel=wordpress-dev&day=2012-07-19&sort=asc#m424509, starts at 15:36, ends at 16:14.
#27
@
12 years ago
Another decision is whether to deprecate wp_get_current_user() - because of it's pluggable-ness - or to let it be.
#28
@
12 years ago
I'm leaning towards always returning a WP_User object. For chaining ability, and for parity with some of the other objects/instantiation functions we've been making.
#36
@
12 years ago
21120.5.diff is an attempt at optimizing is_super_admin()
.
#37
@
12 years ago
21120.6.diff includes map_meta_cap()
.
Proof of concept static cache