Make WordPress Core

Opened 13 years ago

Closed 12 years ago

Last modified 8 years ago

#15458 closed enhancement (fixed)

Lazy-load usermeta into the user object

Reported by: nacin's profile nacin Owned by: nacin's profile nacin
Milestone: 3.3 Priority: normal
Severity: normal Version:
Component: Performance Keywords: has-patch needs-unit-tests
Focuses: Cc:

Description

Currently we load all usermeta keys into the user object. This is not ideal, and it doesn't scale when you wish to store a lot of data in usermeta. (Example, BuddyPress profiles.)

We can leverage a magic __get() method for WP_User and fetch data on demand, instead of our current situation.

Attachments (21)

15458.diff (2.8 KB) - added by scribu 13 years ago.
15458.2.diff (8.7 KB) - added by scribu 13 years ago.
fix deprecation warnings
set.15458.diff (484 bytes) - added by scribu 13 years ago.
first-last-name.15458.diff (661 bytes) - added by scribu 13 years ago.
lazy-load.diff (13.7 KB) - added by scribu 13 years ago.
lazy-load.2.diff (18.4 KB) - added by scribu 13 years ago.
lazy-load.3.diff (23.7 KB) - added by scribu 13 years ago.
15458.3.diff (25.3 KB) - added by ryan 13 years ago.
15458-unit.diff (2.1 KB) - added by ryan 13 years ago.
15458.4.diff (25.3 KB) - added by ryan 13 years ago.
s/isset_metadata/metadata_exists/, s/is_set/exists/
15458.5.diff (25.3 KB) - added by ryan 13 years ago.
s/exists/has_prop/
15458.userkeys.patch (552 bytes) - added by SergeyBiryukov 12 years ago.
15458.6.diff (647 bytes) - added by ryan 12 years ago.
wp_update_user() fix
wp_update_users.15458.diff (2.9 KB) - added by scribu 12 years ago.
wp_update_users.15458.2.diff (3.1 KB) - added by scribu 12 years ago.
wp_update_users.15458.3.diff (3.1 KB) - added by scribu 12 years ago.
15458.7.diff (2.3 KB) - added by nacin 12 years ago.
15458.8.diff (2.3 KB) - added by SergeyBiryukov 12 years ago.
15458-clean_user_cache.diff (1.8 KB) - added by ryan 12 years ago.
clean_user_cache() workaround for blogs of user cache cleaning
15458.9.diff (2.2 KB) - added by ryan 12 years ago.
Pull root blog check out of the loop.
15458.testcase-get_blogs_of_user.patch (956 bytes) - added by SergeyBiryukov 12 years ago.

Download all attachments as: .zip

Change History (107)

#1 @johnjamesjacoby
13 years ago

+1

WP will need specifically optimized queries to handle the polling of multiple users and specific pieces usermeta.

#2 @nacin
13 years ago

We'd want to continue to load a number of meta keys for any WP_User object, and also a few more for the logged-in user as well.

#3 @andy
13 years ago

Related: #12267

#4 @scribu
13 years ago

  • Cc scribu added

#5 @westi
13 years ago

  • Cc westi added

#6 follow-ups: @Denis-de-Bernardy
13 years ago

For the same reason as posts, making dozens of queries per user is more expensive than making a single one for all users.

Are we even loading these things for all queried users in one go? If not, agreeing wholly; if so, disagreeing based on the above.

#7 @Denis-de-Bernardy
13 years ago

s/making dozens of queries per user/making dozens of queries for as many users/

#8 in reply to: ↑ 6 @andy
13 years ago

Replying to Denis-de-Bernardy:

For the same reason as posts, making dozens of queries for as many users is more expensive than making a single one for all users.

Are we even loading these things for all queried users in one go? If not, agreeing wholly; if so, disagreeing based on the above.

A new bulk authors query is such a good idea that it deserves its own ticket. It dovetails with this ticket in that there is no need to load usermeta when all you need is the author's display name.

#9 in reply to: ↑ 6 @nacin
13 years ago

Replying to Denis-de-Bernardy:

For the same reason as posts, making dozens of queries per user is more expensive than making a single one for all users.

Are we even loading these things for all queried users in one go? If not, agreeing wholly; if so, disagreeing based on the above.

This is commonly seen when querying the current user, not necessarily all queried users. Multiple-user usermeta queries is not common.

I think a solution is to allow WP_User to be instantiated with only a whitelisted (and filterable) subset of usermeta being queried. This would be leveraged for querying the current user's user meta, which obviously is loaded into memory.

When a non-whitelisted usermeta key is requested, we should then load ALL usermeta, not just that key, to allow other keys to hit the cache. The whitelist should be large enough to cover core usage, and beyond that, it'll at least keep the memory footprint lower than it is now.

#10 @scribu
13 years ago

  • Keywords needs-patch added; 3.2-early removed
  • Milestone changed from Future Release to 3.2

#11 @scribu
13 years ago

I can take a stab at this if no one has already started.

#12 follow-up: @scribu
13 years ago

A more conservative approach would be to load everything by default, but have a blacklist, populated by a register_heavy_user_custom_field() function.

#13 @mikeschinkel
13 years ago

  • Cc mikeschinkel@… added

#14 in reply to: ↑ 12 @aaroncampbell
13 years ago

  • Cc aaroncampbell added

Replying to scribu:

A more conservative approach would be to load everything by default, but have a blacklist, populated by a register_heavy_user_custom_field() function.

My two cents: I prefer the whitelist idea.

#15 @ryan
13 years ago

  • Milestone changed from 3.2 to Future Release

@scribu
13 years ago

#16 @scribu
13 years ago

  • Keywords has-patch added; needs-patch removed
  • Milestone changed from Future Release to 3.3

I'm not sure about the best way of handling large custom fields, but I do know that we can reduce the memory footprint by half:

Currently, each field is stored twice: once in WP_User->data and once on the WP_User object directly. We can avoid this by using magic methods. So that's what 15458.diff does.

Last edited 13 years ago by scribu (previous) (diff)

@scribu
13 years ago

fix deprecation warnings

#17 @ryan
13 years ago

This seems like a good first step that is in scope for 3.3. I added some unit tests:

http://unit-tests.trac.wordpress.org/changeset/390

#18 @ryan
13 years ago

In [18504]:

Add magic get/set/isset methods to WP_User to avoid data duplication. Standardize on WP_User::ID. Props scribu. see #15458

#19 @aaroncampbell
13 years ago

I haven't actually got to test it yet, but it looks like you could set user->id but you can only get user->ID. Should we add a deprecated parameter bit to the magic set as well?

@scribu
13 years ago

#20 @scribu
13 years ago

I guess we could. See set.15458.diff.

#21 @aaroncampbell
13 years ago

I think that makes sense. I think this would be an unexpected experience:

$user->id = 123;
if ( 123 == $user->id )
	echo "why this no work?!?";
else
	echo "This happens and it shouldn't!";

#22 @ryan
13 years ago

In [18506]:

Handle deprecation of id in set(). Props scribu. see #15458

#23 @ryan
13 years ago

Perhaps pulling meta out of the user object cache could be the next step? meta has its own cache.

#24 @scribu
13 years ago

Looking into that now. Will basically mean getting rid of _fill_user() and it's ilk, which is good.

Last edited 13 years ago by scribu (previous) (diff)

#25 @scribu
13 years ago

So, WP_User calls get_userdata(), which is a pluggable function.

Furthermore, get_userdata()'s docs say that it returns "User DB row object", but it actually includes metadata too.

We could make get_userdata() return a WP_User object instead and move the query into WP_User::__construct(), but I don't know what implication that would have for code that overwrites get_userdata().

I assume we can't just make get_userdata() not-pluggable anymore.

#26 @scribu
13 years ago

Same situation for get_user_by(), get_userdatabylogin() and get_user_by_email().

Last edited 13 years ago by scribu (previous) (diff)

#27 @scribu
13 years ago

first-last-name.15458.diff removes the first_name and last_name properties, since they interfere with the magic methods. Any notices that spring up (I didn't see any) should be dealt with another way.

Last edited 13 years ago by scribu (previous) (diff)

#28 @scribu
13 years ago

Related: #18333

#29 @scribu
13 years ago

Related: #8911

#30 @ryan
13 years ago

We might be able move those user functions out of pluggable.php. Let's sample some of the LDAP plugins and see what they are doing.

#31 @ryan
13 years ago

I sampled 6 LDAP and SSO plugins. One replaced get_user_by_email(), but simply to workaround a possible bug. I didn't see any other use of pluggable functions.

Last edited 13 years ago by ryan (previous) (diff)

#32 @ryan
13 years ago

In [18512]:

Remove first_name and last_name properties. Props scribu. see #15458

#33 @scribu
13 years ago

I'm in the process of writing a patch that makes WP_User::__get() call get_user_meta().

The advantage is that you don't have to duplicate all the metadata in WP_User->data anymore.

The drawback is that using $user->somefunkykey, where meta_key is actually some-funky-key won't work anymore and there's no way around it, since you can't guess where the dashes are.

This should be pretty rare, but let me know if it's not an acceptable compromise.

#34 @ryan
13 years ago

In [18515]:

Handle id back compat in isset magic method. see #15458

#35 @ryan
13 years ago

I think that will be okay, but we should audit core user meta for use of dashes. And instead of relying on the magic method all the time perhaps we should standardize on using a get() or get_meta() accessor so we can handle dashed keys.

#36 @ryan
13 years ago

I took a quick look through a usermeta table and saw dashes used only for admin state related core user meta (meta box prefs, settings cookie sync). These are never accessed as an object property.

#37 @ryan
13 years ago

Unit tests for id property back compat:

http://unit-tests.trac.wordpress.org/changeset/411

#38 @ryan
13 years ago

foreach ( (array) $user as $key => $value )

As seen in get_blogs_of_user() won't work as previously expected if $user is a WP_User object. Something to be wary of.

Last edited 13 years ago by ryan (previous) (diff)

@scribu
13 years ago

#39 @scribu
13 years ago

lazy-load.diff:

  • _fill_user(), _fill_single_user() and _fill_many_users() are gone.
  • get_user_metavalues() is deprecated
  • introduces a static method WP_User::get_data_by() which handles all fields, including 'id'
  • handles the case in get_blogs_of_user()

What it doesn't handle yet: wp_update_user(). This means that most fields in user-edit.php aren't updated anymore.

Last edited 13 years ago by scribu (previous) (diff)

#40 @scribu
13 years ago

perhaps we should standardize on using a get() or get_meta() accessor so we can handle dashed keys

Well, you can always do get_user_meta( $user->ID, 'some-key', true ) or just $user->{'some-key'}.

@scribu
13 years ago

#41 @scribu
13 years ago

  • Keywords needs-testing added

lazy-load.2.diff:

  • updated wp_insert_user() and wp_update_user() to use WP_User::get_data_by() instead of get_userdata()
  • sanitize_user_object() is deprecated. Sanitization is handled in __get()

Everything seems to work.

PS: The patch can be applied using patch -p1, instead of patch -p0.

#42 follow-up: @ryan
13 years ago

$this->caps = &$this->data->{$this->cap_key};
if ( ! is_array( $this->caps ) )
	$this->caps = array();

Those lines in WP_User::_init_caps() are somehow causing cache pollution. wp_cache_get($id, 'users'), as done in get_userdata(), will sometimes return an object with the wp_xxx_capabilities property set ( where xxx is the current blog id). The value is an empty array. This confuses get_blogs_of_user(), which sees that the property is set and mistakenly thinks the user belongs to the current blog. Since the cache clones objects on set and get this is rather unexpected. Removing the ref assignment avoids the problem.

Last edited 13 years ago by ryan (previous) (diff)

#43 in reply to: ↑ 42 @ryan
13 years ago

Those lines in WP_User::_init_caps() are somehow causing cache pollution. wp_cache_get($id, 'users'), as done in get_userdata(), will sometimes return an object with the wp_xxx_capabilities property set ( where xxx is the current blog id). The value is an empty array. This confuses get_blogs_of_user(), which sees that the property is set and mistakenly thinks the user belongs to the current blog. Since the cache clones objects on set and get this is rather unexpected. Removing the ref assignment avoids the problem.

Tracked this down to a bug in a particular memcached backend. The value was being properly cloned from the cache during get but then that clone was assigned right back into the cache. Dumb. Anyhow, not a core bug so disregard the previous comment.

#44 @scribu
13 years ago

In lazy-load.2.diff, that line is replaced with this:

$this->caps = get_user_meta( $this->ID, $this->cap_key, true );

#45 @ryan
13 years ago

SELECT * FROM wp_trunk_users WHERE user_login = ''
get_userdata, get_user_by, WP_User->__construct, WP_User->get_data_by

When WP_User instantiated with no args.

@scribu
13 years ago

#46 @scribu
13 years ago

lazy-load.3.diff:

  • prevent useless query in WP_User::get_data_by() when value is empty
  • remove clean_user_cache() call from *_metadata() functions (not necessary anymore)
  • use WP_User::get_data_by() in clean_user_cache() for efficiency
Last edited 13 years ago by scribu (previous) (diff)

#47 @ryan
13 years ago

get_user_option fails because the isset magic method doesn't consult meta. I think we should retain this for back compat. Since get_metadata() returns an empty string even if a meta key is not actually set, I introduced isset_metadata() and used it in the isset magic method. So that get_user_option() can once again support meta keys with dashes, I also added is_set() and get() methods to WP_User.

Unit tests are here.

http://unit-tests.trac.wordpress.org/browser/wp-testcase/test_user.php

We need more coverage of WP_User() and the functions that wrap it.

@ryan
13 years ago

#48 @scribu
13 years ago

For reference, the diff between lazy-load.3.diff and 15458.3.diff:

http://github.com/scribu/WordPress/commit/b6509efedd8d581874b85c6cd43d262bd5ae7cbb

#49 @scribu
13 years ago

  • Keywords needs-unit-tests added; needs-testing removed

@ryan
13 years ago

#50 @scribu
13 years ago

So, to not get lost in the discussion, I propose some renamings:

isset_metadata() -> metadata_exists()

WP_User::is_set() -> WP_User::has()

@ryan
13 years ago

s/isset_metadata/metadata_exists/, s/is_set/exists/

#51 @ryan
13 years ago

Discussed exists() vs. has() vs. has_prop() vs. purely magic method in IRC. Nothing decided yet.

#52 @azaozz
13 years ago

metadata_exists() sounds good, perhaps WP_User::has_prop() or WP_User::option_exists() is a bit better than WP_User::has() and WP_User::exists().

@ryan
13 years ago

s/exists/has_prop/

#53 @ryan
13 years ago

In [18597]:

Introduce metadata_exists(), WP_User::get_data_by(), WP_User::get(), WP_User::has_prop(). Don't fill user objects with meta. Eliminate data duplication in cache and memory. Props scribu. see #15458

#54 @ryan
13 years ago

The amount of data stored in the users group in memcached for a sample multisite user was reduced from 22.31K to 0.47K. Multiply that by millions of users for a very nice savings. On users.php with 17 users, the number of trips to the users cache went from ~422 to ~400. (Numbers include local cache fetches, not just memcached trips.) The number of trips to the user_meta cache increased sharply from ~13 to ~464. That is to be expected since we're not duplicating user meta in the users cache any more and must fetch meta separately. The overall cache query time is roughly the same, however. Gets from the users cache are many times faster now that the cache is much smaller.

#55 @scribu
13 years ago

Very cool stats. Anything else left to do here?

#56 @ryan
13 years ago

  • Resolution set to fixed
  • Status changed from new to closed

#57 @ryan
12 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

wp_update_user() is broken.

	// First, get all of the original fields
	$user = WP_User::get_data_by('id', $ID);

	// Escape data pulled from DB.
	$user = add_magic_quotes(get_object_vars($user));

That no longer has the meta values in it, so plugins that pass just the things they want to change in $userdata ends up stomping meta values.

#58 @ryan
12 years ago

In [18905]:

Avoid Warning: array_keys() expects parameter 1 to be array, boolean given. see #15458

#59 @ryan
12 years ago

Various themes, plugins, and even maybe a couple places in core are calling get_blogs_of_user() with a user ID of 0 or null, resulting in the warning noted in the commit message above. Checking for an empty id at the top of the function seems a good idea.

#60 @ryan
12 years ago

As for wp_update_user(), I resurrected _fill_user() and called it in wp_update_user() as a temp fix on wp.com. I'm sure we can think of a better permanent fix.

#61 @SergeyBiryukov
12 years ago

The second empty() check seems redundant, unless I'm missing something. Also, line 687 has excess whitespace (Trac doesn't display the difference in my patch, but it's visible in the changeset).

#62 @ryan
12 years ago

In [18906]:

Do only one empty check. Clean up whitespace. Props SergeyBiryukov. see #15458

#63 @ryan
12 years ago

Added a unit test to catch the wp_update_user() bug.

http://unit-tests.trac.wordpress.org/changeset/448

@ryan
12 years ago

wp_update_user() fix

#64 @ryan
12 years ago

Another option is to put that logic in a WP_User::get_vars() method. Or, have WP_User::get() return an associative array when the $key argument is empty.

#65 @ryan
12 years ago

Looks good. Passes the new unit test. I'll add some more tests and commit.

#67 @ryan
12 years ago

In [18909]:

Don't stomp meta fields in wp_update_user(). Props scribu. see #15458

#68 @ryan
12 years ago

There is still some debate on API, but I committed wp_update_users.15458.3.diff for now since the current code can lead to data loss.

#69 @ryan
12 years ago

Some sporadic reports of get_blogs_of_user() problems after putting these changes on wp.com.

There have been a few reports of users having difficulty adding other users to their blogs. It appears that in some cases, the issue resolves itself and the user is ultimately added. In some other cases, the user can be removed and re-added to the blog by an SA, which resolves the issue.

This is with the memcached backend. Investigating.

#70 @ryan
12 years ago

When we lost clean_user_cache() in the various meta.php functions, we lost clearing of the blogs_of_user cache. Restoring the clean fixes get_blogs_of_user( ) problems.

@nacin
12 years ago

#71 @nacin
12 years ago

15458.7.diff removes caching from get_blogs_of_user().

The caching appears pretty silly and causes more trouble than it's worth. It also looks like it was still checking for an older version of this cache, which can go. get_blog_details() is already cached, as is get_user_meta() (incl. when requesting all meta keys).

The code for looping through the keys was pretty hairy. New code hasn't been benchmarked yet, but it avoids that regular expression and optimizes string function usage, among other things. I'd be willing to bet this would perform much better than restoring clean_user_cache().

#72 follow-up: @nacin
12 years ago

We can also add in a defined('MULTISITE') check before (or after) `$key === $wpdb->base_prefix . 'capabilities', to make sure we're operating on the main blog of a 3.0+ multisite.

I'm open to suggestions on how to better order the various conditions in the loop to ensure that common and fast cases are treated higher up.

Ryan relays that blogs_of_user caching reduced load on WP.com when it was introduced, but it may have been just patching the symptoms. Fixing up the blog details caching in 3.1 and user objects in 3.2 might mean we can yank the caching here.

User objects (before this ticket) could get pretty large. Like 300k large. As long as the user's meta is in cache, though, we'll be loading it but only processing the keys. I doubt looping through even a few hundred meta keys would cause enough load to require caching. I dunno. Worth some serious testing.

#73 in reply to: ↑ 72 @SergeyBiryukov
12 years ago

Replying to nacin:

I'm open to suggestions on how to better order the various conditions in the loop to ensure that common and fast cases are treated higher up.

I've performed a quick test by iterating each condition 10,000,000 times, and the average results were:

6,479s for ( 0 !== strpos( $key, $wpdb->base_prefix ) )
6,123s for ( 'capabilities' !== substr( $key, -12 ) )
4,408s for ( $key === $wpdb->base_prefix . 'capabilities' )

Looks like we can move them bottom-up.

#74 @nacin
12 years ago

The final condition is rare -- it only applies for a single site (root), and not for all networks (non-MU). Maybe we should check for that key with an isset() or something beforehand, perhaps.

I'm honestly surprised substr() is faster than strpos().

#75 @ryan
12 years ago

#14379 is where we fixed the blogs of user cache invalidation. It used to have a 5 second timeout, meaning we were rebuilding it all the time. All I recall is that this supposedly helped a lot on .com Because of the admin bar, get_blogs_of_user() is called on pretty much every logged in page load. But as nacin suggests, this could have been fixed more so by all of the other cache work done around that time and since.

Last edited 12 years ago by ryan (previous) (diff)

@ryan
12 years ago

clean_user_cache() workaround for blogs of user cache cleaning

#76 @ryan
12 years ago

FWIW, there's the brute force clear I used to work around this for now.

#77 @ryan
12 years ago

I poked around .com, looking for get_blogs_of_user() calls that resulted in a cache miss on user meta. All calls were for the current user and the meta cache was always hit. A quick timer_start()/timer_stop() benchmark showed times of 0.004 for the current code and 0.008 for 15458.8.diff. get_blogs_of_user() ) is called 10 times on a typical .com page load for a logged in user. So, the caching does speed things up a bit, but we're not talking about huge times or the possibility of making extra db queries without the blogs_of_user cache.

Last edited 12 years ago by ryan (previous) (diff)

@ryan
12 years ago

Pull root blog check out of the loop.

#78 follow-up: @ryan
12 years ago

http://unit-tests.trac.wordpress.org/changeset/452

Includes a test case that catches the get_blogs_of_user() bug.

#79 in reply to: ↑ 78 @SergeyBiryukov
12 years ago

Replying to ryan:

Includes a test case that catches the get_blogs_of_user() bug.

Aside from caching, the test fails due to different sorting: $blog_ids is ordered, get_blogs_of_user() is not. 15458.testcase-get_blogs_of_user.patch fixes that. With 15458.9.diff applied, the test then passes correctly.

#81 @nacin
12 years ago

  • Owner set to nacin
  • Resolution set to fixed
  • Status changed from reopened to closed

In [18928]:

Remove caching from get_blogs_of_user(). Leave caching to usermeta and blog details. Speed up the loops. fixes #15458.

#82 @jk3us
12 years ago

  • Cc jk3us added

It looks like the patch for this has introduced some problems for installations that don't use a base_prefix. In WP 3.3, line 699 is causing two problems. First, a PHP warning is thrown by searching an empty string with strpos. Also, the "My sites" menu only contains the first site in that case, ignoring the rest. Checking the strlen of $wpdb->base_prefix before the strpos call fixes both of these, and appears to be what was done before this change was made.

See these three threads.

#83 @nacin
12 years ago

An empty base prefix is not supported. #16229

#84 @RMarks
12 years ago

  • Cc RMarks added

@nacin, I believe you're referencing the $table_prefix. My setting for this, per wp-config.php, is

$table_prefix  = '';

and I' am getting the error jk3us mentioned.

#85 @scribu
12 years ago

Related: #19595

This ticket was mentioned in Slack in #core-multisite by jeremyfelt. View the logs.


8 years ago

Note: See TracTickets for help on using tickets.