Make WordPress Core

Opened 9 years ago

Last modified 22 months ago

#31746 reviewing defect (bug)

get_blogs_of_user() can be very slow when a user is a member of thousands of sites

Reported by: jtsternberg's profile jtsternberg Owned by: johnbillion's profile johnbillion
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 3.0
Component: Users Keywords: has-patch
Focuses: multisite, performance Cc:

Description

When a user belongs to thousands of sites, the array of $blogs can become rather large. Considering get_blogs_of_user is called in many places on every page-load when a user is logged in, Some optimization should happen here. Not counting the multisite admin pages, get_blogs_of_user() is called from the functions is_user_member_of_blog(), get_dashboard_url(), and WP_Admin_Bar::initialize(). I propose the results of the function to at least be stored to a static variable to prevent multiple lookups to the same information. I also propose that a pre-fetch filter (pre_get_blogs_of_user ?) be added so that a plugin can override that information and provide something more efficient if the need arises.

Attachments (3)

31746.diff (1.1 KB) - added by jtsternberg 9 years ago.
Adds pre_get_blogs_of_user filter and sets a static variable to keep from re-fetching data
31746.2.diff (4.0 KB) - added by jtsternberg 8 years ago.
31746.3.diff (4.7 KB) - added by jtsternberg 8 years ago.
same as 31746.2.diff, but with a supporting unit test

Download all attachments as: .zip

Change History (36)

@jtsternberg
9 years ago

Adds pre_get_blogs_of_user filter and sets a static variable to keep from re-fetching data

#1 @SergeyBiryukov
9 years ago

When a user belongs to thousands of sites, the array of $blogs can become rather large.

I encountered this issue as well. I help manage a network where WordPress is used as an LMS, and one of the super admins is also a trainer added to each student's blog.

get_blogs_of_user() was making hundreds of queries for that account on each page load in the admin, often leading to slow response times and out of memory errors.

A hacky workaround was to short-circuit the function for super admins, who have access to all the blogs anyway:

function wp31746_skip_get_blogs_of_user_for_super_admin( $null, $object_id, $meta_key, $single ) {
	global $wpdb;

	if ( $meta_key || ! is_super_admin() ) {
		return $null;
	}

	remove_filter( 'get_user_metadata', __FUNCTION__, 10, 4 );

	$keys = get_user_meta( $object_id );

	add_filter( 'get_user_metadata', __FUNCTION__, 10, 4 );

	foreach ( $keys as $key => $value ) {
		if ( 'capabilities' !== substr( $key, -12 ) ) {
			continue;
		}

		if ( $wpdb->base_prefix && 0 !== strpos( $key, $wpdb->base_prefix ) ) {
			continue;
		}

		$blog_id = str_replace( array( $wpdb->base_prefix, '_capabilities' ), '', $key );
		if ( ! is_numeric( $blog_id ) ) {
			continue;
		}

		unset( $keys[ $key ] );
	}

	return $keys;
}
add_filter( 'get_user_metadata', 'wp31746_skip_get_blogs_of_user_for_super_admin', 10, 4 );

#2 @MikeHansenMe
9 years ago

  • Keywords has-patch added

#3 @DrewAPicture
9 years ago

  • Version changed from trunk to 3.0

#4 @jtsternberg
9 years ago

I recognize this area will not get much attention as it is pretty much an edge-case, but the proposed filter is the least invasive method and will allow custom handling for this situation. Would love to get some more eyes/feedback on this ticket.

#5 @SergeyBiryukov
9 years ago

  • Milestone changed from Awaiting Review to Future Release

#6 @flynsarmy
9 years ago

I would also love to see this filter implemented as the default WP admin bar calls get_blogs_of_user() twice (once directly and once indirectly) AND admin-header.php calls it indirectly as well. That's a huge number of queries when a user has lots of sites.

#7 @johnbillion
8 years ago

  • Milestone changed from Future Release to 4.4
  • Owner set to johnbillion
  • Status changed from new to reviewing

#8 @johnbillion
8 years ago

  • Keywords needs-patch added; has-patch removed
  • Milestone changed from 4.4 to Future Release

Couple of issues with this patch.

  • The use of the static means this isn't unit testable when the function is re-called with a different user ID, and thus causes the tests in Tests_Multisite_User to fail. The static could be an array keyed by user ID, maybe.
  • If the pre_get_blogs_of_user filter returns an empty array because the user doesn't have any blogs, the rest of the processing in the function still runs. The empty() check should probably be removed.

@jtsternberg
8 years ago

@jtsternberg
8 years ago

same as 31746.2.diff, but with a supporting unit test

#9 @jtsternberg
8 years ago

  • Keywords has-patch added; needs-patch removed

Thanks for the feedback John. 31746.2.diff has been modified to 1) keep a keyed array of user blogs and 2) defaults the original pre_get filter value to null (which follows a pretty common precedent among other pre_ filters), and only check if the result of the filter is an array. The tests in Tests_Multisite_User now all pass, and I've added an additional test for the pre_get filter.

#10 @SergeyBiryukov
8 years ago

#34448 was marked as a duplicate.

#11 @mnelson4
8 years ago

Would also like to see this patch make it to core as we're experiencing this issue as well.

#16853 is very similar.

#12 follow-up: @ericlewis
8 years ago

Can we use the object caching API here to store the result persistently?

#13 in reply to: ↑ 12 ; follow-up: @jtsternberg
8 years ago

Replying to ericlewis:

Can we use the object caching API here to store the result persistently?

What should/would trigger cache invalidation?

#14 @nerrad
8 years ago

I think the patch proposed here is a good interim measure, it at least allows for hooking in to modify the number of blogs returned and some temporary caching of results.

However long term, I think it'd be better to introduce paging to this. I realize there would be a number of implications to any ui that uses this function but that's something that needs improved anyways (for user's that legitimately _belong_ to a large number of blogs anyways).

The good thing about this patch though, is it helps makes it possible for a feature plugin to be built that tests out using paging and what that affects.

#15 in reply to: ↑ 13 @ericlewis
8 years ago

Replying to jtsternberg:

What should/would trigger cache invalidation?

Not sure off the top of my head. Someone would need to take a good look at the ways a user is added to and removed from a site. remove_user_from_blog() and add_user_to_blog() exist, but I'm not sure if there would be other places to tap into. cc @jeremyfelt

#16 @boonebgorges
8 years ago

What should/would trigger cache invalidation?

Blog membership is determined by caps, which is stored in wp_x_capabilities in usermeta. Invalidation could happen at a low-level, in WP_User->add_role() set_role() remove_role() etc, or even by intercepting update_option() calls if we really wanted to cover all possible points.

#17 @artilibere
8 years ago

could we just add a capability like "memberof_network" given at administrator level?

This ticket was mentioned in Slack in #core-multisite by richardtape. View the logs.


8 years ago

#19 follow-up: @ericlewis
8 years ago

Alternatively, we could try to avoid calling get_blogs_of_user() in the places @jtsternberg mentioned in the original post.

#20 in reply to: ↑ 19 @nerrad
8 years ago

Replying to ericlewis:

Alternatively, we could try to avoid calling get_blogs_of_user() in the places @jtsternberg mentioned in the original post.

There still would need to be a replacement for what its called for. Long term, there still needs to be something to allow get_blogs_of_user() to be more performant. The main reason why this function is so slow is because of this line:

$keys = get_user_meta( $user_id );. All info about the blogs a user belongs to is inferred from data used to store capabilities for a user on a blog AND thus ALL user meta has to be retrieved in order to grab that data. Then each meta key/value is LOOPED through and an additional query is done to get the blog details (querypalooza). This means (contrary to my earlier suggestion) efficient paging can't be done.

Long term, there needs to be a better way to indicate the blogs a user belongs to that can be

  1. easily queried.
  2. can be paged.

A couple ways (so far) that I can think of for doing this are:

Option 1: Track the blogs a user belongs to via its own user_meta key/value pair.

Each blog a user belongs to is a record with a common prefix (maybe user_blog => {$blog_id}). With that, the query for getting all the blogs a user belongs to can be simplified and pages (and even to some degree, joined with the blog table.

Option 2: Add a new table

Another way of accomplishing a more performant way to get the blogs a user belongs to is to rework how capabilities (gasp another core wp system) are done a bit so capabilities are tracked in a separate table called user_capabilities where we have:

user_cap_id | blog_id | user_id | capabilities

On single site installs blog_id will always be 1 and this table will have little information. However on multi-sites this table because immediately more useful because its more scalable. It serves as both a join table for user -> blog and tracks the capabilities the user has on that blog. Feasibly this could also be used to track user information that is specific to blog.

Wrap Up

Option one is definitely the quicker win because it requires less code changes elsewhere. However, option 2 gives us the benefits of less data stored in the database, more scalability, and clearer representation of the data. The tradeoff of course is the impact on the capability system (ugh). The ONLY reason I suggested using this for storing capabilities is because it doesn't make sense to create a brand new table just for joining blogs to users (although there's benefits to doing so to) and IF we create a new table, we should try to think of other problems this new table could help solve.

#21 @nerrad
8 years ago

Just thought of an Option 3 add a blog_id column to the user_meta column. Then instead of having to prefix any meta_key specific to a blog you just put the blog_id in its OWN column. Then any queries retrieving the blogs a user belongs to can just grab distinct blog ids.

#22 @jtsternberg
8 years ago

I guess I'm biased, but I really wish we would push the patch through. It's A) helping with performance by bringing those queries down to 1 per page load, and B) provides a filter whereby we can do alternate performance feasibility testing (new table, caching, etc). The patch provides an instant performance improvement on systems where users are part of many blogs, and provides an 'off' switch on systems where users belong to too many sites for it to be a useable UI.

Last edited 8 years ago by jtsternberg (previous) (diff)

#23 @nerrad
8 years ago

yah I agree as stated earlier, the short-term solution is to get a filter in.

#24 @mnelson4
8 years ago

+1 to short term fix at least; more long-term refactors etc should probably be discussed in a separate ticket

This ticket was mentioned in Slack in #core-multisite by websupporter. View the logs.


8 years ago

#26 @nerrad
8 years ago

#36707 was created for specifically getting the filter initially proposed put in, that way we're able to keep this ticket for continued discussion of a longer term solution as discussed in slack.

#27 @jeremyfelt
8 years ago

In 37326:

Multisite: Add the pre_get_blogs_of_user filter

This allows a plugin to short circuit get_blogs_of_user() in cases where the default behavior of the function is unnecessary or slow. (e.g. A user is a member of thousands of sites.)

Props jsternberg.
See #31746, Fixes #36707.

#28 @jeremyfelt
8 years ago

#36815 was marked as a duplicate.

This ticket was mentioned in Slack in #core-multisite by jeremyfelt. View the logs.


7 years ago

#30 @johnbillion
2 years ago

  • Milestone set to Awaiting Review

#32 @iandunn
22 months ago

  • Focuses performance added

#33 @iandunn
22 months ago

#55911 was marked as a duplicate.

Note: See TracTickets for help on using tickets.