WordPress.org

Make WordPress Core

Opened 3 years ago

Last modified 8 weeks ago

#38741 accepted enhancement

Introduce the concept of a large site in order to speed up the Users screen when there are many users

Reported by: johnbillion Owned by: johnbillion
Milestone: Future Release Priority: normal
Severity: normal Version:
Component: Users Keywords: early has-patch dev-feedback needs-testing 2nd-opinion
Focuses: administration, multisite, performance Cc:
PR Number:

Description

Given a single site install with a high number of users, the Users screen can become very slow due to the nasty query performed by count_users(). The result of this query is used to populate the count of users with each role in the role filtering links across the top.

SELECT
	COUNT(NULLIF(`meta_value` LIKE '%\"administrator\"%', false)),
	COUNT(NULLIF(`meta_value` LIKE '%\"editor\"%', false)),
	COUNT(NULLIF(`meta_value` LIKE '%\"author\"%', false)),
	COUNT(NULLIF(`meta_value` LIKE '%\"contributor\"%', false)),
	COUNT(NULLIF(`meta_value` LIKE '%\"subscriber\"%', false)),
	COUNT(NULLIF(`meta_value` = 'a:0:{}', false)), COUNT(*)
FROM
	wp_usermeta
WHERE
	meta_key = 'wp_capabilities'

On a site with tens of thousands of users, this SQL query can easily take tens of seconds.

Multisite has the concept of a large network (by default, one with > 10,000 users or > 10,000 sites), but single site doesn't have this concept.

The query performed by count_users() can be skipped entirely. If it is skipped, the list of roles will simply display all available roles and won't display a count next to each role. I think this is a completely acceptable trade-off for such a performance gain on sites with a high number of users.

Attachments (11)

38741.diff (3.9 KB) - added by johnbillion 3 years ago.
38741.2.diff (5.5 KB) - added by tharsheblows 3 years ago.
with wp_is_large_site()
38741.3.diff (6.3 KB) - added by tharsheblows 3 years ago.
with tests and cleaned up a bit
38741.4.diff (8.5 KB) - added by tharsheblows 3 years ago.
renamed functions and apply filter before ms one
38741.5.diff (8.9 KB) - added by johnbillion 2 years ago.
38741.6.diff (8.4 KB) - added by johnbillion 2 years ago.
38741.7.diff (9.1 KB) - added by johnbillion 2 years ago.
38741.8.diff (12.2 KB) - added by spacedmonkey 18 months ago.
38741.9.diff (15.6 KB) - added by spacedmonkey 17 months ago.
38741.10.diff (16.2 KB) - added by macbookandrew 16 months ago.
Adds another fix for bulk-editing of posts
38741.11.diff (16.9 KB) - added by macbookandrew 5 months ago.
patch file for 5.2.1

Download all attachments as: .zip

Change History (68)

@johnbillion
3 years ago

#1 @johnbillion
3 years ago

  • Keywords has-patch needs-testing added

38741.diff is a partial patch which implements the performance improvement on the Users screen, but only implements a no-oped version of a wp_is_large_site() function.

To test the patch, change the return value of wp_is_large_site() to true.

TODO: Implement logic in wp_is_large_site() which determines if the site has a large number of users or not. This will probably need to be converted to a cached value in the same way that Multisite uses a cached value for wp_is_large_network().

#2 @Mista-Flo
3 years ago

+1 on this, we should focus on this performance issue.

#4 @tharsheblows
3 years ago

There's also an instance in wp-includes/update.php so I've updated that one too. This makes the count a transient with a max expiration of 12 hours and takes the query down from about 2.5s to 50.3ms with user table of 184k.

I'm not sure if:

  1. the value should be a transient
  2. the expiration time should be filterable

Still needs testing and tests.

@tharsheblows
3 years ago

with wp_is_large_site()

@tharsheblows
3 years ago

with tests and cleaned up a bit

#5 @tharsheblows
3 years ago

With tests and cleaned up a bit

#6 @johnbillion
3 years ago

  • Keywords has-unit-tests added
  • Milestone changed from Awaiting Review to 4.8

#7 @tharsheblows
3 years ago

Updated function names and applies before the 'wp_is_large_network' too.

@tharsheblows
3 years ago

renamed functions and apply filter before ms one

This ticket was mentioned in Slack in #core by jeffpaul. View the logs.


3 years ago

This ticket was mentioned in Slack in #core by jeffpaul. View the logs.


3 years ago

This ticket was mentioned in Slack in #core by obenland. View the logs.


2 years ago

#11 @obenland
2 years ago

  • Milestone changed from 4.8 to Future Release

#12 @johnbillion
2 years ago

Needs some more testing and review.

#13 @johnbillion
2 years ago

@tharsheblows I just remembered that we already have get_user_count() which stores a count of all the users on the network. This can be used in place of the new user count functions that you've added in your patches.

Do you want to update your patch? I'd like to see about getting this into 4.9.

#14 @tharsheblows
2 years ago

@johnbillion that's multisite only, isn't it? In ms-functions.php.

I'd love to get this in 4.9! :)

@johnbillion
2 years ago

@johnbillion
2 years ago

@johnbillion
2 years ago

#15 @miss_jwo
2 years ago

Looks good to me. Thansk for changing the function name to wp_is_large_user_count() as the original suggestion would have been misleading in many ways.

#16 @johnbillion
2 years ago

  • Keywords needs-testing removed
  • Milestone changed from Future Release to 4.9
  • Owner set to johnbillion
  • Status changed from new to reviewing

#17 @johnbillion
2 years ago

  • Resolution set to fixed
  • Status changed from reviewing to closed

In 41613:

Users: Introduce the concept of a large site in order to speed up the Users screen when there are many users.

Calling the count_users() function is expensive, regardless of the counting strategy that's used, and it gets
slower the more users there are on a site. In order to speed up the Users screen in the admin area, calling
count_users() can be avoided entirely while still displaying the total count for users.

This introduces some new functions:

  • wp_is_large_user_count()
  • wp_get_active_user_count()
  • wp_update_active_user_count()

A corresponding wp_is_large_user_count filter is also introduced.

Props tharsheblows, johnbillion

Fixes #38741

#18 @johnbillion
2 years ago

In 41614:

Users: There is not, in fact, 12345 users on every WordPress installation.

Props spacedmonkey

See #38741

#19 @johnbillion
2 years ago

  • Keywords needs-patch needs-unit-tests added; has-patch has-unit-tests removed
  • Resolution fixed deleted
  • Status changed from closed to reopened

Some issues have arisen:

  • The function names need improving as nothing here relates to whether a user is "active" or not.
  • COUNT(ID) should be used to better utilise the table index.
  • The tests are failing ¯\_(ツ)_/¯

#20 @johnbillion
2 years ago

In 41623:

Users: Remove the failing tests added in [41613] while they're investigated.

See #38741

#21 @jrchamp
2 years ago

If you don't want to worry about the right field to use to match the available indexes, COUNT(1) might be a simpler, consistent option that the database can optimize easily.

This ticket was mentioned in Slack in #core by jeffpaul. View the logs.


2 years ago

#23 @johnbillion
2 years ago

  • Focuses multisite added
  • Status changed from reopened to accepted

I'm going back to the idea of moving get_user_count() out of the Multisite files and making it available for single site installations too. Otherwise we have separate functions (get_user_count() and the soon-to-be-renamed wp_get_active_user_count()) for essentially doing the same thing.

Patch coming up.

Adding the multisite focus even though this functionality doesn't actually affect Multisite.

#24 @johnbillion
2 years ago

  • Keywords needs-revert added; needs-patch needs-unit-tests removed

Regrettably I think this needs some more work and I don't want to rush it in today. I'm going to revert this and get it into 5.0 early.

#25 @johnbillion
2 years ago

In 41753:

Users: Revert [41613], [41614], and [41623] as this feature needs some more work.

See #38741

#26 @johnbillion
2 years ago

  • Keywords early needs-patch added; needs-revert removed
  • Milestone changed from 4.9 to Future Release

#27 @johnbillion
2 years ago

  • Milestone changed from Future Release to 5.0

This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.


2 years ago

#29 @tharsheblows
2 years ago

I'm currently using 38741.7.diff(ish — with 4.8.3) on a production site and will let you know if it all goes horribly wrong. (It hasn't yet.)

#30 @flixos90
19 months ago

@johnbillion I like the idea of tying in more with the existing multisite functionality, and I actually prefer the initial name wp_is_large_site() as it would align well with the existing wp_is_large_network(). It could have a parameter defaulting to users, which determines what criteria to use to determine whether the site is considered large. This would make it future-proof for other similar ideas (maybe wp_is_large_site( 'posts' ) could come in handy at some point too.
I don't like the wp_is_large_user_count() as it's unclear how that works (it counts all users, in multisite that's the entire setup). I also don't think we should need a separate additional filter to wp_is_large_network() just for that purpose. The new function should be there for a site. It should be available for a single site, but just as useful when having multisite enabled.

I propose the following:

  • Introduce wp_is_large_site( $using = 'users', $site_id = null ). The second parameter defaults to the current site, but allows to do the check for another site in multisite.
  • The option where the number is stored should be called user_count. I don't see why the 'active' is needed here at all.
  • The SQL query to count the users can remain as is for single site (single all users = site users here), but in multisite it should have a meta table join and do WHERE umeta.meta_key = 'wp_X_capabilities' (depending on the site checked).
  • wp_update_active_user_count() should be wp_update_site_user_counts( $site_id = null ) instead, to align with wp_update_network_user_counts( $network_id = null ).
  • I really like the idea of leveraging get_user_count() for sites as well, however I'm not sure how to sanely do so as it already has a $network_id as first parameter. We could create get_site_user_count() but that'd be kind of incoherent (unless we'd also have get_network_user_count() - but maybe we could consider refactoring this).

#31 @tharsheblows
19 months ago

@flixos90 It'd be useful for me to take a step back and clarify what wp_is_large_site('users') is meant to do – in my mind it's a kill switch for functions that are too expensive to run when a table is large, eg count_users() on a single site install. In this context only the size of the table matters. For example, if a site in a multisite network only has 10 users but the network has 20,000 then it's not a good idea to run a slow query on it.

My worry is that people will use wp_is_large_site( 'users' ) as a check before running expensive queries on both single site and multisite installs when this might not be what they wanted. One possible workaround is to use wp_is_large_network() as the multisite check for wp_is_large_site('users') rather than the query on usermeta when it's used on multisite and keep the rest the same because those are useful functions. Extending to posts and other tables could then be done on a site by site basis.

This ticket was mentioned in Slack in #core-multisite by tharsheblows. View the logs.


18 months ago

This ticket was mentioned in Slack in #core-multisite by spacedmonkey. View the logs.


18 months ago

#34 @flixos90
18 months ago

We reviewed and discussed this in today's multisite office-hours (see https://wordpress.slack.com/archives/C03BVB47S/p1527609680000818 for the chat log). Here are the decisions made during the conversation:

  • Multisite's get_user_count() has historically contained the global user count already. While it's stored on every network, it is actually a global count. Let's make this function available for single site.
    • The function supports an optional $network_id parameter. This was just recently introduced (by myself, and I regret it given these circumstances), and it doesn't affect the outcome of the function, so we can deprecate it again. This should no longer be considered a network-specific function.
    • On single-site, the user_count should be stored as a regular option. When setting up multisite, it should be migrated to become a network setting.
  • wp_is_large_user_count() should be introduced as a globally available function to check the user count. It should call get_user_count() to get its value and use 10000 as a maximum border by default. A filter should be present to modify the result of this function.
  • The multisite function wp_is_large_network() should be updated to call wp_is_large_user_count() if called with 'users' as the metric parameter. It won't have any functional change, just get rid of some logic that would otherwise be duplicated.
  • wp_is_large_user_count() should be used in count_users() to short-circuit as needed.
Last edited 18 months ago by flixos90 (previous) (diff)

#35 @spacedmonkey
18 months ago

  • Keywords has-patch dev-feedback needs-testing added; needs-patch removed

38741.8.diff is a completely new patch based on the feedback from @flixos90 and @jeremyfelt in multisite chat.

What this does, it makes get_user_count (and related functions) available in single site. This also means that there is a scheduled event that runs on single site twice a day now to get this count and store in options. This is a better implementation than doing a count on every site on a multisite, as for large networks, running this on every site is expensive.

If the site is seen as large, it now doesn't include counts and all the registered roles are displayed. This is a functional change and may require some discussion.

#36 @tharsheblows
18 months ago

Moving get_user_count to wp-includes/functions.php so it's available for single sites might cause issues with plugins using the function name to do the same thing. I vaguely remember discussing this with @johnbillion at a contributor day which is why the initial patches didn't do that. Sorry that discussion wasn't included here!

#37 @spacedmonkey
18 months ago

Well I simple solution to that problem would be calling the function in wp_get_global_user_count, which is a better name, deprecate get_user_count and get it call wp_get_global_user_count inside it.

#38 @flixos90
18 months ago

Even more than with registered post types, metadata, options and anything else, declaring functions without a prefix is an absolute no-go unless you are a 100% sure you know what you're doing. I don't think we should change the name of get_user_count() just because someone might have declared it. We should only change the name if we actually think the name is not correct or precise enough.

This change will be documented as well, so if someone is in fact declaring get_user_count() on their own, they should at least be aware of what they're doing is critical and be responsible of following core development in that regard closely enough.

#39 @tharsheblows
18 months ago

:) Y'all are harsh.

Having "network" in the function names for single sites feels strange to me, it's contextless and a bit confusing. I would think it's probably worth doing what you suggested @spacedmonkey to have clearer names in all those functions.

I'm not sure that single site needs a scheduled twice daily user count update on single sites either, it's fine to run it on every registration and deletion, it's a quick query. For multisite, the spam and deleted columns aren't indexed so it probably makes sense there, I think that's how it currently works. (But if this function is to be used in a kill switch for expensive queries on the users table, then it needs to be the same query as single site, ie SELECT COUNT(ID) FROM $wpdb->users – it's the size of the table that matters then)

In general, I love that WordPress is sympathetic to new developers and those who might have done it wrong in the past and never fixed it. If avoiding a function name collision so that we don't brick their sites is easy, then why not do it? Using something like wp_get_user_count has the added benefit of fitting in with other WP function names.

#40 @spacedmonkey
18 months ago

I agree. Here is what I think we should do.

We should deprecate get_user_count as it currently allows for network_id to be passed. It is confusing in a single site context. We should remove all references (there is only 4 in code).

Introduce wp_get_user_count function, that doesn't have any params on it. This fixes the naming issues and is a valid reason to get rid of the get_user_count.

When I get a minute, I will update the patch.

#41 @flixos90
18 months ago

That is an argument I agree with. :)

Before we deprecate, I'd like to know though how often this function is used by plugins. We probably wanna be kind if tons of plugins use it.
And one aside, if we deprecate get_user_count(), we should as well consider deprecating get_blog_count() as this should then rather be named get_network_site_count() (contrary to get_user_count(), the current get_blog_count() actually works per network). This would not be part of this ticket, but something to be aware of and think about.

#42 @spacedmonkey
17 months ago

As of 38741.9.diff there are the following changes

  • Added wp_get_user_count function, with the logic contained in get_user_count but no network id param
  • Deprecated get_user_count
  • Deprecated $network_id in wp_is_large_network as unable to pass to get_user_count as wp_get_user_count now in use.
  • Removed all references to get_user_count in code.

#43 @macbookandrew
16 months ago

We need this for a site we’re currently working on. I manually applied 38741.9.diff to WP core and it seems to be working great.

Is there anything I can do to help with this ticket?

@macbookandrew
16 months ago

Adds another fix for bulk-editing of posts

#44 follow-up: @macbookandrew
16 months ago

Found another place where a large number of users is a performance hit: wp-admin/includes/class-wp-posts-list-table.php calls wp_dropdown_users when generating the list of authors for bulk edit mode.

38741.10.diff includes a fix for that.

Is it ok to include here or should I open a separate ticket instead?

This ticket was mentioned in Slack in #core by macbookandrew. View the logs.


16 months ago

#46 in reply to: ↑ 44 @spacedmonkey
16 months ago

Replying to macbookandrew:

Found another place where a large number of users is a performance hit: wp-admin/includes/class-wp-posts-list-table.php calls wp_dropdown_users when generating the list of authors for bulk edit mode.

38741.10.diff includes a fix for that.

Is it ok to include here or should I open a separate ticket instead?

Thanks for the input @macbookandrew . However, I believe that the issues with should addressed in another tickets. For a little history have a look at this ticket #40613 that would likely fix issues wp_dropdown_users.

#48 @johnbillion
13 months ago

  • Milestone changed from 5.0 to 5.1

#49 @pento
10 months ago

  • Milestone changed from 5.1 to 5.2

This ticket was mentioned in Slack in #core-multisite by spacedmonkey. View the logs.


10 months ago

#51 @desrosj
8 months ago

  • Milestone changed from 5.2 to 5.3

This has not seen any action in several months, and is marked early. Since we are late in the 5.2 timeline, going to punt to 5.3.

@macbookandrew
5 months ago

patch file for 5.2.1

#52 @Mista-Flo
5 months ago

Patch looks good to me, great job @macbookandrew :)

#53 @johnjamesjacoby
2 months ago

Coming in blind to this issue, the proposed solutions seem somewhat compromised and inconsistent. Multisite is already a mess of function names, and these new ones don't provide much in the way of obviousness or clarity.

I agree with @flixos90, and do not like wp_is_large_user_count() as a function name - its intended purpose is not immediately clear without the additional context in the comments in this ticket, especially without corresponding wp_is_large_site_count() and wp_is_large_network_count() functions to complete the use-case coverage.

I'm also reluctant to deprecate the $network_id parameter in wp_is_large_network() because its intended purpose is to determine whether or not any specific network is large, either by number of sites or users.

(_network_ functions should be doing Network things, just like _site_ functions (usually) do Site things.)

I also do not agree that not passing a $network_id (or using it when in single-site) is wrong here. In either case, get_current_blog_id() and get_current_network_id() should be trusted to do their jobs. If they can't be, they should be updated to allow them to be.

I suggest we introduce a new is_large_install() function (to use for both single-site and multi-site) because that appears to be the metric we are trying to gauge here. That would also partner up well with is_subdomain_install() and friends.


Regarding how the counts themselves are updated, Comments sets a precedent (as do Term Relationships) with how they defer their counts using wp_defer_comment_counting(), wp_update_comment_count(), and wp_update_comment_count_now(), and I believe their approaches are as sound as any.

They include queries, caches, and persistent values in the database, all relative to their object hierarchy.

I suggest we go that route for all of these counts instead of reinventing this wheel.

#54 @jb510
2 months ago

I found this looking into why count_users() was causing many pages in the admin to drag (and in this case only 6000 users). It's not just a problem on wp-admin/users.php.

+1 on sane function naming, but I find @johnjamesjacoby's suggestion confusing as well. I'm not clear what is_large_install() would specifically check for? Everything? Why? Does it still have functions to check for individual things being large?

We have very large sites in terms of posts, in terms of comments, in terms of users, in terms of media assets and probably other things ... and they don't generally coincide.

It doesn't make sense to me to short circuit comment counts just because you have a lot of users, or vice versa, which I presume is_large_install would do.

#55 @johnjamesjacoby
2 months ago

@jb510 good point about my suggested function name. You’re right that it is neither more obvious nor clear. Oops. 🤣

Ideally, my second suggestion (related to using the Defer approach to counts) eliminates the slowness of functions like count_users() making “large thing” type functions wholly unnecessary (because the counts would all be inexpensive to retrieve from their now-persistent locations.)

Rather than introduce new “large thing” functions, maybe we can remove the one we have instead?

#56 @jb510
2 months ago

I love the idea of deferred counting and caching these types of things, but I've never seen "core" be receptive to that sort of approach.

Coming into this myself, fixing count_users() directly rather than adding an abstraction that results in inconsistent results (why do I see user counts on some sites and not on others?!?!) seems a better approach. It just may not be possible. Even deferred a query into a 250k row table is going to time out for a lot of folks.

#57 @davidbaumwald
8 weeks ago

  • Keywords 2nd-opinion added
  • Milestone changed from 5.3 to Future Release

This ticket needs some additional direction, a final patch, and testing. With version 5.3 Beta 1 tomorrow, this is being punted to Future Release. This the remaining issues can be resolved and a patch committed in time, please feel free to move it back.

Note: See TracTickets for help on using tickets.