WordPress.org

Make WordPress Core

Opened 5 weeks ago

Last modified 3 days ago

#42252 assigned enhancement

Use more granular caching for common queries in `WP_Site_Query`

Reported by: flixos90 Owned by: flixos90
Milestone: 5.0 Priority: normal
Severity: normal Version:
Component: Networks and Sites Keywords: needs-unit-tests has-patch
Focuses: multisite Cc:

Description

The new function get_site_by() that should serve as a modern replacement for get_blog_details() ended up not making it into 4.9, because it had several drawbacks of the latter related to caching, particularly caused by the caches used in WP_Site_Query being invalidated on any update to any site, which makes caching almost useless in a very big and active setup.

Better handling of caching specific queries that are common should be explored, particularly for queries which can be invalidated only by invalidating the caches for a specific site instead of relying on the aggressively invalidated last_changed. Let's look at how some of the old caches like blog-lookup or blog-id-cache work, and apply that to the modern behavior of WP_Site_Query.

While mentioning get_site_by() above, the changes should preferably be made to WP_Site_Query to benefit any other multisite area making such queries as well.

Attachments (5)

42252.diff (6.0 KB) - added by flixos90 5 weeks ago.
42252.2.diff (3.6 KB) - added by spacedmonkey 5 weeks ago.
42252.3.diff (6.2 KB) - added by spacedmonkey 3 weeks ago.
42252.4.diff (6.4 KB) - added by spacedmonkey 2 weeks ago.
42252.5.diff (6.3 KB) - added by flixos90 7 days ago.

Download all attachments as: .zip

Change History (18)

@flixos90
5 weeks ago

#1 @flixos90
5 weeks ago

42252.diff implements several caching improvements, some major, some minor. All of them are part of a new WP_Site_Query::get_cache_key( $query_args ) method. The goal of these changes is to bring the same granular handling that old caches such as blog-lookup use to WP_Site_Query in general and thus bring this solid foundation to any code that uses get_sites().

The most significant improvement and also kind of new concept introduces in the patch is that there is no longer just one last_changed key in the sites group. Instead there are several more last_changed keys, prefixed with md5-generated values from specific query arguments. There are currently four sets of arguments that are handled in a special way:

  • domain, path and network_id
  • domain and path
  • domain
  • path

If a query contains all arguments for one of the above bullet point without including any other "database field related" argument, a special last_changed key will be used which is generated by passing the concatenated values for the respective arguments (only the arguments of the respective bullet point above) through md5. The term "database field related" here means query arguments that deal with a specific database field, such as domain, path, ID, site__in, lang__in, public, to give only a few examples. Contrary, arguments that are not database field related would be for example number, offset, fields, no_found_rows, orderby. The latter of course do affect the outcome of a query - however they do not affect when the cache for a query needs to be invalidated (via a last_changed cache value).

While the logic is rather complex (and we may surely be able to improve it!), this change brings a significant improvement to these specific queries which are rather common. Notably, the technique of using more granular last_changed values provides much more flexibility than something like the existing blog-lookup cache, which only stores one result for a specific domain and path.

For example, both of the following sets would use md5( 'example.com' ) . '_last_changed' for their last changed key:

Set 1:

  • domain: example.com
  • number: 1

Set 2:

  • domain: example.com
  • number: 100
  • orderby: path
  • order: DESC

Another example, both of the following sets would use md5( 'example.com/foo/3' ) . '_last_changed' for their last changed key:

Set 1:

  • network_id: 3
  • domain: example.com
  • path: /foo/
  • number: 1

Set 2:

  • network_id: 3
  • domain: example.com
  • path: /foo/
  • number: 5
  • offset: 5
  • no_found_rows: false

These examples show the flexibility of this approach while not being subject to aggressive cache invalidation. To explain a bit further, the second example's cache would only be invalidated if a site with either network ID 3, domain example.com or path /foo/ was created/updated/deleted (see the change in clean_blog_cache() for that part specifically).

Concluding, I think this approach gives us the possibility to for example get get_site_by() merged as it currently is, as simply by making these improvements to WP_Site_Query it would have caching that is better than that in get_blog_details(). Furthermore, this will obviously benefit any query using get_sites(), as said before.

The code for this is admittedly rather complex, so don't hesitate to ask question or tear it apart. This is a new pattern for caching query results, so I hope we can get it to a solid state. Once we get there, WP_Network_Query (and possibly even non-multisite query classes) could be enhanced in the same way. Nicely enough, all existing unit tests passed for me with these changes. Of course we'll also need a bunch of tests for this behavior specifically, which I'll add later.

One last thing: Lines 707-724 of class-wp-site-query.php in the patch implement what we've discussed today during office-hours (plus I ignore update_site_cache completely, since it doesn't affect the query at all). I consider this a minor improvement and wanted to do it at the same time, but we could also take this out for now to make the patch better readable and focus on the more important things first.

This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.


5 weeks ago

This ticket was mentioned in Slack in #core-multisite by spacedmonkey. View the logs.


5 weeks ago

@spacedmonkey
5 weeks ago

#4 @spacedmonkey
5 weeks ago

In 42252.2.diff I went a completely different direction with this.

So looking at the type of queries that WP_Site_Query is used for, these can be broken into 2, queries for a list of sites or query for a single site by domain / path / network_id. In my patch, it detects if query is for one site and uses different caching rules. If you are looking for a single site, then generate a cache key based on a domain/path/network_id. The cache lives forever, until it is invalidated in clean_blog_cache. This cache will have some massive improvements to profile of WordPress, but cahce invalidation will be the trick here.

I think if my patch in some form goes into core, it will have to go with #42284 to see the benefits.

This ticket was mentioned in Slack in #core-multisite by spacedmonkey. View the logs.


4 weeks ago

@spacedmonkey
3 weeks ago

#6 @spacedmonkey
3 weeks ago

42252.3.diff is a improves on @flixos90 's original patch. Reviewing this again, I believe the he is on the right track. Improvements include.

  • Use get_current_network_id in clean_blog_cache to guess, the site_id of site object. It is a reasonable guess, you are cleaning the cache of current network.
  • Caches are generated for, following combos,
    • domain
    • path
    • network_id
    • domain + path
    • domain + path + network_id
    This should cover most types of queries.
  • Cache key salt uses arrays, to stop issues with site paths that are numbers
  • Check to see if multi value isset, to stop notice error.

This ticket was mentioned in Slack in #core-multisite by jeremyfelt. View the logs.


2 weeks ago

#8 @jeremyfelt
2 weeks ago

  • Milestone changed from Awaiting Review to 5.0

This is something we should focus on early in 5.0.

@spacedmonkey
2 weeks ago

#9 @spacedmonkey
2 weeks ago

42252.4.diff

Fixed wrong key used, in clean_blog_cache. site_id -> network_id.

#10 @flixos90
7 days ago

  • Keywords has-patch added; needs-patch removed
  • Owner set to flixos90
  • Status changed from new to assigned

I like the more dynamic approach for the creation of the $last_changed_prefix that @spacedmonkey suggested in his updates, and having support for $network_id as its own "group" totally makes sense.

In 42252.5.diff I made the following changes:

  • The logic for creating a special last_changed prefix is a bit more flexible: Now any kind of permutation of the domain, path and network_id arguments or subset (as sole arguments of course) will determine a special prefix. By iterating through the arguments in a predefined order, it is ensured that there is only one key for the same argument values.
  • The logic in clean_blog_cache() has been adjusted so that it clears all possibly relevant last_changed caches. The most complicated part to figure out here was how to reliably create all relevant subsets to clear, but eventually I found https://gist.github.com/christophervalles/1066801 to help me out.
  • Both the logic in WP_Site_Query::get_cache_key() and clean_blog_cache() ensure that all relevant values before serializing are parsed into strings (in case they aren't already). This is necessary to ensure that they can be reliably cleared (think about for example $network_id which may be a string or integer).
  • The $ignore_args array (containing arguments which are irrelevant for determining a last_changed value) is brought back, any particular concerns why you stripped that out in your patch @spacedmonkey?

With this current behavior, the only regarding the arguments we'd need to worry about in the future is to keep them in sync between WP_Site_Query::get_cache_key() and clean_blog_cache().

Still much needed are unit tests, which I will work on in the next week or two. Let's of course keep continuing to discuss the approach and patch in the meantime.

@flixos90
7 days ago

This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.


4 days ago

This ticket was mentioned in Slack in #core-multisite by jeremyfelt. View the logs.


3 days ago

This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.


3 days ago

Note: See TracTickets for help on using tickets.