Opened 7 years ago
Last modified 6 years ago
#42252 assigned enhancement
Use more granular caching for common queries in `WP_Site_Query`
Reported by: | flixos90 | Owned by: | flixos90 |
---|---|---|---|
Milestone: | Future Release | Priority: | normal |
Severity: | normal | Version: | |
Component: | Networks and Sites | Keywords: | has-patch has-unit-tests |
Focuses: | multisite | Cc: |
Description
The new function get_site_by()
that should serve as a modern replacement for get_blog_details()
ended up not making it into 4.9, because it had several drawbacks of the latter related to caching, particularly caused by the caches used in WP_Site_Query
being invalidated on any update to any site, which makes caching almost useless in a very big and active setup.
Better handling of caching specific queries that are common should be explored, particularly for queries which can be invalidated only by invalidating the caches for a specific site instead of relying on the aggressively invalidated last_changed
. Let's look at how some of the old caches like blog-lookup
or blog-id-cache
work, and apply that to the modern behavior of WP_Site_Query
.
While mentioning get_site_by()
above, the changes should preferably be made to WP_Site_Query
to benefit any other multisite area making such queries as well.
Attachments (8)
Change History (32)
This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.
7 years ago
This ticket was mentioned in Slack in #core-multisite by spacedmonkey. View the logs.
7 years ago
#4
@
7 years ago
In 42252.2.diff I went a completely different direction with this.
So looking at the type of queries that WP_Site_Query is used for, these can be broken into 2, queries for a list of sites or query for a single site by domain / path / network_id. In my patch, it detects if query is for one site and uses different caching rules. If you are looking for a single site, then generate a cache key based on a domain/path/network_id. The cache lives forever, until it is invalidated in clean_blog_cache
. This cache will have some massive improvements to profile of WordPress, but cahce invalidation will be the trick here.
I think if my patch in some form goes into core, it will have to go with #42284 to see the benefits.
This ticket was mentioned in Slack in #core-multisite by spacedmonkey. View the logs.
7 years ago
#6
@
7 years ago
42252.3.diff is a improves on @flixos90 's original patch. Reviewing this again, I believe the he is on the right track. Improvements include.
- Use
get_current_network_id
in clean_blog_cache to guess, the site_id of site object. It is a reasonable guess, you are cleaning the cache of current network. - Caches are generated for, following combos,
- domain
- path
- network_id
- domain + path
- domain + path + network_id
- Cache key salt uses arrays, to stop issues with site paths that are numbers
- Check to see if multi value isset, to stop notice error.
This ticket was mentioned in Slack in #core-multisite by jeremyfelt. View the logs.
7 years ago
#8
@
7 years ago
- Milestone changed from Awaiting Review to 5.0
This is something we should focus on early in 5.0.
#10
@
7 years ago
- Keywords has-patch added; needs-patch removed
- Owner set to flixos90
- Status changed from new to assigned
I like the more dynamic approach for the creation of the $last_changed_prefix
that @spacedmonkey suggested in his updates, and having support for $network_id
as its own "group" totally makes sense.
In 42252.5.diff I made the following changes:
- The logic for creating a special
last_changed
prefix is a bit more flexible: Now any kind of permutation of thedomain
,path
andnetwork_id
arguments or subset (as sole arguments of course) will determine a special prefix. By iterating through the arguments in a predefined order, it is ensured that there is only one key for the same argument values. - The logic in
clean_blog_cache()
has been adjusted so that it clears all possibly relevantlast_changed
caches. The most complicated part to figure out here was how to reliably create all relevant subsets to clear, but eventually I found https://gist.github.com/christophervalles/1066801 to help me out. - Both the logic in
WP_Site_Query::get_cache_key()
andclean_blog_cache()
ensure that all relevant values before serializing are parsed into strings (in case they aren't already). This is necessary to ensure that they can be reliably cleared (think about for example$network_id
which may be a string or integer). - The
$ignore_args
array (containing arguments which are irrelevant for determining alast_changed
value) is brought back, any particular concerns why you stripped that out in your patch @spacedmonkey?
With this current behavior, the only regarding the arguments we'd need to worry about in the future is to keep them in sync between WP_Site_Query::get_cache_key()
and clean_blog_cache()
.
Still much needed are unit tests, which I will work on in the next week or two. Let's of course keep continuing to discuss the approach and patch in the meantime.
This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.
7 years ago
This ticket was mentioned in Slack in #core-multisite by jeremyfelt. View the logs.
7 years ago
This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.
7 years ago
This ticket was mentioned in Slack in #core-multisite by flixos90. View the logs.
7 years ago
#15
@
7 years ago
42252.7.diff applies cleanly against trunk after the coding standards commit, and also fixes a bug in WP_Site_Query::get_cache_key()
where a temporary loop variable would accidentally override another variable used later on, previously causing two tests to fail.
#16
@
7 years ago
- Keywords has-unit-tests added; needs-unit-tests removed
42252.8.diff adds tests for the different caches.
It essentially adds only one test, however it uses systematically generated datasets for it, so that all cases of domain/path/network ID combinations are covered, making it sufficient for the change. In addition, each set is tested with both a random site field updated and then again with a field updated that is used as orderby
in the query, to ensure those fields cause a change accordingly.
You will notice that some existing tests have been modified. However this is only to account for the three new sites that have been added in wpSetUpBeforeClass()
and are used for the new test.
Missing are tests that ensure that when one of the special fields is changed on a site, the caches for both the new and old value are cleared. However those would currently fail anyway since WordPress doesn't have logic for this in place. #40364 implements that logic, therefore I suggest we merge that one first (when it's ready), and then proceed with this one here.
42252.diff implements several caching improvements, some major, some minor. All of them are part of a new
WP_Site_Query::get_cache_key( $query_args )
method. The goal of these changes is to bring the same granular handling that old caches such asblog-lookup
use toWP_Site_Query
in general and thus bring this solid foundation to any code that usesget_sites()
.The most significant improvement and also kind of new concept introduces in the patch is that there is no longer just one
last_changed
key in thesites
group. Instead there are several morelast_changed
keys, prefixed with md5-generated values from specific query arguments. There are currently four sets of arguments that are handled in a special way:domain
,path
andnetwork_id
domain
andpath
domain
path
If a query contains all arguments for one of the above bullet point without including any other "database field related" argument, a special
last_changed
key will be used which is generated by passing the concatenated values for the respective arguments (only the arguments of the respective bullet point above) through md5. The term "database field related" here means query arguments that deal with a specific database field, such asdomain
,path
,ID
,site__in
,lang__in
,public
, to give only a few examples. Contrary, arguments that are not database field related would be for examplenumber
,offset
,fields
,no_found_rows
,orderby
. The latter of course do affect the outcome of a query - however they do not affect when the cache for a query needs to be invalidated (via alast_changed
cache value).While the logic is rather complex (and we may surely be able to improve it!), this change brings a significant improvement to these specific queries which are rather common. Notably, the technique of using more granular
last_changed
values provides much more flexibility than something like the existingblog-lookup
cache, which only stores one result for a specific domain and path.For example, both of the following sets would use
md5( 'example.com' ) . '_last_changed'
for their last changed key:Set 1:
domain
:example.com
number
:1
Set 2:
domain
:example.com
number
:100
orderby
:path
order
:DESC
Another example, both of the following sets would use
md5( 'example.com/foo/3' ) . '_last_changed'
for their last changed key:Set 1:
network_id
:3
domain
:example.com
path
:/foo/
number
:1
Set 2:
network_id
:3
domain
:example.com
path
:/foo/
number
:5
offset
:5
no_found_rows
:false
These examples show the flexibility of this approach while not being subject to aggressive cache invalidation. To explain a bit further, the second example's cache would only be invalidated if a site with either network ID
3
, domainexample.com
or path/foo/
was created/updated/deleted (see the change inclean_blog_cache()
for that part specifically).Concluding, I think this approach gives us the possibility to for example get
get_site_by()
merged as it currently is, as simply by making these improvements toWP_Site_Query
it would have caching that is better than that inget_blog_details()
. Furthermore, this will obviously benefit any query usingget_sites()
, as said before.The code for this is admittedly rather complex, so don't hesitate to ask question or tear it apart. This is a new pattern for caching query results, so I hope we can get it to a solid state. Once we get there,
WP_Network_Query
(and possibly even non-multisite query classes) could be enhanced in the same way. Nicely enough, all existing unit tests passed for me with these changes. Of course we'll also need a bunch of tests for this behavior specifically, which I'll add later.One last thing: Lines 707-724 of
class-wp-site-query.php
in the patch implement what we've discussed today during office-hours (plus I ignoreupdate_site_cache
completely, since it doesn't affect the query at all). I consider this a minor improvement and wanted to do it at the same time, but we could also take this out for now to make the patch better readable and focus on the more important things first.