Opened 16 months ago
Last modified 62 minutes ago
#59516 accepted defect (bug)
Improve cache key generation in query classes
Reported by: | spacedmonkey | Owned by: | pbearne |
---|---|---|---|
Milestone: | 6.8 | Priority: | normal |
Severity: | normal | Version: | |
Component: | Query | Keywords: | has-patch changes-requested early |
Focuses: | performance | Cc: |
Description
In query classes such as WP_Query, parameters like post_type enable the passing of an array of values. For instance, you can use post_type like this: ['post', 'page']. However, if a subsequent request is made with the array elements in a different order, like so: ['page', 'post'], it would lead to the generation of a different cache key, potentially causing the existing cache to be missed. Even though the cached results may be the same, this could result in a cache collision. It is advisable to reuse caches whenever possible to optimize performance.
This would effect the following cache keys.
- WP_Query
- WP_Term_Query
- WP_Network_Query
- WP_Site_Query
- WP_Comment_Query
Some sort of array sorting of the key values would mean that caches could be reused.
Change History (33)
This ticket was mentioned in Slack in #core-performance by thekt12. View the logs.
16 months ago
This ticket was mentioned in Slack in #core-performance by mukeshpanchal27. View the logs.
16 months ago
#7
@
16 months ago
- Keywords needs-patch added
- Owner set to thekt12
- Status changed from new to assigned
Initial exploration into fixing this for the WP_Query
class was started in #59492. I've closed as a duplicate so we can keep discussion for all of the query classes in one place. I'm assigning this to @thekt12 for now, since he was already working on the initial implementation.
This ticket was mentioned in PR #5347 on WordPress/wordpress-develop by @thekt12.
16 months ago
#8
- Keywords has-patch has-unit-tests added; needs-patch removed
Trac ticket: https://core.trac.wordpress.org/ticket/59516
This ticket was mentioned in Slack in #core-performance by thekt12. View the logs.
15 months ago
#10
@
15 months ago
Findings:
WP_Query
Inside WP_Query::get_posts(), cache key is generated using
generate_cache_key( $q, $new_request );
; where $q
is query args and $new_request
is the sql that is formed, (check code here)
$cache_key = $this->generate_cache_key( $q, $new_request );
Setting of cache for the $cache_key happens at 3 ( Case1, Case2, Case3 ) places based on different conditions, all of which consumes the same $cache_key generated above and there is no modification in between.
For Case1 and Case2, there is a slight modification to the output obtained from the query based on the value set in $q['fields']
. In current scenario $q['fields']=='ID'
and $q['fields']=='ids'
can have slightly different cache values as former doesn’t go through the following condition which has a array_map( 'intval', $this->posts );
(this seems to be a minor bug)
In both Case1 and Case2 there is no filter in between key generation and cache setting which can modify the output base on any other values from args/$q.
However, for Case3 there is a possibility that the logic may enter the split_the_query
logic ( if there are less than 500 posts and we split the query to get ids of those 500 posts and fetch them manually and prime them). The split_the_query
condition contains a filter posts_request_ids
that could modify the request post key generation, this is a bug that is addressed in (Ticket#59661) PR#5513. This PR introduces a new arg variable posts_request_ids_filters
which also affects what is cached.
What can be removed?
After Case3, the variable $cache_key is not in use anywhere indicating that the only args that must be used for cache key generation is $q['fields']
and $q['posts_request_ids_filters']
(introduced in PR#5513).
We can thereby remove all the query forming (except fields) and after cache decision making (except 'posts_request_ids_filters') ( which is pretty much everything mentioned here )
With respect to Third Parties, $args can only be used to cache the same values to different keys (which has no use IMO) or to skip most of the logic inside get_posts() like incase of VIP Enterprise Search (link), where a new es
is used to decide if the query must be sent to elastic search or not.
WP_Term_Query
For term_queries, $args['fields']
, $args['object_ids']
, $args['pad_counts']
, $args['hide_empty']
are the only fields that can affect caching. All other variables can be avoided from cache key generation. Like WP_Query there is no filter that third parties can use to modify what is being cached after cache key generation.
WP_Network_Query
The key is entirely based on args, however none of the args affect the outcome post query is fired. So we could modify the code to be more like WP_Query with a separate function for generate_key based on the request and args or just the args. (code)
WP_Site_Query
Same as WP_Network_Query. The code needs to be modified to use final query and not args. (code)
WP_Comment_Query
Same like WP_Network_Query and WP_Site_Query (code)
Proposal
Step 1: Standardize code
Standardize WP_Network_Query, WP_Site_Query, WP_Comment_Query to use generate_cache_key function the same way WP_Query does.
Note: There is one performance downgrade here. The way caching is implemented in all 3 class above it skips the request generation part entirely if cache is present.
However, the upside of implementing these change is, cache will be based on the final request making it better at handling similar queries generated via different args setup.
For e.g Caching in WP_Network_Query skips $network_ids = $this->get_network_ids(); which is logic intensive.
Step 2: Use only those key from
We should only pass the args (filter it before) to generate_cache_key()
for cache key generation that effects the way cache is done (Detailed in the Findings section above).
For filtering have a array of filed that needs to be considered in an array and only scrap out those args from the main args.
For e.g $key_to_consider = array( ‘fields’ )
for WP_Query, we can provide a filter which will allow third parties to add more keys that should be considered, extra care must be given not to unset key from the original $args
.
Step 3: Normalize and save back to main $args or (to the clone)
Basic normalization must be done before query execution and saved to original args->
Some examples->
‘post_name’ and other fields that contains that contain single array needs to be converted to string.
Step 4: Move generate_cache_key to utility function as it will be more like a repeated piece of code.
This ticket was mentioned in Slack in #core-performance by thekt12. View the logs.
15 months ago
This ticket was mentioned in Slack in #core-performance by spacedmonkey. View the logs.
6 months ago
This ticket was mentioned in Slack in #core-performance by mukeshpanchal27. View the logs.
6 months ago
This ticket was mentioned in Slack in #core-performance by mukeshpanchal27. View the logs.
4 months ago
This ticket was mentioned in Slack in #core-performance by mukeshpanchal27. View the logs.
4 months ago
This ticket was mentioned in PR #7497 on WordPress/wordpress-develop by @peterwilsoncc.
4 months ago
#21
Lighter touch alternative to https://github.com/WordPress/wordpress-develop/pull/5347
Trac ticket: https://core.trac.wordpress.org/ticket/59516
#22
@
4 months ago
- Keywords early added
- Milestone changed from 6.7 to 6.8
As WP_Query
is somewhat fundamental to WordPress, I've added the early
keyword and moved this to the next release milestone.
I've created an experimental pull request that sorts various items that are passed as an array to increase cache hits. It will need additional tests.
This ticket was mentioned in Slack in #core-performance by joemcgill. View the logs.
6 weeks ago
#24
@
6 weeks ago
- Keywords has-unit-tests removed
Thanks for the alternative idea here, @peterwilsoncc. Can you provide guidance for any other contributors about what kinds of tests you would like to see added?
6 weeks ago
#25
@peterwilsoncc I was concerned that sorting might effect the order of the output too, looks like it doesn't. Can we also ensure, https://core.trac.wordpress.org/ticket/59492#comment:3 is handled (if not), please. If there is only one element in the array it should be converted into string. I remember getting that scenario from a live site, but I can't recall the details.
@peterwilsoncc commented on PR #7497:
6 weeks ago
#26
I was concerned that sorting might effect the order of the output too, looks like it doesn't.
The order should only have an effect for items that can be used for the orderby clause, for example post__in
. It's fine to sort for the where clause so the key differs only in the instances where the item is used for ordering.
Can we also ensure, https://core.trac.wordpress.org/ticket/59492#comment:3 is handled (if not), please. If there is only one element in the array it should be converted into string. I remember getting that scenario from a live site, but I can't recall the details.
My recollection is that in some instances the use of a string vs an array can result in a different where clause beyond just the use of =
vs IN
. At the very least, I know it is complicated so I'm trying to get something lighter touch in initially so it's possible to spend additional time comparing the effect of passing a string or an array.
This ticket was mentioned in Slack in #core by audrasjb. View the logs.
10 days ago
#28
@
8 days ago
With the initial feedback and after taking a brief look at the experimental PR https://github.com/WordPress/wordpress-develop/pull/7497, I think this is worth exploring further. Would you be up for expanding your PR beyond the draft stage @peterwilsoncc? I'm happy to review it once ready for review.
Note that with this marked as early
, we'll probably want to get this in soon, otherwise it might be safer to punt it to 6.9.
#29
@
6 days ago
In my mind, the things I was looking for to improve the cache key generation here.
- Sorting of array keys.
- Removing of null values that would not effect query.
- Values of array that can be a string with a comma or an array, all to be an array.
- Values that can be int or string, all to be one type, ie string.
- Sorting of values that are arrays, where it would not effect the query. Example, post types in WP_Query, but not postsin, as this value can be used to order the query.
Not sure if we should break this up input one ticket per query. The logic would be the same. We might even want some shared util functions for this.
#30
@
27 hours ago
I've marked the pull request as ready for review.
For arguments that can be passed as an array, it sorts them where possible.
For *__in
arguments that can be used for the orderby clause, it sorts the value for the purpose of generating the WHERE clause but not the ORDERBY clause.
In cases where the code base takes a different path depending on whether the argument is passed as an array or string, I've sorted the array version but have not converted the string to an array. This mainly effects the post type and status as they have some interesting code that I would rather not mess with.
The generated cache key is now stored as a private property on WP_Query
to allow for unit tests to compare the generated keys after normalization. Otherwise the method would need to normalize the data exclusively for the purpose of the unit tests.
#31
@
2 hours ago
@peterwilsoncc I noticed that your PR only updates WP_Query. This ticket is written to mentioned all query classes, such as WP_Term_Query. Will this be done as part of your PR / this ticket or follow up PRs or tickets?
#32
follow-up:
↓ 33
@
63 minutes ago
@spacedmonkey I think it better to improve each class individually on separate tickets. That will allow folks to commit improvements as the code is ready rather than having to wait for all the query classes to be covered.
#33
in reply to:
↑ 32
@
62 minutes ago
Replying to peterwilsoncc:
@spacedmonkey I think it better to improve each class individually on separate tickets. That will allow folks to commit improvements as the code is ready rather than having to wait for all the query classes to be covered.
Do you mind updating the description / title of this ticket?
Is this a duplicate of #59492? Can we merge the tickets? cc @thekt12