Make WordPress Core

Opened 16 months ago

Last modified 62 minutes ago

#59516 accepted defect (bug)

Improve cache key generation in query classes

Reported by: spacedmonkey's profile spacedmonkey Owned by: pbearne's profile pbearne
Milestone: 6.8 Priority: normal
Severity: normal Version:
Component: Query Keywords: has-patch changes-requested early
Focuses: performance Cc:

Description

In query classes such as WP_Query, parameters like post_type enable the passing of an array of values. For instance, you can use post_type like this: ['post', 'page']. However, if a subsequent request is made with the array elements in a different order, like so: ['page', 'post'], it would lead to the generation of a different cache key, potentially causing the existing cache to be missed. Even though the cached results may be the same, this could result in a cache collision. It is advisable to reuse caches whenever possible to optimize performance.

This would effect the following cache keys.

  • WP_Query
  • WP_Term_Query
  • WP_Network_Query
  • WP_Site_Query
  • WP_Comment_Query

Some sort of array sorting of the key values would mean that caches could be reused.

Change History (33)

#1 @swissspidy
16 months ago

Is this a duplicate of #59492? Can we merge the tickets? cc @thekt12

#2 @thekt12
16 months ago

@swissspidy Yes, it's a subset of this ticket.

This ticket was mentioned in Slack in #core-performance by thekt12. View the logs.


16 months ago

#4 @spacedmonkey
16 months ago

#59593 was marked as a duplicate.

This ticket was mentioned in Slack in #core-performance by mukeshpanchal27. View the logs.


16 months ago

#6 @joemcgill
16 months ago

#59492 was marked as a duplicate.

#7 @joemcgill
16 months ago

  • Keywords needs-patch added
  • Owner set to thekt12
  • Status changed from new to assigned

Initial exploration into fixing this for the WP_Query class was started in #59492. I've closed as a duplicate so we can keep discussion for all of the query classes in one place. I'm assigning this to @thekt12 for now, since he was already working on the initial implementation.

This ticket was mentioned in PR #5347 on WordPress/wordpress-develop by @thekt12.


16 months ago
#8

  • Keywords has-patch has-unit-tests added; needs-patch removed

This ticket was mentioned in Slack in #core-performance by thekt12. View the logs.


15 months ago

#10 @thekt12
15 months ago

Findings:

WP_Query

Inside WP_Query::get_posts(), cache key is generated using
generate_cache_key( $q, $new_request );; where $q is query args and $new_request is the sql that is formed, (check code here)

$cache_key   = $this->generate_cache_key( $q, $new_request );

Setting of cache for the $cache_key happens at 3 ( Case1, Case2, Case3 ) places based on different conditions, all of which consumes the same $cache_key generated above and there is no modification in between.

For Case1 and Case2, there is a slight modification to the output obtained from the query based on the value set in $q['fields'] . In current scenario $q['fields']=='ID' and $q['fields']=='ids' can have slightly different cache values as former doesn’t go through the following condition which has a array_map( 'intval', $this->posts ); (this seems to be a minor bug)

In both Case1 and Case2 there is no filter in between key generation and cache setting which can modify the output base on any other values from args/$q.

However, for Case3 there is a possibility that the logic may enter the split_the_query logic ( if there are less than 500 posts and we split the query to get ids of those 500 posts and fetch them manually and prime them). The split_the_query condition contains a filter posts_request_ids that could modify the request post key generation, this is a bug that is addressed in (Ticket#59661) PR#5513. This PR introduces a new arg variable posts_request_ids_filters which also affects what is cached.

What can be removed?

After Case3, the variable $cache_key is not in use anywhere indicating that the only args that must be used for cache key generation is $q['fields'] and $q['posts_request_ids_filters'](introduced in PR#5513).

We can thereby remove all the query forming (except fields) and after cache decision making (except 'posts_request_ids_filters') ( which is pretty much everything mentioned here )

With respect to Third Parties, $args can only be used to cache the same values to different keys (which has no use IMO) or to skip most of the logic inside get_posts() like incase of VIP Enterprise Search (link), where a new es is used to decide if the query must be sent to elastic search or not.

WP_Term_Query

For term_queries, $args['fields'], $args['object_ids'], $args['pad_counts'], $args['hide_empty'] are the only fields that can affect caching. All other variables can be avoided from cache key generation. Like WP_Query there is no filter that third parties can use to modify what is being cached after cache key generation.

WP_Network_Query

The key is entirely based on args, however none of the args affect the outcome post query is fired. So we could modify the code to be more like WP_Query with a separate function for generate_key based on the request and args or just the args. (code)

WP_Site_Query

Same as WP_Network_Query. The code needs to be modified to use final query and not args. (code)

WP_Comment_Query

Same like WP_Network_Query and WP_Site_Query (code)

Proposal

Step 1: Standardize code
Standardize WP_Network_Query, WP_Site_Query, WP_Comment_Query to use generate_cache_key function the same way WP_Query does.
Note: There is one performance downgrade here. The way caching is implemented in all 3 class above it skips the request generation part entirely if cache is present.
However, the upside of implementing these change is, cache will be based on the final request making it better at handling similar queries generated via different args setup.
For e.g Caching in WP_Network_Query skips $network_ids = $this->get_network_ids(); which is logic intensive.

Step 2: Use only those key from
We should only pass the args (filter it before) to generate_cache_key() for cache key generation that effects the way cache is done (Detailed in the Findings section above).
For filtering have a array of filed that needs to be considered in an array and only scrap out those args from the main args.
For e.g $key_to_consider = array( ‘fields’ ) for WP_Query, we can provide a filter which will allow third parties to add more keys that should be considered, extra care must be given not to unset key from the original $args.

Step 3: Normalize and save back to main $args or (to the clone)
Basic normalization must be done before query execution and saved to original args->
Some examples->
‘post_name’ and other fields that contains that contain single array needs to be converted to string.

Step 4: Move generate_cache_key to utility function as it will be more like a repeated piece of code.

Version 4, edited 15 months ago by thekt12 (previous) (next) (diff)

This ticket was mentioned in Slack in #core-performance by thekt12. View the logs.


15 months ago

This ticket was mentioned in Slack in #core-performance by spacedmonkey. View the logs.


6 months ago

#13 @spacedmonkey
6 months ago

  • Milestone changed from Future Release to 6.7

This ticket was mentioned in Slack in #core-performance by mukeshpanchal27. View the logs.


6 months ago

#15 @pbearne
6 months ago

  • Owner changed from thekt12 to pbearne
  • Status changed from assigned to accepted

#16 @pbearne
4 months ago

Will the patch we have do for this ticket, or do I need to create a new one?

This ticket was mentioned in Slack in #core-performance by mukeshpanchal27. View the logs.


4 months ago

#18 @mukesh27
4 months ago

  • Keywords changes-requested added

#19 @pbearne
4 months ago

I can confirm the patch still works

This ticket was mentioned in Slack in #core-performance by mukeshpanchal27. View the logs.


4 months ago

#22 @peterwilsoncc
4 months ago

  • Keywords early added
  • Milestone changed from 6.7 to 6.8

As WP_Query is somewhat fundamental to WordPress, I've added the early keyword and moved this to the next release milestone.

I've created an experimental pull request that sorts various items that are passed as an array to increase cache hits. It will need additional tests.

This ticket was mentioned in Slack in #core-performance by joemcgill. View the logs.


6 weeks ago

#24 @joemcgill
6 weeks ago

  • Keywords has-unit-tests removed

Thanks for the alternative idea here, @peterwilsoncc. Can you provide guidance for any other contributors about what kinds of tests you would like to see added?

@thekt12 commented on PR #7497:


6 weeks ago
#25

@peterwilsoncc I was concerned that sorting might effect the order of the output too, looks like it doesn't. Can we also ensure, https://core.trac.wordpress.org/ticket/59492#comment:3 is handled (if not), please. If there is only one element in the array it should be converted into string. I remember getting that scenario from a live site, but I can't recall the details.

@peterwilsoncc commented on PR #7497:


6 weeks ago
#26

I was concerned that sorting might effect the order of the output too, looks like it doesn't.

The order should only have an effect for items that can be used for the orderby clause, for example post__in. It's fine to sort for the where clause so the key differs only in the instances where the item is used for ordering.

Can we also ensure, https://core.trac.wordpress.org/ticket/59492#comment:3 is handled (if not), please. If there is only one element in the array it should be converted into string. I remember getting that scenario from a live site, but I can't recall the details.

My recollection is that in some instances the use of a string vs an array can result in a different where clause beyond just the use of = vs IN. At the very least, I know it is complicated so I'm trying to get something lighter touch in initially so it's possible to spend additional time comparing the effect of passing a string or an array.

This ticket was mentioned in Slack in #core by audrasjb. View the logs.


10 days ago

#28 @flixos90
8 days ago

With the initial feedback and after taking a brief look at the experimental PR https://github.com/WordPress/wordpress-develop/pull/7497, I think this is worth exploring further. Would you be up for expanding your PR beyond the draft stage @peterwilsoncc? I'm happy to review it once ready for review.

Note that with this marked as early, we'll probably want to get this in soon, otherwise it might be safer to punt it to 6.9.

#29 @spacedmonkey
6 days ago

In my mind, the things I was looking for to improve the cache key generation here.

  • Sorting of array keys.
  • Removing of null values that would not effect query.
  • Values of array that can be a string with a comma or an array, all to be an array.
  • Values that can be int or string, all to be one type, ie string.
  • Sorting of values that are arrays, where it would not effect the query. Example, post types in WP_Query, but not postsin, as this value can be used to order the query.

Not sure if we should break this up input one ticket per query. The logic would be the same. We might even want some shared util functions for this.

#30 @peterwilsoncc
27 hours ago

I've marked the pull request as ready for review.

For arguments that can be passed as an array, it sorts them where possible.

For *__in arguments that can be used for the orderby clause, it sorts the value for the purpose of generating the WHERE clause but not the ORDERBY clause.

In cases where the code base takes a different path depending on whether the argument is passed as an array or string, I've sorted the array version but have not converted the string to an array. This mainly effects the post type and status as they have some interesting code that I would rather not mess with.

The generated cache key is now stored as a private property on WP_Query to allow for unit tests to compare the generated keys after normalization. Otherwise the method would need to normalize the data exclusively for the purpose of the unit tests.

#31 @spacedmonkey
2 hours ago

@peterwilsoncc I noticed that your PR only updates WP_Query. This ticket is written to mentioned all query classes, such as WP_Term_Query. Will this be done as part of your PR / this ticket or follow up PRs or tickets?

#32 follow-up: @peterwilsoncc
63 minutes ago

@spacedmonkey I think it better to improve each class individually on separate tickets. That will allow folks to commit improvements as the code is ready rather than having to wait for all the query classes to be covered.

#33 in reply to: ↑ 32 @spacedmonkey
62 minutes ago

Replying to peterwilsoncc:

@spacedmonkey I think it better to improve each class individually on separate tickets. That will allow folks to commit improvements as the code is ready rather than having to wait for all the query classes to be covered.

Do you mind updating the description / title of this ticket?

Note: See TracTickets for help on using tickets.