Make WordPress Core

Opened 12 months ago

Last modified 7 days ago

#59516 accepted defect (bug)

Improve cache key generation in query classes

Reported by: spacedmonkey's profile spacedmonkey Owned by: pbearne's profile pbearne
Milestone: 6.7 Priority: normal
Severity: normal Version:
Component: Query Keywords: has-patch has-unit-tests changes-requested
Focuses: performance Cc:

Description

In query classes such as WP_Query, parameters like post_type enable the passing of an array of values. For instance, you can use post_type like this: ['post', 'page']. However, if a subsequent request is made with the array elements in a different order, like so: ['page', 'post'], it would lead to the generation of a different cache key, potentially causing the existing cache to be missed. Even though the cached results may be the same, this could result in a cache collision. It is advisable to reuse caches whenever possible to optimize performance.

This would effect the following cache keys.

  • WP_Query
  • WP_Term_Query
  • WP_Network_Query
  • WP_Site_Query
  • WP_Comment_Query

Some sort of array sorting of the key values would mean that caches could be reused.

Change History (19)

#1 @swissspidy
12 months ago

Is this a duplicate of #59492? Can we merge the tickets? cc @thekt12

#2 @thekt12
12 months ago

@swissspidy Yes, it's a subset of this ticket.

This ticket was mentioned in Slack in #core-performance by thekt12. View the logs.


12 months ago

#4 @spacedmonkey
12 months ago

#59593 was marked as a duplicate.

This ticket was mentioned in Slack in #core-performance by mukeshpanchal27. View the logs.


12 months ago

#6 @joemcgill
12 months ago

#59492 was marked as a duplicate.

#7 @joemcgill
12 months ago

  • Keywords needs-patch added
  • Owner set to thekt12
  • Status changed from new to assigned

Initial exploration into fixing this for the WP_Query class was started in #59492. I've closed as a duplicate so we can keep discussion for all of the query classes in one place. I'm assigning this to @thekt12 for now, since he was already working on the initial implementation.

This ticket was mentioned in PR #5347 on WordPress/wordpress-develop by @thekt12.


11 months ago
#8

  • Keywords has-patch has-unit-tests added; needs-patch removed

This ticket was mentioned in Slack in #core-performance by thekt12. View the logs.


11 months ago

#10 @thekt12
11 months ago

Findings:

WP_Query

Inside WP_Query::get_posts(), cache key is generated using
generate_cache_key( $q, $new_request );; where $q is query args and $new_request is the sql that is formed, (check code here)

$cache_key   = $this->generate_cache_key( $q, $new_request );

Setting of cache for the $cache_key happens at 3 ( Case1, Case2, Case3 ) places based on different conditions, all of which consumes the same $cache_key generated above and there is no modification in between.

For Case1 and Case2, there is a slight modification to the output obtained from the query based on the value set in $q['fields'] . In current scenario $q['fields']=='ID' and $q['fields']=='ids' can have slightly different cache values as former doesn’t go through the following condition which has a array_map( 'intval', $this->posts ); (this seems to be a minor bug)

In both Case1 and Case2 there is no filter in between key generation and cache setting which can modify the output base on any other values from args/$q.

However, for Case3 there is a possibility that the logic may enter the split_the_query logic ( if there are less than 500 posts and we split the query to get ids of those 500 posts and fetch them manually and prime them). The split_the_query condition contains a filter posts_request_ids that could modify the request post key generation, this is a bug that is addressed in (Ticket#59661) PR#5513. This PR introduces a new arg variable posts_request_ids_filters which also affects what is cached.

What can be removed?

After Case3, the variable $cache_key is not in use anywhere indicating that the only args that must be used for cache key generation is $q['fields'] and $q['posts_request_ids_filters'](introduced in PR#5513).

We can thereby remove all the query forming (except fields) and after cache decision making (except 'posts_request_ids_filters') ( which is pretty much everything mentioned here )

With respect to Third Parties, $args can only be used to cache the same values to different keys (which has no use IMO) or to skip most of the logic inside get_posts() like incase of VIP Enterprise Search (link), where a new es is used to decide if the query must be sent to elastic search or not.

WP_Term_Query

For term_queries, $args['fields'], $args['object_ids'], $args['pad_counts'], $args['hide_empty'] are the only fields that can affect caching. All other variables can be avoided from cache key generation. Like WP_Query there is no filter that third parties can use to modify what is being cached after cache key generation.

WP_Network_Query

The key is build using args only. However none of the args is used to modify the output after query is fired and before caching.
So we could modify the code to be more like WP_Query with a separate function for generate_key based on the request and args or just the args. (code)

WP_Site_Query

Same as WP_Network_Query. The code needs to be modified to use final query and not args. (code)

WP_Comment_Query

Same like WP_Network_Query and WP_Site_Query (code)

Proposal

Step 1: Standardize code
Standardize WP_Network_Query, WP_Site_Query, WP_Comment_Query to use generate_cache_key function the same way WP_Query does.
Note: There is one performance downgrade here. The way caching is implemented in all 3 class above it skips the request generation part entirely if cache is present.
However, the upside of implementing these change is, cache will be based on the final request making it better at handling similar queries generated via different args setup.
For e.g Caching in WP_Network_Query skips $network_ids = $this->get_network_ids(); which is logic intensive.

Step 2: Use only those key from args that effects caching post query is fired
We should only pass the args (filter it before) to generate_cache_key() for cache key generation that effects the way cache is done (Detailed in the Findings section above).
For filtering, we can have an array keys that needs to be considered in an array and only scrap out those keys from the main args ($q).
For e.g $key_to_consider = array( ‘fields’ ) for WP_Query, we can provide a filter which will allow third parties to add more keys that should be considered, extra care must be given not to unset key from the original $args.

Step 3: Normalize and save back to main $args or (to the clone)
Basic normalization must be done before query execution and saved to original args->
Some examples->
‘post_name’ and other fields that contains that contain single array needs to be converted to string.

Step 4: Move generate_cache_key to utility function as it will be more like a repeated piece of code.

Last edited 11 months ago by thekt12 (previous) (diff)

This ticket was mentioned in Slack in #core-performance by thekt12. View the logs.


11 months ago

This ticket was mentioned in Slack in #core-performance by spacedmonkey. View the logs.


7 weeks ago

#13 @spacedmonkey
7 weeks ago

  • Milestone changed from Future Release to 6.7

This ticket was mentioned in Slack in #core-performance by mukeshpanchal27. View the logs.


7 weeks ago

#15 @pbearne
7 weeks ago

  • Owner changed from thekt12 to pbearne
  • Status changed from assigned to accepted

#16 @pbearne
10 days ago

Will the patch we have do for this ticket, or do I need to create a new one?

This ticket was mentioned in Slack in #core-performance by mukeshpanchal27. View the logs.


9 days ago

#18 @mukesh27
9 days ago

  • Keywords changes-requested added

#19 @pbearne
7 days ago

I can confirm the patch still works

Note: See TracTickets for help on using tickets.