Opened 12 months ago
Last modified 7 days ago
#59516 accepted defect (bug)
Improve cache key generation in query classes
Reported by: | spacedmonkey | Owned by: | pbearne |
---|---|---|---|
Milestone: | 6.7 | Priority: | normal |
Severity: | normal | Version: | |
Component: | Query | Keywords: | has-patch has-unit-tests changes-requested |
Focuses: | performance | Cc: |
Description
In query classes such as WP_Query, parameters like post_type enable the passing of an array of values. For instance, you can use post_type like this: ['post', 'page']. However, if a subsequent request is made with the array elements in a different order, like so: ['page', 'post'], it would lead to the generation of a different cache key, potentially causing the existing cache to be missed. Even though the cached results may be the same, this could result in a cache collision. It is advisable to reuse caches whenever possible to optimize performance.
This would effect the following cache keys.
- WP_Query
- WP_Term_Query
- WP_Network_Query
- WP_Site_Query
- WP_Comment_Query
Some sort of array sorting of the key values would mean that caches could be reused.
Change History (19)
This ticket was mentioned in Slack in #core-performance by thekt12. View the logs.
12 months ago
This ticket was mentioned in Slack in #core-performance by mukeshpanchal27. View the logs.
12 months ago
#7
@
12 months ago
- Keywords needs-patch added
- Owner set to thekt12
- Status changed from new to assigned
Initial exploration into fixing this for the WP_Query
class was started in #59492. I've closed as a duplicate so we can keep discussion for all of the query classes in one place. I'm assigning this to @thekt12 for now, since he was already working on the initial implementation.
This ticket was mentioned in PR #5347 on WordPress/wordpress-develop by @thekt12.
11 months ago
#8
- Keywords has-patch has-unit-tests added; needs-patch removed
Trac ticket: https://core.trac.wordpress.org/ticket/59516
This ticket was mentioned in Slack in #core-performance by thekt12. View the logs.
11 months ago
#10
@
11 months ago
Findings:
WP_Query
Inside WP_Query::get_posts(), cache key is generated using
generate_cache_key( $q, $new_request );
; where $q
is query args and $new_request
is the sql that is formed, (check code here)
$cache_key = $this->generate_cache_key( $q, $new_request );
Setting of cache for the $cache_key happens at 3 ( Case1, Case2, Case3 ) places based on different conditions, all of which consumes the same $cache_key generated above and there is no modification in between.
For Case1 and Case2, there is a slight modification to the output obtained from the query based on the value set in $q['fields']
. In current scenario $q['fields']=='ID'
and $q['fields']=='ids'
can have slightly different cache values as former doesn’t go through the following condition which has a array_map( 'intval', $this->posts );
(this seems to be a minor bug)
In both Case1 and Case2 there is no filter in between key generation and cache setting which can modify the output base on any other values from args/$q.
However, for Case3 there is a possibility that the logic may enter the split_the_query
logic ( if there are less than 500 posts and we split the query to get ids of those 500 posts and fetch them manually and prime them). The split_the_query
condition contains a filter posts_request_ids
that could modify the request post key generation, this is a bug that is addressed in (Ticket#59661) PR#5513. This PR introduces a new arg variable posts_request_ids_filters
which also affects what is cached.
What can be removed?
After Case3, the variable $cache_key is not in use anywhere indicating that the only args that must be used for cache key generation is $q['fields']
and $q['posts_request_ids_filters']
(introduced in PR#5513).
We can thereby remove all the query forming (except fields) and after cache decision making (except 'posts_request_ids_filters') ( which is pretty much everything mentioned here )
With respect to Third Parties, $args can only be used to cache the same values to different keys (which has no use IMO) or to skip most of the logic inside get_posts() like incase of VIP Enterprise Search (link), where a new es
is used to decide if the query must be sent to elastic search or not.
WP_Term_Query
For term_queries, $args['fields']
, $args['object_ids']
, $args['pad_counts']
, $args['hide_empty']
are the only fields that can affect caching. All other variables can be avoided from cache key generation. Like WP_Query there is no filter that third parties can use to modify what is being cached after cache key generation.
WP_Network_Query
The key is build using args only. However none of the args is used to modify the output after query is fired and before caching.
So we could modify the code to be more like WP_Query with a separate function for generate_key based on the request and args or just the args. (code)
WP_Site_Query
Same as WP_Network_Query. The code needs to be modified to use final query and not args. (code)
WP_Comment_Query
Same like WP_Network_Query and WP_Site_Query (code)
Proposal
Step 1: Standardize code
Standardize WP_Network_Query, WP_Site_Query, WP_Comment_Query to use generate_cache_key function the same way WP_Query does.
Note: There is one performance downgrade here. The way caching is implemented in all 3 class above it skips the request generation part entirely if cache is present.
However, the upside of implementing these change is, cache will be based on the final request making it better at handling similar queries generated via different args setup.
For e.g Caching in WP_Network_Query skips $network_ids = $this->get_network_ids(); which is logic intensive.
Step 2: Use only those key from args that effects caching post query is fired
We should only pass the args (filter it before) to generate_cache_key()
for cache key generation that effects the way cache is done (Detailed in the Findings section above).
For filtering, we can have an array keys that needs to be considered in an array and only scrap out those keys from the main args ($q).
For e.g $key_to_consider = array( ‘fields’ )
for WP_Query, we can provide a filter which will allow third parties to add more keys that should be considered, extra care must be given not to unset key from the original $args
.
Step 3: Normalize and save back to main $args or (to the clone)
Basic normalization must be done before query execution and saved to original args->
Some examples->
‘post_name’ and other fields that contains that contain single array needs to be converted to string.
Step 4: Move generate_cache_key to utility function as it will be more like a repeated piece of code.
Is this a duplicate of #59492? Can we merge the tickets? cc @thekt12