Make WordPress Core

#59492 closed defect (bug) (duplicate)

WP_Query::generate_cache_key can create different hash for same queries leading to cache miss.

Reported by: thekt12's profile thekt12 Owned by: thekt12's profile thekt12
Milestone: Priority: normal
Severity: normal Version:
Component: Database Keywords: has-patch
Focuses: performance Cc:

Description

This issue first came to light while investigating #59442. In #59442, two queries where found to have have a different cache key leading to a cache miss.

Another, example is shown below:

// add this to any test plugin
add_action( 'init', 'test_wp_query_changes' );
function test_wp_query_changes() {
	// post_type set to post.
	$query1 = new WP_Query(
		array(
			'cache_results' => true,
			'fields'        => 'ids',
			'post_type'     => [ 'post', 'page', 'attachment' ],
		)
	);
	print_r("SQL ->". $query1->request );
	echo "<br>";

	// post_type set to post.
	$query2 = new WP_Query(
		array(
				'cache_results' => true,
				'fields'        => 'ids',
				'post_type'     => 'any',
			)
	);
	print_r("SQL ->". $query2->request );
	die();
}

I also added the following line below (https://github.com/WordPress/wordpress-develop/blob/e3a612a97c4c45acd7613779a46d86860e76cc65/src/wp-includes/class-wp-query.php#L3175):

print_r("Cache key formed - {$cache_key} <br>");

Output was:

Cache key formed -wp_query:87aeb9c56ee3545b729fe38b1becabd9:0.38106200 1695924120

SQL -> SELECT SQL_CALC_FOUND_ROWS wp_posts.ID FROM wp_posts WHERE 1=1 AND ((wp_posts.post_type = 'post' AND (wp_posts.post_status = 'publish' OR wp_posts.post_status = 'private')) OR (wp_posts.post_type = 'page' AND (wp_posts.post_status = 'publish' OR wp_posts.post_status = 'private')) OR (wp_posts.post_type = 'attachment' AND (wp_posts.post_status = 'publish' OR wp_posts.post_status = 'private'))) ORDER BY wp_posts.post_date DESC LIMIT 0, 10

Cache key formed -wp_query:1e07ead1e7cc64aee3597dc16472f3bb:0.38106200 1695924120

SQL -> SELECT SQL_CALC_FOUND_ROWS wp_posts.ID FROM wp_posts WHERE 1=1 AND ((wp_posts.post_type = 'post' AND (wp_posts.post_status = 'publish' OR wp_posts.post_status = 'private')) OR (wp_posts.post_type = 'page' AND (wp_posts.post_status = 'publish' OR wp_posts.post_status = 'private')) OR (wp_posts.post_type = 'attachment' AND (wp_posts.post_status = 'publish' OR wp_posts.post_status = 'private'))) ORDER BY wp_posts.post_date DESC LIMIT 0, 10

SQL Queries obtained for both the arguments were similar, however the cache_key obtained were different leading to cache miss.

The problem:

generate_cache_key( array $args, $sql ) needs args to replace wpdb placeholders with the string used in the database.

However, the problem arise when the same args is serialized and used to create key ->

$key = md5( serialize( $args ) . $sql );

Proposal

$args parameter in generate_cache_key( array $args, $sql ) be only used to replace placeholder and not be seralized for creating hash.
Hashing key is generated towards the end where the SQL cannot be further modified, so it is indeed reliable.

Creating hash only with $sql will help improve performace by increasing the changes of hit. Another benefit is it improves performance by not serializing large $args. generate_cache_key get called for almost WP_Query as caching is the default nature.

Same with WP_Term_Query::generate_cache_key( array $args, $sql )

Change History (9)

#2 @thekt12
14 months ago

Caching for WP_Query was introduced in https://core.trac.wordpress.org/ticket/22176

#3 @thekt12
14 months ago

Even slightest difference in args creates different cache key leading to cache miss

<?php
$query1 = new WP_Query(
                array(
                        'cache_results' => true,
                        'fields'        => 'ids',
                        'post_type'     => 'post',
                        'post_status'     => ['publish'],
                )
        );
// cache key generated `wp_query:2e1aef1c7c1aa0228d7a7ee3a823e58a:0.99192500 1695978192`

$query2 = new WP_Query(
                array(
                                'cache_results' => true,
                                'fields'        => 'ids',
                                'post_type'     => 'post',
                                'post_status'     => 'publish',
                        )
        );
// cache key generated `wp_query:cbed5d29eec33b81024765fa5fbdefe2:0.99192500 1695978192`



// SQL for both 
// SELECT SQL_CALC_FOUND_ROWS wp_posts.ID FROM wp_posts WHERE 1=1 AND wp_posts.post_type = 'post' AND ((wp_posts.post_status = 'publish')) ORDER BY wp_posts.post_date DESC LIMIT 0, 10


This ticket was mentioned in Slack in #core-performance by thekt12. View the logs.


14 months ago

This ticket was mentioned in Slack in #core-performance by thekt12. View the logs.


14 months ago

@peterwilsoncc commented on PR #5347:


14 months ago
#6

When generating the cache key, the SQL statement is normalized to use the same fields regardless of the processed SQL statement:

https://github.com/WordPress/wordpress-develop/blob/e486ac07458d36182ee75091eac06f9a99a5ae2a/src/wp-includes/class-wp-query.php#L3174-L3175

However, in the following lines the actual cached data an be altered depending on the arguments used in the query in the code immediately following the lines linked above.

You're right, it might be possible to optimize the cache key generation to increase the number of cache hits but the cache key will need to consider some of the options used. It may be worth considering:

  • normalizing arguments a little more, for example the post type argument 'post' and [ 'post' ] are equivalent; as are [ 'page', 'post' ] and [ 'post', 'page' ]
  • whether additional arguments can be removed from the cache key

I think we'd need a bunch of tests to make sure there aren't cache collisions too.

This ticket was mentioned in Slack in #core-performance by thekt12. View the logs.


14 months ago

This ticket was mentioned in Slack in #core-performance by joemcgill. View the logs.


14 months ago

#9 @joemcgill
14 months ago

  • Milestone Awaiting Review deleted
  • Resolution set to duplicate
  • Status changed from assigned to closed

After discussing with @thekt12 and @spacedmonkey, I think it's best to handle normalizing cache keys for query classes all in one place to keep the context all together. I'm closing this as a duplicate of #59516.

@thekt12 could you update your PR to point to the other ticket when you have a chance?

Note: See TracTickets for help on using tickets.