Opened 8 months ago
Last modified 6 weeks ago
#64038 new defect (bug)
Cache miss for `WP_Term_Query`
| Reported by: |
|
Owned by: | |
|---|---|---|---|
| Milestone: | Future Release | Priority: | normal |
| Severity: | normal | Version: | |
| Component: | Taxonomy | Keywords: | has-patch has-unit-tests |
| Focuses: | performance | Cc: |
Description
On a WP fresh install (test made with WP 6.8.2).
- Activate Query Monitor
- Visit the posts list table
- Check that Query Monitor report a duplicate query coming from
WP_Term_Query::get_terms():
SELECT t.term_id
FROM wp_terms AS t
INNER JOIN wp_term_taxonomy AS tt
ON t.term_id = tt.term_id
WHERE tt.taxonomy IN ('category')
ORDER BY t.name ASC
One call is coming from wp_dropdown_categories()
One call is coming from wp_terms_check_list()
Both functions are not using the same arguments for the call to get_terms() but result in the same DB query.
In the past, the cache key used to be computed based on arguments passed to get_terms(). It's now a combination of these arguments and the SQL statement. Maybe using only the sql statement would be sufficient.
Attachments (1)
Change History (7)
#2
@
8 months ago
- Keywords reporter-feedback added
- Milestone changed from Awaiting Review to Future Release
I added some logging after the $key is generated:
<?php do_action( 'qm/debug', compact( 'key', 'cache_args', 'sql' ) );
This is the difference between the logging for the duplicate queries:
-
/tmp/
old new 1 1 Array 2 2 ( 3 [key] => 7e29979b0d775b6021a8f36e13aae8e53 [key] => 5a24f652b8a6b799e1e6d7d7b6afceca 4 4 [cache_args] => Array 5 5 ( 6 6 [taxonomy] => Array … … 42 42 ( 43 43 ) 44 44 45 [hierarchical] => 45 [hierarchical] => 1 46 46 [search] => 47 47 [name__like] => 48 48 [description__like] => 49 49 [pad_counts] => 50 [get] => all50 [get] => 51 51 [child_of] => 0 52 52 [parent] => 53 53 [childless] =>
My concern is that perhaps a different post-processed query result is being cached. However, by adding other logging when the cache is set, this doesn't seem to be the case:
-
/tmp/
old new 1 1 Array 2 2 ( 3 [cache_key] => get_terms: 7e29979b0d775b6021a8f36e13aae8e53 [cache_key] => get_terms:5a24f652b8a6b799e1e6d7d7b6afceca 4 4 [term_cache] => Array 5 5 ( 6 6 [0] => 1
The cache key was most recently modified to remove a redundant parameter component in r59028, and previously in r55083.
Even back r37572 it was:
<?php $key = md5( serialize( wp_array_slice_assoc( $args, array_keys( $defaults ) ) ) . serialize( $taxonomies ) . $query );
@Chouby When was the cache key just the SQL and not the args?
#3
@
8 months ago
- Keywords reporter-feedback removed
To my knowledge, the arguments have always been included. The SQL was introduced in WP 4.4 by r35120
Keeping only the SQL was a suggestion for the future that could solve this duplicate queries issue.
#4
@
8 months ago
- Keywords needs-patch added
@Chouby ok, I was confused because in the description you said the args used to not be included:
In the past, the cache key used to be computed based on arguments passed to
get_terms(). It's now a combination of these arguments and the SQL statement. Maybe using only the sql statement would be sufficient.
Maybe I misunderstood.
This ticket was mentioned in PR #11391 on WordPress/wordpress-develop by @sanket.parmar.
6 weeks ago
#5
- Keywords has-patch has-unit-tests added; needs-patch removed
## Summary
Fixes a duplicate database query that occurs on the Posts list table (and wherever wp_dropdown_categories() and wp_terms_checklist() are rendered on the same page) due to two semantically identical get_terms() calls producing different cache keys.
---
## Problem
WP_Term_Query::generate_cache_key() hashes the full set of sanitised query args together with the SQL:
$key = md5( serialize( $cache_args ) . $sql );
wp_dropdown_categories() passes hierarchical => 1 (int), while wp_terms_checklist() passes get => 'all' which parse_query() immediately normalises to hierarchical => false (bool). Both produce identical SQL, yet serialize() produces two different strings → two different cache keys → two identical DB queries on every page load.
---
## Solution
Replace the wide arg-serialisation approach with SQL + only the args that drive PHP-level post-processing (which is not captured by the SQL):
| Arg | Reason kept |
|---|---|
child_of | _get_term_children() runs in PHP; not always reflected in SQL
|
pad_counts | _pad_term_counts() runs in PHP and changes the cached data shape
|
prune_empty_terms | Combined (bool)($hierarchical && $hide_empty) — only the conjunction determines PHP pruning
|
number / offset (hierarchical only) | No LIMIT in SQL for hierarchical queries; PHP slices with array_slice()
|
fields | Preserved from existing logic; determines cache value shape |
---
## Changes
src/wp-includes/class-wp-term-query.php— Rewritesgenerate_cache_key()to key on SQL + a minimal set of PHP-post-processing args.tests/phpunit/tests/term/query.php— Adds four new@ticket 64038/@group cachetest methods covering: shared cache entry for equivalent queries, and separate keys for differingprune_empty_terms,child_of, andpad_counts.
---
## Testing
vendor/bin/phpunit tests/phpunit/tests/term/query.php --filter cache
---
Trac ticket: https://core.trac.wordpress.org/ticket/64038
## Use of AI Tools
AI assistance: Yes
Tool(s): GitHub Copilot (Claude Sonnet 4.6)
Used for: The problem analysis, approach selection, implementation, and test cases were reviewed and verified by the human author before submission. All code changes have been manually checked for correctness against the WordPress codebase.
#6
@
6 weeks ago
Root cause
generate_cache_key() currently hashes the full set of sanitised query args together with the SQL:
$key = md5( serialize( $cache_args ) . $sql );
wp_dropdown_categories() (called from the quick-filter on the Posts list table) passes hierarchical => 1 (int 1), while wp_terms_checklist() passes get => 'all' which parse_query() immediately normalises to hierarchical => false (bool false). Both values drive identical SQL — the query planner sees no difference — yet serialize() produces two different strings, so two different cache keys, so two identical DB queries.
Approaches considered
Option A — SQL-only key
Hashes only the SQL:
$key = md5( $sql );
This would resolve the duplicate-query issue cleanly. The problem is that WP_Term_Query applies several PHP-level post-processing steps after the query that change the data stored in and read back from the cache:
_get_term_children()forchild_of— runs entirely in PHP; whennumber = 0there is no LIMIT in SQL to differentiate the results._pad_term_counts()forpad_counts— writes{term_id, count}objects as cache values rather than plain term ID arrays.- The
$hierarchical && $args['hide_empty']pruning loop — rewrites the PHP result set without touching SQL. - When
hierarchical = true, SQL skips the LIMIT clause entirely, so PHP slices the result witharray_slice()using thenumber/offsetargs.
If two callers share the same SQL-only key but one has pad_counts = true, the first writer stores padded {term_id, count} objects and the second caller misreads them as plain IDs (or vice versa). Pure SQL-only keying is therefore not safe here without a deeper refactor.
Option B — Normalize boolean-like args before serializing (conservative)
Keep the existing serialize($cache_args) . $sql structure but normalise all truthy/falsy args to their canonical PHP types ((bool), (int)) and apply wp_recursive_ksort() before hashing, similar to wp_dropdown_query_hash() in general-template.php.
Downside: it addresses only the type-coercion symptom. Future callers that express the same intent through semantically equivalent but structurally different args would still miss the cache. It is also a wider change — every single query arg gets serialised even when most of them have no bearing on post-processing.
Option C — SQL + only the args that affect PHP post-processing (chosen)
This is the targeted fix: base the key on the SQL (which already encodes everything that drives the DB query) plus a small, explicit set of args that control PHP-level result shaping:
| Arg | Why it must be in the key |
|---|---|
child_of | _get_term_children() filters in PHP; when number=0 this is not reflected in SQL
|
pad_counts | _pad_term_counts() runs in PHP and changes stored cache shape
|
prune_empty_terms | Combined (bool)($hierarchical && $hide_empty) — only the conjunction matters, not the individual values
|
number / offset (when hierarchical) | No LIMIT in SQL for hierarchical queries; PHP slices with array_slice()
|
fields | Normalised to 'all' for non-count/non-object_id queries (existing logic preserved)
|
$php_cache_args = array(
'child_of' => (int) $args['child_of'],
'pad_counts' => (bool) $args['pad_counts'],
'prune_empty_terms' => (bool) ( $args['hierarchical'] && $args['hide_empty'] ),
);
if ( $args['hierarchical'] && $args['number'] ) {
$php_cache_args['number'] = (int) $args['number'];
$php_cache_args['offset'] = (int) $args['offset'];
}
if ( 'count' !== $args['fields'] && 'all_with_object_id' !== $args['fields'] ) {
$php_cache_args['fields'] = 'all';
} else {
$php_cache_args['fields'] = $args['fields'];
}
$key = md5( $sql . serialize( $php_cache_args ) );
For this ticket: both calls resolve prune_empty_terms = false (since hide_empty = 0), child_of = 0, pad_counts = false, and produce identical SQL → same cache key → single DB query.
This approach is proportionally safe because the only args retained in the key are those proven to alter the cached result set.
Tests included
Four new test methods in Tests_Term_Query (@ticket 64038, @group cache):
test_equivalent_queries_share_cache_entry— asserts that thewp_dropdown_categories-style andwp_terms_checklist-style calls produce no second DB query (the regression test for this ticket).test_queries_with_different_prune_empty_terms_get_separate_cache_entries— asserts thathierarchical=true && hide_empty=truevs.hide_empty=falseget distinct keys.test_queries_with_different_child_of_get_separate_cache_entries— asserts differentchild_ofvalues get distinct keys.test_queries_with_different_pad_counts_get_separate_cache_entries— asserts differentpad_countsvalues get distinct keys.
Duplicate query confirmed