Make WordPress Core

Opened 8 months ago

Last modified 6 weeks ago

#64038 new defect (bug)

Cache miss for `WP_Term_Query`

Reported by: chouby's profile Chouby Owned by:
Milestone: Future Release Priority: normal
Severity: normal Version:
Component: Taxonomy Keywords: has-patch has-unit-tests
Focuses: performance Cc:

Description

On a WP fresh install (test made with WP 6.8.2).

  1. Activate Query Monitor
  2. Visit the posts list table
  3. Check that Query Monitor report a duplicate query coming from WP_Term_Query::get_terms():
SELECT t.term_id
FROM wp_terms AS t
INNER JOIN wp_term_taxonomy AS tt
ON t.term_id = tt.term_id
WHERE tt.taxonomy IN ('category')
ORDER BY t.name ASC

One call is coming from wp_dropdown_categories()
One call is coming from wp_terms_check_list()

Both functions are not using the same arguments for the call to get_terms() but result in the same DB query.

In the past, the cache key used to be computed based on arguments passed to get_terms(). It's now a combination of these arguments and the SQL statement. Maybe using only the sql statement would be sufficient.

Attachments (1)

Screenshot 2025-09-25 at 18.27.24.png (261.9 KB) - added by westonruter 8 months ago.
Duplicate query confirmed

Download all attachments as: .zip

Change History (7)

#1 @Chouby
8 months ago

  • Component changed from General to Taxonomy
  • Focuses performance added

@westonruter
8 months ago

Duplicate query confirmed

#2 @westonruter
8 months ago

  • Keywords reporter-feedback added
  • Milestone changed from Awaiting Review to Future Release

I added some logging after the $key is generated:

<?php
do_action( 'qm/debug', compact( 'key', 'cache_args', 'sql' ) );

This is the difference between the logging for the duplicate queries:

  • /tmp/

    old new  
    11Array
    22(
    3     [key] => 7e29979b0d775b6021a8f36e13aae8e5
     3    [key] => 5a24f652b8a6b799e1e6d7d7b6afceca
    44    [cache_args] => Array
    55        (
    66            [taxonomy] => Array
     
    4242                (
    4343                )
    4444
    45             [hierarchical] =>
     45            [hierarchical] => 1
    4646            [search] =>
    4747            [name__like] =>
    4848            [description__like] =>
    4949            [pad_counts] =>
    50             [get] => all
     50            [get] =>
    5151            [child_of] => 0
    5252            [parent] =>
    5353            [childless] =>

My concern is that perhaps a different post-processed query result is being cached. However, by adding other logging when the cache is set, this doesn't seem to be the case:

  • /tmp/

    old new  
    11Array
    22(
    3     [cache_key] => get_terms:7e29979b0d775b6021a8f36e13aae8e5
     3    [cache_key] => get_terms:5a24f652b8a6b799e1e6d7d7b6afceca
    44    [term_cache] => Array
    55        (
    66            [0] => 1

The cache key was most recently modified to remove a redundant parameter component in r59028, and previously in r55083.

Even back r37572 it was:

<?php
$key = md5( serialize( wp_array_slice_assoc( $args, array_keys( $defaults ) ) ) . serialize( $taxonomies ) . $query );

@Chouby When was the cache key just the SQL and not the args?

Last edited 8 months ago by SergeyBiryukov (previous) (diff)

#3 @Chouby
8 months ago

  • Keywords reporter-feedback removed

To my knowledge, the arguments have always been included. The SQL was introduced in WP 4.4 by r35120

Keeping only the SQL was a suggestion for the future that could solve this duplicate queries issue.

#4 @westonruter
8 months ago

  • Keywords needs-patch added

@Chouby ok, I was confused because in the description you said the args used to not be included:

In the past, the cache key used to be computed based on arguments passed to get_terms(). It's now a combination of these arguments and the SQL statement. Maybe using only the sql statement would be sufficient.

Maybe I misunderstood.

This ticket was mentioned in PR #11391 on WordPress/wordpress-develop by @sanket.parmar.


6 weeks ago
#5

  • Keywords has-patch has-unit-tests added; needs-patch removed

## Summary

Fixes a duplicate database query that occurs on the Posts list table (and wherever wp_dropdown_categories() and wp_terms_checklist() are rendered on the same page) due to two semantically identical get_terms() calls producing different cache keys.

---

## Problem

WP_Term_Query::generate_cache_key() hashes the full set of sanitised query args together with the SQL:

$key = md5( serialize( $cache_args ) . $sql );

wp_dropdown_categories() passes hierarchical => 1 (int), while wp_terms_checklist() passes get => 'all' which parse_query() immediately normalises to hierarchical => false (bool). Both produce identical SQL, yet serialize() produces two different strings → two different cache keys → two identical DB queries on every page load.

---

## Solution

Replace the wide arg-serialisation approach with SQL + only the args that drive PHP-level post-processing (which is not captured by the SQL):

Arg Reason kept
child_of _get_term_children() runs in PHP; not always reflected in SQL
pad_counts _pad_term_counts() runs in PHP and changes the cached data shape
prune_empty_terms Combined (bool)($hierarchical && $hide_empty) — only the conjunction determines PHP pruning
number / offset (hierarchical only) No LIMIT in SQL for hierarchical queries; PHP slices with array_slice()
fields Preserved from existing logic; determines cache value shape

---

## Changes

  • src/wp-includes/class-wp-term-query.php — Rewrites generate_cache_key() to key on SQL + a minimal set of PHP-post-processing args.
  • tests/phpunit/tests/term/query.php — Adds four new @ticket 64038 / @group cache test methods covering: shared cache entry for equivalent queries, and separate keys for differing prune_empty_terms, child_of, and pad_counts.

---

## Testing

vendor/bin/phpunit tests/phpunit/tests/term/query.php --filter cache

---

Trac ticket: https://core.trac.wordpress.org/ticket/64038

## Use of AI Tools

AI assistance: Yes
Tool(s): GitHub Copilot (Claude Sonnet 4.6)
Used for: The problem analysis, approach selection, implementation, and test cases were reviewed and verified by the human author before submission. All code changes have been manually checked for correctness against the WordPress codebase.

#6 @sanket.parmar
6 weeks ago

Root cause

generate_cache_key() currently hashes the full set of sanitised query args together with the SQL:

$key = md5( serialize( $cache_args ) . $sql );

wp_dropdown_categories() (called from the quick-filter on the Posts list table) passes hierarchical => 1 (int 1), while wp_terms_checklist() passes get => 'all' which parse_query() immediately normalises to hierarchical => false (bool false). Both values drive identical SQL — the query planner sees no difference — yet serialize() produces two different strings, so two different cache keys, so two identical DB queries.

Approaches considered

Option A — SQL-only key

Hashes only the SQL:

$key = md5( $sql );

This would resolve the duplicate-query issue cleanly. The problem is that WP_Term_Query applies several PHP-level post-processing steps after the query that change the data stored in and read back from the cache:

  • _get_term_children() for child_of — runs entirely in PHP; when number = 0 there is no LIMIT in SQL to differentiate the results.
  • _pad_term_counts() for pad_counts — writes {term_id, count} objects as cache values rather than plain term ID arrays.
  • The $hierarchical && $args['hide_empty'] pruning loop — rewrites the PHP result set without touching SQL.
  • When hierarchical = true, SQL skips the LIMIT clause entirely, so PHP slices the result with array_slice() using the number/offset args.

If two callers share the same SQL-only key but one has pad_counts = true, the first writer stores padded {term_id, count} objects and the second caller misreads them as plain IDs (or vice versa). Pure SQL-only keying is therefore not safe here without a deeper refactor.

Option B — Normalize boolean-like args before serializing (conservative)

Keep the existing serialize($cache_args) . $sql structure but normalise all truthy/falsy args to their canonical PHP types ((bool), (int)) and apply wp_recursive_ksort() before hashing, similar to wp_dropdown_query_hash() in general-template.php.

Downside: it addresses only the type-coercion symptom. Future callers that express the same intent through semantically equivalent but structurally different args would still miss the cache. It is also a wider change — every single query arg gets serialised even when most of them have no bearing on post-processing.

Option C — SQL + only the args that affect PHP post-processing (chosen)

This is the targeted fix: base the key on the SQL (which already encodes everything that drives the DB query) plus a small, explicit set of args that control PHP-level result shaping:

Arg Why it must be in the key
child_of _get_term_children() filters in PHP; when number=0 this is not reflected in SQL
pad_counts _pad_term_counts() runs in PHP and changes stored cache shape
prune_empty_terms Combined (bool)($hierarchical && $hide_empty) — only the conjunction matters, not the individual values
number / offset (when hierarchical) No LIMIT in SQL for hierarchical queries; PHP slices with array_slice()
fields Normalised to 'all' for non-count/non-object_id queries (existing logic preserved)
$php_cache_args = array(
    'child_of'          => (int)  $args['child_of'],
    'pad_counts'        => (bool) $args['pad_counts'],
    'prune_empty_terms' => (bool) ( $args['hierarchical'] && $args['hide_empty'] ),
);

if ( $args['hierarchical'] && $args['number'] ) {
    $php_cache_args['number'] = (int) $args['number'];
    $php_cache_args['offset'] = (int) $args['offset'];
}

if ( 'count' !== $args['fields'] && 'all_with_object_id' !== $args['fields'] ) {
    $php_cache_args['fields'] = 'all';
} else {
    $php_cache_args['fields'] = $args['fields'];
}

$key = md5( $sql . serialize( $php_cache_args ) );

For this ticket: both calls resolve prune_empty_terms = false (since hide_empty = 0), child_of = 0, pad_counts = false, and produce identical SQL → same cache key → single DB query.

This approach is proportionally safe because the only args retained in the key are those proven to alter the cached result set.

Tests included

Four new test methods in Tests_Term_Query (@ticket 64038, @group cache):

  1. test_equivalent_queries_share_cache_entry — asserts that the wp_dropdown_categories-style and wp_terms_checklist-style calls produce no second DB query (the regression test for this ticket).
  2. test_queries_with_different_prune_empty_terms_get_separate_cache_entries — asserts that hierarchical=true && hide_empty=true vs. hide_empty=false get distinct keys.
  3. test_queries_with_different_child_of_get_separate_cache_entries — asserts different child_of values get distinct keys.
  4. test_queries_with_different_pad_counts_get_separate_cache_entries — asserts different pad_counts values get distinct keys.
Note: See TracTickets for help on using tickets.