Make WordPress Core

Opened 14 months ago

Closed 4 weeks ago

#62126 closed enhancement (wontfix)

Cache duplicate calls to WP_Theme_JSON::sanitize

Reported by: josephscott's profile josephscott Owned by: pbearne's profile pbearne
Milestone: Priority: normal
Severity: normal Version: 5.8
Component: General Keywords: has-patch needs-refresh has-unit-tests reporter-feedback close
Focuses: performance Cc:

Description

A fresh out of the box WordPress site with the 2024 theme will make 53 calls to WP_Theme_JSON::sanitize() during a single request to the default home page. We can see the progress of duplicate $input calls with a simple addition to wp-includes/class-wp-theme-json.php:

    protected static function sanitize( $input, $valid_block_names, $valid_element_names, $valid_variations ) {

        static $inputs = array();
        $input_key = md5( serialize( $input ) );
        if ( ! isset( $inputs[ $input_key ] ) ) {
            $inputs[ $input_key ] = 1;
        } else {
            $inputs[ $input_key ]++;
        }
        error_log( print_r( $inputs, true ) );

        $output = array();

The final log entry:

Array
(
    [c81ffa6dbea2454497cd6bd1bb5328a0] => 2
    [c5ed022349e4a64a02171deb00a3d988] => 23
    [cb150b99ce54ad170f5d795b65f6a651] => 2
    [7aef88bb839b090be13dc01ab67a52e1] => 2
    [a277ebc729ba3b9e28c492d103043a2e] => 21
    [f8c99016466edd5a5377fdf68d6fa04d] => 1
    [4ec129bfa3c2182a1037990a0aad6232] => 1
    [6f5c486d28c1d96906e5237180daed22] => 1
)

Of the 53 calls, there are only 8 unique $input values. All but 3 have duplicates. With that ratio it we'd likely benefit from caching the results and skipping the work on cache hits. This would be a classic trade off of doing less work by using a bit more memory.

I'm including a patch to cache processed $inputs with a simple static variable. This allows for multiple calls in a single page view to benefit from the reduced amount of work. The exact implementation is not my main concern as much as reducing the amount of time spent processing WP_Theme_JSON::sanitize() calls.

Doing 1,000 sequential HTTP GET requests for the front page, here is the TTFB p75 before and after. While TTFB can vary because of any number of things taking longer, I'm including it to show even at that scale it can potentially move the needle. To show the difference in isolation for just this method call I'm including the exclusive processing time for a single home page view for WP_Theme_JSON::sanitize() as measured by php-spx.

Before patch

  • TTFB p75: 44.28ms
  • exclusive time: 1.74ms

After patch

  • TTFB p75: 43.601ms
  • exclusive time: 0.779ms

Change

  • TTFB p75: 0.679ms ( 1.5% )
  • exclusive time: 1.061ms ( 61% )

Setup:

  • WP 6.7-alpha-59092
  • PHP 8.3.10
  • Theme: Twenty Twenty-Four
  • No active plugins
  • Default home page with 2024 theme

Attachments (1)

cache-wp-theme-json-sanitize-input.diff (751 bytes) - added by josephscott 14 months ago.

Download all attachments as: .zip

Change History (51)

This ticket was mentioned in Slack in #core-performance by josephscott. View the logs.


13 months ago

This ticket was mentioned in PR #7529 on WordPress/wordpress-develop by @joemcgill.


13 months ago
#2

  • Keywords has-patch added

This applies the diff supplied by @josephscott in https://core.trac.wordpress.org/attachment/ticket/62126/cache-wp-theme-json-sanitize-input.diff to run CI against the proposed changes.

Trac ticket: https://core.trac.wordpress.org/ticket/62126

#3 @joemcgill
13 months ago

Thanks @josephscott. I've opened a PR from your diff in order to run CI against the proposed changes.

#4 @joemcgill
13 months ago

It looks like this is causing several unit tests to fail: https://github.com/WordPress/wordpress-develop/actions/runs/11220806461/job/31189817758?pr=7529

One thing to keep in mind is that WP_Theme_JSON data gets sanitized at different times during the application lifecycle so it's possible that code that modifies one of the origins (e.g., registered blocks) runs after the first time this method is called, which could cause these caches to become stale.

#5 @josephscott
13 months ago

Looks like I need to expand it to consider the other parameters as well.

If things can be altered after that fact then yeah, that is something that could be a problem as well.

#6 @joemcgill
13 months ago

Caching different aspects of the WP Theme JSON structure has been tricky for this very reason. If you're interested in a deep dive, you may find the discussion in #57789 useful.

#7 @joemcgill
13 months ago

  • Milestone changed from Awaiting Review to Future Release

#8 @joemcgill
13 months ago

#62187 was marked as a duplicate.

#9 @joemcgill
13 months ago

I've reverted setting #62187 as a duplicate since both issues seek to improve the performance of WP_Theme_JSON::sanitize in slightly different ways that are not mutually exclusive.

#10 @joemcgill
13 months ago

  • Keywords changes-requested added
  • Milestone changed from Future Release to 6.8
  • Owner set to joemcgill
  • Status changed from new to reviewing

Going to set this, along with #62187 as issues to review for the 6.8 milestone. In the meantime, @josephscott do you want to investigate the PHPUnit test failures?

@mukesh27 commented on PR #7529:


12 months ago
#11

I also find it worth for some of the keys that are common in requested input when i profile the WP release.

This ticket was mentioned in Slack in #core-performance by mukeshpanchal27. View the logs.


12 months ago

#13 @flixos90
10 months ago

@joemcgill Are you mostly waiting for PR reviews at this point? Or anything else outstanding here? I'm happy to give this a review, if needed.

#14 @joemcgill
10 months ago

  • Status changed from reviewing to accepted

@flixos90 The PR here is just the applied patch from @josephscott cache-wp-theme-json-sanitize-input.diff.

Last update here is that it still needs to be updated. I'm going to remove the assignee myself so it's clear that someone else can pick this up. If there's not movement in the next couple of weeks, this is probably a candidate to punt to a future release.

#15 @joemcgill
10 months ago

  • Owner joemcgill deleted
  • Status changed from accepted to assigned

This ticket was mentioned in Slack in #core-performance by joemcgill. View the logs.


10 months ago

#17 @pbearne
10 months ago

  • Owner set to pbearne

#18 @pbearne
10 months ago

will update and add tests

This ticket was mentioned in PR #8213 on WordPress/wordpress-develop by @pbearne.


10 months ago
#19

…hin the WP_Theme_JSON class. Let's break down the changes and their impact.

Problem: The original sanitize method likely performed redundant computations when called multiple times with the same input. Sanitizing theme JSON data can be resource-intensive, especially with complex themes or frequent calls.

Solution: The added code addresses this inefficiency by implementing a simple caching mechanism. Before performing the sanitization, the method now checks if the input has already been processed.

How the Cache Works:

  1. Hashing the Input: The md5 function generates a unique hash of the input array using wp_json_encode. This hash serves as the key for the cache. Using a hash ensures fast lookups and avoids potential issues with complex array keys.
  1. Checking the Cache: The code checks if the generated hash key exists in the $sanitize_input_cache static property. If found, the cached sanitized output is returned directly, bypassing the sanitization process.
  1. Caching the Result: If the hash is not found in the cache, the sanitization proceeds as before. After the sanitization is complete, the result is stored in the $sanitize_input_cache, associating it with the calculated hash key.

Benefits of this Change:

  • Performance Improvement: By avoiding redundant calculations, this caching significantly improves performance, especially in scenarios where the same theme JSON data is sanitized repeatedly.
  • Reduced Resource Consumption: Less processing means reduced CPU load and memory usage.

Potential Considerations:

  • Cache Invalidation: The current implementation doesn't have a mechanism for cache invalidation. This means that if the underlying theme JSON data changes, the cached value might become stale. A more robust solution might consider ways to invalidate the cache when theme changes occur.
  • Memory Usage: While generally beneficial, caching can consume memory. If the number of unique theme JSON variations is extremely large, the cache could potentially grow quite large. Monitoring memory usage is advisable, especially in memory-constrained environments.

Overall, this change is a positive improvement that addresses performance concerns in theme JSON sanitization. The simple caching strategy effectively reduces redundant processing and enhances efficiency. However, it's important to be aware of the potential need for cache invalidation in a production environment.

#20 @pbearne
10 months ago

  • Keywords changes-requested removed

I have added a function to reset the cache and added a call to the wp_add_global_styles_for_blocks function

This clears the test errors

This ticket was mentioned in Slack in #core-performance by pbearne. View the logs.


10 months ago

This ticket was mentioned in Slack in #core by audrasjb. View the logs.


9 months ago

This ticket was mentioned in Slack in #core by audrasjb. View the logs.


9 months ago

#24 @audrasjb
9 months ago

As per today's bug scrub: It appears this ticket is still needs to be reviewed. As 6.8 beta 1 is very close, I'm moving it to 6.9. Feel free to move it back to 6.8 if it can be committed by Monday.

#25 @audrasjb
9 months ago

  • Milestone changed from 6.8 to 6.9

This ticket was mentioned in Slack in #core-performance by adamsilverstein. View the logs.


8 months ago

#27 @adamsilverstein
8 months ago

  • Keywords reporter-feedback added

@pbearne The PR overall looks good, I will leave a review in GitHub.

It would be great to see the before/after count for the sanitize calls - I assume it would be similar to the ticket description.

This ticket was mentioned in Slack in #core-performance by westonruter. View the logs.


2 months ago

@westonruter commented on PR #7529:


2 months ago
#29

I presume this should be closed in favor of #8213?

#30 @westonruter
2 months ago

  • Keywords needs-refresh added

PR #8213 need needs refresh and has a couple minor code review suggestions. The latest from trunk should be merged in as well.

#31 @westonruter
2 months ago

  • Version set to 5.8

#32 @westonruter
2 months ago

  • Keywords needs-unit-tests added

#33 @pbearne
8 weeks ago

  • Keywords needs-refresh needs-unit-tests removed

@josephscott commented on PR #7529:


8 weeks ago
#34

Yes, I am good with that.

#35 @josephscott
8 weeks ago

Trying out https://github.com/WordPress/wordpress-develop/pull/8213 from @pbearne I looked at just the time spent in WP_Theme_JSON::sanitize with SPX. This was on WP 6.9-alpha-60726 ( Twenty Twenty-five theme ) with PHP 8.4.11 on the logged out front page of the site. No other plugins, basically a near fresh WP site. Before collecting the data from SPX for each run I loaded the front page of the site 5 times, then loaded it one time with SPX enabled.

  • Before: 1.18ms
  • After: 0.803ms

That 0.377ms is a nearly 32% improvement. Granted we are talking small numbers here, but is a move in the right direction.

Before this change SPX listed these as the top 5 areas for time:

  • wpdb::_do_query
  • wp-settings.php
  • WP_Theme_JSON::sanitize
  • WP_Theme_JSON_Resolver::get_style_variations
  • WP_Theme_JSON::remove_keys_not_in_schema

After this change that shifted to:

  • wpdb::_do_query
  • wp-settings.php
  • WP_Theme_JSON_Resolver::get_style_variations
  • WP_Theme_JSON::sanitize
  • WP_Theme_JSON::merge

Pushing the slowest parts of the system to be a bit faster is a win.

Last edited 8 weeks ago by josephscott (previous) (diff)

#36 @josephscott
8 weeks ago

As another data point I ran https://github.com/WordPress/wordpress-develop/pull/8213 against a fresh WooCommerce install, and the WP_Theme_JSON::sanitize time ( via SPX ) dropped from 3.21ms to 0.832ms.

#37 @pbearne
8 weeks ago

What do we need to do to get this merged?

#38 @westonruter
8 weeks ago

  • Keywords needs-refresh has-unit-tests added; reporter-feedback removed

I just left a review on the PR.

#39 @westonruter
8 weeks ago

When I'm looking at the impact of this change, I see that in the Twenty Twenty-Five theme, there are 36 total calls to the sanitize() method. Of those 36, just a third (12) are able to use the cache whereas two thirds (24) do not. This seems like a pretty poor cache HIT rate. Is something not right?

@adamsilverstein commented on PR #8213:


8 weeks ago
#40

Code looks good to me, nice work here @pbearne!

#41 @adamsilverstein
8 weeks ago

Thanks for testing the performance improvements @josephscott, I am pleased to see some numbers backing up the change.

@westonruter that does sound like a poor hit rate, curious what @pbearne says. I assume you tested without any other plugins running?

#42 @westonruter
8 weeks ago

Yes, I tested with just Query Monitor active (which I used for the logging).

#43 @westonruter
8 weeks ago

It may be that including the additional arguments passed to sanitize in the cache key is the cause for the high MISS rate.

#44 @westonruter
8 weeks ago

Given the original logging code in the description added to the beginning of sanitize, I get the following result:

[19-Sep-2025 18:10:13 UTC] Array
(
    [5a711bfeb4c4d9cd4e6b419bf8d8e285] => 2
    [b3279a35e434aae0648de957b531454c] => 2
    [b8ef9e6b73ffc94a960b6685441def19] => 2
    [4d834e3e1efa8331ea6c9114ad5ea7f8] => 2
    [901119a5052414a7f95cea1b62cd6ae0] => 2
    [d5f4929e5ec13d3ad17377a50fab796b] => 2
    [cad6ab868772320aae2e4a13462653e5] => 2
    [fd6cec6288d517badd25cb3ed0cbbdbd] => 2
    [ff2793c30902232220d2e9d1e10bdba1] => 2
    [7f54185488bb2dde2b00131a529c1e43] => 2
    [c5ed022349e4a64a02171deb00a3d988] => 5
    [3ada4d1ae675481f9d5ac28d0e39287b] => 2
    [7aef88bb839b090be13dc01ab67a52e1] => 2
    [a277ebc729ba3b9e28c492d103043a2e] => 5
    [9d885aa254f4281c1bd719d1f80348d0] => 2
)

However, when I change the $input_key to include the other arguments passed to sanitize():

<?php
$input_key = md5( serialize( compact( 'input', 'valid_block_names', 'valid_element_names', 'valid_variations' ) ) );

I get the following:

[19-Sep-2025 18:10:52 UTC] Array
(
    [d912ce88a285afebdff6b2433de9b47a] => 1
    [18cdd5519815f9a4b3479562083fad6b] => 1
    [73ebc5d622a626b682728b2f5cd20236] => 1
    [9facf692dea88de9ecbc7cf333ab8609] => 1
    [93b2da2e8b6242fa3fe8cf772cf1dc5d] => 1
    [67dae318fde353ce133c13bd8f0d328a] => 1
    [b67dd5d6421ace6c4bd65218e5a11b16] => 1
    [b1df13f38cfb9b2bf1d3ad4ac8279964] => 1
    [4692b9d28608b3ac26a5482db312cde7] => 1
    [02a3f2f17a1780465f8490487f58cc7f] => 2
    [18973091d16f173235139a0dce8bd49a] => 1
    [5bafb6bca8a117cb831c9765a76c28ab] => 1
    [f5b2260df6ba91ddc4313d874f507423] => 1
    [047e1ffc5eb96a869f2c75c267fdac72] => 1
    [50083737ecefeb004c1e438debea9b83] => 1
    [84b11ebf68e083d91ffcdc7ee71fe309] => 1
    [882d4890a68c5ae3c0b01495683a5860] => 1
    [54e5e245a371e0d8d3eb4d3307d86fcb] => 1
    [44ea0804e97f7ad261cbd1dd313550fe] => 1
    [7169040703ede04b1013bcbb5e25420b] => 5
    [6a5f1b88747798ff256098346e3aabfd] => 2
    [0529abcf2315035f43d401e128efdaa2] => 2
    [b6116437f73bc028d5a9842a301d4faa] => 5
    [84002271b40977da4ef03b55b600898c] => 2
)

So you can see the HIT rate is much reduced.

Version 0, edited 8 weeks ago by westonruter (next)

#45 @westonruter
7 weeks ago

  • Keywords reporter-feedback added

#46 @adamsilverstein
7 weeks ago

So you can see the HIT rate is much reduced.

Ah, interesting!

Given the reduced cache hit rate, are you thinking this isn't worth doing?

Do you think there could be an approach with a better hit rate?

#47 @westonruter
7 weeks ago

I don't think it's worth doing if the HIT rate is so low, due to the overhead of creating the cache key. That is, unless updated benchmarks show a clear value in doing this still.

@westonruter commented on PR #8213:


7 weeks ago
#48

@pbearne How will changing the serialization method 09b327b improve the underlying issue with the low HIT rate?

#49 @westonruter
5 weeks ago

  • Keywords close added

Given the low hit rate after taking into account the additional function parameters in the cache key, I'm going to suggest closing this.

#50 @westonruter
4 weeks ago

  • Milestone 6.9 deleted
  • Resolution set to wontfix
  • Status changed from assigned to closed
Note: See TracTickets for help on using tickets.