Make WordPress Core

#57648 closed defect (bug) (wontfix)

Performance regression in `wp_enqueue_global_styles()` for classic themes

Reported by: flixos90's profile flixos90 Owned by:
Milestone: Priority: normal
Severity: normal Version: 6.2
Component: Editor Keywords:
Focuses: performance Cc:

Description

I just finished conducting a quick performance analysis for WordPress 6.2, which I did in preparation for Beta tomorrow, to see how things are looking performance-wise. The objective of the analysis was to measure Server Timing metrics in latest 6.2 trunk and compare them to the same metrics in the 6.1 branch.

While overall things were looking good, there was one major regression in template execution time for the classic theme tested (Twenty Twenty-One). After digging in a bit further, it became clear that this regression is caused by some logic triggered in wp_enqueue_global_styles(), with that function now taking 5 times longer than in WP 6.1.

Investigating the root cause for this regression and fixing it is critical, as it also means a significant regression of ~30% for WordPress's overall load time for classic themes.

Here is a spreadsheet with the test results from my analysis: https://docs.google.com/spreadsheets/d/1p8Bw8Y_53NlZ8rHRmXn-a_fJj3XBe8Ajf6w9eCHKecI/edit
All numbers in there are based on median values from 20 requests each, to account for any variance. It is safe to say that this is a real regression and not caused by variance between requests.

All other measurements look very solid, with improvements in comparison to 6.1, so we need to address this regression in order to ensure 6.2 is a win for performance.

Change History (19)

This ticket was mentioned in Slack in #core-editor by flixos90. View the logs.


16 months ago

#2 @oandregal
16 months ago

Hi Felix, would you help me understand what are the wp-* metrics, and how can I reproduce locally?

Coincidentally, I've published my own analysis using Time To First Byte, essentially, the time it takes the server to process the request. I've done 1000 runs for the last 7 major WordPress releases and 4 default themes. These are the numbers https://oandre.gal/evolution-of-wordpress-ttfb-5-6-to-6-2/ *

TLDR:

  • TwentyTwentyThree: improves 30.24% with respect to 6.1.
  • TwentyTwentyTwo: improves 30.71% with respect to 6.1.
  • TwentyTwentyOne: improves 7.77% with respect to 6.1.
  • TwentyTwenty: improves 0.97% with respect to 6.1.

This is using rev55171. I'll update the numbers tomorrow using the beta date as well.

* (sorry, I tried to attach the image directly, but I don't know how to do it: is there a way?)

#3 @oandregal
16 months ago

I ran the tests with the latest revision available (rev 55250) and all themes improve (it's now even better for classic ones):

  • TwentyTwentyThree: improves 28.38% with respect to 6.1.
  • TwentyTwentyTwo: improves 28.92% with respect to 6.1.
  • TwentyTwentyOne: improves 8.17% with respect to 6.1.
  • TwentyTwenty: improves 5.75% with respect to 6.1.

Updated the spreadsheet as well https://docs.google.com/spreadsheets/d/1gzA6zH0eY33p0jQHOvRJfmFNRxvW1qMh2MW16zHlFrc/edit?usp=sharing

#4 @johnbillion
16 months ago

  • Priority changed from normal to high
  • Version set to trunk

#5 @flixos90
16 months ago

  • Priority changed from high to normal
  • Version trunk deleted

@oandregal I have been using the approach from this Gist as a means to record Server-Timing metrics: https://gist.github.com/felixarntz/de5c697a1a16c2b892634b70216eb6c7

Specifically to measure e.g. the wp_enqueue_global_styles() function, I add something like https://gist.github.com/felixarntz/de5c697a1a16c2b892634b70216eb6c7#file-example-php-L5 to the top of that function and something like https://gist.github.com/felixarntz/de5c697a1a16c2b892634b70216eb6c7#file-example-php-L17 (just calling it e.g. enqueue-global-styles-* instead) to the bottom of that function.

I then used https://github.com/GoogleChromeLabs/wpp-research/tree/main/cli#benchmark-url to trigger 20 requests to my local WordPress development site's home page, which is configured with either latest trunk or latest 6.1 branch, plus the above tweak to include Server-Timing metrics.

Let me know if you have further questions on the approach.

#6 @flixos90
16 months ago

  • Priority changed from normal to high
  • Version set to trunk

(re-adding the accidentally modified workflow annotations here)

@oandregal Both of these approaches have their benefits. Measuring TTFB is probably closer to real-world impact in theory, but it is at a higher level and I believe therefore is subject to higher variance than specific server timing metrics.

Needless to say, our numbers are extremely different; for example I find your TTFB numbers overall surprisingly low, so I'd be curious if you could provide more detail in what environment you got those numbers, what tooling you used etc.

The average TTFB in the field is often worse than 800ms according to e.g. https://lookerstudio.google.com/c/u/0/reporting/55bc8fad-44c2-4280-aa0b-5f3f0cd3d2be/page/M6ZPC (800ms is the threshold above which the TTFB assessment is considered "failed"), and while it is expected that lab results are much better, seeing numbers far lower than 100ms in your data consistently makes me wonder what they are based on.

Anyway, the most important question here is probably why I am seeing this regression in my data and you are not. It's certainly possible something was somehow misconfigured in my environment, so I'll give this another test now that Beta 1 is available. Let's keep following up on this - of course it would be nice if my data was just wrong :) But if it's not, we need to profile further and find the underlying problem.

#7 follow-up: @oandregal
16 months ago

@flixos90 I'm still at a loss as to how measure the wp-* metrics 😅 I've used the server-timing script before but I'd need to know where do you measure wp-total, wp-template, etc. for me to replicate.

Perhaps this is an obvious thing, but if you had the git diff you apply to those branches, that'll be helpful. Or pointing at the places where you measure those things is good as well. I'm mostly interested in wp-total, though the others could be useful as well. If I have a way to replicate the wp-total, I can bisect the revisions and drive the investigation based on data, otherwise, it's a lot of guesswork.

#8 @oandregal
16 months ago

As for the TTFB numbers I've collected: the experiment is simply loading the homepage and find the TTFB in the network tab within devtools. To scale to thousands of runs, I substitute the manual process (me looking into the devtools) by using curl and extracting the numbers.

This is the whole step by step for TTFB anyone can reproduce (and, please, do!):

  • Go to the .env file in wordpress-develop and set the following env variables to false, so it's "production": WP_DEBUG, WP_DEBUG_LOG, WP_DEBUG_DISPLAY, SCRIPT_DEBUG.
  • Start the environment: npm install && npm run build:dev && npm run env:start && npm run env:install.
  • Run this command: seq 1000 | xargs -Iz curl -o /dev/null -H 'Cache-Control: no-cache' -s -w "%{time_starttransfer}\n" http://localhost:8889 | xclip -selection clipboard. It copies into the clipboard the result of loading 1000 times the homepage of the site. Paste those numbers to any spreadsheet. I've played with running 100 tests each time and waiting a bit (sometimes half a day) before doing the next bath, though it didn't affect much the numbers.

If you (or anyone!) could use this method to compare 6.2 beta 1 and the 6.1 branch, and then report what they have, it'd be much appreciated.

Regarding the exact values: I don't give them any credit, I'm only interested in the variation between them. As you mention, the lab is not the same as the field. They are going to change machine-to-machine (they'd change using the same machine under different workload as well), and I'm lucky to have a powerful computer. The local experiment also discounts all network traffic, which accounts for big parts of TTFB. However, something worth noting is that using curl, or the manual step, or the GitHub CI actions we push to codehealth from Gutenberg all land in the same ballpark, so it's not completely random either.

For full disclosure, the TTFB method I shared results in low numbers for classic themes. This makes it difficult to find sub-millisecond regressions as they are lower than the noise/statistical error. However, you are reporting a +30% regression! The TTFB approach should have reported a noticeable -4ms being the base value 14ms.

Last edited 16 months ago by oandregal (previous) (diff)

#9 @oandregal
16 months ago

Oh, by the way, know that my only interest is learning how I can help: if the tools I use don't report any regression and I don't know how to reproduce other mechanisms, I'm useless :)

On a general note: I'm committed to find the process/tools that helps us, one we can trust and reproduce. I'm fine if it's not TTFB or if wp-total proves to be better. I want to understand how or why they are telling us different stories, so we can agree on how to drive priorities and investigations, which has been the source of many conversations and frustrations in the past few months.

#10 in reply to: ↑ 7 @flixos90
16 months ago

Replying to oandregal:

@flixos90 I'm still at a loss as to how measure the wp-* metrics I've used the server-timing script before but I'd need to know where do you measure wp-total, wp-template, etc. for me to replicate.

Sorry about that, I still forgot to explain those specific metrics in my last comment. Hopefully I can clear things up here.

All metrics that I initially measured (wp-before-template, wp-template etc.) are based on the Performance Lab plugin's Server-Timing API implementation. See https://github.com/WordPress/performance/pull/553, the PR description mentions what these metrics are.

  • wp-before-template is essentially the time from WordPress's timer_start() call until the very end of the template_redirect action.
  • wp-template is the time after that, i.e. from the very end of the template_redirect action until the shutdown action.
  • The respective *-db-queries is the time the subset of that time that was consumed by database queries.
  • wp-total is simply the sum of wp-before-template plus wp-template.

If you activate the Performance Lab plugin, you'll get these Server-Timing metrics out of the box (except wp-total, but you can simply use the sum of the two other metrics it is comprised of per the above). (Make sure you deactivate all Performance Lab plugin modules under "Settings > Performance", since you don't want those to run when just testing core performance.)

You can use the Server-Timing API in the Performance Lab plugin to register your own metrics, e.g. in any specific function or action. This Gist has some example usages. You can also wrap hook callbacks with a convenience function perflab_wrap_server_timing() (see the above PR description for an example of that). I used the approach from that Gist for example to then test more specifically the timing of wp_head and wp_enqueue_scripts.

The other Gist that I shared in https://core.trac.wordpress.org/ticket/57648?replyto=7#comment:5 is simply a very lightweight version of what the Performance Lab plugin's Server-Timing feature provides a proper API for. So either can be used to get the same data.

If you want to manually inspect Server-Timing data from the response headers, you can access them in the browser's dev console via performance.getEntries()[0].serverTiming.

Last edited 16 months ago by flixos90 (previous) (diff)

This ticket was mentioned in Slack in #core by flixos90. View the logs.


16 months ago

#12 @flixos90
16 months ago

I spent some time conducting additional analyses. TLDR there appears to be a regression, but much smaller than my original analysis had suggested; performance analysis is hard 🙃

Apologies in advance for the upcoming wall of text. If you don't care about how I got to where I got, please skip to the "Summary" section at the end of this comment.

My first analysis had been using the regular WP core development repository, so there's a chance that something in that configuration skewed the data notably. Anyway, because of that I now focused on configurations using WordPress in the "regular" way, i.e. the builds that you also would get from wordpress.org.

The approach for all analyses was to use the Performance Lab plugin as well as the plugin from this Gist which I quickly assembled for that purpose (and other future testing).

I tested in the following configurations (each using 2 separate sites, one with 6.1.1, the other with 6.2-beta.1):

  1. 2 TasteWP sites with WP_DEBUG, 20 requests per scenario (results)
  2. 2 InstaWP sites without WP_DEBUG, 20 requests per scenario (results)
  3. 2 local sites (set up with Local) without WP_DEBUG, 20 requests per scenario (results)
  4. same as 3., but with 1000 requests instead of 20 requests (results)

For each test I made 20 requests per scenario and used the median values for each metric.

The data from the TasteWP and InstaWP tests showed vastly conflicting data, suggesting that making requests to external sites is not a great approach due to the high variance. It also is worth highlighting that the overall response time in each request differed notably from the wp-total metric that WordPress measured itself, which makes sense since that type of request is much more affected by external factors like network connection than in a local environment. This is primarily why I consider Server-Timing metrics more helpful than TTFB (although it's also fair to say that in a local environment TTFB is usually okay since the latency is so little). Anyway, the most important thing to see was that in none of the tests wp_enqueue_global_styles() was a big issue, which meant my first analysis had probably had some problem in its configuration.

Since the above data was so unconclusive though due to its high variance between requests, I went back to testing with a local environment, using 2 sites set up with Local. Those numbers ended up with a lower variance between requests, so are more reliable. They do suggest that there is still a regression in the performance of wp_enqueue_global_styles(), but by far not as severe, which is good.

When I ran the CLI script again locally, I got the impression that results were still somewhat different between runs, so I did yet another comparison with the same Local sites, now going with that far higher request count of 1000 per scenario that @oandregal had been using. Interestingly, while the data seemed even more consistent, at the end of the day it was very much the same data that I got with only the 20 requests per scenario. So making 1000 requests per scenario is probably unnecessary and excessive; 20 seems on the low end, who knows, maybe 100 is a reasonable number?

Anway, based on the above research, I think the last analysis (this sheet) has data that we can rely on. 🎉

Summary

  • WordPress load time so far appears to be 24% faster for the block theme (TT3) and 6% faster for the classic theme (TT1).
  • The regression of wp_enqueue_global_styles() for classic themes is by far not as severe as originally thought. It's still a 60% regression, but in absolute numbers that is only 2ms.
  • For block themes, wp_enqueue_global_styles() is actually 14% (2.7ms) faster than before.
  • One other thing worth highlighting is that the process of locating the template has notably improved for block themes by a significant 39% (23ms instead of 38ms).

#13 @flixos90
16 months ago

@oandregal For next steps: I would suggest to do some profiling on wp_enqueue_global_styles() and the code paths it runs, to see where specifically it has become slower than in 6.1.1. Let me know if you have capacity to do that, or if you need help.

Also, could you try to run the Server-Timing metrics approach on your system and share the results you get? One thing that's very odd is that, just in my local WordPress core development environment, I still keep getting that same very large regression of >400% from that function. I'm not sure why, maybe there is some expensive logic in there that somehow only triggers in that specific configuration? I would be curious how that plays out in your core dev environment.

I'm changing the severity here back to normal, since now that regression is actually not as alarming as originally thought. And honestly, other than that, performance-wise WP 6.2-beta.1 looks great!

Last edited 16 months ago by flixos90 (previous) (diff)

#14 @flixos90
16 months ago

  • Severity changed from major to normal

#15 @flixos90
16 months ago

  • Summary changed from Major performance regression in `wp_enqueue_global_styles()` for classic themes to Performance regression in `wp_enqueue_global_styles()` for classic themes

This ticket was mentioned in Slack in #core by abhanonstopnews. View the logs.


16 months ago

#17 @flixos90
16 months ago

@oandregal I have finally figured out why I have been seeing a regression in wp_enqueue_global_styles(): It is because that function calls wp_get_global_stylesheet() and that specifically was modified in a notable way in #56910 / [55148].

  • This commit intentionally got rid of the 1 minute transient.
  • Since transients are "persistent" caches beyond requests (not really persistent here, but at least "persistent" for 1 minute) and we now switched them to a non-persistent cache, that function usually has a cached result for WP 6.1.1, while it doesn't for the WP 6.2 Beta.
  • However, getting rid of the 1 minute transient was an intentional change, as that is a bad practice and was "hacking" around the problem that we haven't yet found a way to implement proper cache invalidation. In other words, this is basically a calculated / expected regression at this point, as we replaced a "short term persistent" cache with a non-persistent cache.
  • It should be highlighted than when I disable caching in the function for both 6.1.1 and 6.2 Beta, the code in 6.2 Beta is way faster than the code in 6.1.1, so that's still a positive observation. Essentially for the 6.2 code, it effectively doesn't make a difference, since the function is only called 1 time in a regular page load anyway, i.e. will never actually get a cache hit. Below are the Server-Timing metrics for the wp_enqueue_global_styles() function with vs without caching enabled in the wp_get_global_stylesheet() function that it calls:
    • wp_enqueue_global_styles() with caching (TT3): 16.83ms (6.2 Beta) vs 19.18ms (6.1)
    • wp_enqueue_global_styles() without caching (TT3): 16.56ms (6.2 Beta) vs 35.53ms (6.1)
    • wp_enqueue_global_styles() with caching (TT1): 5.41ms (6.2 Beta) vs 3.35ms (6.1)
    • wp_enqueue_global_styles() without caching (TT1): 5.33ms (6.2 Beta) vs 12.04ms (6.1)
  • Note that for block themes, despite effectively the lack of caching in wp_get_global_stylesheet(), the code in 6.2 is still faster than in 6.1, which is excellent. I'm almost sad we ever had caching here, since those ~20ms faster performance of a single function (when considered without caching) are amazing to see :)

Yet, we just have to acknowledge that in the field, we will now have this small regression in the wp_get_global_stylesheet() function for classic themes, for the time being. But I believe we are still in a better place than before, as the approach previously was error-prone, and overall WP 6.2 performance sees so many improvements over 6.1 that holistically this little regression will only cause a slight reduction in the overall load time performance wins.

Based on the above, my vote would be too close this ticket as maybelater (as we should still work towards figuring out how to properly use a persistent cache around this logic).

Aside: This spreadsheet has the latest WP 6.2 performance benchmarks I ran today, covering both server-side and client-side performance for 6.2 Beta 1 and Beta 2 in comparison with 6.1.1.

Version 0, edited 16 months ago by flixos90 (next)

#18 @oandregal
16 months ago

Thanks for investigating and sharing your observations, Felix.

I wanted to add a small note: the transient behavior was a bug. Users didn't see the changes they did in the site editor until 1 minute later because the transient wasn't invalidated properly. And, in some setups, the transient wasn't even flushed at all, causing issues with layout in 6.1 as per https://github.com/WordPress/gutenberg/issues/45713 It's good that it was removed.

All in all, the overall performance for 6.2 has been improved for all themes, so I agree we should close this ticket and focus on other things (investigate how to switch these caches to be persistent, fix more performance bottlenecks for all themes, etc.).

As a side note, we can probably see a similar effect in the wp_get_global_styles_svg_filters function, that was also updated from a 1-minute transient to a non-persistent object cache in https://github.com/WordPress/wordpress-develop/pull/3926

#19 @flixos90
16 months ago

  • Milestone 6.2 deleted
  • Priority changed from high to normal
  • Resolution set to wontfix
  • Status changed from new to closed

@oandregal Thank you for the additional context, I hadn't been aware of that.

Either way, the outcome here is fortunately that this is not an issue that at this point can or should be fixed, so I'm gonna close this ticket as wontfix (it'll technically be maybelater, but we'll have a separate ticket for the overaching work to make the caches persistent anyway, so no need to use this one anymore).

All this has been a very valuable exercise though, also in regards to comparing the different ways to assess performance!

Note: See TracTickets for help on using tickets.