Make WordPress Core

Opened 13 months ago

Closed 12 months ago

Last modified 12 months ago

#57687 closed task (blessed) (fixed)

Add automated performance testing CI workflow MVP

Reported by: adamsilverstein's profile adamsilverstein Owned by: mukesh27's profile mukesh27
Milestone: 6.2 Priority: normal
Severity: normal Version: 6.2
Component: Build/Test Tools Keywords: has-patch has-unit-tests
Focuses: performance Cc:

Description

This ticket is a follow up to my make core blog post “Automated performance monitoring in WordPress core” where we will work on and discuss the exact technical implementation of this feature.

Technical requirements

As outlined in the blog post, the goals of the initial implementation will be limited so we can land a first stable version quickly, then we can iterate on the approach and exact metrics we collect over time.

The initial implementation should satisfy these requirements:

  • Runs on each commit to trunk.
  • Uses the standard wp-env environment, taking measurements from Twenty Twenty One (classic) and Twenty Twenty Three (block) theme home pages using the theme test data.
  • Records median values over 10 runs for the following metrics using the Server-Timing API: total WordPress load time, time the template takes to load, and time that WordPress initialization takes before the template.
  • Sends the recorded values to a new core performance dashboard which will sit alongside the existing Gutenberg dashboard.

Change History (24)

#1 @oandregal
13 months ago

It's great to see progress on this.

For visibility and create as much alignment as possible: I'd like to share that Gutenberg (and codehealth) is now tracking both Time To First Byte and Largest Contentful Paint. I consider this the v1 for tracking front-end metrics.

While tracking the time spent in the server is useful, it doesn’t directly correlate to user-perceived performance. We need to track client metrics as well. For example, WordPress processes the post to render, so it can identify the blocks in use. With that information, it enqueues only the CSS of those blocks instead of the CSS of the whole block library. This certainly takes more time in the server, though we expect it to improve the perceived performance by the user. If we only tracked server metrics, they’d report this behavior as a regression, and we’d lack tools to understand how it impacted the actual user-perceived performance.

So, I wonder if you'd consider those two metrics part of the MVP for core as well?

#2 @flixos90
13 months ago

@oandregal That is a great point, as we already discussed in Monday's hallway hangout.

We intentionally would like the MVP to have a reduced scope, in order to get that first version working quickly. But I'd say 100% we would want to have TTFB and LCP covered.

If that is trivial to add based on our setup, I think it would be totally reasonable to do that as part of the MVP already; otherwise I'd say this should be the next first thing to iterate with.

#3 @flixos90
13 months ago

  • Milestone changed from Awaiting Review to 6.3
  • Owner set to mukesh27
  • Status changed from new to assigned

Adding this to the 6.3 milestone as I believe this is a reasonable goal for getting the first MVP added to the WordPress core codebase.

The milestone doesn't matter all that much from an end user perspective since this feature won't affect end users, but obviously we still would like to eventually get this merged once we have assessed that it's working reliably.

Assigning this to @mukesh27 for now as he is primarily working on this, in this WordPress core fork project.

#4 follow-up: @oandregal
13 months ago

Another thought about meaningful things to track. In addition to: "server time" (TTFB) and "user time" (LCP), there is value for us to understand the time it takes the client to render the data it receives.

In https://github.com/WordPress/gutenberg/pull/48288 I'm proposing to add a new metric to Gutenberg's front-end performance suite: LCP-TTFB. I'm unaware of any industry metric that tracks the same thing, I'll be happy to adapt if there's any. The rationale is that TTFB takes so much of the LCP (50% for classic themes and 80% for block themes) that it may hide any meaningful difference in the client-side rendering. For example, changes that add blocking scripts/styles that impact LCP negatively may be hidden if we improve TTFB in big ways when comparing WordPress releases.

As a matter of fact, take a look at the latest numbers by Felix that include LCP:

Metric WP 6.2 beta2
LCP-TTFB for TT3 103.1ms
LCP-TTFB for TT1 131.85ms

According to this data, block themes are 1,25 times faster than classic themes when it comes to client side rendering.

Version 2, edited 13 months ago by oandregal (previous) (next) (diff)

#5 in reply to: ↑ 4 @flixos90
13 months ago

Replying to oandregal:

In https://github.com/WordPress/gutenberg/pull/48288 I'm proposing to add a new metric to Gutenberg's front-end performance suite: LCP-TTFB. I'm unaware of any industry metric that tracks the same thing, I'll be happy to adapt if there's any.

+1 to using LCP-TTFB. I have been using this in several benchmarks before when it comes to assessing client-side changes specifically.

One thing to pay attention to is to get the median right: We need to compute LCP-TTFB for every run and then take the median of those values, not take the median LCP and median TTFB and use those.

I'm sure you mention my numbers mainly as an illustrative example for your calculation, but just to clarify that that calculation technically computes the median in the incorrect way (although I think at a high level we will still note the major client-side performance benefit for block themes either way, which is great to see).

This ticket was mentioned in PR #4139 on WordPress/wordpress-develop by joemcgill.


12 months ago
#6

  • Keywords has-patch has-unit-tests added; needs-patch removed

#7 follow-up: @joemcgill
12 months ago

  • Milestone changed from 6.3 to 6.2
  • Type changed from enhancement to task (blessed)

@mukesh27 and I have been working on an initial approach for this, inspired heavily from the performance workflow in the Gutenberg repo. I've opened an initial PR for this work above, which we've been testing against a clone of the https://www.codevitals.run/ dashboard that we set up for testing purposes.

A few things I'd specifically appreciate feedback from @desrosj (or someone else familiar with our GitHub workflow best practices) on are the following:

  1. The longest running step in the workflow is by far installing npm dependencies, assuming they're uncached, but I'm also not seeing these get cached between workflows and want to make sure we're not missing something.
  2. The publish step requires that we set up a secret token in the WordPress repo, and I'm unsure who has access to add the necessary token there.

Finally, I'm proposing adding this as task (blessed) during the 6.2 milestone, since this change only effects our workflow tooling, because I think it would benefit us to begin capturing performance metrics while we're still in the final weeks of this release cycle and will set us up for a full 6.3 cycle that is completely covered by comparative metrics.

Last edited 12 months ago by joemcgill (previous) (diff)

This ticket was mentioned in PR #4160 on WordPress/wordpress-develop by @desrosj.


12 months ago
#8

This adds the Performance testing workflow proposed in #4139 to the old branches workflow to ensure it runs semi-regularly in branches that support it (6.2 forward).

For this to be committed, #4139 should be committed to trunk AND the 6.2 branch should be created in WordPress SVN.

Trac ticket: https://core.trac.wordpress.org/ticket/57687.

#9 in reply to: ↑ 7 @desrosj
12 months ago

Replying to joemcgill:

Thanks so much for both your work on this!

  1. The longest running step in the

workflow is by far installing npm dependencies, assuming they're uncached, but I'm also not seeing these get cached between workflows and want to make sure we're not missing something.

The way that caching works within the setup-node action is the global package data is cached, not the node_modules directory. We could add a separate step after that uses actions/cache to cache the actual node_modules directory, but this is generally recommended against for a few reasons:

  • It can cause breaks when using multiple node versions (not currently a real issue for us in this workflow).
  • According to the npm specific documentation in the actions/cache action, it can break when using npm ci (which we do use here).

There is a PR in the Gutenberg repository suggesting that we also cache the node_modules directory. I've been meaning to dive in and review that PR as well. The TLDR thinking there is to call npm ci when a cache hit does not occur (calling npm ci will delete node_modules before installing the modules, causing problems), and npm install when one does.

For the time being, I think I'd prefer to match how the other wordpress-develop workflows currently are set up with npm ci and no direct caching of the node_modules directory. The time for that step to complete is ~2 minutes (give or take ~30 seconds), which inline with what we see in the other wordpress-develop workflows. And if the Gutenberg PR ends up being merged with good results/no problems, I'd like to bring that over here for all of our workflows at once.

  1. The publish step requires that we set up a secret token in the WordPress repo, and I'm unsure who has access to add the necessary token there.

Just to publicly document this, @joemcgill and I discussed this within a Slack DM and he shared the repository secret that needed to be set. I've set that and workflows should now be able to access the CODEVITALS_PROJECT_TOKEN secret for reporting results on wordpress-develop.

One final thing that came up through some discussion on the PR I think is worth documenting here is a decision was made to target the current version of WordPress being built in trunk (WP 6.2 once branched) and future branches for running this testing, similar to how PHP Compatibility testing was introduced. A separate discussion would need to be had weighing level of effort, maintenance burden (any required changes to the workflow will need to be backported as necessary), and whether it's worth merging this to any branches only receiving security updates (and technically are unsupported). I think there could also be an argument to stop running performance testing on unsupported branches as new ones are released. But that's also a separate discussion for future us.

This ticket was mentioned in Slack in #core by costdev. View the logs.


12 months ago

@desrosj commented on PR #4160:


12 months ago
#11

On that note, do you mean "6.3 branch should be created" in your PR description?

Nope, 6.2 is accurate. If this change is merged into trunk prior to the 6.2 branch existing, it will result in an error because the workflow cannot dispatch a run of the performance testing workflow in a non-existent branch.

#12 @joemcgill
12 months ago

Thanks @desrosj, @flixos90, @costdev, @youknowriad, and @swissspidy for providing feedback on the PR. I believe that all of the feedback has been addressed and am looking to finalize and commit this new workflow before the 6.2 branch is created.

One important aspect of this workflow that I think is worthy of future discussion, but I don't see as a blocker to this being committed, is the strategy being used to normalize each measurement against a baseline version in order to reduce the variance that can be introduced by the GitHub runners themselves.

The basic idea here is that each time we take a set of measurements, we are also measuring the performance of a consistent version of WP that we can use as a baseline. Because that baseline version stays consistent, we can attribute differences in measurements of the baseline version to the environment, and not to differences in the code and remove that variance from the measurement we're taking on each commit in the reports that are generated in the https://codevitals.run dashboard. It should also be noted that none of that normalization happens as part of this workflow, we're just providing data to the reporting tool, which handles those calculations.

For now, we're using the most recent release (WP 6.1.1) as the baseline. The Gutenberg project uses the same strategy for their metrics, but updates the base version each time a new major version of WP is released. I don't think we will want to follow the same practice for core and will instead maintaining the same baseline over longer periods of time, but that's mainly a question for future consideration once we've had some time to monitor these metrics.

I expect us to iterate over our collection and reporting processes over time, but this current workflow should give us a good starting foundation from which to learn and improve.

joemcgill commented on PR #4160:


12 months ago
#13

Thanks @desrosj. I'm not familiar with the "old branches" workflow. Can you explain why we would want to periodically run performance metrics on old branches aside from when changes are made? Is this mainly just to ensure that the workflows are still working and aren't in need of some maintenance, or is there another reason the old branches workflow exists?

#14 @flixos90
12 months ago

@joemcgill Agreed. I think we should for now go with not updating our base tag even as new WordPress versions get released. Gutenberg has to do it because it has a dependency on WordPress and the newest Gutenberg trunk can't be used with too old versions of WordPress trunk. That's not a limitation for core itself though, so I think here we should aim to stick with the consistent base tag as long as it's reasonable. As noted, any time the base tag would change, metrics recorded before or after are not statistically comparable anymore, which means for Gutenberg that the tool is only useful for comparing performance within one WordPress release cycle.

As you already mentioned, there will be a few follow up ideas to explore. I mentioned some in https://github.com/WordPress/wordpress-develop/pull/4139#pullrequestreview-1324456566, IMO the highest priority follow up would be to expand the metrics recorded to also include a client-side metric (I would suggest to add load time Web Vitals like LCP, FCP, and TTFB). But all of those are reasonable iterations, this PR is a solid starting point and I think good to commit.

@desrosj commented on PR #4160:


12 months ago
#15

Is this mainly just to ensure that the workflows are still working and aren't in need of some maintenance

This is the primary reason.

There can be other situations where some workflows start failing unexpectedly, but I don't know that there are any for this specific workflow.

One that comes to mind is the PHPUnit testing workflow. Recently, specific versions of PHP started failing because the local Docker containers had not been updated for several months (GitHub stops running schedule events after ~3 months when a repository is completely inactive). Once this was fixed, a newer point release of PHP introduced some problems.

#16 @joemcgill
12 months ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 55459:

Build/Test Tools: Add a performance measurement workflow.

This adds a new GitHub Action workflow that measures a set of performance metrics on every commit, so we can track changes in the performance of WordPress over time and more easily identify changes that are responsible for significant performance improvements or regressions during development cycles.

The workflow measures the homepage of a classic theme (Twenty Twenty-One) and a block theme (Twenty Twenty-Three) set up with demo content from the Theme Test Data project. Using the e2e testing framework, it makes 20 requests and records the median value of the following Server Timing metrics, generated by an mu-plugin installed as part of this workflow:

  • Total server response time
  • Server time before templates are loaded
  • Server time during template rendering

In addition to measuring the performance metrics of the current commit, it also records performance metrics of a consistent version of WordPress (6.1.1) to be used as a baseline measurement in order to remove variance caused by the GitHub workers themselves from our reporting.

The measurements are collected and displayed at https://www.codevitals.run/project/wordpress.

Props adamsilverstein, mukesh27, flixos90, youknowriad, oandregal, desrosj, costdev, swissspidy.
Fixes #57687.

joemcgill commented on PR #4139:


12 months ago
#17

Committed in 55459

#18 @joemcgill
12 months ago

In 55479:

Build/Test Tools: Update the URL for logging performance workflows.

This is a follow up to r55459, which fixes the hostname used for logging performance metrics by adding the correct www subdomain.

See #57687.

This ticket was mentioned in Slack in #core by sergey. View the logs.


12 months ago

#20 @desrosj
12 months ago

In 55507:

Build/Test Tools: Test the new performance workflow regularly.

This adds the new performance measurement workflow introduced in [55459] to the one that tests old branches on a regular schedule.

This ensures it continues to work as expected over time, even if the branches with this workflow are not updated for a bit of time.

Props flixos90, joemcgill, desrosj.
See #57687.

#21 @desrosj
12 months ago

In 55508:

Build/Test Tools: Add the workflow_dispatch event to the performance workflow.

This allows workflow runs to be initiated through the GitHub API.

Follow up to [55507].

See #57687.

#22 @desrosj
12 months ago

In 55509:

Build/Test Tools: Add the workflow_dispatch event to the performance workflow.

This allows workflow runs to be initiated through the GitHub API.

Follow up to [55507].

Merges [55508] to the 6.2 branch.
See #57687.

This ticket was mentioned in Slack in #core by sergey. View the logs.


12 months ago

Note: See TracTickets for help on using tickets.