Make WordPress Core

Opened 6 weeks ago

Last modified 3 weeks ago

#58196 new enhancement

Improve performance of `_get_block_templates_paths`

Reported by: spacedmonkey's profile spacedmonkey Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 5.9
Component: Themes Keywords: has-patch reporter-feedback
Focuses: performance Cc:

Description

The _get_block_templates_paths function, is extremely expensive to run from a performance standard. Some profiles, mark it as much as 8% of the page load of a block theme page request. This is because of the recursive directory lookup.

Attachments (9)

Screenshot 2023-05-02 at 15.35.11.png (152.0 KB) - added by spacedmonkey 5 weeks ago.
RecursiveDirectoryIterator::hasChildren profile
Screenshot 2023-05-02 at 15.21.38.png (240.0 KB) - added by spacedmonkey 5 weeks ago.
Query monitor logging - Trunk
Screenshot 2023-05-02 at 15.20.06.png (235.7 KB) - added by spacedmonkey 5 weeks ago.
Query monitor - Logging PR.
Screenshot 2023-05-02 at 15.32.27.png (571.9 KB) - added by spacedmonkey 5 weeks ago.
Xhprof - Trunk
Screenshot 2023-05-02 at 15.31.56.png (584.5 KB) - added by spacedmonkey 5 weeks ago.
XHprof - PR
Screenshot 2023-05-02 at 15.47.09.png (20.7 KB) - added by spacedmonkey 5 weeks ago.
Blackfire - Trunk
Screenshot 2023-05-02 at 15.48.15.png (24.1 KB) - added by spacedmonkey 5 weeks ago.
Blackfire - PR
Screenshot 2023-05-02 at 18.59.04.png (259.6 KB) - added by spacedmonkey 5 weeks ago.
Query monitor - 6.3 - profiling
Screenshot 2023-05-02 at 18.57.09.png (259.4 KB) - added by spacedmonkey 5 weeks ago.
Query monitor - 6.3 + patch - profiling

Change History (25)

This ticket was mentioned in PR #4383 on WordPress/wordpress-develop by @spacedmonkey.


6 weeks ago
#1

  • Keywords has-patch added

#2 @spacedmonkey
6 weeks ago

Benchmark data from 2023 theme.

Trunk PR
Response Time (median)162.22 146.51
wp-before-template (median)91.07 76.1
wp-before-template-db-queries (median)6.81 6.76
wp-template (median)63.36 63.3
wp-total (median)154.86 140.44
wp-template-db-queries (median)3.43 3.43

#3 follow-up: @johnbillion
6 weeks ago

Do you know the technical reason that recursive globbing (nearly wrote blogging there) is faster than using the filesystem iterator classes? It would be good to know for a full understanding and for future reference.

#5 in reply to: ↑ 3 @spacedmonkey
6 weeks ago

Replying to johnbillion:

Do you know the technical reason that recursive globbing (nearly wrote blogging there) is faster than using the filesystem iterator classes? It would be good to know for a full understanding and for future reference.

It is a recursive lookup. The glob, looks for nested directory and if it is not found, then it returns early.

After a rethink here. A simpler approach would be to add a cache see #4397.

#6 @joemcgill
6 weeks ago

  • Keywords reporter-feedback added

Thanks @spacedmonkey. I'm not able to reproduce conditions where this function is profiling with the type of performance concerns you mentioned. Can you share more details about you've got your profiling setup configured to observe these results so I can try to reproduce?

I've done several profiling runs using the current trunk branch and am generally seeing that this function is called 5 times on the homepage of the twenty twenty-three theme with very little impact to overall performance (<1% inclusive wall time). Additionally, in some of my runs, this is only being called once, which would indicate that there is some upstream caching somewhere that sometimes avoids the codepath that makes the extra calls by avoiding the initial call to resolve_block_template() that is responsible for all but one call to this function (the other being triggered from build_template_part_block_instance_variations() eventually). I'd like to do some more digging to see why this is.

Regardless, the approach you're suggesting to save the results of the recursive lookup to a static variable makes sense to me, since this should only need to be calculated once per page run. I'd rather avoid trying to create a standalone helper function for this unless we're planning to use it in more places to do more performant recursive lookups like this.

#7 @flixos90
5 weeks ago

@spacedmonkey I just conducted another performance benchmark for your PR with the 2023 theme and my data is very different from the one you posted above: I don't see any notable difference in performance, all numbers are very close to each other, sometimes better or worse so that it's most likely only due to remaining variance in the results.

I used WordPress 6.2's production build for testing. See this spreadsheet for my numbers.

I don't know if using glob() has a benefit over the current approach, but based on my data and @joemcgill's feedback above, I'd question it at this point. Is there a chance that something specific in your testing environment causes this function to be unusually slow? Do you have WP_DEBUG disabled? Not sure that makes a difference, but just to point out one potential caveat.

@spacedmonkey
5 weeks ago

RecursiveDirectoryIterator::hasChildren profile

@spacedmonkey
5 weeks ago

Query monitor logging - Trunk

@spacedmonkey
5 weeks ago

Query monitor - Logging PR.

@spacedmonkey
5 weeks ago

Blackfire - Trunk

#9 @spacedmonkey
5 weeks ago

@flixos90 @joemcgill @johnbillion @oandregal

Okay, so for some context. I was doing some profiling using blackfire. On block themes, I noticed that calls to RecursiveDirectoryIterator was counting for as much as 6-8% of page load. See screenshot. I track this down to _get_block_templates_paths. What _get_block_templates_paths does, it a recursive lookup for html files in a directory. However, as this function is not cached and is called 13 times in TT3, this results in this expensive call being called multiple times. In #4383, I tried replacing, RecursiveDirectoryIterator with glob. But as it still called 13 times, it is still expensive. In #4397, I add caching to this function and remove the regex ( as it not really needed ). The results are really improves.

I run it against benchmarks ( 1000 runs ).

Trunk PR
Response Time (median)152.35133.09
wp-before-template (median)92.774.13
wp-before-template-db-queries (median)6.716.62
wp-template (median)53.3652.46
wp-total (median)146.49127.32
wp-template-db-queries (median)3.353.26

Blackfire Before, After.

XHProf Before, After.

Query monitor Before, After.

The results speak for themselves and are well worth reviewing. Adding caching is worth doing.

#10 @spacedmonkey
5 weeks ago

When checking this change, please check against trunk and not 6.2. I believe this improves goes inline with #57756 / [55687].

@spacedmonkey
5 weeks ago

Query monitor - 6.3 - profiling

@spacedmonkey
5 weeks ago

Query monitor - 6.3 + patch - profiling

#11 @spacedmonkey
5 weeks ago

Running this against Local.wp gives me different numbers.

6.2 6.2 + patch 6.3-alpha-55700 6.3-alpha-55700 + patch
Response Time (median)109.71 108.44 97.33 96.49
wp-load-alloptions-query (median)0.35 0.35 0.34 0.35
wp-before-template (median)23.06 22.56 19.26 18.59
wp-template (median)11.13 11.04 11.31 11.2
wp-total (median)34.28 33.72 30.71 29.92

Doing some more investigations.

Doing some inline profiling

6.3-alpha-55700
https://core.trac.wordpress.org/raw-attachment/ticket/58196/Screenshot%202023-05-02%20at%2018.59.04.png

6.3-alpha-55700 + patch
https://core.trac.wordpress.org/raw-attachment/ticket/58196/Screenshot%202023-05-02%20at%2018.57.09.png

Time and memory usage are a lot less in Local WP. I don't understand why.

#12 @flixos90
5 weeks ago

@spacedmonkey I conducted additional benchmarks with PR https://github.com/WordPress/wordpress-develop/pull/4397 (note that my benchmark results from 7 was for your previous PR https://github.com/WordPress/wordpress-develop/pull/4383).

Unfortunately, both of those benchmarks produce similar results, showing only very small differences in performance. Overall load time (wp-total) so far is ~1% faster, which could be a legitimate small performance win, but also may be due to variance, so we would need to validate that with further benchmarks.

That said, it still makes the previous benchmarks seeing the large performance improvements questionable - maybe it is something setup-specific? Have you tried benchmarking this on trunk with less overhead (i.e. no profiler running, WP_DEBUG etc constants disabled, ...)?

#13 @spacedmonkey
5 weeks ago

So I tested against the following.

  • Docker env - Macbook pro, I9, other data provide is from this ENV.
  • Docker env - Mac Mini M1.
  • Docker env - Ubuntu - i9.
  • Local WP - Macbook pro, I9
  • WP Engine test site.

I was only about to replicate on the Docker env - Macbook pro, I9. After re-installing docker and re-downloading the docker images, this error went away on my local setup as well. Sorry, this seems to have been an issue with my setup. It doesn't mean that others would not have this issue. I am going to keep this ticket open, just in case others see this issue.

#14 @spacedmonkey
3 weeks ago

Even through the above turned out to be an issue with my local env. An improved has surfaced based on a recommendation from @swissspidy. Replace is_dir from the current file_exists makes an improvement.

Note: See TracTickets for help on using tickets.