Make WordPress Core

Opened 20 months ago

Closed 12 months ago

Last modified 12 months ago

#58196 closed enhancement (fixed)

Improve performance of `_get_block_templates_paths`

Reported by: spacedmonkey's profile spacedmonkey Owned by: thekt12's profile thekt12
Milestone: 6.5 Priority: normal
Severity: normal Version: 5.9
Component: Themes Keywords: has-patch reporter-feedback
Focuses: performance Cc:

Description

The _get_block_templates_paths function, is extremely expensive to run from a performance standard. Some profiles, mark it as much as 8% of the page load of a block theme page request. This is because of the recursive directory lookup.

Attachments (12)

Screenshot 2023-05-02 at 15.35.11.png (152.0 KB) - added by spacedmonkey 20 months ago.
RecursiveDirectoryIterator::hasChildren profile
Screenshot 2023-05-02 at 15.21.38.png (240.0 KB) - added by spacedmonkey 20 months ago.
Query monitor logging - Trunk
Screenshot 2023-05-02 at 15.20.06.png (235.7 KB) - added by spacedmonkey 20 months ago.
Query monitor - Logging PR.
Screenshot 2023-05-02 at 15.32.27.png (571.9 KB) - added by spacedmonkey 20 months ago.
Xhprof - Trunk
Screenshot 2023-05-02 at 15.31.56.png (584.5 KB) - added by spacedmonkey 20 months ago.
XHprof - PR
Screenshot 2023-05-02 at 15.47.09.png (20.7 KB) - added by spacedmonkey 20 months ago.
Blackfire - Trunk
Screenshot 2023-05-02 at 15.48.15.png (24.1 KB) - added by spacedmonkey 20 months ago.
Blackfire - PR
Screenshot 2023-05-02 at 18.59.04.png (259.6 KB) - added by spacedmonkey 20 months ago.
Query monitor - 6.3 - profiling
Screenshot 2023-05-02 at 18.57.09.png (259.4 KB) - added by spacedmonkey 20 months ago.
Query monitor - 6.3 + patch - profiling
Screen Shot 2023-09-21 at 1.49.41 PM.png (227.0 KB) - added by joemcgill 15 months ago.
XHProf data for _get_block_templates_paths()
Screenshot 2023-11-27 at 15.23.20.png (146.2 KB) - added by thekt12 13 months ago.
Screenshot 2023-11-27 at 15.56.55.png (127.5 KB) - added by thekt12 13 months ago.
Performance improvement for TT4 after static implementation

Change History (52)

This ticket was mentioned in PR #4383 on WordPress/wordpress-develop by @spacedmonkey.


20 months ago
#1

  • Keywords has-patch added

#2 @spacedmonkey
20 months ago

Benchmark data from 2023 theme.

Trunk PR
Response Time (median)162.22 146.51
wp-before-template (median)91.07 76.1
wp-before-template-db-queries (median)6.81 6.76
wp-template (median)63.36 63.3
wp-total (median)154.86 140.44
wp-template-db-queries (median)3.43 3.43

#3 follow-up: @johnbillion
20 months ago

Do you know the technical reason that recursive globbing (nearly wrote blogging there) is faster than using the filesystem iterator classes? It would be good to know for a full understanding and for future reference.

#5 in reply to: ↑ 3 @spacedmonkey
20 months ago

Replying to johnbillion:

Do you know the technical reason that recursive globbing (nearly wrote blogging there) is faster than using the filesystem iterator classes? It would be good to know for a full understanding and for future reference.

It is a recursive lookup. The glob, looks for nested directory and if it is not found, then it returns early.

After a rethink here. A simpler approach would be to add a cache see #4397.

#6 @joemcgill
20 months ago

  • Keywords reporter-feedback added

Thanks @spacedmonkey. I'm not able to reproduce conditions where this function is profiling with the type of performance concerns you mentioned. Can you share more details about you've got your profiling setup configured to observe these results so I can try to reproduce?

I've done several profiling runs using the current trunk branch and am generally seeing that this function is called 5 times on the homepage of the twenty twenty-three theme with very little impact to overall performance (<1% inclusive wall time). Additionally, in some of my runs, this is only being called once, which would indicate that there is some upstream caching somewhere that sometimes avoids the codepath that makes the extra calls by avoiding the initial call to resolve_block_template() that is responsible for all but one call to this function (the other being triggered from build_template_part_block_instance_variations() eventually). I'd like to do some more digging to see why this is.

Regardless, the approach you're suggesting to save the results of the recursive lookup to a static variable makes sense to me, since this should only need to be calculated once per page run. I'd rather avoid trying to create a standalone helper function for this unless we're planning to use it in more places to do more performant recursive lookups like this.

#7 @flixos90
20 months ago

@spacedmonkey I just conducted another performance benchmark for your PR with the 2023 theme and my data is very different from the one you posted above: I don't see any notable difference in performance, all numbers are very close to each other, sometimes better or worse so that it's most likely only due to remaining variance in the results.

I used WordPress 6.2's production build for testing. See this spreadsheet for my numbers.

I don't know if using glob() has a benefit over the current approach, but based on my data and @joemcgill's feedback above, I'd question it at this point. Is there a chance that something specific in your testing environment causes this function to be unusually slow? Do you have WP_DEBUG disabled? Not sure that makes a difference, but just to point out one potential caveat.

@spacedmonkey
20 months ago

RecursiveDirectoryIterator::hasChildren profile

@spacedmonkey
20 months ago

Query monitor logging - Trunk

@spacedmonkey
20 months ago

Query monitor - Logging PR.

@spacedmonkey
20 months ago

Blackfire - Trunk

#9 @spacedmonkey
20 months ago

@flixos90 @joemcgill @johnbillion @oandregal

Okay, so for some context. I was doing some profiling using blackfire. On block themes, I noticed that calls to RecursiveDirectoryIterator was counting for as much as 6-8% of page load. See screenshot. I track this down to _get_block_templates_paths. What _get_block_templates_paths does, it a recursive lookup for html files in a directory. However, as this function is not cached and is called 13 times in TT3, this results in this expensive call being called multiple times. In #4383, I tried replacing, RecursiveDirectoryIterator with glob. But as it still called 13 times, it is still expensive. In #4397, I add caching to this function and remove the regex ( as it not really needed ). The results are really improves.

I run it against benchmarks ( 1000 runs ).

Trunk PR
Response Time (median)152.35133.09
wp-before-template (median)92.774.13
wp-before-template-db-queries (median)6.716.62
wp-template (median)53.3652.46
wp-total (median)146.49127.32
wp-template-db-queries (median)3.353.26

Blackfire Before, After.

XHProf Before, After.

Query monitor Before, After.

The results speak for themselves and are well worth reviewing. Adding caching is worth doing.

#10 @spacedmonkey
20 months ago

When checking this change, please check against trunk and not 6.2. I believe this improves goes inline with #57756 / [55687].

@spacedmonkey
20 months ago

Query monitor - 6.3 - profiling

@spacedmonkey
20 months ago

Query monitor - 6.3 + patch - profiling

#11 @spacedmonkey
20 months ago

Running this against Local.wp gives me different numbers.

6.2 6.2 + patch 6.3-alpha-55700 6.3-alpha-55700 + patch
Response Time (median)109.71 108.44 97.33 96.49
wp-load-alloptions-query (median)0.35 0.35 0.34 0.35
wp-before-template (median)23.06 22.56 19.26 18.59
wp-template (median)11.13 11.04 11.31 11.2
wp-total (median)34.28 33.72 30.71 29.92

Doing some more investigations.

Doing some inline profiling

6.3-alpha-55700
https://core.trac.wordpress.org/raw-attachment/ticket/58196/Screenshot%202023-05-02%20at%2018.59.04.png

6.3-alpha-55700 + patch
https://core.trac.wordpress.org/raw-attachment/ticket/58196/Screenshot%202023-05-02%20at%2018.57.09.png

Time and memory usage are a lot less in Local WP. I don't understand why.

#12 @flixos90
20 months ago

@spacedmonkey I conducted additional benchmarks with PR https://github.com/WordPress/wordpress-develop/pull/4397 (note that my benchmark results from 7 was for your previous PR https://github.com/WordPress/wordpress-develop/pull/4383).

Unfortunately, both of those benchmarks produce similar results, showing only very small differences in performance. Overall load time (wp-total) so far is ~1% faster, which could be a legitimate small performance win, but also may be due to variance, so we would need to validate that with further benchmarks.

That said, it still makes the previous benchmarks seeing the large performance improvements questionable - maybe it is something setup-specific? Have you tried benchmarking this on trunk with less overhead (i.e. no profiler running, WP_DEBUG etc constants disabled, ...)?

#13 @spacedmonkey
20 months ago

So I tested against the following.

  • Docker env - Macbook pro, I9, other data provide is from this ENV.
  • Docker env - Mac Mini M1.
  • Docker env - Ubuntu - i9.
  • Local WP - Macbook pro, I9
  • WP Engine test site.

I was only about to replicate on the Docker env - Macbook pro, I9. After re-installing docker and re-downloading the docker images, this error went away on my local setup as well. Sorry, this seems to have been an issue with my setup. It doesn't mean that others would not have this issue. I am going to keep this ticket open, just in case others see this issue.

#14 @spacedmonkey
19 months ago

Even through the above turned out to be an issue with my local env. An improved has surfaced based on a recommendation from @swissspidy. Replace is_dir from the current file_exists makes an improvement.

#17 @spacedmonkey
18 months ago

  • Milestone changed from Awaiting Review to 6.4
  • Owner set to spacedmonkey
  • Status changed from new to assigned

This ticket was mentioned in Slack in #core by oglekler. View the logs.


16 months ago

#19 @spacedmonkey
15 months ago

  • Owner spacedmonkey deleted

This ticket was mentioned in Slack in #core-performance by spacedmonkey. View the logs.


15 months ago

This ticket was mentioned in Slack in #core-performance by mukeshpanchal27. View the logs.


15 months ago

This ticket was mentioned in Slack in #core-performance by thekt12. View the logs.


15 months ago

#23 @joemcgill
15 months ago

  • Owner set to thekt12

This ticket was mentioned in Slack in #core-performance by thekt12. View the logs.


15 months ago

#25 @spacedmonkey
15 months ago

  • Milestone changed from 6.4 to Future Release

After a chat with @thekt12, seems like _get_block_templates_paths is not used in core since [55687].

This means that this function should be deprecated. I am moving this ticket to future release and we can loop around on this ticket later, if needed.

#26 @joemcgill
15 months ago

@spacedmonkey and @thekt12, this function is still being used in _get_block_templates_files() (reference).

In a recent profiling run I conducted, I can see it getting called 3 times during a pageload, and while the parent function (_get_block_templates_files() has a significant impact on server performance, _get_block_template_paths() doesn't seem to have much impact at all, so I think it's fine to deprioritize this.

@joemcgill
15 months ago

XHProf data for _get_block_templates_paths()

#28 @joemcgill
14 months ago

  • Milestone changed from Future Release to 6.5

Moving to 6.5 for consideration. I think the next steps here are to verify whether this function is, in fact, still in use, and whether the previously reported performance concerns still exist or have been resolved via a previous refactor or underlying change to the execution path.

This ticket was mentioned in Slack in #core-performance by mukeshpanchal27. View the logs.


14 months ago

#30 @thekt12
13 months ago

Profiled TT4 home page and found that this function _get_block_templates_paths() is being called 8 times. The function is called from another expensive function _get_block_templates_files https://core.trac.wordpress.org/attachment/ticket/58196/Screenshot%202023-11-27%20at%2015.23.20.png

_get_block_templates_files(~102 ms) is expensive mainly due to it's three child ->
_add_block_template_part_area_info (49 ms called 6 times), _add_block_template_info (~32ms called 7 times ) and _get_block_templates_paths (~16ms called 8 times).

Out of the 8 calls to _get_block_templates_paths 7 calls are for the same directory. There is very little scope to improve core logic as it mostly uses PHP's core function to traverse files, however, introducing static caching (PR 5281) or persistent caching could eliminate the repetitive call to _get_block_templates_paths.

Last edited 13 months ago by thekt12 (previous) (diff)

@thekt12
13 months ago

Performance improvement for TT4 after static implementation

This ticket was mentioned in Slack in #core-performance by thekt12. View the logs.


13 months ago

@thekt12 commented on PR #5281:


13 months ago
#32

TT4 Homepage performance metrics. Conducted 500 rounds 3 times and got the mean. This is a bit contradiction to what was observed in Blackfire.

Title PR Mean Trunk Mean
Response Time (mean) 412.69 411.71
wp-load-alloptions-query (mean) 3.36 3.37
wp-before-template (mean) 161.00 160.86
wp-before-template-db-queries (mean) 0.00 0.00

This ticket was mentioned in Slack in #core-performance by thekt12. View the logs.


13 months ago

@joemcgill commented on PR #5281:


13 months ago
#34

360effb Fixes the test failures and I've run some benchmarks locally which do show some small potential improvement, but not huge. Likely due to most of the performance impact being incurred the first time the function is called.

However, there is a decent memory improvement by avoiding redundant calls to the Iterator class methods. Not opposed to committing this enhancement, but I do think focusing on the parent function, _get_block_templates_files would have a bigger impact than this change.

Trunk
https://i0.wp.com/github.com/WordPress/wordpress-develop/assets/801097/c7eaf9f3-ca72-4c92-8121-7847bae0f12a

w/ PR applied
https://i0.wp.com/github.com/WordPress/wordpress-develop/assets/801097/3ba2f2c1-944a-43cb-9ac6-02e34dc93cad

This ticket was mentioned in Slack in #core-performance by joemcgill. View the logs.


12 months ago

This ticket was mentioned in Slack in #core-performance by mukeshpanchal27. View the logs.


12 months ago

This ticket was mentioned in Slack in #core-performance by joemcgill. View the logs.


12 months ago

#38 @joemcgill
12 months ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 57215:

Themes: Improve the performance of _get_block_templates_paths.

This avoids redundant recursive lookups for block template paths in the same base directory by implementing a static cache. It also replaces an potentially expensive file_exists call in favor of doing recursive iteration of files inside a try/catch block.

Props thekt12, spacedmonkey, flixos90, mukesh27, joemcgill.
Fixes #58196.

Note: See TracTickets for help on using tickets.