Opened 7 weeks ago
Closed 7 days ago
#64538 closed defect (bug) (fixed)
memoize wp_normalize_path
| Reported by: |
|
Owned by: |
|
|---|---|---|---|
| Milestone: | 7.0 | Priority: | normal |
| Severity: | normal | Version: | 3.9 |
| Component: | General | Keywords: | has-patch has-unit-tests commit |
| Focuses: | performance | Cc: |
Description
I noticed that on one site wp_normalize_path() was being called about 4,000 times per request. This looks like a good place to add a simple static cache variable. I have put together a small patch to make this happen.
This has the added benefit of also dropping the number of calls to wp_is_stream(). On the test request I was looking at it went from 4,190 to 1,460. While wp_is_stream() doesn't do much, when we are talking about ~4,000 calls even tiny amounts add up quick. This also showed that adding the cache to wp_normalize_path() provides a pretty good hit rate ( hovering around 66% ).
Extracting just those to functions and comparing the uncached vs. cached wp_normalize_path(), at 4,000 calls the time went from 1.4ms to 0.4ms ( PHP 8.4.7 on M3 laptop ).
Change History (26)
This ticket was mentioned in PR #10770 on WordPress/wordpress-develop by @josephscott.
7 weeks ago
#1
- Keywords has-patch has-unit-tests added
#3
@
7 weeks ago
- Focuses performance removed
This is a classic "use more memory to reduce I/O and CPU" trade off. At the ~4,000 calls to wp_normalize_path() I was seeing about 240KB additional memory usage. This is of course going to vary depending on exactly what paths you end up normalizing.
#4
@
7 weeks ago
@josephscott out of curiosity, did you test any variation with a limited cache size to purge out old entries? 240 KB isn’t much, but it’s also not nothing when thinking about requests that can be fulfilled in 16–19 MB total. That’s potentially around a 1% bloat I think.
#5
@
7 weeks ago
I did not. I'd consider the 240KB for ~4,000 calls to be on the high side of things. In a quick benchmark generated by Opus it had the memory usage coming in about 105KB for 2,000 requests.
#6
@
7 weeks ago
@josephscott I also noticed that wp_is_stream() starts with strpos( $path, '://' ) and yet it seems like we should have some constraints to limit this, making the worst-case of inputs needlessly inefficient here.
in fact, it looks like there could be significant improvement in that function and I wonder how much of an impact it would have if you applied some optimizations there in your test code.
<?php function wp_is_stream( $path ) { if ( ! is_string( $path ) || '' === $path ) { return false; } // `php`, `file`, `http`, `https`? will always be available, or else things would break… if ( 1 === strspn( $path, 'hfp', 0, 1 ) && ( str_starts_with( $path, 'http://' ) || str_starts_with( $path, 'https://' ) || str_starts_with( $path, 'file://' ) || str_starts_with( $path, 'php://' ) ) ) { return true; } // Valid protocol names must contain alphanumerics, dots (.), plusses (+), or hyphens (-) only. $protocol_length = strspn( $path, '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.+-' ); if ( 0 === $protocol_length || 0 !== substr_compare( $path, '://', $protocol_length, 3 ) { return false; } return in_array( substr( $path, 0, $protocol_length ), stream_get_wrappers(), true ); }
on the other hand, I don’t know how often we expect stream wrappers to change. the only place in Core I one in use was in a test file, and the plugin directory mostly only shows plugins adding `guzzle` or `sftp`. would it make sense to cache stream_get_wrappers() instead of the paths? I’m not sure if we should generally be passing around those paths anyway or if they are limited internally within the vendorred code.
<?php function wp_is_stream( $path ) { static $known_schemes = null; if ( null === $known_schemes ) { $known_schemes = ' '; foreach ( stream_get_wrappers() as $scheme ) { $known_schemes .= "{$scheme} "; } } // Valid protocol names must contain alphanumerics, dots (.), plusses (+), or hyphens (-) only. $protocol_length = strspn( $path, '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.+-' ); if ( 0 === $protocol_length || 0 !== substr_compare( $path, '://', $protocol_length, 3 ) { return false; } $scheme = substr( $path, 0, $protocol_length ); return str_contains( $known_schemes, " {$scheme} " ); }
By caching the stream wrappers and eliminating the array code we should up with an extremely fast lookup that stays within a few 32-byte cache lines on most systems.
To summarize:
- how much are we losing performance-wise by looking for the scheme separator at any point in the string vs. anchoring it at the front?
- how much loss comes in through the array functions?
- how much of the overhead is calling
stream_get_wrappers()repeatedly, which, if cached, would not be more stale than caching the$pathresults but would involve considerably less memory cost. (PHP 8.5.2 on my laptop shows 12 schemes of which there are a total of 65 characters).
#7
@
7 weeks ago
Thanks to some additional ideas from Matthew Reishus I experimented with various approaches to caching here. After a number of iterations I settled on a segmented cache approach. This avoids bringing in a whole new class to do LRU, while still keeping a cap on the size & memory.
This segmented approach still got a 66% cache hit ratio on my large scale test site, with a cap of 100 entries for each segment, but the memory usage was only ~35KB. Seems like a good trade off for an additional two dozen lines or so.
I have updated https://github.com/WordPress/wordpress-develop/pull/10770/changes to use this segmented approach.
#8
@
7 weeks ago
@dmsnell I tested this version:
function wp_is_stream( $path ) {
static $known_schemes = null;
if ( null === $known_schemes ) {
$known_schemes = ' ';
foreach ( stream_get_wrappers() as $scheme ) {
$known_schemes .= "{$scheme} ";
}
}
// Valid protocol names must contain alphanumerics, dots (.), plusses (+), or hyphens (-) only.
$protocol_length = strspn( $path, '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.+-' );
if ( 0 === $protocol_length || 0 !== substr_compare( $path, '://', $protocol_length, 3 ) ) {
return false;
}
$scheme = substr( $path, 0, $protocol_length );
return str_contains( $known_schemes, " {$scheme} " );
}
I didn't see any difference in the timing for wp_normalize_path() from the current code.
#9
@
7 weeks ago
Thanks for testing that out @josephscott — it’s really interesting that the call to stream_get_wrappers() wasn’t the bottleneck, if I understand your results correctly.
Do you think this implies the cost is in the preg_replace() inside of wp_normalize_path()? I guess we are discussing a fairly trivial aspect of performance, even though it’s at startup.
#10
@
7 weeks ago
My current theory is that preg_replace() is responsible for the bulk of the work. I haven't bothered to narrow it down though, since the simple inline segmented caching seems to provide a nice win.
#11
@
7 weeks ago
- Focuses performance added
@josephscott great work. this seems a bit non-controversial now, especially with the limited impact of the cache.
linking here for posterity’s sake, I have proposed an update to wp_normalize_path() that removes the PCRE calls and eliminates the call to stream_get_wrappers().
In my little investigation is seems almost like a defect that we are using that function to determine if a path specifies a stream or not, because paths with valid stream schemes/protocols should definitely be retained even if the running PHP instance doesn’t happen to be aware of them.
if a path is provided as git+ssh://github.com/wordpress/wordpress-develop.git, for example, it should not be transformed into git+ssh:/github.com/..., which is what the legacy code does. I think it would be good to consider eliminating the call to wp_is_stream() entirely regardless of what we do.
This ticket was mentioned in Slack in #core-performance by josephscott. View the logs.
7 weeks ago
This ticket was mentioned in Slack in #core-performance by dmsnell. View the logs.
6 weeks ago
This ticket was mentioned in PR #10781 on WordPress/wordpress-develop by @dmsnell.
6 weeks ago
#14
Trac ticket: Core-64538
In addition to optimizing, two changes were introduced:
- The scheme separators for non-registered stream wrappers will be presered. Previously, these were combined into a single slash.
- Windows drive letters which are not found at the start of the string will be upper-cases if they are not. Previously, if a drive letter were found after the first character and was lower-case, it would remain lower-case.
@josephscott commented on PR #10770:
6 weeks ago
#15
I worked with Opus to address the issues brought up by Copilot.
#16
@
6 weeks ago
I was looking at https://github.com/WordPress/wordpress-develop/pull/10781 to see what it would do performance wise. After a bit I realized that I was juggling a fair number of variations of this work on wp_normalize_path() and was testing it in a environment with a huge number of moving parts. It felt like a good time to pull back and really re-evaluate what is going on, and to do it in a stand alone way.
Working with Claude I put together a stand alone benchmark script that tries the various approaches and compares them - https://github.com/josephscott/wp-normalize-path - and this turned out to be really helpful.
In these isolated tests the segmented cache did not perform as well. That ultimately led me to have Claude automatically try out variations it could think of and use the benchmark script to see how well they did. That is why the script has other variations in it that are not included in the output.
The original simple cache ended up being a better option than the segmented, original, and the variation @dmsnell has in https://github.com/WordPress/wordpress-develop/pull/10781. But Claude made a few tweaks to the simple approach that gave up a bit of the performance win for less memory usage. It labeled that variation "best".
A few other changes I made - the benchmark doesn't include Windows and UNC paths. I expect the vast majority of WP installs are only dealing with Unix like environments. So I simplified the paths to be tested to focus on that.
Another data point that I found interesting. My practically fresh WP install from wordpress-develop does about 800 calls to wp_normalize_path(). Meaning out of the box this would add less than 60KB to PHP memory and reduce the time spent by ~0.2 seconds ( in my local Apple Silicon M3 test results ).
I'm hoping that by making this public and easy for anyone to use that we can zero in on a final solution. And of course if you find any bugs, or have other variations that should be tested, this is easy to update.
@dmsnell commented on PR #10781:
6 weeks ago
#17
Measured performance data suggests two slowdowns:
- the
//replacement loop - finding the stream protocol length, which is faster when doing
strcspn( $path, ':' )instead of the positive list.
it does seem that the string-processing is itself the bottleneck.
#18
@
6 weeks ago
@josephscott this is wonderful — thank you for sharing your benchmark. with it, I was able to investigate a few more things.
- as noted in the comment above, the
while ()loop replacing//seemed to be the heaviest bottleneck. - it seems like all the string processing is indeed the main source of delay. skipping that, via the cache, is exactly what’s lifting the overall performance here.
- with a warm JIT, the cacheless optimization with my two changes above brought it within a little over 2x the runtime of the cached version. the overall performance of the non-cache version was between 25% and 45% faster than the original.
I even wondered about the odd generate_paths() functions and if we could have been skewing the results due to pre-computing the string hashes in generate_paths() outside of the timing function. That is, array lookup is much cheaper if that hash has already been computed. The cached version was still faster, though the gap was reduced when I moved this into the benchmark. In order to facilitate this I also precomputed a large list of paths as a newline-delimited string and had generate_paths() randomly point into that string to grab a line.
you have exhausted all of the cases I can think of, and I wonder how generally this approach could apply to other Core functions which see the same strings frequently. there are probably many out there that do, where a limited cache would serve us better than uncapped or large caches.
the one thing I think is worth reflecting on are the cases where a given path loads before a stream wrapper is registered. there seemed to mostly be a couple stream wrappers in the plugin directory, Guzzle being one of them. I foresee the failure scenario being the case that we normalize a path before the plugin code is activated, and then that path stays in memory.
it does seem low-risk, as one might expect plugin code to run before processing the rest of the request, so as long as those require or require_once statements don’t appear dynamically and on-demand, this should not misreport the paths. regardless, this is what I believe to be a bug anyway, so that point will be moot if we fix the bug, as the actual cached value will then only rely on the normalized path and not on stream_get_wrappers()
#19
@
9 days ago
- Keywords changes-requested added
- Milestone changed from 7.0 to Future Release
Hi there!
Moving this to Future Release as there has been no progress on the ticket for the past few weeks. Feel free to move it back to the milestone once it’s ready for merge.
#20
@
7 days ago
Apologies for the delay and the back and forth on the approach. With all of the tests pointing to the simple static cache as the best performance option at a variety of path counts, I've updated the PR to use the simple static cache method.
As noted by @mukesh27 this clearly isn't going to make it for 7.0, so in the code comments I've aimed for 7.1.0.
#23
@
7 days ago
@westonruter @jonsurrell I think that this ticket can probably go in to 7.0, as it wasn’t delayed for reasons that I think are relevant. if we had been more able to keep up with everything, I think we would have merged it weeks ago.
I’ve added my approval to the PR. Is it still something we could get in, given the history here, or is there a compelling reason to push it back?
I’ve been so distracted by working on fixing what the build change broke that a lot of things have fallen to the side, and I just don’t feel like it’s essential to push this back.
https://core.trac.wordpress.org/ticket/64538