Opened 3 months ago
Last modified 3 months ago
#64180 new enhancement
Simplify and improve performance in `WP_Hook::apply_filters()`
| Reported by: |
|
Owned by: | |
|---|---|---|---|
| Milestone: | Awaiting Review | Priority: | normal |
| Severity: | minor | Version: | 6.8.3 |
| Component: | Plugins | Keywords: | |
| Focuses: | performance | Cc: |
Description
Hi All!
I am thinking about filters and actions a lot, because it is something that every site uses and every page load goes trough thousands of, so any small improvements here can have a multiplied effect globally.
So as I'm looking at code responsible for the execution of filters and actions I noticed this part of the WP_Hook::apply_filters() method:
<?php // Avoid the array_slice() if possible. if ( 0 === $the_['accepted_args'] ) { $value = call_user_func( $the_['function'] ); } elseif ( $the_['accepted_args'] >= $num_args ) { $value = call_user_func_array( $the_['function'], $args ); } else { $value = call_user_func_array( $the_['function'], array_slice( $args, 0, $the_['accepted_args'] ) ); }
I guess "Avoid the array_slice() if possible." was added for performance reasons, although I could not find a specific commit for it as it was added as part of the class.
I did some tests to see the performance impact of that array_slice to see if it is better to register hooks with all known arguments, even if those arguments are not used just to avoid the array_slice call, or is it better to use as few arguments as possible.
My tests were not conclusive, probably new versions of PHP have optimisations in place that make the difference negligible, although it may have been important at some point.
Anyway, while looking at it I figured that the best way to "Avoid the array_slice()" is to not use it at all. PHP supports passing more arguments to functions than what they declare, it is perfectly valid. So this code in fact works just the same:
<?php // Avoid the array_slice() if possible. f ( 0 === $the_['accepted_args'] ) { $value = call_user_func( $the_['function'] ); } else { $value = call_user_func_array( $the_['function'], $args ); }
Now the question is this: Can the removal of array_slice() and the additional if branch save more CPU than what the passing of the extra arguments take?
Moreover, passing arguments in case of no arguments accepted by the called function is also fine, so the code below works also just identical:
<?php $value = call_user_func_array( $the_['function'], $args );
With this one the additional question is if the removal of the branching saves more than the loss on using the known-to-be-slower call_user_func_array() instead of the call_user_func().
I did tests comparing these cases. In my test case I passed 5 arguments to apply_filters() calls, a random number (to prevent the engine optimising away the calls), a static number, a static string, the global $wp_query as an example of an object, and a large array with 1000 items, each consisting of 1000 bytes (1M bytes + overhead).
Here are my results over 1.000.000 apply_filters calls:
| Apply Filter call with argument count | Current implementation | No array_slice() | No branching |
apply_filter(...,0); | 2.038458s (100%) | 2.062812s (101.19%) | 2.062165s (101.16%) |
apply_filter(...,1); | 2.129430s (100%) | 2.090761s ( 98.18%) | 2.085732s ( 97.95%) |
apply_filter(...,2); | 2.078453s (100%) | 2.085459s (100.34%) | 2.049746s ( 98.62%) |
apply_filter(...,3); | 2.096780s (100%) | 2.097050s (100.01%) | 2.047824s ( 97.67%) |
apply_filter(...,4); | 2.106713s (100%) | 2.088943s ( 99.16%) | 2.073354s ( 98.42%) |
apply_filter(...,5); | 2.086291s (100%) | 2.091264s (100.24%) | 2.045195s ( 98.03%) |
As you can see the efforts around avoiding the array_slice() are kind of futile. In some cases it improves a bit, in some cases it is worse, it is between ~±2%.
But as you can also see the one liner clearly dominates the other two solutions, except for the 0 argument case. However I'd argue that since the default value for $accepted_args is 1 for both add_action() and add_filter(), and also a filter without any argument is probably extremely rare that first row is most likely an extremely rare occurrence on any site.
So at the end of the day the rare and little saving that call_user_func() provides is eaten up by the if branching and using a single line of call_user_func_array() for all 3 branches differentiated currently seems to be preferable.
I made a counting on an example site of mine, how many times a call_user_func* is fired in WP_Hook::apply_filters(), and it really seems 0 argument cases are extremely rare:
| Arguments accepted | Admin | Front page |
| 0 | 62 | 43 |
| 1 | 2128 | 1317 |
| 2 | 1164 | 976 |
| 3 | 742 | 287 |
| 4 | 3147 | 1927 |
| 5 | 3 | 2 |
| 6+ | 0 | 0 |
I know one example site is not conclusive at all, but still I think 0 argument calls are extremely rare in the whole ecosystem.
Now I know the differences are small, few percents only, and these are not at all going to convert to any meaningful speed-up of site loads, because the time spent by the calling of the hook functions on a real site I expect to be in the range of 10s of milliseconds (e.g. 7000 hook call * 2us = 14ms), and the savings we can expect is only about 2% of that, that is a few hundreds of microseconds per page load, but if someone originally thought it was worth (trying to) optimising the overhead of call_user_func_array() vs call_user_func() away, it may still worth to revisit the issue and switch as it seems on more modern PHP engines this optimisation no longer valid. (Here I assume it might have been valid in Sept 2016 (https://github.com/WordPress/wordpress-develop/commit/61abf68e6d6f7ba496928e70a77a3a7c538ac4f0) when the latest PHP version was v7.0).
I've done this benchmark on PHP v8.3.12. I'd be surprised if v8.4 or v8.5 would flip the conclusions, but I can imagine that e.g. v7.x could be different.
Change History (3)
#2
follow-up:
↓ 3
@
3 months ago
Hello!
Anyway, while looking at it I figured that the best way to "Avoid the array_slice()" is to not use it at all. PHP supports passing more arguments to functions than what they declare, it is perfectly valid. So this code in fact works just the same:
<?php $value = call_user_func_array( $the_['function'], $args );
There's a possibility though that this could cause a fatal error, although perhaps unlikely. Consider the case where someone tries to use intval as the callback for a filter that passes more than one argument. This could cause the additional filter parameter to be interpreted as being intended for the second integer $base parameter.
It's the problem outlined here: https://jakearchibald.com/2021/function-callback-risks/
#3
in reply to:
↑ 2
@
3 months ago
Oh, wow, I would have never thought of that. Thanks for the insight.
In that case we can still achieve similar improvements just by removing the if ( 0 === $the_['accepted_args'] ) { branch. This is because as pointed out this branch is only triggered in about 1% of the total calls, but is evaluated as the first condition in all cases. Apparently the additional overhead added in 99% of the cases is not justified by the small gain achieved in 1%.
So if we look at two more possible solutions:
Reorder branches:
<?php // Avoid the array_slice() if possible. if ( $the_['accepted_args'] >= $num_args ) { $value = call_user_func_array( $the_['function'], $args ); } else if ( 0 === $the_['accepted_args'] ) { $value = call_user_func( $the_['function'] ); } else { $value = call_user_func_array( $the_['function'], array_slice( $args, 0, $the_['accepted_args'] ) ); }
Here we use the knowledge that 0 accepted arguments only happens around 1% of the cases whereas accepting all arguments is roughly 80% of all filter calls on the front end and around 75% on the admin, so checking for the most likely branch first saves additional checks.
Simplified branches:
<?php if ( $the_['accepted_args'] >= $num_args ) { $value = call_user_func_array( $the_['function'], $args ); } else { $value = call_user_func_array( $the_['function'], array_slice( $args, 0, $the_['accepted_args'] ) ); }
And here we go one step further assuming that the additional check for 0 arguments what is still going to fail in about 95% of the remaining cases actually costs more than calling array_slice and slicing an array to an empty array and using call_user_func_array() instead of call_user_func() in ~5% of the remaining calls or ~1% of the total calls.
So I did my tests again:
(Note that the 100% values are different, because my computer is in a different stat since opening the ticket, so comparing to those would not be fair.)
| Apply Filter call with argument count | Current implementation | Reordered branches | Simplified branches |
apply_filter(...,0); | 1.984460s (100%) | 2.017885s (101.68%) | 1.993416s (100.45%) |
apply_filter(...,1); | 2.084825s (100%) | 2.038089s ( 97.76%) | 2.026259s ( 97.19%) |
apply_filter(...,2); | 2.034263s (100%) | 2.041381s (100.35%) | 1.996706s ( 98.15%) |
apply_filter(...,3); | 2.070492s (100%) | 2.057335s ( 99.36%) | 2.005684s ( 96.87%) |
apply_filter(...,4); | 2.031988s (100%) | 2.054009s (101.08%) | 2.020054s ( 99.41%) |
apply_filter(...,5); | 2.057378s (100%) | 2.010370s ( 97.72%) | 2.000880s ( 97.25%) |
The first one seems kind of similar (with only noise making differences), but the total removal of one of the three branches is seems better. Again, in the 0 argument case indeed it is slightly slower, but that being only 1% of the real world cases it is still better to only differentiate 2 cases.
I tried one more thing, because of unrelated reason that I had a negative experience a good few times when using a step debugger that these call_user_func* lines can sometimes be where I am setting breakpoints in order to identify what function/method is hooked into a given hook name, so I have to set 3 breakpoints or do a lots of manual stepping or write complex breakpoint criteria which is kind-of annoying.
So I tried out of curiosity more solutions:
Ternary conditional operator:
<?php $value = call_user_func_array( $the_['function'], $the_['accepted_args'] <= $num_args ? array_slice( $args, 0, $the_['accepted_args'] ) : $args );
This equivalent of the Simplified branches solution in behaviour.
Always slice:
<?php $value = call_user_func_array( $the_['function'], array_slice( $args, 0, $the_['accepted_args'] ) );
Just because it definitely looks like any kind of branching behaviour slows the loops down and also I think slicing an array to a length equal or above its element count should be kind of a no-op internally, so might be faster than an inline ternary.
| Apply Filter call with argument count | Current implementation | Ternary conditional operator | Always slice |
apply_filter(...,0); | 1.984460s (100%) | 2.026094s (102.10%) | 1.958154s (98.67%) |
apply_filter(...,1); | 2.084825s (100%) | 2.040432s ( 97.87%) | 1.954197s (93.73%) |
apply_filter(...,2); | 2.034263s (100%) | 1.998606s ( 98.25%) | 1.964470s (96.57%) |
apply_filter(...,3); | 2.070492s (100%) | 2.001361s ( 96.66%) | 1.974343s (95.36%) |
apply_filter(...,4); | 2.031988s (100%) | 2.005720s ( 98.71%) | 1.970653s (96.98%) |
apply_filter(...,5); | 2.057378s (100%) | 2.097180s (101.93%) | 1.981899s (96.33%) |
At the end of the day it looks like the PHP engine can just way better optimise if the same line is executed in all cases. And it has the added benefit of being much simpler code.
Also the accepting 1 argument case seems to have most gain (I guess because popping of the first element of an array is much simpler than slicing to 2 or more elements as it does not involve iterating to), which is a nice addition for real world benefits, where the default value for accepted arguments is 1, and in my checked example ~30% of all calls were accepting 1 arguments.
I acknowledge that the data presented here suggest a rather hard to explain conclusion:
Apparently using the more complex line of
$value = call_user_func_array( $the_['function'], array_slice( $args, 0, $the_['accepted_args'] ) );
is faster than using the simpler
$value = call_user_func_array( $the_['function'], $args );
which I find surprising. But I can imagine not needing to put large object onto the call stack or something like that, even if it needs slicing before is faster, but than still I cannot explain why the last line is faster still, where no slicing happens, because we pass on 5 out of 5 arguments. Still it looks like the simpler solution only achieved 98.03%, while the more complex one 96.33%. It may be a fluke, although I ran each measurements multiple times, always ignoring the first one (to let Opcache initialize) and I tried to choose a representative outcome of many runs, and not outliers. But I can admit I more scientific methodology could have been used, but I only started to look into this out of curiosity and didn't plan to really do controlled science experiments.
So long story short:
- as @westonruter pointed out
array_slicemust be supported to avoid issues in some cases - the additional effort to use the faster
call_user_func()instead ofcall_user_func_array()seem less relevant considering it only affects ~1% of the use-cases, but requires additional conditional check, and I assume creates a loop that is harder to optimise by the engine - apparently
array_slice()is only a performance penalty, if the sliced array is smaller than the input array, so the additional branching again only adds overhead. - my numbers may include a fair amount of noise, but I did not cherry pick results and I think the performance gains are real, albeit the amounts may be different in different circumstances.
Thinking about optimisation, these results may vary depending on CPU architecture as well (e.g. depending speculative execution solutions), but I think the simpler the loop is, the better it can be optimised.
Simpler code, simpler debugging, better performance.
The avoiding of the
array_slice()is mentioned here: https://core.trac.wordpress.org/ticket/17817#comment:117 but the reasoning just points todo_action(), where the same was done, butdo_actiontoday callsapply_filtersso it came full circle.