Opened 12 years ago
Closed 12 years ago
#22286 closed enhancement (fixed)
Enhancing backslashit performance
Reported by: | jbutkus | Owned by: | nacin |
---|---|---|---|
Milestone: | 3.6 | Priority: | normal |
Severity: | normal | Version: | |
Component: | Formatting | Keywords: | has-patch commit 3.6-early |
Focuses: | Cc: |
Description
The backslashit function is rather frequently used, especially with date_i18n.
While preg_replace, currently used, is bullet proof, it is not of the greatest efficiency.
Here I attach a patch (based on WP 3.4.2 and patched from unit tests trunk (https://unit-tests.svn.wordpress.org/trunk).
The performance rationale: in a normal page load I get some 200 hits to date_i18n (resulting in ~0.7 seconds). The backslashit takes nearly 35 %' of it's execution time (monitored using XHProf, mmv).
After applying this patch the impact of backslashit is down to 19 percent.
It may not seem as a big gain, but 0.1 second per 1000 calls to function, which is called some 1500 times per regular page load, and may vary drastically, could be of some meaning.
Attachments (3)
Change History (12)
#2
follow-up:
↓ 3
@
12 years ago
isset( $string{0} ) && $string{0} >= '0' && $string{0} <= '9'
Stylistically, [0] should be used in lieu of the older {0}.
I'm thinking about ways to break this. PHP is usually really bad about comparing things. In this case, it's a fun one. If $string[0] is "a", then $string[0] >= '0' will be true. But, $string[0] <= '9' will be false. So it does work, but it's fairly weird string-to-number behavior.
Is is_numeric() slower? Or maybe strpbrk?
#3
in reply to:
↑ 2
@
12 years ago
@nacin - thank you, for your comment. I have updated diff with array-style brackets for char-in-string accessor.
As for first character comparison - it is based on ASCII table, and compiles to C code directly.
We are saving method call (JMP, more likely - long one, in assembly), which would occur if is_numeric, or strpbrk would be used.
And yes - is_numeric is slower by order of magnitude. Mostly because is_numeric( '1.8e26' )
is true, but it has to evaluate rather large number, thus causing a call to mathematical co-processor, instead of relying on general CPU.
As for strpbrk - it maps to C equivalent directly (with sanity checks, only).
Although it is also slower, than direct char comparison using order mask.
Mostly we save the iteration, which occurs during char-list traversal in strpbrk.
And what you observe, when $string[0] <= '9'
is ommited is true not only in PHP, but in C as well:
#include <stdio.h> int main( int argc, const char* argv[] ) { unsigned char single_letter = 'a'; if ( argc > 1 ) { single_letter = *(*(argv+1)); } if ( single_letter >= '0' /*&& single_letter <= '9'*/ ) { // notice, that second check is commented out printf( "It is a digit: %c\n", single_letter ); } else { printf( "It is NOT a digit: %c\n", single_letter ); } return 0; }
If we compile the program, and run it as following:
./tester a
It would say It is a digit: a.
Although with {{./tester $}} it would say It is NOT a digit: $.
And if we un-comment the following part: && single_letter <= '9'
it would respond as expected: It is NOT a digit: a vs. It is a digit: 5.
#4
@
12 years ago
- Keywords commit 3.6-early added
- Milestone changed from Awaiting Review to Future Release
Thank you very much for this. We'll get this in trunk in a few weeks once 3.5 branches off.
In the future, please continue to upload new patches rather than replacing existing ones. It can get confusing for someone else reading this ticket later, as they're unable to follow along with what changed.
#5
@
12 years ago
// Do not use [A..z], as it would also add backslashes to intermediate // characters (namely: '[', '\', ']', '^', '_', '`'). $string = addcslashes( $string, 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz' );
Seems that 'A..Za..z'
can be used instead of full enumeration without the mentioned side effect. 22286.diff changes that.
Moved the unit test into a separate patch: 22286.test.diff.
#7
@
12 years ago
In 1269/tests:
I have attached modified DIFF, where
/(a-z)/i
is escaped usingaddcslashes
instead ofpreg_replace
. Former is inteded for just this task, whilepreg_replace
is too generic, and adds unnecessary, at this time, overhead.