Make WordPress Core

Opened 11 years ago

Closed 11 years ago

#22286 closed enhancement (fixed)

Enhancing backslashit performance

Reported by: jbutkus's profile jbutkus Owned by: nacin's profile nacin
Milestone: 3.6 Priority: normal
Severity: normal Version:
Component: Formatting Keywords: has-patch commit 3.6-early
Focuses: Cc:

Description

The backslashit function is rather frequently used, especially with date_i18n.

While preg_replace, currently used, is bullet proof, it is not of the greatest efficiency.

Here I attach a patch (based on WP 3.4.2 and patched from unit tests trunk (https://unit-tests.svn.wordpress.org/trunk).

The performance rationale: in a normal page load I get some 200 hits to date_i18n (resulting in ~0.7 seconds). The backslashit takes nearly 35 %' of it's execution time (monitored using XHProf, mmv).

After applying this patch the impact of backslashit is down to 19 percent.

It may not seem as a big gain, but 0.1 second per 1000 calls to function, which is called some 1500 times per regular page load, and may vary drastically, could be of some meaning.

Attachments (3)

backslashit-with-array-access.diff (1.5 KB) - added by jbutkus 11 years ago.
Using addcslashes for [A..z] and ASCII char comparison for starting digit
22286.diff (816 bytes) - added by SergeyBiryukov 11 years ago.
22286.test.diff (535 bytes) - added by SergeyBiryukov 11 years ago.

Download all attachments as: .zip

Change History (12)

#1 @jbutkus
11 years ago

I have attached modified DIFF, where /(a-z)/i is escaped using addcslashes instead of preg_replace . Former is inteded for just this task, while preg_replace is too generic, and adds unnecessary, at this time, overhead.

#2 follow-up: @nacin
11 years ago

isset( $string{0} ) && $string{0} >= '0' && $string{0} <= '9'

Stylistically, [0] should be used in lieu of the older {0}.

I'm thinking about ways to break this. PHP is usually really bad about comparing things. In this case, it's a fun one. If $string[0] is "a", then $string[0] >= '0' will be true. But, $string[0] <= '9' will be false. So it does work, but it's fairly weird string-to-number behavior.

Is is_numeric() slower? Or maybe strpbrk?

@jbutkus
11 years ago

Using addcslashes for [A..z] and ASCII char comparison for starting digit

#3 in reply to: ↑ 2 @jbutkus
11 years ago

@nacin - thank you, for your comment. I have updated diff with array-style brackets for char-in-string accessor.

As for first character comparison - it is based on ASCII table, and compiles to C code directly.
We are saving method call (JMP, more likely - long one, in assembly), which would occur if is_numeric, or strpbrk would be used.

And yes - is_numeric is slower by order of magnitude. Mostly because is_numeric( '1.8e26' ) is true, but it has to evaluate rather large number, thus causing a call to mathematical co-processor, instead of relying on general CPU.

As for strpbrk - it maps to C equivalent directly (with sanity checks, only).
Although it is also slower, than direct char comparison using order mask.
Mostly we save the iteration, which occurs during char-list traversal in strpbrk.

And what you observe, when $string[0] <= '9' is ommited is true not only in PHP, but in C as well:

#include <stdio.h>

int main( int argc, const char* argv[] ) {
    unsigned char single_letter = 'a';

    if ( argc > 1 ) {
        single_letter = *(*(argv+1));
    }

    if ( single_letter >= '0' /*&& single_letter <= '9'*/ ) { // notice, that second check is commented out
        printf( "It is a digit: %c\n", single_letter );
    } else {
        printf( "It is NOT a digit: %c\n", single_letter );
    }

    return 0;
}

If we compile the program, and run it as following:
./tester a
It would say It is a digit: a.

Although with {{./tester $}} it would say It is NOT a digit: $.

And if we un-comment the following part: && single_letter <= '9' it would respond as expected: It is NOT a digit: a vs. It is a digit: 5.

#4 @nacin
11 years ago

  • Keywords commit 3.6-early added
  • Milestone changed from Awaiting Review to Future Release

Thank you very much for this. We'll get this in trunk in a few weeks once 3.5 branches off.

In the future, please continue to upload new patches rather than replacing existing ones. It can get confusing for someone else reading this ticket later, as they're unable to follow along with what changed.

#5 @SergeyBiryukov
11 years ago

// Do not use [A..z], as it would also add backslashes to intermediate
// characters (namely: '[', '\', ']', '^', '_', '`').
$string = addcslashes(
	$string,
	'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'
);

Seems that 'A..Za..z' can be used instead of full enumeration without the mentioned side effect. 22286.diff changes that.

Moved the unit test into a separate patch: 22286.test.diff.

#6 @SergeyBiryukov
11 years ago

  • Milestone changed from Future Release to 3.6

#7 @nacin
11 years ago

In 1269/tests:

Additional test case for backslashit(). props SergeyBiryukov. see #22286.

#8 @nacin
11 years ago

Oops, I did not realize that the original patch from jbutkus included the unit test.

#9 @nacin
11 years ago

  • Owner set to nacin
  • Resolution set to fixed
  • Status changed from new to closed

In 24051:

Improve the performance of backslashit() by avoiding regular expressions. backslashit() is used heavily in date_i18n().

props jbutkus.
fixes #22286.

Note: See TracTickets for help on using tickets.