#28724 closed enhancement (fixed)
Optimize Regexp Usage in wptexturize()
Reported by: | miqrogroove | Owned by: | SergeyBiryukov |
---|---|---|---|
Milestone: | 4.0 | Priority: | normal |
Severity: | normal | Version: | |
Component: | Formatting | Keywords: | has-patch |
Focuses: | Cc: |
Description
The effort to consolidate patterns in #28623 didn't pan out.
I think the next logical step in reducing CPU time in wptexturize is to structure the various patterns into groups by target and run them only if the target char(s) exist within the subject string. As most of the patterns involve quotes or apostrophes, it should be faster to run strpos within the inner loop before running multiple regexp replacements. If those replacements will do nothing, then they do not need to run during that one pass through the loop.
Example:
This is a <i>test</i> post.
Currently, all of the patterns would run 3 times each for that post. With the proposed changes, we could run strpos about a dozen times and then exit wptexturize with the same results in less time.
Attachments (4)
Change History (20)
#3
@
10 years ago
- Keywords has-patch added
First draft tested better than expected. I'll continue brainstorming, meanwhile the patch can be used as-is.
#4
@
10 years ago
The next thing I'd like to study is the performance implication of having whitespace between HTML tags. For example, when I insert a table in the editor it tends to get unwrapped like
<table> <tbody> <tr> <td> ...
Does this mean texturize is looping through a non-empty string after each element whose content is nothing more than a line feed? This might be another situation where it is possible to avoid running regexp altogether.
#6
@
10 years ago
I got another 25% improvement just by stubbing this line after the "do texturize" comment:
if (empty(trim($curl))) continue;
This is definitely worth looking into.
#7
@
10 years ago
In miqro-28724-part2.patch:
- Optimize handling of whitespace in wptexturize().
What I found was that HTML-heavy posts tend to have a large amount of whitespace between elements, due to the formatting imposed by TinyMCE. Lists and tables in particular tend to have a newline after every element. These cases improve by ~25% less time spent texturizing whitespace.
A moderately long post with little HTML still tends to contain whitespace around commonly used shortcodes, the "more" tag, and the inline formatting tags. There is a minimal speed improvement in these cases.
Short posts such as "hello world" will have a fixed cost and tend not to benefit from optimizations.
Let's commit that and call it fixed. :)
#8
@
10 years ago
In a side-by-side comparison of v3.8 and my working copy, results of running wptexturize 10 times on a sample post:
v3.8 - 0.260 seconds
V4.0 - 0.061 seconds
This includes the impact of bug fixes, added patterns, HTML avoidance, calls to _wptexturize_pushpop_element(), and running the inner loop.
This ticket was mentioned in IRC in #wordpress-dev by miqrogroove. View the logs.
10 years ago
#10
follow-up:
↓ 11
@
10 years ago
miqro-28724-part2.patch gives me a fatal error:
Fatal error: Can't use function return value in write context in wp-includes/formatting.php on line 237
It can probably be replaced with:
if ( '' === trim( $curl ) ) { continue; }
Is there a reason not to make this check the first in the loop? Can the regex can be modified to avoid newlines instead?
#11
in reply to:
↑ 10
;
follow-up:
↓ 13
@
10 years ago
Replying to SergeyBiryukov:
Is there a reason not to make this check the first in the loop? Can the regex can be modified to avoid newlines instead?
The position was based on performance testing. This particular spot gives slightly better results depending on the type of input.
Regexp changes were considered but would make the stack logic very much more complex. The stack helper currently assumes the input begins and ends with the tag braces.
Can you tell me which version of PHP is giving a fatal error? I've never seen that before.
#13
in reply to:
↑ 11
@
10 years ago
Replying to miqrogroove:
Can you tell me which version of PHP is giving a fatal error? I've never seen that before.
A note from http://www.php.net/manual/en/function.empty.php:
Prior to PHP 5.5,
empty()
only supports variables; anything else will result in a parse error.
#14
@
10 years ago
- Owner set to SergeyBiryukov
- Resolution set to fixed
- Status changed from new to closed
In 29008:
3x Performance Boost