Opened 9 years ago
Closed 6 years ago
#35667 closed defect (bug) (fixed)
Maximum execution time error in WP_Text_Diff_Renderer_Table
Reported by: |
|
Owned by: |
|
---|---|---|---|
Milestone: | 5.0 | Priority: | normal |
Severity: | normal | Version: | |
Component: | Revisions | Keywords: | has-patch fixed-5.0 |
Focuses: | Cc: |
Description
Happens on w.org (#meta1345) and I could reproduce it locally.
Steps to reproduce:
0) Make sure to use the text editor
1) Create a post with the content of revision1.txt
2) Save the post
3) Update the post with the content of revision2.txt
4) Save the post
5) Click the Browse link in the publish metabox
Expected: Page with revisions loads
Actual: Page doesn't load: PHP Fatal error: Maximum execution time of 30 seconds exceeded in wp-includes/wp-diff.php on line 404
Attachments (17)
Change History (41)
#1
@
9 years ago
This seems to be an issue within WP_Text_Diff_Renderer_Table->interleave_changed_lines()
.
In array-matches.txt we have an array with 9752 keys (Number of keys in $orig
* number of keys in $final
, 92*106), the result of the match process.
In interleave_changed_lines.txt we have the output of the second last foreach loop. As you can see it tries to find the original and final position and when there is a diff it increases the array. At the time of the timeout the size of $final_rows
is at 13000. Reminder: The number of final rows was only 106.
IMO the huge position diffs are wrong somehow, for example
$orig_matches[$orig_row]: 51 $final_pos: 52 $orig_pos: 7
Since only the spaces are removed the final position should be the same as $orig
, maybe +-2, but not +45.
I've the feeling that the previous match process matches lines wrong which is a result of the "order by string distance" in line 352. Removing that line prevents the timeout but the final UI output is different.
#2
@
9 years ago
- Keywords has-patch needs-testing dev-feedback added; needs-patch removed
@ocean90 Thanks for the details. I'm able to replicate this issue too. However I'm not been able to replicate this in most cases. I have a local server with lots of memory/timeout settings and it works there.
Meanwhile, on my standard local with limited timeout settings, it fails. I think that the best fix for now would be to use set_time_limit to a sensible amount which would work in all cases. This timeout fix won't introduce any side effects on our existing browse/diff implementation.
I'm attaching a patch with setting the PHP timeout limit to 300. We are already using a fix like this in our wp-admin/includes/update-core.php (update-core function). We can also use this in the interleave_changed_lines functions where it can be used in situations requiring a long timeout.
#3
@
9 years ago
- Keywords needs-patch added; has-patch needs-testing dev-feedback removed
Thanks for the patch @codex-m, but let's fix the real issue. Setting a limit will result in a fatal error. I assume that there is a loop somewhere.
#5
@
9 years ago
- Keywords has-patch needs-testing added; needs-patch removed
I agree @ocean90 , I debugged this one and it seems there is an issue with our existing compute_string_distance() function. These string distances seems to be basis for the rest of the matching processes and interleaving rows with blanks.In some instances (loop array too large, it seems to jumps off its values at some point in the loop). This is because of inaccurate string distances.
I replaced it with a native PHP levenshtein() function and it now works. I did some testing for different text revision comparison and seems to work. Please confirm. Thanks :)
#6
@
9 years ago
- Keywords needs-patch added; has-patch needs-testing removed
@codex-m levenshtein()
has only one issue, from the docs:
This function returns the Levenshtein-Distance between the two argument strings or -1, if one of the argument strings is longer than the limit of 255 characters.
The MediaWiki extension Translate has a wrapper for levenshtein()
which supports more characters, see
https://github.com/wikimedia/mediawiki-extensions-Translate/blob/master/ttmserver/TTMServer.php#L79-L125.
The same is used in GlotPress, see https://github.com/GlotPress/GlotPress-WP/blob/679d500e5ffbdfed6ac124481780205c3b7854d9/gp-includes/strings.php#L114-L175.
Maybe this implementation works here too?
@
9 years ago
Implemented custom levenshtein()
from https://github.com/wikimedia/mediawiki-extensions-Translate/blob/master/ttmserver/TTMServer.php#L79-L125
#7
@
9 years ago
- Keywords has-patch added; needs-patch removed
I did some quick test with different revisions and it seems to be working. Please check. Thanks !
#8
@
9 years ago
- Keywords needs-testing added
@ocean90 Yep it will work..
@subharanjan We have same thoughts on custom implementation for levenshtein(). Only that mbstring is a non-default extension: http://php.net/manual/en/mbstring.installation.php
We should check if the server supports and that mb_strlen is available. If not add a compatible workaround. I followed similar concepts to GlotPress here: https://github.com/GlotPress/GlotPress-WP/blob/679d500e5ffbdfed6ac124481780205c3b7854d9/gp-includes/strings.php#L28
Extending your custom implementation, I added the 35667.3.patch
I tested this locally and it works also. :)
#9
@
9 years ago
No need to check for the existence of mb_strlen()
, WordPress includes a compat version of it.
#10
@
9 years ago
Thanks @dd32 & @codex-m ! Are we good with https://core.trac.wordpress.org/attachment/ticket/35667/35667.2.patch then?
#11
@
9 years ago
@dd32 Thanks, I didn't know that :)
@subharanjan I'm still seeing some timeout errors using the new test file (attached, longrevision1.txt , longrevision2.txt)
Surprisingly 35667.1.diff works :) Although it does not have the workaround on levenshtein limitation.
#13
@
9 years ago
- Keywords has-patch added; needs-patch removed
OK I have found a reliable PHP custom equivalent of Levenshtein that will process more than 255 characters. Also I find a bit faster than the previous examples. In my test, there is no more maximum execution timeout error on sensible long strings or text. This is customized example is based here: http://php.net/manual/en/function.levenshtein.php#85387
I attached 35667.4.patch adjusted to WordPress implementation :)
#14
@
9 years ago
- Milestone changed from 4.5 to Future Release
Tested 35667.4.patch on the post mentioned in #meta135 and it makes it worse. I'll try to get an export of this post with its revisions so we can use it for further testing.
I think we have go a step higher in the stack and check why compute_string_distance()
is even needed and/or if the "old MediaWiki Diff" on which WP_Text_Diff_Renderer_Table
seems to be build on got some updates which we can use.
#15
@
7 years ago
I've run into this on some other posts, so I did some performance work on it, in 35667.4.diff:
As has been noted on this ticket, compute_string_distance()
is slow, which is caused by a combination of the count_chars()
calls (mostly on very long lines), and the difference calculation, for all lines. A simple cache of these values reduces the total time spent in compute_string_distance()
by 50%, more if there are any repeated lines in either of the posts.
Using md5()
to generate the cache keys gives us fixed length keys, which causes future lookups to be much faster, and memory usage to be lower, provided the average line length is < 32 chars.
The second slow part is in interleave_changed_lines()
, when adding blank lines for padding. array_splice()
is a super expensive function call, calling it in a loop causes a significant slowdown. Instead, we can call it once with an array of blank lines to insert.
Running on revision1.txt / revision1.txt, it reduces a call to wp_text_diff()
from 1 second to 0.1 second. It does cause an increase in memory usage, approximately 3MB for this test case.
#16
@
6 years ago
We are running into this a lot now we are running gutenberg so we have had to disable revisions. I believe that because post content has much more markup into in the form of html comment for blocks, it is making comparison timeout.
This issue directly effects gutenberg and should be moved to 5.0.0 milestone. cc @pento
#17
@
6 years ago
- Milestone changed from Future Release to 5.0
Thanks for the ping, @spacedmonkey. We can investigate this some more.
#18
@
6 years ago
35667.5.diff refreshes the patch to apply cleanly against the 5.0 branch. It still gives the same performance improvements I saw previously.
@spacedmonkey: Do you have some example revisions I could test with?
#19
@
6 years ago
I know that @trepmal was looking into this issue and had done some brenchmarking on it.
I've done some preliminary benchmarking diffing some large (50-100 paragraph) text strings:
Using patch: 0.2024610042572 seconds
Using core: 92.410513162613 seconds
@pento It is hard to write unit tests for performance issues. Any thoughts on testing this other than manual testing.
#20
@
6 years ago
- Keywords needs-testing removed
@trepmal was kind enough to send me the test data she's been working with, that was super helpful! (lorem1.txt and lorem2.txt, for anyone interested.)
I did some more cachegrind analysis based on these, but all the further wins I found had drastically reduce benefit: maybe a total of 30ms improvement from the ~260ms 35667.5.diff runs in on my computer.
As far as testing goes, I'm happy to go with manual testing for now. Setting up the infrastructure for performance regression testing is outside the scope of this issue. 😉
https://core.trac.wordpress.org/browser/trunk/src/wp-includes/wp-diff.php?rev=34348#L352