WordPress.org

Make WordPress Core

Opened 4 years ago

Closed 9 months ago

#35667 closed defect (bug) (fixed)

Maximum execution time error in WP_Text_Diff_Renderer_Table

Reported by: ocean90 Owned by: pento
Milestone: 5.0 Priority: normal
Severity: normal Version:
Component: Revisions Keywords: has-patch fixed-5.0
Focuses: Cc:

Description

Happens on w.org (#meta1345) and I could reproduce it locally.

Steps to reproduce:

0) Make sure to use the text editor
1) Create a post with the content of revision1.txt
2) Save the post
3) Update the post with the content of revision2.txt
4) Save the post
5) Click the Browse link in the publish metabox

Expected: Page with revisions loads
Actual: Page doesn't load: PHP Fatal error: Maximum execution time of 30 seconds exceeded in wp-includes/wp-diff.php on line 404

Attachments (17)

revision1.txt (4.2 KB) - added by ocean90 4 years ago.
revision2.txt (3.9 KB) - added by ocean90 4 years ago.
array-matches.txt (333.6 KB) - added by ocean90 4 years ago.
https://core.trac.wordpress.org/browser/trunk/src/wp-includes/wp-diff.php?rev=34348#L352
interleave_changed_lines.txt (342.6 KB) - added by ocean90 4 years ago.
Output of WP_Text_Diff_Renderer_Table->interleave_changed_lines()
35667.diff (494 bytes) - added by codex-m 4 years ago.
Patch
35667.1.diff (1003 bytes) - added by codex-m 4 years ago.
Updated patch
35667.2.patch (2.5 KB) - added by subharanjan 4 years ago.
Implemented custom levenshtein() from https://github.com/wikimedia/mediawiki-extensions-Translate/blob/master/ttmserver/TTMServer.php#L79-L125
35667.3.patch (2.8 KB) - added by codex-m 4 years ago.
Add checks if mb_strlen exists
longrevision1.txt (18.6 KB) - added by codex-m 4 years ago.
Long Revision 1
longrevision2.txt (19.8 KB) - added by codex-m 4 years ago.
Long revision 2
35667.4.patch (2.5 KB) - added by codex-m 4 years ago.
Patch version 4
35667.2.diff (2.8 KB) - added by pento 15 months ago.
35667.3.diff (2.8 KB) - added by pento 15 months ago.
35667.4.diff (3.0 KB) - added by pento 15 months ago.
35667.5.diff (3.3 KB) - added by pento 11 months ago.
lorem1.txt (19.7 KB) - added by pento 11 months ago.
lorem2.txt (43.7 KB) - added by pento 11 months ago.

Download all attachments as: .zip

Change History (41)

@ocean90
4 years ago

@ocean90
4 years ago

@ocean90
4 years ago

Output of WP_Text_Diff_Renderer_Table->interleave_changed_lines()

#1 @ocean90
4 years ago

This seems to be an issue within WP_Text_Diff_Renderer_Table->interleave_changed_lines().

In array-matches.txt we have an array with 9752 keys (Number of keys in $orig * number of keys in $final, 92*106), the result of the match process.

In interleave_changed_lines.txt we have the output of the second last foreach loop. As you can see it tries to find the original and final position and when there is a diff it increases the array. At the time of the timeout the size of $final_rows is at 13000. Reminder: The number of final rows was only 106.

IMO the huge position diffs are wrong somehow, for example

$orig_matches[$orig_row]: 51
$final_pos: 52
$orig_pos: 7

Since only the spaces are removed the final position should be the same as $orig, maybe +-2, but not +45.
I've the feeling that the previous match process matches lines wrong which is a result of the "order by string distance" in line 352. Removing that line prevents the timeout but the final UI output is different.

@codex-m
4 years ago

Patch

#2 @codex-m
4 years ago

  • Keywords has-patch needs-testing dev-feedback added; needs-patch removed

@ocean90 Thanks for the details. I'm able to replicate this issue too. However I'm not been able to replicate this in most cases. I have a local server with lots of memory/timeout settings and it works there.

Meanwhile, on my standard local with limited timeout settings, it fails. I think that the best fix for now would be to use set_time_limit to a sensible amount which would work in all cases. This timeout fix won't introduce any side effects on our existing browse/diff implementation.

I'm attaching a patch with setting the PHP timeout limit to 300. We are already using a fix like this in our wp-admin/includes/update-core.php (update-core function). We can also use this in the interleave_changed_lines functions where it can be used in situations requiring a long timeout.

#3 @ocean90
4 years ago

  • Keywords needs-patch added; has-patch needs-testing dev-feedback removed

Thanks for the patch @codex-m, but let's fix the real issue. Setting a limit will result in a fatal error. I assume that there is a loop somewhere.

#4 @jorbin
4 years ago

  • Owner set to ocean90
  • Status changed from new to assigned

@codex-m
4 years ago

Updated patch

#5 @codex-m
4 years ago

  • Keywords has-patch needs-testing added; needs-patch removed

I agree @ocean90 , I debugged this one and it seems there is an issue with our existing compute_string_distance() function. These string distances seems to be basis for the rest of the matching processes and interleaving rows with blanks.In some instances (loop array too large, it seems to jumps off its values at some point in the loop). This is because of inaccurate string distances.

I replaced it with a native PHP levenshtein() function and it now works. I did some testing for different text revision comparison and seems to work. Please confirm. Thanks :)

#6 @ocean90
4 years ago

  • Keywords needs-patch added; has-patch needs-testing removed

@codex-m levenshtein() has only one issue, from the docs:

This function returns the Levenshtein-Distance between the two argument strings or -1, if one of the argument strings is longer than the limit of 255 characters.

The MediaWiki extension Translate has a wrapper for levenshtein() which supports more characters, see
https://github.com/wikimedia/mediawiki-extensions-Translate/blob/master/ttmserver/TTMServer.php#L79-L125.

The same is used in GlotPress, see https://github.com/GlotPress/GlotPress-WP/blob/679d500e5ffbdfed6ac124481780205c3b7854d9/gp-includes/strings.php#L114-L175.

Maybe this implementation works here too?

#7 @subharanjan
4 years ago

  • Keywords has-patch added; needs-patch removed

I did some quick test with different revisions and it seems to be working. Please check. Thanks !

@codex-m
4 years ago

Add checks if mb_strlen exists

#8 @codex-m
4 years ago

  • Keywords needs-testing added

@ocean90 Yep it will work..

@subharanjan We have same thoughts on custom implementation for levenshtein(). Only that mbstring is a non-default extension: http://php.net/manual/en/mbstring.installation.php

We should check if the server supports and that mb_strlen is available. If not add a compatible workaround. I followed similar concepts to GlotPress here: https://github.com/GlotPress/GlotPress-WP/blob/679d500e5ffbdfed6ac124481780205c3b7854d9/gp-includes/strings.php#L28

Extending your custom implementation, I added the 35667.3.patch

I tested this locally and it works also. :)

#9 @dd32
4 years ago

No need to check for the existence of mb_strlen(), WordPress includes a compat version of it.

@codex-m
4 years ago

Long Revision 1

@codex-m
4 years ago

Long revision 2

#11 @codex-m
4 years ago

@dd32 Thanks, I didn't know that :)
@subharanjan I'm still seeing some timeout errors using the new test file (attached, longrevision1.txt , longrevision2.txt)

Surprisingly 35667.1.diff works :) Although it does not have the workaround on levenshtein limitation.

#12 @codex-m
4 years ago

  • Keywords needs-patch added; has-patch removed

@codex-m
4 years ago

Patch version 4

#13 @codex-m
4 years ago

  • Keywords has-patch added; needs-patch removed

OK I have found a reliable PHP custom equivalent of Levenshtein that will process more than 255 characters. Also I find a bit faster than the previous examples. In my test, there is no more maximum execution timeout error on sensible long strings or text. This is customized example is based here: http://php.net/manual/en/function.levenshtein.php#85387

I attached 35667.4.patch adjusted to WordPress implementation :)

#14 @ocean90
4 years ago

  • Milestone changed from 4.5 to Future Release

Tested 35667.4.patch on the post mentioned in #meta135 and it makes it worse. I'll try to get an export of this post with its revisions so we can use it for further testing.

I think we have go a step higher in the stack and check why compute_string_distance() is even needed and/or if the "old MediaWiki Diff" on which WP_Text_Diff_Renderer_Table seems to be build on got some updates which we can use.

@pento
15 months ago

@pento
15 months ago

@pento
15 months ago

#15 @pento
15 months ago

I've run into this on some other posts, so I did some performance work on it, in 35667.4.diff:

As has been noted on this ticket, compute_string_distance() is slow, which is caused by a combination of the count_chars() calls (mostly on very long lines), and the difference calculation, for all lines. A simple cache of these values reduces the total time spent in compute_string_distance() by 50%, more if there are any repeated lines in either of the posts.

Using md5() to generate the cache keys gives us fixed length keys, which causes future lookups to be much faster, and memory usage to be lower, provided the average line length is < 32 chars.

The second slow part is in interleave_changed_lines(), when adding blank lines for padding. array_splice() is a super expensive function call, calling it in a loop causes a significant slowdown. Instead, we can call it once with an array of blank lines to insert.

Running on revision1.txt / revision1.txt, it reduces a call to wp_text_diff() from 1 second to 0.1 second. It does cause an increase in memory usage, approximately 3MB for this test case.

#16 @spacedmonkey
11 months ago

We are running into this a lot now we are running gutenberg so we have had to disable revisions. I believe that because post content has much more markup into in the form of html comment for blocks, it is making comparison timeout.

This issue directly effects gutenberg and should be moved to 5.0.0 milestone. cc @pento

#17 @pento
11 months ago

  • Milestone changed from Future Release to 5.0

Thanks for the ping, @spacedmonkey. We can investigate this some more.

@pento
11 months ago

#18 @pento
11 months ago

35667.5.diff refreshes the patch to apply cleanly against the 5.0 branch. It still gives the same performance improvements I saw previously.

@spacedmonkey: Do you have some example revisions I could test with?

#19 @spacedmonkey
11 months ago

I know that @trepmal was looking into this issue and had done some brenchmarking on it.

I've done some preliminary benchmarking diffing some large (50-100 paragraph) text strings:
Using patch: 0.2024610042572 seconds
Using core: 92.410513162613 seconds

@pento It is hard to write unit tests for performance issues. Any thoughts on testing this other than manual testing.

@pento
11 months ago

@pento
11 months ago

#20 @pento
11 months ago

  • Keywords needs-testing removed

@trepmal was kind enough to send me the test data she's been working with, that was super helpful! (lorem1.txt and lorem2.txt, for anyone interested.)

I did some more cachegrind analysis based on these, but all the further wins I found had drastically reduce benefit: maybe a total of 30ms improvement from the ~260ms 35667.5.diff runs in on my computer.

As far as testing goes, I'm happy to go with manual testing for now. Setting up the infrastructure for performance regression testing is outside the scope of this issue. 😉

#21 @pento
11 months ago

  • Owner changed from ocean90 to pento

#22 @pento
11 months ago

In 43775:

Revisions: Improve performance of WP_Text_Diff_Renderer_Table.

WP_Text_Diff_Renderer_Table is used to generate the diff view in revisions, but there were some cases that could cause it to take excessive amounts of time to run.

Some noteable cases include:

  • When a large number of new lines were inserted in the middle of the post from one revision to the next.
  • When both revisions contain >100 lines.
  • When either revision contains a lot of long lines.

In one extreme test case, the diff view took over a minute to generate. With this change, it now takes less than a second.

See #35667.

#23 @pento
11 months ago

  • Keywords fixed-5.0 added

#24 @pento
9 months ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 44129:

Revisions: Improve performance of WP_Text_Diff_Renderer_Table.

WP_Text_Diff_Renderer_Table is used to generate the diff view in revisions, but there were some cases that could cause it to take excessive amounts of time to run.

Some noteable cases include:

  • When a large number of new lines were inserted in the middle of the post from one revision to the next.
  • When both revisions contain >100 lines.
  • When either revision contains a lot of long lines.

In one extreme test case, the diff view took over a minute to generate. With this change, it now takes less than a second.

Merges [43775] from the 5.0 branch to trunk.

Fixes #35667.

Note: See TracTickets for help on using tickets.