WordPress.org

Make WordPress Core

Opened 6 years ago

Closed 5 years ago

Last modified 5 years ago

#6877 closed defect (bug) (fixed)

preg_replace in wpautop deletes all the text in the post

Reported by: duncanmc Owned by: azaozz
Milestone: 2.8 Priority: highest omg bbq
Severity: critical Version: 2.7
Component: Administration Keywords: reporter-feedback, wpautop, has_patch
Focuses: Cc:

Description

I entered a preformatted HTML post, but the post was showing blank. I saw that it is because of the preg_replace in wpautop function in the file formatting.php. The problem code is

$pee = preg_replace('/\n?(.+?)(?:\n\s*\n|\z)/s', "<p>$1</p>\n", $pee); // make paragraphs, including one at the end

It is completely deleting the post content in some cases. There are also other users having the same problem in the forum. See http://wordpress.org/support/topic/156804?replies=6 for details. Could you please fix it?

Attachments (2)

6877.diff (1.0 KB) - added by ryan 5 years ago.
Try split and concat
6877.2.diff (1.0 KB) - added by azaozz 5 years ago.
After chatting with Ryan, cleaned up the preg_split a bit. Makes it a little faster for very large posts.

Download all attachments as: .zip

Change History (25)

comment:1 azaozz6 years ago

  • Keywords reporter-feedback added; preg_replace removed

This regexp would not remove anything. In the worst case it wouldn't create paragraphs if PCRE_UNGREEDY is compiled as an option in php. However that's extremely rare and will affect all posts, not just some.

Can't reproduce the blank post problem. Can you check if the content has been saved in the database? If yes, it may be a timing or out of memory issue with very long posts with a lot of html tags, as there are several display filters run every time and this can be slow and memory consuming for very long posts, especially if many plugins are used.

comment:2 duncanmc6 years ago

Content is properly saved in the database. The problem occurs when displaying the content on the page.
It is not related to plugins. I wrote echo($pee) to see the content of pee variable just before that preg_replace and also just after doing that preg_replace. When the code is executed, first echo before that preg_replace shows that pee has all the content, properly. However right after executing the line $pee = preg_replace('/\n?(.+?)(?:\n\s*\n|\z)/s', "<p>$1</p>\n", $pee); the pee variable becomes empty! It does not occur on all long posts.

comment:3 duncanmc6 years ago

I did some debugging by using preg_last_error() command. It gave error number 2 that is PREG_BACKTRACK_LIMIT_ERROR. It is controlled by pcre.backtrack_limit option of php. I was using the default value of 100000. Increasing the number to 1000000 solved the problem. However, 1000000 looks too high. Is it possible to change the preg_replace, so that default backtrack_limit is enough?

comment:4 ryan6 years ago

Backtrack limit strikes again. We might be able to preg_split on "\n\s*\n", strip newlines and add paragraph tags to each result, and concatenate it all back together again. Failing that, we can check for an error from preg_last_error and rollback that preg_replace. preg_last_error() is PHP >= 5.2 so a function_exists check would be needed. Trying a preg_split first is worth a shot, though.

comment:5 azaozz6 years ago

Maybe we can try "Atomic Grouping" to reduce the backtracking:
http://www.regular-expressions.info/atomic.html

The regexp would look something like this:

$pee = preg_replace('/\n?(.+?)(?>\n\s*\n|\z)/s', "<p>$1</p>\n", $pee);

(the only change would be replacing : with >).

comment:6 ryan6 years ago

  • Milestone changed from 2.5.2 to 2.9

Milestone 2.5.2 deleted

comment:7 follow-ups: duncanmc6 years ago

  • Keywords has_patch added
  • Milestone changed from 2.9 to 2.7

azaozz proposed a solution. Can you apply it?

comment:8 in reply to: ↑ 7 jacobsantos6 years ago

Replying to duncanmc:

azaozz proposed a solution. Can you apply it?

Did you test it first?

comment:9 ryan5 years ago

  • Owner changed from anonymous to azaozz

Andrew is testing this and getting some performance numbers.

comment:10 DD325 years ago

I closed a similar ticket: #7718 (as invalid, should've used worksforme actually)

comment:11 in reply to: ↑ 7 azaozz5 years ago

  • Status changed from new to assigned

Replying to duncanmc:

azaozz proposed a solution. Can you apply it?

Unfortunately this solution makes almost no difference (tested with 800KB post). Ryan's solution should work better, as splitting would virtually do no backtracking and could also be faster for large posts.

ryan5 years ago

Try split and concat

azaozz5 years ago

After chatting with Ryan, cleaned up the preg_split a bit. Makes it a little faster for very large posts.

comment:12 ryan5 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

(In [9255]) Reduce backtracking in wpautop. fixes #6877

comment:13 Denis-de-Bernardy5 years ago

cross-referencing: #8553

comment:14 Denis-de-Bernardy5 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

the test text in #8553 still causes problems

comment:15 Denis-de-Bernardy5 years ago

any odds we can get the patch in 5.7.1 as well?

comment:16 ryan5 years ago

2.7.1? This is already in 2.7.

comment:17 azaozz5 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

Seems that the problem with the test text in #8553 is unrelated. It doesn't load in the editor but still shows on the site, meaning autop (or this part of autop) isn't the cause.

comment:18 Denis-de-Bernardy5 years ago

  • Milestone changed from 2.7 to 2.7.1
  • Version changed from 2.5.1 to 2.7

sorry for re-opening, but... the test text in #8553 (http://www.misthaven.eu/test/text.txt) is related to this line in wpautop:

$pee = preg_replace('!<p>([^<]+)\s*?(</(?:div|address|form)[^>]*>)!', "<p>$1</p>$2", $pee);

upon commenting it out, things work as expected.

comment:19 Denis-de-Bernardy5 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

comment:20 Denis-de-Bernardy5 years ago

as I understand that regexp, it's supposed to add missing closed paragraph tags before closing div, address and form tags.

above it, however, we have:

	foreach ( $pees as $tinkle )
		$pee .= '<p>' . trim($tinkle, "\n") . "</p>\n";

might it be that the div/address/form regexp is not useful at all?

comment:21 azaozz5 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

(In [10527]) Reduce backtracking in autop, fixes #6877, see #8553

comment:22 azaozz5 years ago

This seems to fix the problems with the sample text above without breaking anything else, but let's test it for a few weeks.

comment:23 azaozz5 years ago

  • Milestone changed from 2.7.1 to 2.8
Note: See TracTickets for help on using tickets.