WordPress.org

Make WordPress Core

Opened 6 years ago

Last modified 4 weeks ago

#27350 reopened defect (bug)

Invalid HTML Output

Reported by: miqrogroove Owned by: pento
Milestone: Future Release Priority: normal
Severity: normal Version: 3.8
Component: Formatting Keywords: wpautop has-patch has-unit-tests
Focuses: Cc:
PR Number:

Description

Paste into editor's Text tab:

<div>hello
<pre>test</pre>
world</div>

Preview output:

<div>hello</p>
<pre>test</pre>
<p>world</p></div>

Seen in 3.8 and in trunk. Doesn't seem to be an editor bug because switching tabs will improve the output. Something wrong with wpautop()?

Attachments (7)

Autop-27350.diff (752 bytes) - added by jond 5 years ago.
Autop.php unit test coverage for #27350
Autop-27350-fixed.diff (1.4 KB) - added by jond 5 years ago.
Updated tests for #18136, #20444, and #27350
formatting-27350.diff (3.4 KB) - added by jond 5 years ago.
Fixes for #27350. Also fixes #18136 and #20444
wpautop_profile_reports.zip (201.7 KB) - added by jond 5 years ago.
wpautop profiler reports
27350.diff (555 bytes) - added by jorbin 5 years ago.
Unit Tests from MikeHansenMe originally from #30142
formatting-27350-rev1.diff (3.4 KB) - added by jond 5 years ago.
Revision for accurate preg_match return values
27350.2.diff (4.5 KB) - added by pento 4 months ago.

Download all attachments as: .zip

Change History (41)

#1 @iseulde
6 years ago

  • Component changed from General to Formatting

#2 @miqrogroove
6 years ago

Here's another test case:

hello<div>test</div>

Preview output:

<p>hello
<div>test</div>

#3 @bcworkz
6 years ago

Yes, it is wpautop() wreaking havoc. FWIW, one way to prevent it is to switch to the visual tab, which inserts a huge amount of white space. The extra white space, while being offensive to those that hand code HTML, does allow wpautop() to work correctly.

I've traced the problem to the block of code starting at line 265 of wp-includes/formatting.php. The block starting with:

$pee = preg_replace('|<p>\s*</p>|', '', $pee); // under certain strange conditions it could create a P of entirely whitespace

I've no idea how to correct for the errors though without potentially messing up other HTML.

The function essentially adds opening and closing P tags at the front and end of every line, then removes those tags not adjacent to multiple newlines. However, there's a slew of regexp replacements to account for special situations which makes the whole thing very complex and prone to unanticipated errors like you've found..

#4 @SergeyBiryukov
6 years ago

  • Keywords wpautop needs-unit-tests added

Related: #18136, #20444, #21689, #25856.

@jond
5 years ago

Autop.php unit test coverage for #27350

#5 @jond
5 years ago

Submitted the unit test for both cases discussed here. I also worked out a solution, but it fails on the other tests. I'll spend more time to understand if it is failing because of hardcoded assumptions in the prior tests or if the fix is incomplete.

@jond
5 years ago

Updated tests for #18136, #20444, and #27350

@jond
5 years ago

Fixes for #27350. Also fixes #18136 and #20444

#6 @jond
5 years ago

I took a deeper look and found that my assumptions for the unit tests were faulty. I updated the test for #27350 along with the tests I added for #18136 and #20444. For clarity Autop-27350-fixed.diff includes the updated unit tests for all three tickets (#27350, #18136, and #20444).

I submitted the fix (formatting-27350.diff`) I developed for this ticket (#27350) which also fixes #18136 and #20444. In addition to passing the new unit tests, it passes all existing unit tests.

I found that the $allblocks preg fragment allowed tr in it to match <track> elements and the p in the preg fragment matched <param> elements. Appending \b outside of the subpattern matches the word boundary to prevent unintented elements from matching the pattern.

Adding the ltrim to the preg_split prevents an empty element in the $pees blocks which can cause some other odd cases (empty <p></p> blocks).

The primary fix to the unit tests adds preg_match inside the $pees loop. It avoids unnecessarily wrapping block elements in <p> tags and handles the special case of <blockquote> tags.

I imagine this will cause a performance penalty. I have not yet run benchmarks to understand if a signficant hit is going to occur. I will report back on this.

Understanding that all of this will work will be supplanted by #25856, it is still advantageous to clean up these edge cases now and add coverage to the unit tests.

@jond
5 years ago

wpautop profiler reports

#7 @jond
5 years ago

I ran performance tests using a 4K word test document (with <blockquote>, <video>, <table> and other block elements sprinkled throughout) that was run through wpautop 1000x. I used the exact same methodology for both tests (for loop calling wpautop)and captured results through xdebug profiling. The results show a modest slowdown as expected. Results:

Original:
Runtime: 15034 milliseconds
Function: wpautop
Invocation Count: 1000
Total Self Cost: 12.60
Total Inclusive Cost: 98.21

Modified:
Runtime: 18752 milliseconds
Function: wpautop
Invocation Count: 1000
Total Self Cost: 24.40
Total Inclusive Cost: 95.96

Test text is included in the wpautop_profiler_reports.zip

#8 @miqrogroove
5 years ago

I suggest checking the performance impact of each part of the patch. For example, the \b solution may not be optimal, but I'd bet the slowdown you're seeing is related to the multiple PCRE function calls added to the inner loop.

Also note the return value of preg_match() is never > 1.

#9 @jond
5 years ago

Yes, of course, simply confused the return with preg_match_all.

As for performance, it's not much in the way of slow down at all. Even scaled up at 1000x, we're ultimately looking at an average slow down of +3ms on ~4K-word texts. It's not up to me to decide if the fixes for 3 tickets is worth +3 ms.

#10 @clifgriffin
5 years ago

+1 for implementing this. Looks like a well thought out solution that fixes multiple issues.

@jorbin
5 years ago

Unit Tests from MikeHansenMe originally from #30142

#11 @jorbin
5 years ago

  • Keywords needs-unit-tests removed

Originally created by MikeHansenMe on #30142

@jond
5 years ago

Revision for accurate preg_match return values

#12 @jond
5 years ago

  • Keywords has-patch dev-feedback added

#13 @chriscct7
4 years ago

  • Keywords needs-unit-tests added

#14 @pento
4 months ago

#40135 was marked as a duplicate.

#15 @pento
4 months ago

#39377 was marked as a duplicate.

#16 @pento
4 months ago

#18136 was marked as a duplicate.

#17 @pento
4 months ago

#43387 was marked as a duplicate.

#18 @pento
4 months ago

#21689 was marked as a duplicate.

#19 @pento
4 months ago

#20444 was marked as a duplicate.

@pento
4 months ago

#20 @pento
4 months ago

  • Keywords has-unit-tests added; dev-feedback needs-unit-tests removed
  • Milestone set to 5.3
  • Owner set to pento
  • Status changed from new to assigned

27350.2.diff combines the test cases from all of the duplicate tickets. The solution takes inspiration from the patch on this ticket, as well as the attempted fixes on the other tickets.

#21 @pento
4 months ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 45585:

Clean up stray <p> tags added by wpautop() inside block level tags.

autop() can sometimes get confused and not clean up stray <p> or </p> tags inside block level elements, which produces sub-optimal HTML. While browsers can generally handle it, there's no need to make things harder for them if we don't have to.

Props pento, ayubi, pbearne, jond, azaozz, 1994rstefan, dionysous, MikeHansenMe, jorbin, miqrogroove, niallkennedy.
Fixes #27350.

#22 @pento
4 months ago

#40603 was marked as a duplicate.

#23 @pento
4 months ago

#41531 was marked as a duplicate.

#24 @pento
4 months ago

#42748 was marked as a duplicate.

#25 @pento
4 months ago

#42410 was marked as a duplicate.

#26 @pento
4 months ago

In 45587:

Formatting: Improve performance of wpautop() on large paragraphs.

Following [45585], older versions of PHP could segfault when attempting to autop paragraphs with 10,000+ characters.

Rather than having to negative lookahead for every character in the paragraph (which could run into recursion limits), we can quickly jump ahead to the next tag and start checking from there.

See #27350.

#27 @ocean90
4 months ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

We had to revert [45585] and [45587] on dotorg since it was causing trouble in the plugin directory. The content was blank before [45587] and then causing segfaults before [45585] was reverted.

This ticket was mentioned in Slack in #meta by dd32. View the logs.


4 months ago

#29 @pento
4 months ago

In 45589:

Formatting: Revert the changes to wpautop() in [45585,45587].

See #27350.

#30 @pento
4 months ago

Sadly, the fix in [45585] had severe performance issues on PHP 7.2 and earlier. I've reverted it for now, to give it some more time to percolate.

This ticket was mentioned in Slack in #meta by ipstenu. View the logs.


4 months ago

This ticket was mentioned in Slack in #meta by sergey. View the logs.


4 months ago

#33 @pento
3 months ago

#27530 was marked as a duplicate.

#34 @azaozz
4 weeks ago

  • Milestone changed from 5.3 to Future Release

This is one of the "peculiar ways" wpautop works... The "Text" tab in the classic editor has a visual component to it. It is not a "pure HTML" input, it converts line breaks to paragraphs and <br> tags.

In this case the error is that the first closing </p> should be a <br>. It may be possible to add another regex to search for this exact case and fix it, but chances are something else will break. The bad part about such breakage is that it gets saved to the DB and then introduces problems on every page load on the front-end. As wpautop use is on its way out, not sure it's worth the risks of fixing these rare edge cases.

Note: See TracTickets for help on using tickets.