WordPress.org

Make WordPress Core

Opened 4 years ago

Last modified 22 months ago

#27350 new defect (bug)

Invalid HTML Output

Reported by: miqrogroove Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 3.8
Component: Formatting Keywords: wpautop has-patch dev-feedback needs-unit-tests
Focuses: Cc:

Description

Paste into editor's Text tab:

<div>hello
<pre>test</pre>
world</div>

Preview output:

<div>hello</p>
<pre>test</pre>
<p>world</p></div>

Seen in 3.8 and in trunk. Doesn't seem to be an editor bug because switching tabs will improve the output. Something wrong with wpautop()?

Attachments (6)

Autop-27350.diff (752 bytes) - added by jond 3 years ago.
Autop.php unit test coverage for #27350
Autop-27350-fixed.diff (1.4 KB) - added by jond 3 years ago.
Updated tests for #18136, #20444, and #27350
formatting-27350.diff (3.4 KB) - added by jond 3 years ago.
Fixes for #27350. Also fixes #18136 and #20444
wpautop_profile_reports.zip (201.7 KB) - added by jond 3 years ago.
wpautop profiler reports
27350.diff (555 bytes) - added by jorbin 3 years ago.
Unit Tests from MikeHansenMe originally from #30142
formatting-27350-rev1.diff (3.4 KB) - added by jond 3 years ago.
Revision for accurate preg_match return values

Download all attachments as: .zip

Change History (19)

#1 @iseulde
4 years ago

  • Component changed from General to Formatting

#2 @miqrogroove
4 years ago

Here's another test case:

hello<div>test</div>

Preview output:

<p>hello
<div>test</div>

#3 @bcworkz
4 years ago

Yes, it is wpautop() wreaking havoc. FWIW, one way to prevent it is to switch to the visual tab, which inserts a huge amount of white space. The extra white space, while being offensive to those that hand code HTML, does allow wpautop() to work correctly.

I've traced the problem to the block of code starting at line 265 of wp-includes/formatting.php. The block starting with:

$pee = preg_replace('|<p>\s*</p>|', '', $pee); // under certain strange conditions it could create a P of entirely whitespace

I've no idea how to correct for the errors though without potentially messing up other HTML.

The function essentially adds opening and closing P tags at the front and end of every line, then removes those tags not adjacent to multiple newlines. However, there's a slew of regexp replacements to account for special situations which makes the whole thing very complex and prone to unanticipated errors like you've found..

#4 @SergeyBiryukov
4 years ago

  • Keywords wpautop needs-unit-tests added

Related: #18136, #20444, #21689, #25856.

@jond
3 years ago

Autop.php unit test coverage for #27350

#5 @jond
3 years ago

Submitted the unit test for both cases discussed here. I also worked out a solution, but it fails on the other tests. I'll spend more time to understand if it is failing because of hardcoded assumptions in the prior tests or if the fix is incomplete.

@jond
3 years ago

Updated tests for #18136, #20444, and #27350

@jond
3 years ago

Fixes for #27350. Also fixes #18136 and #20444

#6 @jond
3 years ago

I took a deeper look and found that my assumptions for the unit tests were faulty. I updated the test for #27350 along with the tests I added for #18136 and #20444. For clarity Autop-27350-fixed.diff includes the updated unit tests for all three tickets (#27350, #18136, and #20444).

I submitted the fix (formatting-27350.diff`) I developed for this ticket (#27350) which also fixes #18136 and #20444. In addition to passing the new unit tests, it passes all existing unit tests.

I found that the $allblocks preg fragment allowed tr in it to match <track> elements and the p in the preg fragment matched <param> elements. Appending \b outside of the subpattern matches the word boundary to prevent unintented elements from matching the pattern.

Adding the ltrim to the preg_split prevents an empty element in the $pees blocks which can cause some other odd cases (empty <p></p> blocks).

The primary fix to the unit tests adds preg_match inside the $pees loop. It avoids unnecessarily wrapping block elements in <p> tags and handles the special case of <blockquote> tags.

I imagine this will cause a performance penalty. I have not yet run benchmarks to understand if a signficant hit is going to occur. I will report back on this.

Understanding that all of this will work will be supplanted by #25856, it is still advantageous to clean up these edge cases now and add coverage to the unit tests.

@jond
3 years ago

wpautop profiler reports

#7 @jond
3 years ago

I ran performance tests using a 4K word test document (with <blockquote>, <video>, <table> and other block elements sprinkled throughout) that was run through wpautop 1000x. I used the exact same methodology for both tests (for loop calling wpautop)and captured results through xdebug profiling. The results show a modest slowdown as expected. Results:

Original:
Runtime: 15034 milliseconds
Function: wpautop
Invocation Count: 1000
Total Self Cost: 12.60
Total Inclusive Cost: 98.21

Modified:
Runtime: 18752 milliseconds
Function: wpautop
Invocation Count: 1000
Total Self Cost: 24.40
Total Inclusive Cost: 95.96

Test text is included in the wpautop_profiler_reports.zip

#8 @miqrogroove
3 years ago

I suggest checking the performance impact of each part of the patch. For example, the \b solution may not be optimal, but I'd bet the slowdown you're seeing is related to the multiple PCRE function calls added to the inner loop.

Also note the return value of preg_match() is never > 1.

#9 @jond
3 years ago

Yes, of course, simply confused the return with preg_match_all.

As for performance, it's not much in the way of slow down at all. Even scaled up at 1000x, we're ultimately looking at an average slow down of +3ms on ~4K-word texts. It's not up to me to decide if the fixes for 3 tickets is worth +3 ms.

#10 @clifgriffin
3 years ago

+1 for implementing this. Looks like a well thought out solution that fixes multiple issues.

@jorbin
3 years ago

Unit Tests from MikeHansenMe originally from #30142

#11 @jorbin
3 years ago

  • Keywords needs-unit-tests removed

Originally created by MikeHansenMe on #30142

@jond
3 years ago

Revision for accurate preg_match return values

#12 @jond
3 years ago

  • Keywords has-patch dev-feedback added

#13 @chriscct7
22 months ago

  • Keywords needs-unit-tests added
Note: See TracTickets for help on using tickets.