WordPress.org

Make WordPress Core

Opened 7 weeks ago

#47973 new defect (bug)

Invalid HTML output from image_caption + wpautop combination

Reported by: terokilkanen Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 5.2.2
Component: Formatting Keywords:
Focuses: coding-standards Cc:
PR Number:

Description

Hello,

We are using WordPress on our site https://www.bonus.ca, and the HTML output on the page contains extra </p> tags.

I have traced the problem into wpautop() function, which leaves these extra </p> tags on.

An example:

Caption shortcode like this:

[caption id="attachment_4413" align="alignnone" width="800"]<img class="size-full wp-image-4413" src="/images/online-gambling-bonuses.jpg" alt="top Online Gambling Bonuses" width="800" height="394" /> Internet has changed the way we gamble. Live dealer casinos are a new way to enjoy the thrills of Roulette and Blackjack from the comfort of your own home, even on your phone. Visit <a href="/leovegas">LeoVegas</a> for the best <a href="/casino/live">live casino experience</a>. You can also get a rare live casino bonus at LeoVegas.[/caption]

After shortcode filter has run on the block, the result is:

<div id="attachment_4413" style="width: 810px" class="wp-caption alignnone"><img aria-describedby="caption-attachment-4413" class="size-full wp-image-4413" src="/images/online-gambling-bonuses.jpg" alt="top Online Gambling Bonuses" width="800" height="394" /><p id="caption-attachment-4413" class="wp-caption-text">Internet has changed the way we gamble. Live dealer casinos are a new way to enjoy the thrills of Roulette and Blackjack from the comfort of your own home, even on your phone. Visit <a href="/leovegas">LeoVegas</a> for the best <a href="/casino/live">live casino experience</a>. You can also get a rare live casino bonus at LeoVegas.</p></div>

After running wpautop filter, the output is:

<div id="attachment_4413" style="width: 810px" class="wp-caption alignnone"><img aria-describedby="caption-attachment-4413" class="size-full wp-image-4413" src="/images/online-gambling-bonuses.jpg" alt="top Online Gambling Bonuses" width="800" height="394" /></p>
<p id="caption-attachment-4413" class="wp-caption-text">Internet has changed the way we gamble. Live dealer casinos are a new way to enjoy the thrills of Roulette and Blackjack from the comfort of your own home, even on your phone. Visit <a href="/leovegas">LeoVegas</a> for the best <a href="/casino/live">live casino experience</a>. You can also get a rare live casino bonus at LeoVegas.</p>

Here, there is an extra </p> tag after the <img> tag.

The following part of wpautop() function is supposed to remove the closing tag, but it fails:

<?php
        // If an opening or closing block element tag is followed by a closing <p> tag, remove it.
        $pee = preg_replace( '!(</?' . $allblocks . '[^>]*>)\s*</p>!', '$1', $pee );

The code doesn't remove the extra </p> because the regex only matches to defined block level tags, and in this case, the previous tag is an <img /> tag.

In general, I think the approach to use regex to modify a HTML document is an invalid one. One could generate proper regular expressions to properly match everything in HTML language, but the regular expressions will be huge.

The proper way to do this would be to somehow generate a DOM from the source code, and do the adjustments on that.

Or then it might be that the way WP is mixing HTML and plain text is simply impossible to implement correctly...

Change History (0)

Note: See TracTickets for help on using tickets.