WordPress.org

Make WordPress Core

Opened 3 years ago

Last modified 6 months ago

#37672 new defect (bug)

wpautop adds a closing p-tag without an opening p-tag

Reported by: TBarregren Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 4.5.3
Component: Formatting Keywords: has-patch has-unit-tests needs-testing needs-refresh
Focuses: Cc:
PR Number:

Description

Following code result in ill-formed HTML.

<?php
$pee = <<<EOT
<div>
This is a paragraph.

This is another paragraph.
</div>
EOT;

echo wpautop($pee);

This is the output:

<div>
This is a paragraph.</p>
<p>This is another paragraph.
</p></div>

As you can see, the first paragraph lacks an opening <p>.

Attachments (1)

37672.diff (1.8 KB) - added by MattyRob 3 years ago.

Download all attachments as: .zip

Change History (6)

#1 @TBarregren
3 years ago

If it is to any help, this is my function.php code to solve this (and related) problem(s).

<?php
add_filter('the_content', function ($html) {
  
  $html_pos = 0;
  $last_html_pos = strlen($html) - 1;

  $opening_pos = null;
  $text_pos = null;
  
  // See http://w3c.github.io/html/single-page.html#elementdef-p
  // and http://w3c.github.io/html/single-page.html#kinds-of-content-phrasing-content
  $self_closing_tags_allowed_in_p = explode(' ', 'area br embed img input link wbr');
  $not_self_closing_tags_allowed_in_p = explode(' ', 'a abbr audio b bdi bdo button canvas cite code data datalist del dfn em i iframe ins kbd label map mark math meter noscript object output picture progress q ruby s samp script select small span strong sub sup svg template textarea time u var video text');


  while (($html_pos = strpos($html, '<', $html_pos)) !== false) {
    if (substr($html, $html_pos, 4) === '<!--') {
      $html_pos = strpos($html, '-->', $html_pos + 4);
      if ($html_pos === false) return $html; // Too messy, don't do anything further.
      $html_pos += 3;
    } 
    elseif ($opening_pos === null) {
      if (substr($html, $html_pos, 4) === '<pre') {
        $html_pos = strpos($html, '</pre>', $html_pos + 6);
        if ($html_pos === false) return $html; // Too messy, don't do anything further.
        $html_pos += 6;
      }
      elseif (strtolower(substr($html, $html_pos, 2)) === '<p') {
        $opening_pos = $html_pos;
        $html_pos = strpos($html, '>', $html_pos + 2);
        if ($html_pos === false) return $html; // Too messy, don't do anything further.
        $text_pos = ++$html_pos;
      }
      elseif (strtolower(substr($html, $html_pos, 4)) === '</p>') {
        // See https://core.trac.wordpress.org/ticket/37672
        $html = substr($html, 0, $html_pos) . substr($html, $html_pos + 4);
        $last_html_pos -= 4;
      }
      else {
        $html_pos += 1;
      }
    }
    else {
      if (strtolower(substr($html, $html_pos, 4)) === '</p>') {
        if (trim(substr($html, $text_pos, $html_pos - $text_pos)) === '') {
          $html = substr($html, 0, $opening_pos) . substr($html, $html_pos + 4);
          $html_pos = $opening_pos;
          $last_html_pos = strlen($html) - 1;                
        }
        else {
          $html_pos += 4;
        }
        $opening_pos = null;
      }
      else {
        $tag_end_pos = $html_pos + 1;
        while ($html{$tag_end_pos} !== '>' && !ctype_space($html{$tag_end_pos})) {
          ++$tag_end_pos;
        }
        $tag = strtolower(substr($html, $html_pos + 1, $tag_end_pos - $html_pos - 1));
        if (in_array($tag, $self_closing_tags_allowed_in_p)) {
          $html_pos = strpos($html, '>', $tag_end_pos);
          if ($html_pos === false) return $html; // Too messy, don't do anything further.
          $html_pos += 1;
        }
        elseif (in_array($tag, $not_self_closing_tags_allowed_in_p)) {
          $html_pos = strpos($html, "</$tag", $tag_end_pos);
          if ($html_pos === false) return $html; // Too messy, don't do anything further.
          $html_pos += 1;
        }
        else {
          $tag_end_pos = strpos($html, "</$tag>", $tag_end_pos);
          if ($tag_end_pos === false) return $html; // Too messy, don't do anything further.
          $tag_end_pos += strlen("</$tag>");
          $html = substr($html, 0, $html_pos) . '</p>' . substr($html, $html_pos, $tag_end_pos - $html_pos) . '<p>' . substr($html, $tag_end_pos);
        }
      }
    }
  }
  if ($opening_pos !== null) {
    $html .= '</p>';
  }

  return $html;

});

#2 @Presskopp
3 years ago

  • Keywords needs-patch added

@MattyRob
3 years ago

#3 @MattyRob
3 years ago

@TBarregren

Thank for your report - I can reproduce this with a unit test and I think I've also managed to create a fix in the 'wpautop()' function. Patch and Unit test attached above.

#4 @MattyRob
3 years ago

  • Keywords has-patch has-unit-tests needs-testing added; needs-patch removed

#5 @desrosj
6 months ago

  • Keywords needs-refresh added

@MattyRob are you able to refresh 37672.diff to apply cleanly to trunk? Having a unit test demonstrating the issue may help others dive in.

Note: See TracTickets for help on using tickets.