Make WordPress Core

Opened 18 years ago

Closed 19 months ago

#3833 closed defect (bug) (worksforme)

Extra </p> inside blockquote

Reported by: audwan's profile audwan Owned by: archibald-leaurees's profile Archibald Leaurees
Milestone: Priority: normal
Severity: normal Version: 2.7
Component: Formatting Keywords: wpautop needs-refresh close
Focuses: Cc:

Description (last modified by foolswisdom)

When using blockquote </p> is inserted directly in front of </blockquote>, making the code invalid XHTML.

Example:

<blockquote>This is a blockquote</blockquote>

Gives the following result:

<blockquote>This is a blockquote</p></blockquote>

Seems like this forum thread adresses the same issue in the support forum.

Attachments (6)

wpautopdebug.php (4.5 KB) - added by Archibald Leaurees 16 years ago.
Debuggin wpautop function php file
3833.diff (813 bytes) - added by Denis-de-Bernardy 15 years ago.
add an extra \n before block elts
3833.2.diff (988 bytes) - added by Denis-de-Bernardy 15 years ago.
better patch, that fixes #6041 as well
3833.3.diff (2.1 KB) - added by Denis-de-Bernardy 15 years ago.
fix <li>\n<p>foo bar</p>\n</li>
3833.4.diff (1.2 KB) - added by jorbin 10 years ago.
Originally created by MikeHansenMe on #30142
3833.5.diff (1.2 KB) - added by jared_smith 9 years ago.
Updated patch with unit tests

Download all attachments as: .zip

Change History (45)

#1 @foolswisdom
17 years ago

  • Milestone changed from 2.1.3 to 2.2

#2 @rob1n
17 years ago

  • Milestone 2.2 deleted
  • Resolution set to worksforme
  • Status changed from new to closed

Using <blockquote> inside <li> is invalid. Use <q> instead.

For me, <blockquote>Hello, world. Wonder what this will do.</blockquote> puts <p>...</p> inside the <blockquote> fine.

This is with r5142. Closing as worksforme.

#3 @Goingthewongway
17 years ago

  • Resolution worksforme deleted
  • Status changed from closed to reopened
  • Version changed from 2.1 to 2.3.1

On version 2.3.1 this is still an issue.

For example, a post with only <blockquote>some quote</blockquote> will result in <blockquote>some quote</p></blockquote>, which is invalid XHTML.

This seems to me to be a result of:

  1. formatting.php:78 $pee = str_replace('</blockquote></p>', '</p></blockquote>', $pee);
  2. not having/not adding an opening "<p>" tag at the beginning of the post (even though it does get added for other all-text entries)

#4 @foolswisdom
17 years ago

  • Description modified (diff)
  • Keywords wpautop autop added
  • Priority changed from low to normal

ENV: WP trunk r6301

Confirmed bug exists as described.

The bug won't reproduce if there are no blank lines before the blockquotes, or if there is a paragraph before the blockquote. That likely explains it working for rob1n .

Repro having a block quote by the only text, inserted using visual editor.

Removed that part about blockquote in a list from description.

#5 @foolswisdom
17 years ago

  • Milestone set to 2.4

@Archibald Leaurees
16 years ago

Debuggin wpautop function php file

#6 @Archibald Leaurees
16 years ago

  • Cc Archibald Leaurees added
  • Component changed from Administration to Validation
  • Owner changed from anonymous to Archibald Leaurees
  • Status changed from reopened to new
  • Version changed from 2.3.1 to 2.7

This bug is still there.
When wpautop deals with that content :

    text before
    <div class="whatever">
      <blockquote>
        text inside the blockquote tag
      </blockquote>
      Some text inside the div embedding the blockquote and after the closing blockquote tag
    </div>

wpautop returns the following output :

<p>    text before</p>
<div class="whatever">
<blockquote><p>
        text inside the blockquote tag
      </p></blockquote>
<p>      Some text inside the div embedding the blockquote and after the closing blockquote tag
    </p></div>

Everything is ok. The output of wpautop is just like we want it to be.
But when we try the following input :

    text before    <div class="whatever">
      <blockquote>
        text inside the blockquote tag
      </blockquote>
      Some text inside the div embedding the blockquote and after the closing blockquote tag
    </div>

wpautop returns :

<p>    text before
<div class="whatever">
<blockquote><p>
        text inside the blockquote tag
      </p></blockquote>
<p>      Some text inside the div embedding the blockquote and after the closing blockquote tag
    </p></div>

Obviously that's not a valid HTML. There's an openning 'p' tag just at the beginning, which one won't be closed.

We can reproduce that bug also using the following HTML code sample :

    text before    <div class="whatever"><blockquote>
        text inside the blockquote tag
      </blockquote>
      Some text inside the div embedding the blockquote and after the closing blockquote tag
    </div>

After passing that content to wpautop, we have, as output, the following content :

<p>    text before
<div class="whatever">
<blockquote>
        text inside the blockquote tag
      </p></blockquote>
<p>      Some text inside the div embedding the blockquote and after the closing blockquote tag
    </p></div>

As you can see there's some orphan tags. The openning p-tag before the beginning text and a closing p-tag just before the closing blockquote-tag.

There's other examples. But the two given are clear enough.

I've spent some hours looking for a solution or a patch. Helping is better that complaining. I've extracted the function, put it in a separated file and give it the real output of 'get_shortcode_regex()', because wpautopo uses it. you can have a look on the page I used here http://www.flegme.fr/wpautopdebug.php. And here's the same page with my bug fix applied http://www.flegme.fr/wpautopdebugafter.php (the php deugging file is attached as wpautopdebug.php)

So the solution is rather simple. In the wpautop php code at line 37,38 you've got :

	$pee = preg_replace('!(<' . $allblocks . '[^>]*>)!', "\n$1", $pee);
	$pee = preg_replace('!(</' . $allblocks . '>)!', "$1\n\n", $pee);

I've noticed that those two lines add newlines '\n' before and after some special tags (including the 'div' tag and the 'blockquote' tag). But the first regular expression adds one '\n' only where the second one adds two ones.

When I modify the code that it inserts two newlines before and after the special tags, the bug disapears.

So those two lines become :

	$pee = preg_replace('!(<' . $allblocks . '[^>]*>)!', "\n\n$1", $pee);
	$pee = preg_replace('!(</' . $allblocks . '>)!', "$1\n\n", $pee);

Maybe i've missed something and my solution is not the good one... But that worked fine for me.

Regards

Archibald Leaurees

#7 @Denis-de-Bernardy
15 years ago

  • Component changed from Validation to Formatting

#9 @Denis-de-Bernardy
15 years ago

in 11256:

<blockquote>some quote</blockquote>

works fine

    text before
    <div class="whatever">
      <blockquote>
        text inside the blockquote tag
      </blockquote>
      Some text inside the div embedding the blockquote and after the closing blockquote tag
    </div>

clunky but works too

    text before    <div class="whatever">
      <blockquote>
        text inside the blockquote tag
      </blockquote>
      Some text inside the div embedding the blockquote and after the closing blockquote tag
    </div>

fails

    text before    <div class="whatever"><blockquote>
        text inside the blockquote tag
      </blockquote>
      Some text inside the div embedding the blockquote and after the closing blockquote tag
    </div>

fails

and the patch fixes everything indeed.

@Denis-de-Bernardy
15 years ago

add an extra \n before block elts

#10 @Denis-de-Bernardy
15 years ago

  • Keywords has-patch tested commit added; blockquote invalid list paragraph tags wpautop autop removed
  • Milestone changed from 2.9 to 2.8

@Denis-de-Bernardy
15 years ago

better patch, that fixes #6041 as well

#11 @Denis-de-Bernardy
15 years ago

just closed #6401 as dup. second patch fixes this:

<dd>paragraph 1

paragraph 2

paragraph 3</dd> 

#12 @Denis-de-Bernardy
15 years ago

second patch also fixes #7988:

<div><a href="xx"> <img src="yy" /> </a> <p>text</p> </div>

#13 @Denis-de-Bernardy
15 years ago

was meant #6041, even. it fixes #6376 as well:

<dl> <dt>term</dt> <dd>paragraph

paragraph2</dd> </dl> 

#14 @Denis-de-Bernardy
15 years ago

  • Keywords tested commit removed

there is one side effect, however. it turns:

<ul>
<li>foobar</li>
</ul>

into:

<ul>
<li>
<p>foobar</p>
</li>
</ul>

@Denis-de-Bernardy
15 years ago

fix <li>\n<p>foo bar</p>\n</li>

#15 @Denis-de-Bernardy
15 years ago

  • Keywords tested commit added

3rd patch keeps everything fixed and sanitizes list items as needed.

#16 @azaozz
15 years ago

Don't think it's a good idea to preg_split() on a \n followed by a white space. That would affect a lot of the internal workings of wpautop. It is affected by line breaks in the html formatting and the workaround is to keep that formatting to a minimum. It works when using the visual editor as all html formatting is removed there when saving.

IMHO to go ahead with this and all of the wpautop related patches, we will need comprehensive unit tests first.

#17 @Denis-de-Bernardy
15 years ago

  • Keywords needs-unit-tests added; commit removed

it's not too urgent, as these tickets have waited for years anyway.

the preg_split() is actually mostly equivalent. this:

$pee = preg_replace("/\n\n+/", "\n\n", $pee); // take care of duplicates 
$pees = preg_split('/\n\s*\n/', $pee, -1, PREG_SPLIT_NO_EMPTY); 

is more or less the same as this, in a less optimized/generalized manner:

$pees = preg_split('/\n(?:\s*\n)+/', $pee, -1, PREG_SPLIT_NO_EMPTY); 

semantically, it means the same: find double \n ignoring lines that have space chars. the suggested regex merely does a better job at it, and it's an optimization that can readily be removed if it's not wanted, since it has no effect on the ticket.

anyway, I'm not too familiar with the unit tests api, so moving this over to needs-unit-tests... I totally agree a few are needed on this front.

the patch is an improvement to the current behavior, however.

#18 @Denis-de-Bernardy
15 years ago

this is the one that needs unit-tests.

#19 @Denis-de-Bernardy
15 years ago

  • Keywords early added
  • Milestone changed from 2.8 to Future Release

punting pending unit tests

#20 @Denis-de-Bernardy
15 years ago

  • Milestone changed from Future Release to 2.9

#21 @hakre
15 years ago

Another long-player in the formatting bug league, +1 for getting a definition, unit-tests and merging with related ones.

#22 @azaozz
15 years ago

  • Milestone changed from 2.9 to 3.0

Agreed, we should get all related tickets for autop (PHP and JS) together and come up with good unit tests for them. Best time to do that is early in the dev. cycle, like couple of weeks after 2.9 is out.

#23 @dd32
14 years ago

  • Milestone changed from 3.0 to 3.1

No longer early in a dev cycle.. flicking to 3.1

Note, in trunk currently, the result of wpautop('<blockquote>This is a blockquote</blockquote>) is:

<blockquote><p>This is a blockquote</p></blockquote>

#24 @nacin
14 years ago

  • Keywords tested early removed
  • Milestone changed from Awaiting Triage to Future Release

#25 @MikeHansenMe
12 years ago

  • Cc MikeHansenMe added

UPDATE::
after looking into the problem further I was able to reproduce the above behavior.

wpautop('<blockquote>This is a blockquote</blockquote>');

returns

<blockquote><p>This is a blockquote</p></blockquote>

and
wpautop('<p><blockquote>This is a blockquote</blockquote></p>');

<p>
<blockquote>This is a blockquote</p></blockquote>
Last edited 12 years ago by MikeHansenMe (previous) (diff)

#26 @c3mdigital
11 years ago

  • Resolution set to invalid
  • Status changed from new to closed

The Last example in comment:25 now works:

<blockquote><p>This is a blockquote</p></blockquote>

In comment:9 The example listed as failing now works:

		<p>    text before
</p><div class="whatever">
<blockquote>
        text inside the blockquote tag
      <p></p></blockquote>
<p>      Some text inside the div embedding the blockquote and after the closing blockquote tag
    </p></div>

Closing. If there are still issues with wpautop() please reopen or open a new ticket.

#27 @SergeyBiryukov
11 years ago

  • Resolution invalid deleted
  • Status changed from closed to reopened

The last example in comment:9 still fails for me in trunk:

wpautop('
    text before    <div class="whatever"><blockquote>
        text inside the blockquote tag
      </blockquote>
      Some text inside the div embedding the blockquote and after the closing blockquote tag
    </div>
')

returns

<p>    text before
<div class="whatever">
<blockquote>
        text inside the blockquote tag
      </p></blockquote>
<p>      Some text inside the div embedding the blockquote and after the closing blockquote tag
    </p></div>

The last example in comment:25 also fails, exactly as described.

#28 @nacin
11 years ago

  • Keywords wpautop added

#29 @DrewAPicture
10 years ago

#11202 was marked as a duplicate.

@jorbin
10 years ago

Originally created by MikeHansenMe on #30142

#30 @jorbin
10 years ago

MikeHansenMe wrote some unit tests on #30142 for this. The above patch was originally written by him.

#31 @jorbin
10 years ago

  • Keywords needs-unit-tests removed

#32 @wonderboymusic
9 years ago

  • Keywords needs-refresh added; has-patch removed

3833.4.diff cause another unit test to fail

#33 @jared_smith
9 years ago

  • Keywords needs-patch added; needs-refresh removed

Here is an update patch which applies cleanly. The unit test fails, however.

@jared_smith
9 years ago

Updated patch with unit tests

#34 @sebastienserre
20 months ago

  • Keywords needs-refresh added; needs-patch removed

Just tested and

wpautop('
    text before    <div class="whatever"><blockquote>
        text inside the blockquote tag
      </blockquote>
      Some text inside the div embedding the blockquote and after the closing blockquote tag
    </div>
')

now WordPress 6.1.1 returns:

<p> text before    </p>
<div class="whatever">
<blockquote><p>
   text inside the blockquote tag
      </p></blockquote>
<p>      Some text inside the div embedding the blockquote and after the closing blockquote tag
    </p></div>

#35 follow-up: @Mte90
20 months ago

Investigating:

Looking now the problem seems fixed

        // If a <blockquote> is wrapped with a <p>, move it inside the <blockquote>.
        $text = preg_replace( '|<p><blockquote([^>]*)>|i', '<blockquote$1><p>', $text );
        $text = str_replace( '</blockquote></p>', '</p></blockquote>', $text );

Infact there is already a unit test for those cases https://github.com/WordPress/wordpress-develop/blob/0cb8475c0d07d23893b1d73d755eda5f12024585/tests/phpunit/tests/formatting/wpAutop.php#L381

I think that can be closed.

#36 @Mte90
20 months ago

  • Keywords close added

This ticket was mentioned in Slack in #core by audrasjb. View the logs.


19 months ago

#38 in reply to: ↑ 35 ; follow-up: @SergeyBiryukov
19 months ago

Replying to Mte90:

Looking now the problem seems fixed

        // If a <blockquote> is wrapped with a <p>, move it inside the <blockquote>.
        $text = preg_replace( '|<p><blockquote([^>]*)>|i', '<blockquote$1><p>', $text );
        $text = str_replace( '</blockquote></p>', '</p></blockquote>', $text );

FWIW, these particular lines have been around since wpautop() was introduced in [13] for WordPress 0.71. They were moved around a bit in [106] and [333], and the comment was added in [31852].

So if the issue can no longer be reproduced, it seems like it was fixed elsewhere.

#39 in reply to: ↑ 38 @azaozz
19 months ago

  • Milestone Future Release deleted
  • Resolution set to worksforme
  • Status changed from reopened to closed

Replying to SergeyBiryukov:

it seems like it was fixed elsewhere.

Yep, think this was fixed many years ago, but this ticket was probably missed and never closed. Closing as "worksforme" seems right imho.

Note: See TracTickets for help on using tickets.