WordPress.org

Make WordPress Core

Opened 10 years ago

Closed 8 years ago

Last modified 7 years ago

#7988 closed defect (bug) (wontfix)

wpautop p (paragraph) bug with div

Reported by: filosofo Owned by:
Milestone: Priority: normal
Severity: normal Version: 2.7
Component: Formatting Keywords: has-patch needs-testing needs-unit-tests
Focuses: Cc:

Description

As Jan Erik Moström pointed out on wp-hackers, wpautop mis-parses the following line:

<div><a href="xx"> <img src="yy" /> </a> <p>text</p> </div>

The problem is this line in wpautop:

$pee = preg_replace('!<p>([^<]+)\s*?(</(?:div|address|form)[^>]*>)!', "<p>$1</p>$2", $pee);

This says that if a line has a <p> tag, some non-markup stuff, perhaps some space, followed by a closing tag of </div>, </address>, or </form>, then the <p> is orphaned and should add a closing tag (</p>) on the right of the non-markup stuff.

The mistake that it makes is allowing it to enclose empty space in <p></p>. It makes this mistake by allowing space characters to be considered as the non-markup stuff. My patch tells it not to do so by changing the above line to the following:

$pee = preg_replace('!<p>([^<\s]+)\s*?(</(?:div|address|form)[^>]*>)!', "<p>$1</p>$2", $pee);

Attachments (4)

wpautop.p.diff (861 bytes) - added by filosofo 10 years ago.
wpautop.p.2.diff (869 bytes) - added by filosofo 10 years ago.
wpautop.morespecific.diff (943 bytes) - added by filosofo 10 years ago.
wpautop.morespecific.7988.diff (885 bytes) - added by filosofo 9 years ago.

Download all attachments as: .zip

Change History (27)

@filosofo
10 years ago

#1 @filosofo
10 years ago

Oops, actually that patch needs fixing. One moment.

#2 @filosofo
10 years ago

Okay, I fixed the patch.

The problem with the example "fixed" line in OP,

$pee = preg_replace('!<p>([^<\s]+)\s*?(</(?:div|address|form)[^>]*>)!', "<p>$1</p>$2", $pee);

is that it doesn't work when the non-markup stuff between the <p> and the </div>, etc. has a space in it, e.g.:

<p>text </div> is fine but <p>text text </div> is not.

To fix that, the current patch uses this line instead:

$pee = preg_replace('#<p>\s*?(?!\s)([^<]+)\s*?(</(?:div|address|form)[^>]*>)#', "<p>$1</p>$2", $pee);

\s*? gobbles up any space after the <p>. Next, the negative lookaround, (?!\s), makes sure that the non-markup stuff does not begin with a space, which is what it would do if it were all space.

#3 @filosofo
10 years ago

In case you're wondering why not do (?!\s*) instead of \s*?(?!\s), it's because the Perl regex family (including PHP's preg_replace) does not support indefinite-length patterns in lookarounds.

#4 follow-up: @azaozz
10 years ago

I think there are more borderline problems with the original regexp. What it says is if there is a <p> followed by anything but less-than bracket (which may or may not be the opening bracket of a HTML tag), followed by zero or more space chars (not needed since the ([^<]+) will match all space chars), followed by </div>, </address> or </form>, the <p> is considered orphaned.

What is should say is: if a <p> tag is followed by at least one non-space char, followed by </div>, </address> or </form> and doesn't have </p> before the </div>, </address> or </form> (ignoring spaces/line breaks), the <p> is orphaned.

It probably should look for </div>, </address> or </form>, then look behind if the last tag before them was <p> and if yes, close the <p>.

Also, not sure what the next line in autop is for:

$pee = preg_replace( '|<p>|', "$1<p>", $pee );

Doesn't seem to do anything.

#5 in reply to: ↑ 4 @filosofo
10 years ago

Replying to azaozz:

followed by zero or more space chars (not needed since the ([^<]+) will match all space chars),

I had thought that maybe there was a reason for not capturing those space characters, but I don't see it in later lines.

It probably should look for </div>, </address> or </form>, then look behind if the last tag before them was <p> and if yes, close the <p>.

How about this, which I've included in patch wpautop.morespecific.diff:

$pee = preg_replace('#<p>\s*?(?!</div|</address|</form)(\S.*)(</(?:div|address|form)[^>]*>)#', "<p>$1</p>$2", $pee);

Look for a <p>, perhaps followed by space, followed by a string that's at least one non-space character that's not the beginning of a </div>, </address>, or </form>, and that string is followed by </div>, </address>, or </form>. It also has the advantage of being able to handle text like 2 < 3.

Also, not sure what the next line in autop is for:

$pee = preg_replace( '|<p>|', "$1<p>", $pee );

Doesn't seem to do anything.

Yes, you're right.

Here's the changeset where both these lines came in: [4565] The odd thing is that these lines don't seem to have anything to do with the ticket the changeset's supposed to solve, and they're not included in that ticket's attached patch.

#6 @filosofo
10 years ago

On second thought, that negative lookahead is overkill. There's no reason any closing tag should be following a <p> and space---not just address, div, and form.

$pee = preg_replace('#<p>\s*?(?!</)(\S.*)(</(?:address|div|form)[^>]*>)#', "<p>$1</p>$2", $pee); 

Should do the trick.

#7 @markjaquith
10 years ago

  • Milestone changed from 2.7 to 2.8

Let's look at this early in the 2.8 cycle.

#8 @ryan
9 years ago

  • Component changed from General to Formatting
  • Owner anonymous deleted

#9 @Denis-de-Bernardy
9 years ago

  • Milestone changed from 2.8 to Future Release

patch is br0ke

#10 @filosofo
9 years ago

  • Keywords needs-testing added
  • Milestone changed from Future Release to 2.8

I've freshened the patch.

wpautop really needs unit tests.

#11 @Denis-de-Bernardy
9 years ago

It sure does...

Do you think your patch might also fix the various other autop bugs that are in the Formatting section? I'm itching to create a master ticket or two and to close them all as dups. There probably are only a few real bugs in there.

#12 @Denis-de-Bernardy
9 years ago

cross-referencing a bit: #4298, #3833, #6041, #6984, #2833, #9437

#3833 in particular is of interest.

curious to know your thoughts on this idea:

$pee = preg_replace("/<\/?$blockelts\b[^>]+>/", "\n\n$0\n\n"0, $pee);

it might kill a few several birds in one go.

#13 @filosofo
9 years ago

I think separating block element tags from other text with a couple line breaks would fix a lot of things.

Also, I think the master ticket is a great idea. Ideally, it would list all of the issues the other tickets mention so that a patch would know exactly what to test against.

#14 @Denis-de-Bernardy
9 years ago

yeah. and escaping elements that should get ignored would fix a whole bunch of things too.

#15 @Denis-de-Bernardy
9 years ago

second patch I uploaded to #3833 fixes this as well.

#16 @Denis-de-Bernardy
9 years ago

  • Milestone changed from 2.8 to Future Release

punting pending unit tests

#17 follow-up: @Denis-de-Bernardy
9 years ago

  • Milestone changed from Future Release to 2.9

#18 @Denis-de-Bernardy
9 years ago

  • Keywords needs-unit-tests added

#19 in reply to: ↑ 17 @azaozz
9 years ago

  • Milestone changed from 2.9 to 3.0

We need to look at all of these autop edge cases at the same time.

#20 @dd32
8 years ago

  • Milestone changed from 3.0 to 3.1

Echo'ing azaozz's comment, this needs to be done at the same time as the rest. This has missed the boat for happening in 3.0

#21 @filosofo
8 years ago

  • Milestone Awaiting Triage deleted
  • Resolution set to wontfix
  • Status changed from new to closed

Never gonna happen.

#22 @SergeyBiryukov
7 years ago

  • Keywords needs-unit-tests removed

#23 @SergeyBiryukov
7 years ago

  • Keywords needs-unit-tests added; wpautop p formatting removed
Note: See TracTickets for help on using tickets.