#7988 closed defect (bug) (wontfix)
wpautop p (paragraph) bug with div
Reported by: | filosofo | Owned by: | |
---|---|---|---|
Milestone: | Priority: | normal | |
Severity: | normal | Version: | 2.7 |
Component: | Formatting | Keywords: | has-patch needs-testing needs-unit-tests |
Focuses: | Cc: |
Description
As Jan Erik Moström pointed out on wp-hackers, wpautop mis-parses the following line:
<div><a href="xx"> <img src="yy" /> </a> <p>text</p> </div>
The problem is this line in wpautop:
$pee = preg_replace('!<p>([^<]+)\s*?(</(?:div|address|form)[^>]*>)!', "<p>$1</p>$2", $pee);
This says that if a line has a <p>
tag, some non-markup stuff, perhaps some space, followed by a closing tag of </div>
, </address>
, or </form>
, then the <p>
is orphaned and should add a closing tag (</p>
) on the right of the non-markup stuff.
The mistake that it makes is allowing it to enclose empty space in <p></p>
. It makes this mistake by allowing space characters to be considered as the non-markup stuff. My patch tells it not to do so by changing the above line to the following:
$pee = preg_replace('!<p>([^<\s]+)\s*?(</(?:div|address|form)[^>]*>)!', "<p>$1</p>$2", $pee);
Attachments (4)
Change History (27)
#2
@
16 years ago
Okay, I fixed the patch.
The problem with the example "fixed" line in OP,
$pee = preg_replace('!<p>([^<\s]+)\s*?(</(?:div|address|form)[^>]*>)!', "<p>$1</p>$2", $pee);
is that it doesn't work when the non-markup stuff between the <p>
and the </div>
, etc. has a space in it, e.g.:
<p>text </div>
is fine but <p>text text </div>
is not.
To fix that, the current patch uses this line instead:
$pee = preg_replace('#<p>\s*?(?!\s)([^<]+)\s*?(</(?:div|address|form)[^>]*>)#', "<p>$1</p>$2", $pee);
\s*?
gobbles up any space after the <p>
. Next, the negative lookaround, (?!\s)
, makes sure that the non-markup stuff does not begin with a space, which is what it would do if it were all space.
#3
@
16 years ago
In case you're wondering why not do (?!\s*)
instead of \s*?(?!\s)
, it's because the Perl regex family (including PHP's preg_replace
) does not support indefinite-length patterns in lookarounds.
#4
follow-up:
↓ 5
@
16 years ago
I think there are more borderline problems with the original regexp. What it says is if there is a <p> followed by anything but less-than bracket (which may or may not be the opening bracket of a HTML tag), followed by zero or more space chars (not needed since the ([^<]+)
will match all space chars), followed by </div>, </address> or </form>, the <p> is considered orphaned.
What is should say is: if a <p> tag is followed by at least one non-space char, followed by </div>, </address> or </form> and doesn't have </p> before the </div>, </address> or </form> (ignoring spaces/line breaks), the <p> is orphaned.
It probably should look for </div>, </address> or </form>, then look behind if the last tag before them was <p> and if yes, close the <p>.
Also, not sure what the next line in autop is for:
$pee = preg_replace( '|<p>|', "$1<p>", $pee );
Doesn't seem to do anything.
#5
in reply to:
↑ 4
@
16 years ago
Replying to azaozz:
followed by zero or more space chars (not needed since the
([^<]+)
will match all space chars),
I had thought that maybe there was a reason for not capturing those space characters, but I don't see it in later lines.
It probably should look for </div>, </address> or </form>, then look behind if the last tag before them was <p> and if yes, close the <p>.
How about this, which I've included in patch wpautop.morespecific.diff:
$pee = preg_replace('#<p>\s*?(?!</div|</address|</form)(\S.*)(</(?:div|address|form)[^>]*>)#', "<p>$1</p>$2", $pee);
Look for a <p>
, perhaps followed by space, followed by a string that's at least one non-space character that's not the beginning of a </div>
, </address>
, or </form>
, and that string is followed by </div>
, </address>
, or </form>
. It also has the advantage of being able to handle text like 2 < 3.
Also, not sure what the next line in autop is for:
$pee = preg_replace( '|<p>|', "$1<p>", $pee );Doesn't seem to do anything.
Yes, you're right.
Here's the changeset where both these lines came in: [4565]
The odd thing is that these lines don't seem to have anything to do with the ticket the changeset's supposed to solve, and they're not included in that ticket's attached patch.
#6
@
16 years ago
On second thought, that negative lookahead is overkill. There's no reason any closing tag should be following a <p>
and space---not just address, div, and form.
$pee = preg_replace('#<p>\s*?(?!</)(\S.*)(</(?:address|div|form)[^>]*>)#', "<p>$1</p>$2", $pee);
Should do the trick.
#10
@
15 years ago
- Keywords needs-testing added
- Milestone changed from Future Release to 2.8
I've freshened the patch.
wpautop really needs unit tests.
#11
@
15 years ago
It sure does...
Do you think your patch might also fix the various other autop bugs that are in the Formatting section? I'm itching to create a master ticket or two and to close them all as dups. There probably are only a few real bugs in there.
#13
@
15 years ago
I think separating block element tags from other text with a couple line breaks would fix a lot of things.
Also, I think the master ticket is a great idea. Ideally, it would list all of the issues the other tickets mention so that a patch would know exactly what to test against.
#14
@
15 years ago
yeah. and escaping elements that should get ignored would fix a whole bunch of things too.
#19
in reply to:
↑ 17
@
15 years ago
- Milestone changed from 2.9 to 3.0
We need to look at all of these autop edge cases at the same time.
#20
@
15 years ago
- Milestone changed from 3.0 to 3.1
Echo'ing azaozz's comment, this needs to be done at the same time as the rest. This has missed the boat for happening in 3.0
Oops, actually that patch needs fixing. One moment.