WordPress.org

Make WordPress Core

Opened 8 months ago

Last modified 8 months ago

#42340 new defect (bug)

Spurious Insertion of <p>aragraph Tags

Reported by: oeconomist Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version:
Component: Formatting Keywords:
Focuses: Cc:

Description

The code replaces “<br /><br />” in an entry with “</p>”-newline-“<p>”. I use “<br /><br />” because HTML has semantics as well as syntax; not all vertical spacing represents distinction amongst paragraphs. The replacements break my mark-up more generally, resulting in defective presentation.

The software should never behave in such an entitled manner without permission from the user.

Change History (8)

#1 follow-up: @SergeyBiryukov
8 months ago

  • Component changed from General to Formatting
  • Keywords pseudo-correction removed

The replacements break my mark-up more generally, resulting in defective presentation.

You can disable the wpautop() filter:
http://codex.wordpress.org/Function_Reference/wpautop#Disabling_the_filter

#2 in reply to: ↑ 1 ; follow-up: @oeconomist
8 months ago

Replying to SergeyBiryukov:

You can disable the wpautop() filter:
http://codex.wordpress.org/Function_Reference/wpautop#Disabling_the_filter

Thank you; that's quite useful.

But wpautop should not be replacing explicit mark-up in the first place. It's one thing for a user to hit the Enter key twice; something else for him or her to enter “<br /><br />”.

#3 in reply to: ↑ 2 @JPry
8 months ago

Replying to oeconomist:

But wpautop should not be replacing explicit mark-up in the first place. It's one thing for a user to hit the Enter key twice; something else for him or her to enter “<br /><br />”.

You're right that it's different when a user is entering explicit markup. The problem faced in WordPress Core is determining when a user is explicitly entering markup, vs. when they did a copy/paste from somewhere, or otherwise unintentionally included such markup. I would venture to say that most users are not entering explicit markup like you've described.

My opinion is that this particular situation isn't something that needs to be addressed by WordPress Core.

#4 follow-up: @oeconomist
8 months ago

When a user copies-and-pastes, the presumption should be that the source was marked-up as intended by whomever or whatever created it. Otherwise, if the source has inline elements, then the copy-and-paste may fail after conversion, in a way that is perfectly mysterious to users who aren't used to doing their own mark-up.

I've not dived into the code (nor do I intend to do so), but I would guess that at some point WordPress converts two consecutive newlines into two break elements, and that wpautop effectively assumes that all back-to-back instances of two break elements arose from such a conversion. Assuming both that my guess is correct and that there is a good reason for that first conversion, I suggest that it be replaced by a substitution such as “<br /><!-- wpautop convertible --><br />” and that wpautop act when it encounters this new substring, leaving alone two back-to-back break elements.

Last edited 8 months ago by oeconomist (previous) (diff)

#5 in reply to: ↑ 4 @JPry
8 months ago

Replying to oeconomist:

I would guess that at some point WordPress converts two consecutive newlines into two break elements, and that wpautop effectively assumes that all back-to-back instances of two break elements arose from such a conversion.

There are actually two portions of code within the function that triggers the behavior you're seeing. The first portion is where consecutive <br> tags with nothing but whitespace between them are converted into a double newline:

<?php
        // Change multiple <br>s into two line breaks, which will turn into paragraphs.
        $pee = preg_replace('|<br\s*/?>\s*<br\s*/?>|', "\n\n", $pee);

Later in the function, the content is divided at any set of double newlines:

<?php
        // Split up the contents into an array of strings, separated by double line breaks.
        $pees = preg_split('/\n\s*\n/', $pee, -1, PREG_SPLIT_NO_EMPTY);

Each of the resulting pieces is wrapped in <p> tags.

The bold text earlier is key to the functionality. As the function currently exists, you can prevent your consecutive <br> tags from being replaced by including some non-whitespace text between them.

I suggest that it be replaced by a substitution such as “<br /><!-- wpautop convertible --><br />” and that wpautop act when it encounters this new substring, leaving alone two back-to-back break elements.

As the code currently works, you should be able to do something like this to prevent your consecutive breaks from being converted to newlines, and from there being matched as new paragraph dividers.

#6 follow-up: @oeconomist
8 months ago

Okay, yes. Now two work-arounds have been provided (as well as a direct look at the code). But they are just work-arounds, not patches of the bug.

It is one thing for the code to recognize that most users don't know or don't want to be bothered with mark-up, and therefore to interpret newlines as paragraph closures; it's another for the code to infer that a user didn't intend the mark-up that he or she entered. And, because breaks are inline elements, this conversion can break mark-up in ways that are mysterious to many users.

#7 in reply to: ↑ 6 @JPry
8 months ago

Replying to oeconomist:

It is one thing for the code to recognize that most users don't know or don't want to be bothered with mark-up, and therefore to interpret newlines as paragraph closures; it's another for the code to infer that a user didn't intend the mark-up that he or she entered.

The WordPress editor and wpautop() work the way they do because most people don't want be bothered with markup. WordPress Core is designed for the majority, and the majority of users would want their markup to be cleaned up if they copied and pasted it from some other source. That means that for the majority of users, the current behavior is not a bug.

For those users, like yourself, who know exactly what markup they want, it is certainly understandable that the default behavior has aspects that don't work for you. However, that means that the best solution for you may be to disable the default behavior.

But they are just work-arounds, not patches of the bug.

One thing that hasn't been mentioned yet here is that even though this seems like a small change, this could have huge ramifications for the majority of users. They would suddenly have different generated output on their site. To them, if the change you're requesting was made, the new behavior would be a bug.

#8 @oeconomist
8 months ago

The point that most users don't want to be bothered with mark-up was already stipulated.

Entering “<br /><br />” is simply not the same act as that of twice hitting the Enter key. Whoever decided that “<br /><br />” were usually intended to start a new paragraph was guessing, and guessing strangely; it remains a conjecture and an odd one that many users have already done just this. “<p>” has been part of the HTML specification since the first RFC (albeit that Berners-Lee was incoherent about what it meant, which incoherence can cause “<p>” to need clean-up); whereas “<br>” was introduced later.

I'd like to see someone point to a case, anywhere, of a preëxisting 'blog entry that will actually render badly as a result of this pseudo-correction being ended. If such cases exist at all, then they will be very rare, and more rare than cases of style attributes being ignored because new blocks were spuriously created.

This bug should not be cemented as a feature.

Note: See TracTickets for help on using tickets.