Make WordPress Core

Opened 18 years ago

Last modified 5 weeks ago

#2691 accepted defect (bug)

HTML comments in posts aren't handled properly.

Reported by: gord's profile gord Owned by: adamsilverstein's profile adamsilverstein
Milestone: Future Release Priority: normal
Severity: normal Version: 2.8.5
Component: Formatting Keywords: wpautop dev-feedback has-patch needs-unit-tests needs-refresh close
Focuses: Cc:


When an HTML comment is added in a post, autop adds paragraph ( <p> ) tags around the comment and for multi-line comments line breaks ( <br /> ) are added after every line. This should not happen in HTML comments.

This ticket is similar to #712 which was closed with wontfix. I would like to know why this isn't seen as an issue? It prevents the addition of RDF and other metadata, not to mention just plain old HTML comments in posts.

Attachments (9)

2691.diff (1.5 KB) - added by collinsinternet 9 years ago.
Unit tests for HTML comments.
2691.2.diff (3.0 KB) - added by collinsinternet 9 years ago.
Adjustments to formatting.php to accommodate HTML comments.
2691.3.diff (3.4 KB) - added by adamsilverstein 7 years ago.
2691.4.diff (4.2 KB) - added by adamsilverstein 7 years ago.
2691.5.diff (5.1 KB) - added by adamsilverstein 7 years ago.
2691.6.diff (5.6 KB) - added by adamsilverstein 5 years ago.
2691.7.diff (4.1 KB) - added by adamsilverstein 5 years ago.
2691.8.diff (4.7 KB) - added by adamsilverstein 5 years ago.
2691.9.diff (4.9 KB) - added by adamsilverstein 5 years ago.

Download all attachments as: .zip

Change History (48)

#1 @skeltoac
18 years ago

This is what is implied: nobody who has seen this has found it important enough to spend their time rewriting autop to work around comments. The applicable mantra is "edge case." You are welcome to work up a patch and submit it for testing and consideration.

#2 @random
18 years ago

Maybe a bandaid on it? (e.g.)

$pee = preg_replace('!<p><!--(.*?)--></p>!ise', " '<!--' .  stripslashes(clean_pre('$1'))  . '-->' ", $pee);

clean_pre() will ruin any comments which are supposed to have <p> and <br /> in them, but that's an even edgier case.

Gord, you could also try a plugin replacement for wpautop -- I know Markdown handles comments fine.

#3 @Nazgul
17 years ago

  • Resolution set to wontfix
  • Status changed from new to closed

No traction in over a year, so closing as wontfix.

Feel free to reopen if you have patches/suggestions/...

#4 @vikingjs
14 years ago

  • Cc vikingjs added
  • Resolution wontfix deleted
  • Status changed from closed to reopened
  • Version changed from 2.0 to 2.8.5

I can't believe this got no traction - I use plugins that conscientiously mark where their added content begins and ends using HTML comments, and those comments are adding empty paragraphs in my posts, which in some cases affect layout and undermine the aesthetics of the page.

This issue affects anyone who uses plugins that modify the text of the blog post, e.g., SexyBookmarks.

I'll look at what it would take to patch it, but someone who knows how and where the unwanted behavior is occurring could do it in a fraction of the time.

#5 follow-up: @scribu
14 years ago

  • Keywords wpautop needs-patch added; autop removed
  • Milestone set to Future Release

The function responsible is called wpautop() and can be found in /wp-includes/formatting.php

#6 in reply to: ↑ 5 @vikingjs
14 years ago

Replying to scribu:

The function responsible is called wpautop() and can be found in /wp-includes/formatting.php

I took a look and I'm probably not the guy to be fixing this. As far as I can tell the band-aid advised by random above is a good fix. The reason I think I'm not the guy is because I don't know why stripslashes() and clean_pre() are needed for a comment, so I bow to the superior understanding of others.

random calls his patch a band-aid, but it would be just another in a long list of band-aids in the function. The philosophy seems to be to add paragraph tags and then go back and repair special cases where they broke things. This would be just another special case. Ideally it would be better simply not to apply the wpautop routines to comments (or some of the other cases like embed tags) in the first place.

#7 @mrclay
13 years ago

  • Cc mrclay added

I've created a more resilient autop implementation and a more rigorous testcase cobbled from various WP tickets. It avoids the dangerous regex operations by doing most work in a DOMDocument object.

This is undoubtedly slower than current wpautop (and may choke on extremely invalid HTML), but happily handles markup that the current wpautop breaks. Since there's no real spec for wpautop, it's hard to duplicate the style of markup wpautop outputs when it does work, but a lot of testing will surely be necessary.

#8 @nacin
10 years ago

  • Component changed from General to Formatting
  • Keywords needs-unit-tests added

9 years ago

Unit tests for HTML comments.

9 years ago

Adjustments to formatting.php to accommodate HTML comments.

#9 @collinsinternet
9 years ago

  • Keywords dev-feedback added

Hey All,

2691.2.diff allows my unit tests for HTML comments to pass, however it caused some others related to video and embed objects to fail. Can I get another set of eyes on this to make sure I'm not missing anything?

Last edited 9 years ago by collinsinternet (previous) (diff)

#10 @miqrogroove
9 years ago

I'm working on a new patch for this and want some fresh comments regarding @collinsinternet patch which attempts to avoid adding BR directly after an HTML comment.

Is the BR after the comment part of this bug? Or is it only the BR that gets added inside a comment?

Would it be a step in the right direction for wpautop to start avoiding HTML comments but still add a BR when the comment is followed by a single newline character?

What is the desired behavior when an HTML comment is surrounded by multiple newline characters?

#11 @adamsilverstein
7 years ago

In 2691.3.diff :

  • refactor unit tests using dataProvider
  • only remove the tailing </p> after an HTML comment when the paragraph starts as a comment

*adjust the regex for adding breaks after comments - exclude cases where a closing </p> tag is already present

all of the formatting group unit tests now pass. appreciate testing & feedback.

#12 @adamsilverstein
7 years ago

2691.4.diff updates unit tests for test_multiline_comment_with_embeds to pass after this change, removing some stray <br /> tags in the expected results.

#13 @adamsilverstein
7 years ago

  • Keywords has-patch has-unit-tests added; needs-patch needs-unit-tests removed

@nacin or @azaozz any feedback here? does 2691.4.diff look good?

This is an 11 year old ticket and we should get this fixed! This might affect stored block content for Gutenberg (which uses HTML comments to store blocks).

#14 @adamsilverstein
7 years ago

  • Owner set to adamsilverstein
  • Status changed from reopened to accepted

#15 @adamsilverstein
7 years ago

2691.5.diff adds some additional tests for the examples in #2833.

This ticket was mentioned in Slack in #core by adamsilverstein. View the logs.

7 years ago

This ticket was mentioned in Slack in #core by adamsilverstein. View the logs.

7 years ago

This ticket was mentioned in Slack in #core by pento. View the logs.

7 years ago

#19 @adamsilverstein
7 years ago

  • Milestone changed from Future Release to 4.9

#20 @adamsilverstein
7 years ago

  • Milestone changed from 4.9 to 5.0

Punting from 4.9. This needs more testing and feedback.

#21 @bobbingwide
6 years ago

I've just had to develop a workaround for this long term bug.
In my opinion wpautop shouldn't create any <br /> tags when dealing with new lines following comments.
It should produce the same results as if the comment were not present.

This ticket was mentioned in Slack in #core-js by adamsilverstein. View the logs.

6 years ago

#23 @pento
6 years ago

Who likes tests? I like tests. I'm going to use this ticket to test a tool I'm writing, you might see attachments and comments appear, I'll delete them as I go. 🙂

#24 @adamsilverstein
6 years ago

@pento I LOVE tests! Bring it on.

#25 @pento
6 years ago

  • Milestone changed from 5.0 to 5.1

Gutenberg doesn't need this, as autop is turned off when block comments exist in the post content.

#26 @GunGeekATX
6 years ago

Ran into this problem today, though it was my code at first, but found this ticket. WordPress 4.9.8 and Gutenberg 4.1.1. I created a dynamic block using only a title.

$ ./bin/ post get 75846 --field=post_content
<!-- wp:paragraph -->
<p><del>This is a paragraph.</del></p>
<!-- /wp:paragraph -->

<!-- wp:client/custom-block {"title":"This is the title"} /-->

<!-- wp:paragraph -->
<!-- /wp:paragraph -->

When I view source on the page:

<!-- wp:paragraph -->
<p><del>This is a paragraph.</del></p>
<!-- /wp:paragraph -->

<!-- wp:paragraph -->
<!-- /wp:paragraph -->

Changing the priority of the do_blocks call from 7 to 5 in the Gutenberg plugin fixes it. My assumption is autop is running before Gutenberg and stripping out the dynamic block.

#27 @adamsilverstein
5 years ago

  • Milestone changed from 5.1 to 5.2

Punting to 5.2 because we don't have complete test coverage yet. Would also like to test the last report from @GunGeekATX in 5.0.3/5.1.

#28 @adamsilverstein
5 years ago

2691.6.diff includes updated tests.

#29 @adamsilverstein
5 years ago

  • Milestone changed from 5.2 to 5.3

Punting to 5.3 and will try to land that very early in the cycle.

#30 @adamsilverstein
5 years ago

2691.8.diff refresh against trunk and correct a failing test, fixes for phpcs - all tests now pass -

#31 @adamsilverstein
5 years ago

  • Keywords commit added

#32 @pento
5 years ago

  • Keywords commit removed

Nice work on the patch, @adamsilverstein!

Is there any reason it's checking for <!- and ->, instead of <!-- and -->? This seems to cause issues sometimes. For example, adding this test to data_html_comments fails:

        "foo\n\n<!-- HTML comment ->\n\n still in the comment... -->",
        "<p>foo</p>\n<!-- HTML comment ->\n\n still in the comment... -->",

Also, can you explain the if on line 607 of formatting.php? It seems like it will only remove the trailing </p> if the entire content starts with <!-, rather than if the paragraph starts with it, as the comment suggests.

#33 @pento
5 years ago

Also, this regex is possibly wrong.

$pee = preg_replace( "|(\-\->.*[^</p>])\n|", "$1<br />\n", $pee );

I think it's intended to match everything that isn't a </p> tag, until it reaches a \n? It will actually match everything that isn't one of the characters <, /, p, or >, followed by a \n.

Given that this is a fairly simple check, I think it'd be better to do it in the $pees loop, instead. If a $tinkle starts and ends with HTML comment syntax, don't wrap it in <p> tags.

#34 @adamsilverstein
5 years ago

@pento Thanks for the feedback and regex deciphering :) I'll work on addressing your feedback and add an additional test case to cover the first issue.

#35 @adamsilverstein
5 years ago

  • Keywords needs-unit-tests added; has-unit-tests removed
  • Milestone changed from 5.3 to Future Release

This ticket could use a bunch of unit tests to verify the code works as expected. Milestoning this as 'future release' until there is some more progress.

This ticket was mentioned in Slack in #core by audrasjb. View the logs.

16 months ago

#37 @audrasjb
16 months ago

  • Keywords needs-refresh added

This ticket was mentioned in today's Old Ticket Triage Session :)

To summarize, the ticket:

  • has a good approach in the most recent patches
  • needs a new, refreshed patch
  • needs refreshed unit tests + more test cases (like CDATA comments)
  • needs testing on both Classic and Block editors

Thanks @poena, @SergeyBiryukov, @maigret, @wongjn.

#38 @audrasjb
16 months ago

Also, from @poena:

Kind of related: it discusses how broken html “block markup” is handled.

#39 @azaozz
16 months ago

  • Keywords close added

Frankly I'd be quite reluctant to patch autop at this stage. It is "on its way out" and chances to introduce new bugs there are pretty high.

Looking at the patch, this seems problematic:

if ( strpos( $pee, '<!--' ) !== false ) {
    $pee = preg_replace( '|<p>\s*<\!--|', '<!--', $pee );

It will remove <p> from strings that start with a comment like <p><!-- 123 -->text goes here</p>.

If this must be patched, a preferable way would be to avoid wrapping lines that contain only HTML comments. I.e. something like (untested):

- $pee .= '<p>' . trim( $tinkle, "\n" ) . "</p>\n";
+ $part = trim( $tinkle, "\n" );

if (
    strpos( $part, '<!--' ) === 0 &&
    strpos( $part, '-->' ) === ( strlen( $part ) - 3 )
) {
    $pee .= "$part\n"; // Don't wrap stand-alone HTML comments in <p>.
} else {
    $pee .= "<p>$part</p>\n";

Support for multi-line HTML comments would have to be added similarly to how <pre> tags are treated, i.e. completely excluded from autop.

Still, the chance to introduce regressions here even by fixing the behavior are high. Perhaps better to leave this and similar cases as-is. If a fix is desirable, better to add in a plugin so the users can enable or disable it at will.

Last edited 16 months ago by azaozz (previous) (diff)
Note: See TracTickets for help on using tickets.