WordPress.org

Make WordPress Core

Opened 14 years ago

Last modified 3 months ago

#2691 accepted defect (bug)

HTML comments in posts aren't handled properly.

Reported by: gord Owned by: adamsilverstein
Milestone: Future Release Priority: normal
Severity: normal Version: 2.8.5
Component: Formatting Keywords: wpautop dev-feedback has-patch needs-unit-tests
Focuses: Cc:
PR Number:

Description

When an HTML comment is added in a post, autop adds paragraph ( <p> ) tags around the comment and for multi-line comments line breaks ( <br /> ) are added after every line. This should not happen in HTML comments.

This ticket is similar to #712 which was closed with wontfix. I would like to know why this isn't seen as an issue? It prevents the addition of RDF and other metadata, not to mention just plain old HTML comments in posts.

Attachments (9)

2691.diff (1.5 KB) - added by collinsinternet 5 years ago.
Unit tests for HTML comments.
2691.2.diff (3.0 KB) - added by collinsinternet 5 years ago.
Adjustments to formatting.php to accommodate HTML comments.
2691.3.diff (3.4 KB) - added by adamsilverstein 3 years ago.
2691.4.diff (4.2 KB) - added by adamsilverstein 2 years ago.
2691.5.diff (5.1 KB) - added by adamsilverstein 2 years ago.
2691.6.diff (5.6 KB) - added by adamsilverstein 8 months ago.
2691.7.diff (4.1 KB) - added by adamsilverstein 4 months ago.
2691.8.diff (4.7 KB) - added by adamsilverstein 4 months ago.
2691.9.diff (4.9 KB) - added by adamsilverstein 4 months ago.

Download all attachments as: .zip

Change History (44)

#1 @skeltoac
14 years ago

This is what is implied: nobody who has seen this has found it important enough to spend their time rewriting autop to work around comments. The applicable mantra is "edge case." You are welcome to work up a patch and submit it for testing and consideration.

#2 @random
14 years ago

Maybe a bandaid on it? (e.g.)

$pee = preg_replace('!<p><!--(.*?)--></p>!ise', " '<!--' .  stripslashes(clean_pre('$1'))  . '-->' ", $pee);

clean_pre() will ruin any comments which are supposed to have <p> and <br /> in them, but that's an even edgier case.

Gord, you could also try a plugin replacement for wpautop -- I know Markdown handles comments fine.

#3 @Nazgul
12 years ago

  • Resolution set to wontfix
  • Status changed from new to closed

No traction in over a year, so closing as wontfix.

Feel free to reopen if you have patches/suggestions/...

#4 @vikingjs
10 years ago

  • Cc vikingjs added
  • Resolution wontfix deleted
  • Status changed from closed to reopened
  • Version changed from 2.0 to 2.8.5

I can't believe this got no traction - I use plugins that conscientiously mark where their added content begins and ends using HTML comments, and those comments are adding empty paragraphs in my posts, which in some cases affect layout and undermine the aesthetics of the page.

This issue affects anyone who uses plugins that modify the text of the blog post, e.g., SexyBookmarks.

I'll look at what it would take to patch it, but someone who knows how and where the unwanted behavior is occurring could do it in a fraction of the time.

#5 follow-up: @scribu
10 years ago

  • Keywords wpautop needs-patch added; autop removed
  • Milestone set to Future Release

The function responsible is called wpautop() and can be found in /wp-includes/formatting.php

#6 in reply to: ↑ 5 @vikingjs
10 years ago

Replying to scribu:

The function responsible is called wpautop() and can be found in /wp-includes/formatting.php

I took a look and I'm probably not the guy to be fixing this. As far as I can tell the band-aid advised by random above is a good fix. The reason I think I'm not the guy is because I don't know why stripslashes() and clean_pre() are needed for a comment, so I bow to the superior understanding of others.

random calls his patch a band-aid, but it would be just another in a long list of band-aids in the function. The philosophy seems to be to add paragraph tags and then go back and repair special cases where they broke things. This would be just another special case. Ideally it would be better simply not to apply the wpautop routines to comments (or some of the other cases like embed tags) in the first place.

#7 @mrclay
9 years ago

  • Cc mrclay added

I've created a more resilient autop implementation and a more rigorous testcase cobbled from various WP tickets. It avoids the dangerous regex operations by doing most work in a DOMDocument object.

This is undoubtedly slower than current wpautop (and may choke on extremely invalid HTML), but happily handles markup that the current wpautop breaks. Since there's no real spec for wpautop, it's hard to duplicate the style of markup wpautop outputs when it does work, but a lot of testing will surely be necessary.

#8 @nacin
6 years ago

  • Component changed from General to Formatting
  • Keywords needs-unit-tests added

@collinsinternet
5 years ago

Unit tests for HTML comments.

@collinsinternet
5 years ago

Adjustments to formatting.php to accommodate HTML comments.

#9 @collinsinternet
5 years ago

  • Keywords dev-feedback added

Hey All,

2691.2.diff allows my unit tests for HTML comments to pass, however it caused some others related to video and embed objects to fail. Can I get another set of eyes on this to make sure I'm not missing anything?

Last edited 5 years ago by collinsinternet (previous) (diff)

#10 @miqrogroove
4 years ago

I'm working on a new patch for this and want some fresh comments regarding @collinsinternet patch which attempts to avoid adding BR directly after an HTML comment.

Is the BR after the comment part of this bug? Or is it only the BR that gets added inside a comment?

Would it be a step in the right direction for wpautop to start avoiding HTML comments but still add a BR when the comment is followed by a single newline character?

What is the desired behavior when an HTML comment is surrounded by multiple newline characters?

#11 @adamsilverstein
3 years ago

In 2691.3.diff :

  • refactor unit tests using dataProvider
  • only remove the tailing </p> after an HTML comment when the paragraph starts as a comment

*adjust the regex for adding breaks after comments - exclude cases where a closing </p> tag is already present

all of the formatting group unit tests now pass. appreciate testing & feedback.

#12 @adamsilverstein
2 years ago

2691.4.diff updates unit tests for test_multiline_comment_with_embeds to pass after this change, removing some stray <br /> tags in the expected results.

#13 @adamsilverstein
2 years ago

  • Keywords has-patch has-unit-tests added; needs-patch needs-unit-tests removed

@nacin or @azaozz any feedback here? does 2691.4.diff look good?

This is an 11 year old ticket and we should get this fixed! This might affect stored block content for Gutenberg (which uses HTML comments to store blocks).

#14 @adamsilverstein
2 years ago

  • Owner set to adamsilverstein
  • Status changed from reopened to accepted

#15 @adamsilverstein
2 years ago

2691.5.diff adds some additional tests for the examples in #2833.

This ticket was mentioned in Slack in #core by adamsilverstein. View the logs.


2 years ago

This ticket was mentioned in Slack in #core by adamsilverstein. View the logs.


2 years ago

This ticket was mentioned in Slack in #core by pento. View the logs.


2 years ago

#19 @adamsilverstein
2 years ago

  • Milestone changed from Future Release to 4.9

#20 @adamsilverstein
2 years ago

  • Milestone changed from 4.9 to 5.0

Punting from 4.9. This needs more testing and feedback.

#21 @bobbingwide
23 months ago

I've just had to develop a workaround for this long term bug.
In my opinion wpautop shouldn't create any <br /> tags when dealing with new lines following comments.
It should produce the same results as if the comment were not present.

This ticket was mentioned in Slack in #core-js by adamsilverstein. View the logs.


22 months ago

#23 @pento
14 months ago

Who likes tests? I like tests. I'm going to use this ticket to test a tool I'm writing, you might see attachments and comments appear, I'll delete them as I go. 🙂

#24 @adamsilverstein
14 months ago

@pento I LOVE tests! Bring it on.

#25 @pento
12 months ago

  • Milestone changed from 5.0 to 5.1

Gutenberg doesn't need this, as autop is turned off when block comments exist in the post content.

#26 @GunGeekATX
12 months ago

Ran into this problem today, though it was my code at first, but found this ticket. WordPress 4.9.8 and Gutenberg 4.1.1. I created a dynamic block using only a title.

$ ./bin/wp.sh post get 75846 --field=post_content
<!-- wp:paragraph -->
<p><del>This is a paragraph.</del></p>
<!-- /wp:paragraph -->

<!-- wp:client/custom-block {"title":"This is the title"} /-->

<!-- wp:paragraph -->
<p></p>
<!-- /wp:paragraph -->

When I view source on the page:

<!-- wp:paragraph -->
<p><del>This is a paragraph.</del></p>
<!-- /wp:paragraph -->



<!-- wp:paragraph -->
<p></p>
<!-- /wp:paragraph -->

Changing the priority of the do_blocks call from 7 to 5 in the Gutenberg plugin fixes it. My assumption is autop is running before Gutenberg and stripping out the dynamic block.

#27 @adamsilverstein
10 months ago

  • Milestone changed from 5.1 to 5.2

Punting to 5.2 because we don't have complete test coverage yet. Would also like to test the last report from @GunGeekATX in 5.0.3/5.1.

#28 @adamsilverstein
7 months ago

2691.6.diff includes updated tests.

#29 @adamsilverstein
7 months ago

  • Milestone changed from 5.2 to 5.3

Punting to 5.3 and will try to land that very early in the cycle.

#30 @adamsilverstein
4 months ago

2691.8.diff refresh against trunk and correct a failing test, fixes for phpcs - all tests now pass - https://travis-ci.com/WordPress/wordpress-develop/builds/117891005

#31 @adamsilverstein
4 months ago

  • Keywords commit added

#32 @pento
4 months ago

  • Keywords commit removed

Nice work on the patch, @adamsilverstein!

Is there any reason it's checking for <!- and ->, instead of <!-- and -->? This seems to cause issues sometimes. For example, adding this test to data_html_comments fails:

array(
        "foo\n\n<!-- HTML comment ->\n\n still in the comment... -->",
        "<p>foo</p>\n<!-- HTML comment ->\n\n still in the comment... -->",
),

Also, can you explain the if on line 607 of formatting.php? It seems like it will only remove the trailing </p> if the entire content starts with <!-, rather than if the paragraph starts with it, as the comment suggests.

#33 @pento
4 months ago

Also, this regex is possibly wrong.

$pee = preg_replace( "|(\-\->.*[^</p>])\n|", "$1<br />\n", $pee );

I think it's intended to match everything that isn't a </p> tag, until it reaches a \n? It will actually match everything that isn't one of the characters <, /, p, or >, followed by a \n.

Given that this is a fairly simple check, I think it'd be better to do it in the $pees loop, instead. If a $tinkle starts and ends with HTML comment syntax, don't wrap it in <p> tags.

#34 @adamsilverstein
4 months ago

@pento Thanks for the feedback and regex deciphering :) I'll work on addressing your feedback and add an additional test case to cover the first issue.

#35 @adamsilverstein
3 months ago

  • Keywords needs-unit-tests added; has-unit-tests removed
  • Milestone changed from 5.3 to Future Release

This ticket could use a bunch of unit tests to verify the code works as expected. Milestoning this as 'future release' until there is some more progress.

Note: See TracTickets for help on using tickets.