WordPress.org

Make WordPress Core

Opened 6 years ago

Closed 11 months ago

Last modified 11 months ago

#8912 closed defect (bug) (fixed)

wptexturize malforms HTML comments that contain HTML tags

Reported by: Otto42 Owned by: wonderboymusic
Milestone: 4.0 Priority: normal
Severity: normal Version: 2.7
Component: Formatting Keywords: has-patch wptexturize
Focuses: Cc:

Description

Because it's replacing -- with #8211, a comment like <!-- whatever --> put into the HTML part of a post gets broken.

This makes it difficult for people writing special HTML in posts (like people putting in object tags, or javascript, or whatever) to do that sort of thing.

What is needed is to recognize --> as different from -- and not replace it with the en dash in that case.

Attachments (2)

8912.patch (589 bytes) - added by SergeyBiryukov 4 years ago.
The patch from #16060
8912.diff (2.2 KB) - added by nacin 15 months ago.

Download all attachments as: .zip

Change History (43)

comment:1 @Viper007Bond6 years ago

  • Resolution set to worksforme
  • Status changed from new to closed

Just tested with trunk. No problems.

comment:2 @Viper007Bond6 years ago

  • Milestone 2.8 deleted

comment:3 @Otto425 years ago

Note: Visual mode doesn't allow HTML comments, the <>'s get altered to lt; and gt; and so on.

But in HTML mode, comments works fine in 2.9.2, at least.

comment:4 @Otto425 years ago

  • Component changed from General to Formatting
  • Milestone set to 3.0
  • Resolution worksforme deleted
  • Status changed from closed to reopened
  • Version changed from 2.7 to 2.9.2

Have a confirmed failure.

This code fails:

<ul><li>Hello.</li><!--<li>Goodbye.</li>--></ul>

It gets converted to this:

<ul>
<li>Hello.</li>
<p><!--
<li>Goodbye.</li>
<p>&#8211;></ul>

Happens on a default installation.

comment:5 @Viper007Bond5 years ago

You're including extra filters in that, Otto.

Here's what wptexturize() returns for that example string:

`
<ul><li>Hello.</li><!--<li>Goodbye.</li>&#8211;></ul>
`

comment:6 @Viper007Bond5 years ago

Ugh.

<ul><li>Hello.</li><!--<li>Goodbye.</li>&#8211;></ul>

comment:7 @Viper007Bond5 years ago

  • Summary changed from wp_texturize breaks HTML comments in posts to wptexturize malforms HTML comments that contain HTML tags

BTW, it would have been better to open a new ticket on this as this is a separate issue. ;)

comment:8 follow-up: @otto425 years ago

How is it a separate issue? It does exactly what I said it did last year. The comment got converted to the 8211 incorrectly. It's the same issue.

comment:9 in reply to: ↑ 8 @Viper007Bond5 years ago

Replying to otto42:

How is it a separate issue? It does exactly what I said it did last year. The comment got converted to the 8211 incorrectly. It's the same issue.

<!-- foobar --> works though, right? That was the original issue.

The reason -- is being converted to &#8211; in your above example is because the HTML tags in the comment are breaking wptexturize()'s comment detection (it doesn't realize it's an HTML comment). It's totally a valid issue, but this is a bug in the HTML comment detection code rather than all HTML comments.

In short, I'm just nitpicking minor technicalities. Don't mind me. :)

comment:10 @rmccue5 years ago

Related: #10033

comment:11 @ryan5 years ago

  • Milestone changed from 3.0 to 3.1

comment:12 @nacin4 years ago

  • Milestone changed from Awaiting Triage to Future Release

No patch. Future.

comment:13 @norbertm4 years ago

  • Cc norbert@… added

Submitting patch & unit test to #4539.

comment:14 @azaozz4 years ago

On production sites HTML comments in the post_content seem to be extremely rare, commented HTML i.e. the example above, are virtually non existent. Currently wptexturize() supports simple HTML comments properly:

<ul><li>Hello.</li><!--Goodbye.--></ul>

works as expected.

The only user case for supporting commented HTML seems to be when a plugin developer wants to test the plugin's output. Don't think it's worth it adding some redundant regexp that will run on hundreds of millions of posts every day just for that. Perhaps better would be to add to the plugin developers part of the codex that commented HTML is not supported in the post content.

Suggesting: wontfix.

comment:15 @strider724 years ago

Responding to @azaozz:

I have a plugin that is affected by this in some cases. It's been an outstanding bug for a long time now, and I can't fix it within the plugin itself. See the Graceful Pull-Quotes plugin. Users can input an "alternate" text within an HTML comment, but with this bug they can't use tags at all. A similar (the same?) bug prevents HTML entities in comments -- they get malformed in the same way as tags. Non-English speakers contact me all the time with this.

comment:16 @strider724 years ago

I can also see situations where an author wants to temporarily "deactivate", but not delete, a chunk of text, and puts it in a comment. He better not have any tags in that text! So, yes I think this is a legitimate problem.

comment:17 @SergeyBiryukov4 years ago

Related/duplicate: #16060

comment:18 @SergeyBiryukov4 years ago

  • Keywords has-patch added

comment:19 @SergeyBiryukov4 years ago

  • Milestone changed from Future Release to 3.3

Moving to 3.3, as #16060 was marked 3.2-early but didn't make it into 3.2.

@SergeyBiryukov4 years ago

The patch from #16060

comment:20 @SergeyBiryukov4 years ago

  • Keywords needs-unit-tests added

comment:21 @SergeyBiryukov4 years ago

  • Keywords needs-refresh added

Since [17636], the result for my example from 16060 is a bit different:

<!-- Sample list
<ul>
	<li>Sample item</li>
</ul> -->

Output:

<!-- Sample list
<ul>
	<li>Sample item</li>
 -->

Comment closing tag is preserved, but <ul> tag is missing.

Version 0, edited 4 years ago by SergeyBiryukov (next)

comment:22 @SergeyBiryukov3 years ago

  • Milestone changed from 3.3 to Future Release

comment:23 @ericaentertainment3 years ago

  • Keywords needs-refresh needs-unit-tests removed

I have a workaround. Just before the close comment tag, put another open comment tag. That second open comment tag will be commented out, but it forces the close tag to be recognized and not be reformatted as an en-dash.

Example that fails:
<!-- <b> text </b> -->

Example that works:
<!-- <b> text </b> <!-- -->

In other words, if you have comments that enclose other tags, just make sure the last tag is another open comment tag.

comment:24 @SergeyBiryukov3 years ago

  • Keywords needs-refresh needs-unit-tests added

comment:25 @harrismw3 years ago

  • Cc harrismw added
  • Version changed from 2.9.2 to 3.3.1

Still existing in the latest version of WP (3.3.1) which I just updated to from 2.9.something.

The code which sets it off is:

<!-- <li><a href="/news-and-events">News &amp; Events</a>: A record of news and events related to philosophy of religion.</li>  -->

I think I must have edit core before to work around this bug. Which, funny thing, with the update ...

(Also, where's the "bloody annoying" severity level, I don't think it's "major", but it's certainly not "normal")

comment:26 @SergeyBiryukov3 years ago

  • Version changed from 3.3.1 to 2.7

Version number indicates when the bug was initially introduced/reported.

comment:27 follow-up: @cgrymala3 years ago

  • Cc curtiss@… added

I've come across a new wrinkle in this issue that I haven't seen mentioned yet. HTML comments around entire lines (or groups of lines) occasionally causes those lines to be deleted altogether (and the HTML structure to get messed up) when switching between the HTML editor and the Visual Editor.

Take the following code for example:

<ol>
	<li>List Item 1</li>
	<li>List Item 2</li>
	<li>List Item 3</li>
</ol>

Now, add an HTML comment like so:

<ol>
	<li>List Item 1</li>
<!--	<li>List Item 2</li>-->
	<li>List Item 3</li>
</ol>

Then, switch to the visual editor and switch back to the HTML editor. You're left with the following:

<ol>
<ol>
	<li>List Item 1</li>
</ol>
</ol>
&nbsp;
<ol>
	<li>List Item 3</li>
</ol>

As another example, if you take the following HTML:

List Item 1

<strong>List Item 2</strong>

List Item 3

Then, add an HTML comment like:

List Item 1

<!--<strong>List Item 2</strong>-->

List Item 3

Then, switch back to the visual editor and back to the HTML editor again, and you're left with the following code in the HTML editor:

List Item 1

&nbsp;

List Item 3

comment:28 in reply to: ↑ 27 @realj423 years ago

Replying to cgrymala:

I am also seeing this, furthermore, the comments are being completely removed, even if there are no tags inside. For example, the following text, set up for use with the 'Graceful Pull-Quotes' plugn

<span class="pullquote"><!--Joan Cuneos successes, against prominent male racers in the USA, led to women being banned--></span>

completely disappears after switcing to visual then back to html. This bug would appear to completely prohibit the use of hidden text for special formatting.

comment:30 @kraftbj2 years ago

  • Cc bk@… added

comment:31 @Cimmo22 months ago

Got a 'weird' bug on my plug-in "Cimy User Extra Fields", at the end is due to exactly this issue. It took me 1 hour of investigation to understand what was going on.

@ericaentertainment
Thanks for the workaround, works great, but may break in the future. A real fix would be nice from WP developers, I know I can avoid HTML comments, but you can also avoid to mess with them :)

Thank you.

comment:32 @Cimmo22 months ago

Actually my issue was even more subtle:
basically I have a html comment in my plugin with just the plugin's name, version and author, no html tags.

So how that triggered this bug?
When used on certain themes they were applying to the_content() first wpautop() function, that transformed all new lines into html break tags and then applied wptexturize().

So in that case, even if I had a totally harmless comment, but on multiple lines, the combination of the two functions made my "end of the html comment tag" changed and then all the page broke.

This the piece that is found in some themes like: "Modular" and "Emporium" for "WooCommerce":

# Format and append to content
$new_content .= wptexturize(wpautop($piece_intl));

To fix the problem I had to remove all "new lines" in my html comment.

Just to give the idea how a bug can be amplified by pure "luck" :)

comment:33 @nacin15 months ago

  • Keywords needs-unit-tests removed

We already "disable" wptexturize() when we experience certain shortcodes and tags. If we added support for HTML comments — disable texturize when we see <!--, re-enable it after --> — I think that'd solve our problems and shouldn't be too much of a nightmare to implement.

An alternative, cheap fix would be to not do the '--' em-dash replacement if the next character is >. We'd never want an em-dash there.

[17636] made this patch stale.

Removing needs-unit-tests as some exist, though more would of course be nice.

@nacin15 months ago

comment:34 follow-up: @nacin15 months ago

  • Keywords needs-refresh removed

8912.diff is a cheap patch. It simply does a lookahead for > before replacing -- with an en dash. So, the closing HTML comment is at least no longer broken. This only allows an assertion to pass, which I've broken off into a discrete test.

The bug still remains that inside HTML comments, things are still texturized. (As seen by the failed test.) Given the nature of HTML comments, this isn't a big deal, but it sure would be a nicer fix overall.

comment:35 @nacin15 months ago

#26286 was marked as a duplicate.

comment:36 @nacin15 months ago

  • Keywords wptexturize added

comment:37 in reply to: ↑ 34 @miqrogroove14 months ago

Replying to nacin:

It simply does a lookahead for > before replacing --

Does WP allow any space between those? I believe it's permitted in HTML.

comment:38 @miqrogroove13 months ago

8912.diff would break xn-- which is magical.

Let's visit this ticket only after closing #23185 to avoid confusion.

comment:39 @miqrogroove13 months ago

Alternatively, we could roll this up with #12690 because there is no reason to texturize inside HTML comments.

comment:40 @wonderboymusic11 months ago

  • Owner set to wonderboymusic
  • Resolution set to fixed
  • Status changed from reopened to closed

In 28727:

In wptexturize(), ensure that texturization does not corrupt contents of HTML elements, HTML comments, and smartcode attributes.

Adds a variety of unit tests/assertions.

Props miqrogroove.
Fixes #12690, #8912, #27602.

comment:41 @wonderboymusic11 months ago

  • Milestone changed from Future Release to 4.0
Note: See TracTickets for help on using tickets.