Make WordPress Core

Opened 15 years ago

Closed 10 years ago

Last modified 10 years ago

#8912 closed defect (bug) (fixed)

wptexturize malforms HTML comments that contain HTML tags

Reported by: otto42's profile Otto42 Owned by: wonderboymusic's profile wonderboymusic
Milestone: 4.0 Priority: normal
Severity: normal Version: 2.7
Component: Formatting Keywords: has-patch wptexturize
Focuses: Cc:


Because it's replacing -- with #8211, a comment like <!-- whatever --> put into the HTML part of a post gets broken.

This makes it difficult for people writing special HTML in posts (like people putting in object tags, or javascript, or whatever) to do that sort of thing.

What is needed is to recognize --> as different from -- and not replace it with the en dash in that case.

Attachments (2)

8912.patch (589 bytes) - added by SergeyBiryukov 13 years ago.
The patch from #16060
8912.diff (2.2 KB) - added by nacin 10 years ago.

Download all attachments as: .zip

Change History (43)

#1 @Viper007Bond
15 years ago

  • Resolution set to worksforme
  • Status changed from new to closed

Just tested with trunk. No problems.

#2 @Viper007Bond
15 years ago

  • Milestone 2.8 deleted

#3 @Otto42
14 years ago

Note: Visual mode doesn't allow HTML comments, the <>'s get altered to lt; and gt; and so on.

But in HTML mode, comments works fine in 2.9.2, at least.

#4 @Otto42
14 years ago

  • Component changed from General to Formatting
  • Milestone set to 3.0
  • Resolution worksforme deleted
  • Status changed from closed to reopened
  • Version changed from 2.7 to 2.9.2

Have a confirmed failure.

This code fails:


It gets converted to this:


Happens on a default installation.

#5 @Viper007Bond
14 years ago

You're including extra filters in that, Otto.

Here's what wptexturize() returns for that example string:


#6 @Viper007Bond
14 years ago



#7 @Viper007Bond
14 years ago

  • Summary changed from wp_texturize breaks HTML comments in posts to wptexturize malforms HTML comments that contain HTML tags

BTW, it would have been better to open a new ticket on this as this is a separate issue. ;)

#8 follow-up: @otto42
14 years ago

How is it a separate issue? It does exactly what I said it did last year. The comment got converted to the 8211 incorrectly. It's the same issue.

#9 in reply to: ↑ 8 @Viper007Bond
14 years ago

Replying to otto42:

How is it a separate issue? It does exactly what I said it did last year. The comment got converted to the 8211 incorrectly. It's the same issue.

<!-- foobar --> works though, right? That was the original issue.

The reason -- is being converted to &#8211; in your above example is because the HTML tags in the comment are breaking wptexturize()'s comment detection (it doesn't realize it's an HTML comment). It's totally a valid issue, but this is a bug in the HTML comment detection code rather than all HTML comments.

In short, I'm just nitpicking minor technicalities. Don't mind me. :)

#10 @rmccue
14 years ago

Related: #10033

#11 @ryan
14 years ago

  • Milestone changed from 3.0 to 3.1

#12 @nacin
13 years ago

  • Milestone changed from Awaiting Triage to Future Release

No patch. Future.

#13 @norbertm
13 years ago

  • Cc norbert@… added

Submitting patch & unit test to #4539.

#14 @azaozz
13 years ago

On production sites HTML comments in the post_content seem to be extremely rare, commented HTML i.e. the example above, are virtually non existent. Currently wptexturize() supports simple HTML comments properly:


works as expected.

The only user case for supporting commented HTML seems to be when a plugin developer wants to test the plugin's output. Don't think it's worth it adding some redundant regexp that will run on hundreds of millions of posts every day just for that. Perhaps better would be to add to the plugin developers part of the codex that commented HTML is not supported in the post content.

Suggesting: wontfix.

#15 @strider72
13 years ago

Responding to @azaozz:

I have a plugin that is affected by this in some cases. It's been an outstanding bug for a long time now, and I can't fix it within the plugin itself. See the Graceful Pull-Quotes plugin. Users can input an "alternate" text within an HTML comment, but with this bug they can't use tags at all. A similar (the same?) bug prevents HTML entities in comments -- they get malformed in the same way as tags. Non-English speakers contact me all the time with this.

#16 @strider72
13 years ago

I can also see situations where an author wants to temporarily "deactivate", but not delete, a chunk of text, and puts it in a comment. He better not have any tags in that text! So, yes I think this is a legitimate problem.

#17 @SergeyBiryukov
13 years ago

Related/duplicate: #16060

#18 @SergeyBiryukov
13 years ago

  • Keywords has-patch added

#19 @SergeyBiryukov
13 years ago

  • Milestone changed from Future Release to 3.3

Moving to 3.3, as #16060 was marked 3.2-early but didn't make it into 3.2.

13 years ago

The patch from #16060

#20 @SergeyBiryukov
13 years ago

  • Keywords needs-unit-tests added

#21 @SergeyBiryukov
13 years ago

  • Keywords needs-refresh added

Since [17636], the result for my example from #16060 is a bit different:

<!-- Sample list
	<li>Sample item</li>
</ul> -->


<!-- Sample list
	<li>Sample item</li>

Comment closing tag is preserved, but </ul> tag is missing.

Last edited 13 years ago by SergeyBiryukov (previous) (diff)

#22 @SergeyBiryukov
12 years ago

  • Milestone changed from 3.3 to Future Release

#23 @ericaentertainment
12 years ago

  • Keywords needs-refresh needs-unit-tests removed

I have a workaround. Just before the close comment tag, put another open comment tag. That second open comment tag will be commented out, but it forces the close tag to be recognized and not be reformatted as an en-dash.

Example that fails:
<!-- <b> text </b> -->

Example that works:
<!-- <b> text </b> <!-- -->

In other words, if you have comments that enclose other tags, just make sure the last tag is another open comment tag.

#24 @SergeyBiryukov
12 years ago

  • Keywords needs-refresh needs-unit-tests added

#25 @harrismw
12 years ago

  • Cc harrismw added
  • Version changed from 2.9.2 to 3.3.1

Still existing in the latest version of WP (3.3.1) which I just updated to from 2.9.something.

The code which sets it off is:

<!-- <li><a href="/news-and-events">News &amp; Events</a>: A record of news and events related to philosophy of religion.</li>  -->

I think I must have edit core before to work around this bug. Which, funny thing, with the update ...

(Also, where's the "bloody annoying" severity level, I don't think it's "major", but it's certainly not "normal")

#26 @SergeyBiryukov
12 years ago

  • Version changed from 3.3.1 to 2.7

Version number indicates when the bug was initially introduced/reported.

#27 follow-up: @cgrymala
12 years ago

  • Cc curtiss@… added

I've come across a new wrinkle in this issue that I haven't seen mentioned yet. HTML comments around entire lines (or groups of lines) occasionally causes those lines to be deleted altogether (and the HTML structure to get messed up) when switching between the HTML editor and the Visual Editor.

Take the following code for example:

	<li>List Item 1</li>
	<li>List Item 2</li>
	<li>List Item 3</li>

Now, add an HTML comment like so:

	<li>List Item 1</li>
<!--	<li>List Item 2</li>-->
	<li>List Item 3</li>

Then, switch to the visual editor and switch back to the HTML editor. You're left with the following:

	<li>List Item 1</li>
	<li>List Item 3</li>

As another example, if you take the following HTML:

List Item 1

<strong>List Item 2</strong>

List Item 3

Then, add an HTML comment like:

List Item 1

<!--<strong>List Item 2</strong>-->

List Item 3

Then, switch back to the visual editor and back to the HTML editor again, and you're left with the following code in the HTML editor:

List Item 1


List Item 3

#28 in reply to: ↑ 27 @realj42
12 years ago

Replying to cgrymala:

I am also seeing this, furthermore, the comments are being completely removed, even if there are no tags inside. For example, the following text, set up for use with the 'Graceful Pull-Quotes' plugn

<span class="pullquote"><!--Joan Cuneos successes, against prominent male racers in the USA, led to women being banned--></span>

completely disappears after switcing to visual then back to html. This bug would appear to completely prohibit the use of hidden text for special formatting.

#30 @kraftbj
11 years ago

  • Cc bk@… added

#31 @Cimmo
11 years ago

Got a 'weird' bug on my plug-in "Cimy User Extra Fields", at the end is due to exactly this issue. It took me 1 hour of investigation to understand what was going on.

Thanks for the workaround, works great, but may break in the future. A real fix would be nice from WP developers, I know I can avoid HTML comments, but you can also avoid to mess with them :)

Thank you.

#32 @Cimmo
11 years ago

Actually my issue was even more subtle:
basically I have a html comment in my plugin with just the plugin's name, version and author, no html tags.

So how that triggered this bug?
When used on certain themes they were applying to the_content() first wpautop() function, that transformed all new lines into html break tags and then applied wptexturize().

So in that case, even if I had a totally harmless comment, but on multiple lines, the combination of the two functions made my "end of the html comment tag" changed and then all the page broke.

This the piece that is found in some themes like: "Modular" and "Emporium" for "WooCommerce":

# Format and append to content
$new_content .= wptexturize(wpautop($piece_intl));

To fix the problem I had to remove all "new lines" in my html comment.

Just to give the idea how a bug can be amplified by pure "luck" :)

#33 @nacin
10 years ago

  • Keywords needs-unit-tests removed

We already "disable" wptexturize() when we experience certain shortcodes and tags. If we added support for HTML comments — disable texturize when we see <!--, re-enable it after --> — I think that'd solve our problems and shouldn't be too much of a nightmare to implement.

An alternative, cheap fix would be to not do the '--' em-dash replacement if the next character is >. We'd never want an em-dash there.

[17636] made this patch stale.

Removing needs-unit-tests as some exist, though more would of course be nice.

10 years ago

#34 follow-up: @nacin
10 years ago

  • Keywords needs-refresh removed

8912.diff is a cheap patch. It simply does a lookahead for > before replacing -- with an en dash. So, the closing HTML comment is at least no longer broken. This only allows an assertion to pass, which I've broken off into a discrete test.

The bug still remains that inside HTML comments, things are still texturized. (As seen by the failed test.) Given the nature of HTML comments, this isn't a big deal, but it sure would be a nicer fix overall.

#35 @nacin
10 years ago

#26286 was marked as a duplicate.

#36 @nacin
10 years ago

  • Keywords wptexturize added

#37 in reply to: ↑ 34 @miqrogroove
10 years ago

Replying to nacin:

It simply does a lookahead for > before replacing --

Does WP allow any space between those? I believe it's permitted in HTML.

#38 @miqrogroove
10 years ago

8912.diff would break xn-- which is magical.

Let's visit this ticket only after closing #23185 to avoid confusion.

#39 @miqrogroove
10 years ago

Alternatively, we could roll this up with #12690 because there is no reason to texturize inside HTML comments.

#40 @wonderboymusic
10 years ago

  • Owner set to wonderboymusic
  • Resolution set to fixed
  • Status changed from reopened to closed

In 28727:

In wptexturize(), ensure that texturization does not corrupt contents of HTML elements, HTML comments, and smartcode attributes.

Adds a variety of unit tests/assertions.

Props miqrogroove.
Fixes #12690, #8912, #27602.

#41 @wonderboymusic
10 years ago

  • Milestone changed from Future Release to 4.0
Note: See TracTickets for help on using tickets.