Opened 4 years ago

Last modified 2 months ago

#8912 reopened defect (bug)

wptexturize malforms HTML comments that contain HTML tags

Reported by: Otto42 Owned by: anonymous
Priority: normal Milestone: Future Release
Component: Formatting Version: 2.7
Severity: normal Keywords: has-patch needs-refresh needs-unit-tests
Cc: norbert@…, harrismw, curtiss@…, bk@…

Description

Because it's replacing -- with #8211, a comment like <!-- whatever --> put into the HTML part of a post gets broken.

This makes it difficult for people writing special HTML in posts (like people putting in object tags, or javascript, or whatever) to do that sort of thing.

What is needed is to recognize --> as different from -- and not replace it with the en dash in that case.

Attachments (1)

8912.patch (589 bytes) - added by SergeyBiryukov 21 months ago.
The patch from #16060

Download all attachments as: .zip

Change History (31)

  • Resolution set to worksforme
  • Status changed from new to closed

Just tested with trunk. No problems.

  • Milestone 2.8 deleted

Note: Visual mode doesn't allow HTML comments, the <>'s get altered to lt; and gt; and so on.

But in HTML mode, comments works fine in 2.9.2, at least.

  • Component changed from General to Formatting
  • Milestone set to 3.0
  • Resolution worksforme deleted
  • Status changed from closed to reopened
  • Version changed from 2.7 to 2.9.2

Have a confirmed failure.

This code fails:

<ul><li>Hello.</li><!--<li>Goodbye.</li>--></ul>

It gets converted to this:

<ul>
<li>Hello.</li>
<p><!--
<li>Goodbye.</li>
<p>&#8211;></ul>

Happens on a default installation.

You're including extra filters in that, Otto.

Here's what wptexturize() returns for that example string:

`
<ul><li>Hello.</li><!--<li>Goodbye.</li>&#8211;></ul>
`

Ugh.

<ul><li>Hello.</li><!--<li>Goodbye.</li>&#8211;></ul>
  • Summary changed from wp_texturize breaks HTML comments in posts to wptexturize malforms HTML comments that contain HTML tags

BTW, it would have been better to open a new ticket on this as this is a separate issue. ;)

comment:8 follow-up: ↓ 9   otto423 years ago

How is it a separate issue? It does exactly what I said it did last year. The comment got converted to the 8211 incorrectly. It's the same issue.

comment:9 in reply to: ↑ 8   Viper007Bond3 years ago

Replying to otto42:

How is it a separate issue? It does exactly what I said it did last year. The comment got converted to the 8211 incorrectly. It's the same issue.

<!-- foobar --> works though, right? That was the original issue.

The reason -- is being converted to &#8211; in your above example is because the HTML tags in the comment are breaking wptexturize()'s comment detection (it doesn't realize it's an HTML comment). It's totally a valid issue, but this is a bug in the HTML comment detection code rather than all HTML comments.

In short, I'm just nitpicking minor technicalities. Don't mind me. :)

Related: #10033

  • Milestone changed from 3.0 to 3.1
  • Milestone changed from Awaiting Triage to Future Release

No patch. Future.

  • Cc norbert@… added

Submitting patch & unit test to #4539.

On production sites HTML comments in the post_content seem to be extremely rare, commented HTML i.e. the example above, are virtually non existent. Currently wptexturize() supports simple HTML comments properly:

<ul><li>Hello.</li><!--Goodbye.--></ul>

works as expected.

The only user case for supporting commented HTML seems to be when a plugin developer wants to test the plugin's output. Don't think it's worth it adding some redundant regexp that will run on hundreds of millions of posts every day just for that. Perhaps better would be to add to the plugin developers part of the codex that commented HTML is not supported in the post content.

Suggesting: wontfix.

Responding to @azaozz:

I have a plugin that is affected by this in some cases. It's been an outstanding bug for a long time now, and I can't fix it within the plugin itself. See the Graceful Pull-Quotes plugin. Users can input an "alternate" text within an HTML comment, but with this bug they can't use tags at all. A similar (the same?) bug prevents HTML entities in comments -- they get malformed in the same way as tags. Non-English speakers contact me all the time with this.

I can also see situations where an author wants to temporarily "deactivate", but not delete, a chunk of text, and puts it in a comment. He better not have any tags in that text! So, yes I think this is a legitimate problem.

Related/duplicate: #16060

  • Keywords has-patch added
  • Milestone changed from Future Release to 3.3

Moving to 3.3, as #16060 was marked 3.2-early but didn't make it into 3.2.

The patch from #16060

  • Keywords needs-unit-tests added
  • Keywords needs-refresh added

Since [17636], the result for my example from 16060 is a bit different:

<!-- Sample list
<ul>
	<li>Sample item</li>
</ul> -->

Output:

<!-- Sample list
<ul>
	<li>Sample item</li>
 -->

Comment closing tag is preserved, but <ul> tag is missing.

Version 0, edited 20 months ago by SergeyBiryukov (next)
  • Milestone changed from 3.3 to Future Release
  • Keywords needs-refresh needs-unit-tests removed

I have a workaround. Just before the close comment tag, put another open comment tag. That second open comment tag will be commented out, but it forces the close tag to be recognized and not be reformatted as an en-dash.

Example that fails:
<!-- <b> text </b> -->

Example that works:
<!-- <b> text </b> <!-- -->

In other words, if you have comments that enclose other tags, just make sure the last tag is another open comment tag.

  • Keywords needs-refresh needs-unit-tests added
  • Cc harrismw added
  • Version changed from 2.9.2 to 3.3.1

Still existing in the latest version of WP (3.3.1) which I just updated to from 2.9.something.

The code which sets it off is:

<!-- <li><a href="/news-and-events">News &amp; Events</a>: A record of news and events related to philosophy of religion.</li>  -->

I think I must have edit core before to work around this bug. Which, funny thing, with the update ...

(Also, where's the "bloody annoying" severity level, I don't think it's "major", but it's certainly not "normal")

  • Version changed from 3.3.1 to 2.7

Version number indicates when the bug was initially introduced/reported.

comment:27 follow-up: ↓ 28   cgrymala14 months ago

  • Cc curtiss@… added

I've come across a new wrinkle in this issue that I haven't seen mentioned yet. HTML comments around entire lines (or groups of lines) occasionally causes those lines to be deleted altogether (and the HTML structure to get messed up) when switching between the HTML editor and the Visual Editor.

Take the following code for example:

<ol>
	<li>List Item 1</li>
	<li>List Item 2</li>
	<li>List Item 3</li>
</ol>

Now, add an HTML comment like so:

<ol>
	<li>List Item 1</li>
<!--	<li>List Item 2</li>-->
	<li>List Item 3</li>
</ol>

Then, switch to the visual editor and switch back to the HTML editor. You're left with the following:

<ol>
<ol>
	<li>List Item 1</li>
</ol>
</ol>
&nbsp;
<ol>
	<li>List Item 3</li>
</ol>

As another example, if you take the following HTML:

List Item 1

<strong>List Item 2</strong>

List Item 3

Then, add an HTML comment like:

List Item 1

<!--<strong>List Item 2</strong>-->

List Item 3

Then, switch back to the visual editor and back to the HTML editor again, and you're left with the following code in the HTML editor:

List Item 1

&nbsp;

List Item 3

comment:28 in reply to: ↑ 27   realj4212 months ago

Replying to cgrymala:

I am also seeing this, furthermore, the comments are being completely removed, even if there are no tags inside. For example, the following text, set up for use with the 'Graceful Pull-Quotes' plugn

<span class="pullquote"><!--Joan Cuneos successes, against prominent male racers in the USA, led to women being banned--></span>

completely disappears after switcing to visual then back to html. This bug would appear to completely prohibit the use of hidden text for special formatting.

  • Cc bk@… added
Note: See TracTickets for help on using tickets.