WordPress.org

Make WordPress Core

Opened 7 years ago

Last modified 12 days ago

#3670 new defect (bug)

Removing CDATA close tag ( ]]> ) unbalances the CDATA block

Reported by: scenic Owned by: andy
Milestone: Future Release Priority: normal
Severity: minor Version: 2.1
Component: Posts, Post Types Keywords: has-patch needs-refresh needs-unit-tests
Focuses: template Cc:

Description

I'm
not sure if this is a bug per se, but it breaks included JS in my
posts that are escaped with a CDATA section. I've basically commented
out the following line in the_content() every time I upgrade:

$content = str_replace(']]>', ']]>', $content);

A user on the wp-testers list indicated that this was part of making sure that included CDATA blocks didn't break RSS feeds. I don't use CDATA sections directly in my RSS feeds, so I haven't tested this. In my particular case, the JS is used to embed Flash movies (e.g. YouTube videos) in an XHTML compliant way (without embed tags). I have a custom plugin I've written that strips out the JS and replaces it with the embed tag in the RSS feed.

Perhaps we should use a flag to activate this when the app is going through a feed. Ideally, though, it would simply be removed. Odds are that the CDATA block is being used for a JS block in a post body, and since most RSS aggregators don't allow JS by default, it would be safe to simply remove CDATA blogs if is_feed() is set.

Attachments (6)

3670.001.diff (1.6 KB) - added by AaronCampbell 6 years ago.
3670.diff (665 bytes) - added by Denis-de-Bernardy 4 years ago.
conditionally disable CDATA blocks
3670.2.diff (2.8 KB) - added by ryan 4 years ago.
Escape cdata close via filter on the_content
post-template.php.diff (544 bytes) - added by shidouhikari 4 years ago.
Moves filter to the end of the function, letting each Wordpress user deal with the situation without having to edit core files
3670.3.diff (2.9 KB) - added by nacin 3 years ago.
Introduce escape_cdata_close()
3670-I.diff (2.9 KB) - added by andy 3 years ago.
Remove incorrect entity encoding of ]]> and introduce cdata functions for filters.

Download all attachments as: .zip

Change History (73)

comment:1 scenic7 years ago

  • Version set to 2.1

comment:2 filosofo7 years ago

Couldn't we just make this a filter that is applied for RSS feeds but not e.g. admins, like kses?

comment:3 andy7 years ago

I was the one who pointed out the reason for that line. The JS-replacement plugin is probably saving your feeds from the problem that line was added to fix.

I think we can come up with a better solution for this. If there is a good reason to wrap some of your post content as CDATA, perhaps you would be better served by adding a filter to your plugin to wrap the_content as CDATA.

Has anyone else run into this?

comment:4 alexrabe7 years ago

I would also appreciate an update to make difference between feed / content, my plugin has exact the same problem, cause when I want the XHTML validation, I need the CDATA implementation, cause the "socalled" flashvars for a Flash video plugin are diveded by ampersand's. At the moment it's possible to use <!-- around CDATA to covers the CDATA problematic in the conent, but it seems to be deprecate from W3C.

So more and More Flash integration trap into this problematic

comment:5 scenic7 years ago

Andy, I agree that the filter is saving me from needing that line. But, embedded JS is a valid thing to do in the content. That replace function should only really happen when we are processing a feed. The global is_feed() (I think) method should just wrap that line at a minimum.

comment:6 andy7 years ago

  • Keywords dev-feedback added

Checking is_feed does not get to the root of the problem, which is that we are trying to nest CDATA sections where we should not. I don't know enough about XML but I think we ought to stop using CDATA markers in feed templates and use a function that can intelligently wrap any content in CDATA as needed, removing any nested markers.

Is there really any reason to encode them rather than simply remove them?

So what I propose, and what should be seen by somebody with XML chops, is this unintelligent function:

function wp_cdata($data) {

Maybe test for conditions necessitating
CDATA markers and return if no need

$data = str_replace('<![CDATA[', , $data);
$data = str_replace(']]>',
, $data);

Sometimes a trailing square bracket
can mess up XML parsers:
if ( substr($data, -1) == ']' )

$data .= ' ';

return "<![CDATA[$data]]>';

}

comment:7 andy7 years ago

function wp_cdata($data) {
 // Maybe test for conditions necessitating
 // CDATA markers and return if no need

 $data = str_replace('<![CDATA[', '', $data);
 $data = str_replace(']]>', '', $data);

 // Sometimes a trailing square bracket
 // can mess up XML parsers:
 if ( substr($data, -1) == ']' )
  $data .= ' ';

 return "<![CDATA[$data]]>';
}

comment:8 tombarta7 years ago

Does wordpress support PHP < 4.0.5? If not, you could use arrays in str_replace:

$data = str_replace(array('<![CDATA[', ']]>'), array('', ''), $data);

comment:9 Nazgul7 years ago

  • Milestone changed from 2.1.1 to 2.1.2

comment:10 foolswisdom7 years ago

  • Milestone changed from 2.1.3 to 2.3

comment:11 tellyworth7 years ago

Not sure how helpful this is, but it is kind of possible to nest CDATA sections. Technically I suppose it's double-encoding rather than nesting, but regardless, this is 100% valid and should work with any compliant XML parser:

<![CDATA foo <![CDATA bar ]]]><![CDATA[]> baz ]]>

It's actually two separate CDATA sections concatenated together.

When decoded by an XML parser, this becomes:

foo <![CDATA bar ]]> baz

Here's a PHP snippet to do it:

return '<![CDATA.str_replace(?]>', ']]]><![CDATA[]>', $str).']]>';

comment:12 follow-up: tellyworth7 years ago

Whoops, here's that code again:

return '<![CDATA['.str_replace(']]>', ']]]><![CDATA[]>', $str).']]>';

comment:13 markjaquith7 years ago

  • Milestone changed from 2.3 to 2.4 (next)

Any updates, Andy?

comment:14 lloydbudd6 years ago

  • Milestone changed from 2.5 to 2.6

comment:15 Nazgul6 years ago

  • Owner changed from anonymous to andy

comment:16 AaronCampbell6 years ago

I'm still a little unclear as to the purpose of this line of code. I keep running into this problem, so I'd like to see a fix (I'd be happy to submit a patch), but I don't understand why we are keeping it at all. If it's just needed for feeds, lets limit it to affecting feeds. If someone could clearly explain to me why it's here (when it's needed), I'll put a patch together.

AaronCampbell6 years ago

comment:17 AaronCampbell6 years ago

  • Component changed from Administration to Template
  • Keywords has-patch needs-testing added
  • Milestone changed from 2.6 to 2.5.1
  • Priority changed from low to normal
  • Severity changed from normal to major

This ticket has sat here forever with nothing getting done, so I added a patch to get the ball rolling. The patch will fix two problems. It will fix one expressed above by only replacing ]]> with ]]&gt; when it is processing a feed.

There was another problem because we so often use:

//<![CDATA[
alert('test');
//]]>

and strip_tags() leaves the leading there, which looks ugly in the feeds as well. I added a simple statement in two places to handle that:
$text = preg_replace('|//\s*<!\[CDATA\[|', '<![CDATA[', $text);

comment:18 ryan6 years ago

  • Milestone changed from 2.5.2 to 2.9

Milestone 2.5.2 deleted

comment:19 Viper007Bond6 years ago

Bumpy bump.

comment:20 Viper007Bond6 years ago

  • Milestone changed from 2.9 to 2.7

comment:21 jacobsantos6 years ago

Seems to be a big enough issue that this ticket should go in.

comment:22 ryan5 years ago

  • Milestone changed from 2.7 to 2.8

Postponing yet again. Patch looked like a good candidate for early 2.8 so we can get plenty of testing on it.

comment:23 lloydbudd5 years ago

  • Keywords early added

comment:24 Denis-de-Bernardy5 years ago

might be worth committing to give it a shot?

comment:25 follow-up: Denis-de-Bernardy5 years ago

patch applies cleanly against today's trunk.

quick question, though: if the closing CDATA gets escaped, shouldn't the opening CDATA get escaped too?

comment:26 Denis-de-Bernardy5 years ago

  • Keywords dev-feedback removed

let's try to get this into 2.9 early

comment:27 Denis-de-Bernardy5 years ago

  • Milestone changed from 2.8 to 2.9

comment:28 Denis-de-Bernardy5 years ago

  • Keywords needs-unit-tests added
  • Severity changed from major to normal

comment:29 in reply to: ↑ 25 hakre5 years ago

Replying to Denis-de-Bernardy:

patch applies cleanly against today's trunk.

quick question, though: if the closing CDATA gets escaped, shouldn't the opening CDATA get escaped too?

For shure, everything else is pretty stupid.

comment:30 follow-up: kirkpatrick4 years ago

  • Cc kirkpatrick added

The substitution of the right bracket in the CDATA closing,

$excerpt = str_replace(']]>', ']]&gt;', $excerpt);

occurs in a number of places in WordPress, all in wp-includes:

comment.php Line 1355


feed.php Line 191

formatting.php Line 1734

post-template.php Line 168

So this substitution seems to be desired for reasons other than "protecting feeds". Perhaps security against javascript-driven xml injection attacks? (It would be useful if the gods of WordPress would tell us the meaning of all this.)

Anyway, as it is, a page written by a plugin that creates javascript that includes strings of html will not validate as xhtml. Since we can (presumably) trust plugins, it should be ok to fix this: just reverse the order of lines 167 and 168 in post-template.php, so the filters (potentially set by plugins) are applied after the replacement:
Change

	$content = apply_filters('the_content', $content);
	$content = str_replace(']]>', ']]&gt;', $content);

to

	$content = str_replace(']]>', ']]&gt;', $content);
	$content = apply_filters('the_content', $content);

This way, any CDATA in the post (as stored in the database) will lose the CDATA, as seems to be desired, but the plugin can still operate properly and validate.

The same reversal should also be done (my application doesn't need these, but ...) in comment.php (line 1355 to above 1351) and formatting.php (exchange 1733 and 1734).

This is a minimal change, and I believe makes more sense than the present code. The post content is filtered, but then the plugin is allowed to do its work.

Of course, this fix does not meet the needs of the earlier change posters, who wish to post javascript hidden with CDATA. I don't think I support this for standard WordPress.

comment:31 in reply to: ↑ 30 Denis-de-Bernardy4 years ago

Replying to kirkpatrick:

So this substitution seems to be desired for reasons other than "protecting feeds"?

Not according to the changeset that introduced it (r1039).

Here's what the file looked like way back then:

http://core.trac.wordpress.org/browser/trunk/wp-includes/template-functions-post.php?rev=1039

Denis-de-Bernardy4 years ago

conditionally disable CDATA blocks

comment:32 Denis-de-Bernardy4 years ago

  • Cc josephscott added

I've added a patch, but I'm not 100% that it won't break the Atom API. Can anyone give it a shot?

along the same line of bugs, there is #9992.

comment:33 westi4 years ago

  • Cc westi added

Would it not be better to move the escaping of the closing CDATA into a function on the the_content filter which can then itself check is_feed() and also if necessary easily be removed by a plugin?

However as has been said before we don't want to change this too much close to release maybe we should just move the current code to a filter function for now but with no conditional check.


comment:34 markjaquith4 years ago

Moving it to a filter sounds like an acceptable solution this close to RC.

ryan4 years ago

Escape cdata close via filter on the_content

comment:35 ryan4 years ago

Patch moves escaping to a filter, although I'm tempted to punt once again and make the decision in 3.0 to finally fix this thing correctly. If we only need this for feeds, do it in get_the_content_feed() and remove it from everywhere else.

comment:36 markjaquith4 years ago

I'm okay with that too. We can wait until 3.0 and fix it once.

comment:37 ryan4 years ago

  • Milestone changed from 2.9 to 3.0

comment:38 hakre4 years ago

I just stumbeled over that line of code today and wondered a bit. Isn't there a proper way to escape the CDATA end sequence(here named close tag)? I think it is but I was not able to find the propper way and it's getting late.

comment:39 hakre4 years ago

Replacement with &gt; should work with useragents (since about HTML 2.0, this is SGML related; this is 1996), so strictly spoken this bug-report is technically bogus. I do not want to speak against providing a patch here but anyone having this problem should check why this is creating problems properly before adding a fix by removing an upcomming filter or uncommenting the current line in code. It looks to me like the feeds useragent is not proplery following the standards.

comment:40 hakre4 years ago

I suggest to close as invalid.

comment:41 hakre4 years ago

As in #9992 tendencies are to name it what it is: an invalid report. Any objections?

comment:42 Denis-de-Bernardy4 years ago

yes, one: for performance sake, is_feed() is faster than and uneeded call to str_replace(). :-)

comment:43 shidouhikari4 years ago

  • Cc shidouhikari added
  • Milestone changed from 3.0 to 2.9.2
  • Priority changed from normal to high
  • Severity changed from normal to major

Why call it invalid report?

Plugins should be allowed to add javascript into posts, and to do so they need to wrap the code in CDATA to keep XHTML valid.

If it is being done just for RSS sake, do it only when the content is feeded at all.

2 solutions were suggested, 3 years passed, and the code is still there without anybody defending it. If it must be there somebody should point why. If not, move it to a filter with high hook priority and let each website owner / plugin developer deal with it.

shidouhikari4 years ago

Moves filter to the end of the function, letting each Wordpress user deal with the situation without having to edit core files

comment:44 shidouhikari4 years ago

With this patch, nothing changes if no filter is hooked, while anybody can take responsibility over his own sites for mystic consequences of leaving CDATA inside posts :)

If somebody decides to hook a filter that lets CDATA lives, and eventually faces a bug, then the cause will be the filter and not Wordpress core itself. Lets us have control of our code.

comment:45 nacin4 years ago

  • Milestone changed from 2.9.3 to 3.0

comment:46 follow-up: dd324 years ago

How about we move the Cdata striping to a function hooked on the filter in question?

We can set the priority to (for example) 100, any filters that require to run afterwards, hook later, all other hooks hook first and get their result affected later down.

Otherwise, due to lack of traction and clear solution here, and my lack of wanting to change the filter order (as it has been for a few releases) this late in the game.. this'd get pushed to the next milestone.

And now i read the higher comments, and see that rboren has suggested exactly this. rboren, what do you say, Commit or deal with in the next release?

comment:47 in reply to: ↑ 46 aaroncampbell4 years ago

Replying to dd32:
Unfortunately, it's been so long that I can barely remember the issue. Reading my previous posts here, it seems like moving the CDATA stripping to a filter would be enough to fix it. I'd be fine with that solution.

comment:48 technosailor4 years ago

Ryans patch still applies cleanly. Let's commit and close this ticket for once and for all and work down the 3.0 ticket load :)

comment:49 shidouhikari4 years ago

  • Keywords needs-testing needs-unit-tests removed

I also don't see how Ryan's patch could break something. It just passed the hardcoded code to a filter. If nobody touches that filter it will remain doing exactally the same.

If a plugin uses a filter with higher priority than 20, he can just remove and add the filter again. Anybody that doesn't want CDATA being escaped can remove the filter and be responsible to anything bad it may cause on his site.

It's a perfect solution that will satisfy everybody.

comment:50 shidouhikari4 years ago

I've been using Wordpress in 2 sites, in their production and in development environments, having this CDATA replacement totally removed. And in all of them I have a plugin that adds CDATA escaped JavaScript to the_content.

I never felt anything bad and don't miss it in anyway in my use.

comment:51 nacin4 years ago

  • Milestone changed from 3.0 to 3.1

I would add the early keyword, but it's been there for 18 months. Let's actually handle this early 3.1.

comment:52 shidouhikari4 years ago

(I'm not complaining, just asking to understand it.)

What blocks this ticket from being commited? Are the suggested solutions not secure? waiting for a metter one to emerge?

comment:53 shidouhikari4 years ago

I've been using 2 Wordpress sites without this ]]> replace line, and they've been working great. No XHTML 1.1 error and no problem in feeds.

Comments feeds work great, I even have a plugin that tweaks the document and tested it a lot while developing and no problem, feedburner also showing my posts fine.

I have a plugin that adds JS on every link in posts and comments and they are working fine too.

comment:54 nacin3 years ago

  • Milestone changed from Awaiting Triage to Future Release

comment:55 in reply to: ↑ 12 andy3 years ago

Replying to tellyworth:

Whoops, here's that code again:

return '<![CDATA['.str_replace(']]>', ']]]><![CDATA[]>', $str).']]>';

This is correct. I have been running a high volume of posts through a strict XML parser and this is how I have been escaping CDATA sections. It splits the closing tag between two adjacent CDATA sections.

nacin3 years ago

Introduce escape_cdata_close()

comment:56 nacin3 years ago

  • Keywords early removed

Attached patch introduces escape_cdata_close() per comments by andy and tellyworth.

This could probably be moved to a filter as well, the way ryan's patch was implemented.

andy3 years ago

Remove incorrect entity encoding of ]]> and introduce cdata functions for filters.

comment:57 andy3 years ago

  • Priority changed from high to normal
  • Severity changed from major to minor

CDATA wrappers should be removed from all templates and applied via filters using new functions. There are some complications.

A quick search for 'the_title_rss' (e.g.) finds that filter tag used in wp-admin/includes/export.php, which reminds that we use feed-related functions for more than just feeds. Any changes to feed-related filters must be tested with export/import.

The same search also finds the template tag the_title_rss() in feed-rss2.php and feed-atom.php, but only CDATA-wrapped in the latter. Universal use of CDATA tags to guarantee XML validity is of dubious merit. However, if we are going to do it we should be consistent.

3670-I.diff is the first half of the fix. It will break some feeds (rare) so it should not be applied by itself. The second half, installing new CDATA filters, touches a lot of code. Please volunteer.

comment:58 lgedeon2 years ago

  • Cc luke.gedeon@… added

comment:59 mpressly6 months ago

It's been 3 years since this bug was updated. On two separate websites, I am currently having to modify the core to allow inclusion of Javascript after each Wordpress upgrade due to this issue. Could the code in question safely be moved to a filter so that it can be overridden as needed without having to modify the core?

comment:60 yitwail4 months ago

I had to take the approach of filtering plugin output to remove javascript with embedded CDATA tags, and make it an external javascript file I'm enqueing. This creates extra code, an extra server request, and possible issues if the plugin is updated, but seems safer than updating the core. I sincerely hope someone sees fit to addressing this long-standing issue.

comment:61 nacin2 months ago

  • Component changed from Template to Posts, Post Types
  • Focuses template added

comment:62 in reply to: ↑ description awelzel2 months ago

Replying to scenic:

I also suggest to make this a FILTER for RSS!

It is just stupid to do this replacement without a filter. Why?

I will open a new ticket for 3.8.1 - hopefully someone will notice it then.

comment:63 SergeyBiryukov2 months ago

#27134 was marked as a duplicate.

comment:64 SergeyBiryukov2 months ago

  • Milestone changed from Future Release to 3.9

comment:65 nacin3 weeks ago

  • Keywords needs-refresh needs-unit-tests added
  • Milestone changed from 3.9 to Future Release

This needs to be moved again. I think 3670-I.diff is good, but we need to refresh it, write unit tests, and confirm it's the proper fix.

comment:66 alekv3 weeks ago

Hi

I very much would appreciate if you could sort out this bug. I've developed a plugin for WooCommerce and AdWords which needs to insert the AdWords tracking code into the /checkout/order-received/ page. The plugin is getting more and more popular. Unfortunately the AdWords script uses the CDATA tags and breaks because of this bug. I have a workaround in place but would prefer a proper solution.

http://wordpress.org/plugins/woocommerce-google-adwords-conversion-tracking-tag/

Thanks and kind regards
Aleks

comment:67 iapassas12 days ago

Hello,
I am facing the same problem trying to embed a google charts java script in a post.
Interesting though the > is not changed all the time.
1) While embedding the following code is working properly

<script type="text/javascript" src="https://www.google.com/jsapi"></script>
<script type="text/javascript"><!--//--><![CDATA[//><!--
     google.load('visualization', '1', {'packages': ['geochart']});
     google.setOnLoadCallback(drawRegionsMap);
      function drawRegionsMap() {
        var data = google.visualization.arrayToDataTable([
          ['Country', 'Popularity'],
          ['Germany', 200],
          ['United States', 300],
          ['Brazil', 400],
          ['Canada', 500],
          ['France', 600],
          ['RU', 700]
        ]);
        var options = {};
        var chart = new google.visualization.GeoChart(document.getElementById('chart_div'));
        chart.draw(data, options);
    };
 //--><!]]></script>
<div id="chart_div" style="width: 900px; height: 500px;"></div>

2)There is an issue in the following

<script type="text/javascript" src="//ajax.googleapis.com/ajax/static/modules/gviz/1.0/chart.js"><!--//--><![CDATA[//><!--
{"dataSourceUrl":"//docs.google.com/spreadsheet/tq?key=0Au97PknsuEKSdDAxWXktZTJKUUJZR2Fmci13UnFUMHc&transpose=1&headers=0&range=A1%3AC2&gid=1&pub=1","options":{"titleTextStyle":{"bold":true,"color":"#000","fontSize":"12"},"vAxes":[{"title":"Left vertical axis title","useFormatFromData":true,"minValue":null,"viewWindow":{"min":null,"max":null},"maxValue":null},{"useFormatFromData":true,"minValue":null,"viewWindow":{"min":null,"max":null},"maxValue":null}],"pieHole":0,"title":"\u03a0\u03cc\u03c3\u03bf \u03c3\u03b7\u03bc\u03b1\u03bd\u03c4\u03b9\u03ba\u03cc \u03c1\u03cc\u03bb\u03bf \u03c0\u03b1\u03af\u03b6\u03b5\u03b9 \u03ba\u03b1\u03c4\u03ac \u03c4\u03b7 \u03b3\u03bd\u03ce\u03bc\u03b7 \u03c3\u03b1\u03c2, \u03b7 \u03c7\u03c1\u03ae\u03c3\u03b7 \u03c0\u03c1\u03c9\u03c4\u03bf\u03b3\u03b5\u03bd\u03ce\u03bd \u03b4\u03b5\u03b4\u03bf\u03bc\u03ad\u03bd\u03c9\u03bd \u03c9\u03c2 \u03ba\u03cd\u03c1\u03b9\u03b1 \u03c0\u03b7\u03b3\u03ae \u03c3\u03c4\u03b7 \u03b4\u03b7\u03bc\u03bf\u03c3\u03b9\u03bf\u03b3\u03c1\u03b1\u03c6\u03af\u03b1:","booleanRole":"certainty","height":371,"animation":{"duration":0},"colors":["#3366CC","#DC3912","#FF9900","#109618","#990099","#0099C6","#DD4477","#66AA00","#B82E2E","#316395","#994499","#22AA99","#AAAA11","#6633CC","#E67300","#8B0707","#651067","#329262","#5574A6","#3B3EAC","#B77322","#16D620","#B91383","#F4359E","#9C5935","#A9C413","#2A778D","#668D1C","#BEA413","#0C5922","#743411"],"width":600,"is3D":false,"hAxis":{"title":"Horizontal axis title","useFormatFromData":true,"minValue":null,"viewWindow":{"min":null,"max":null},"maxValue":null},"tooltip":{}},"state":{},"view":{},"isDefaultVisualization":false,"chartType":"PieChart","chartName":"Chart 2"}
//--><!]]></script>

I hope the above might be a help and resolve the issue.

Thank you for handling this issue

Note: See TracTickets for help on using tickets.