WordPress.org

Make WordPress Core

Opened 8 years ago

Last modified 3 months ago

#3670 new defect (bug)

Removing CDATA close tag ( ]]> ) unbalances the CDATA block

Reported by: scenic Owned by: andy
Milestone: Future Release Priority: normal
Severity: minor Version: 2.1
Component: Posts, Post Types Keywords: has-patch needs-unit-tests
Focuses: template Cc:

Description

I'm
not sure if this is a bug per se, but it breaks included JS in my
posts that are escaped with a CDATA section. I've basically commented
out the following line in the_content() every time I upgrade:

$content = str_replace(']]>', ']]>', $content);

A user on the wp-testers list indicated that this was part of making sure that included CDATA blocks didn't break RSS feeds. I don't use CDATA sections directly in my RSS feeds, so I haven't tested this. In my particular case, the JS is used to embed Flash movies (e.g. YouTube videos) in an XHTML compliant way (without embed tags). I have a custom plugin I've written that strips out the JS and replaces it with the embed tag in the RSS feed.

Perhaps we should use a flag to activate this when the app is going through a feed. Ideally, though, it would simply be removed. Odds are that the CDATA block is being used for a JS block in a post body, and since most RSS aggregators don't allow JS by default, it would be safe to simply remove CDATA blogs if is_feed() is set.

Attachments (7)

3670.001.diff (1.6 KB) - added by AaronCampbell 7 years ago.
3670.diff (665 bytes) - added by Denis-de-Bernardy 5 years ago.
conditionally disable CDATA blocks
3670.2.diff (2.8 KB) - added by ryan 5 years ago.
Escape cdata close via filter on the_content
post-template.php.diff (544 bytes) - added by shidouhikari 5 years ago.
Moves filter to the end of the function, letting each Wordpress user deal with the situation without having to edit core files
3670.3.diff (2.9 KB) - added by nacin 4 years ago.
Introduce escape_cdata_close()
3670-I.diff (2.9 KB) - added by andy 4 years ago.
Remove incorrect entity encoding of ]]> and introduce cdata functions for filters.
3670.4.diff (6.7 KB) - added by mdgl 3 months ago.
Refreshed and enhanced patch

Download all attachments as: .zip

Change History (94)

comment:1 @scenic8 years ago

  • Version set to 2.1

comment:2 @filosofo8 years ago

Couldn't we just make this a filter that is applied for RSS feeds but not e.g. admins, like kses?

comment:3 @andy8 years ago

I was the one who pointed out the reason for that line. The JS-replacement plugin is probably saving your feeds from the problem that line was added to fix.

I think we can come up with a better solution for this. If there is a good reason to wrap some of your post content as CDATA, perhaps you would be better served by adding a filter to your plugin to wrap the_content as CDATA.

Has anyone else run into this?

comment:4 @alexrabe8 years ago

I would also appreciate an update to make difference between feed / content, my plugin has exact the same problem, cause when I want the XHTML validation, I need the CDATA implementation, cause the "socalled" flashvars for a Flash video plugin are diveded by ampersand's. At the moment it's possible to use <!-- around CDATA to covers the CDATA problematic in the conent, but it seems to be deprecate from W3C.

So more and More Flash integration trap into this problematic

comment:5 @scenic8 years ago

Andy, I agree that the filter is saving me from needing that line. But, embedded JS is a valid thing to do in the content. That replace function should only really happen when we are processing a feed. The global is_feed() (I think) method should just wrap that line at a minimum.

comment:6 @andy8 years ago

  • Keywords dev-feedback added

Checking is_feed does not get to the root of the problem, which is that we are trying to nest CDATA sections where we should not. I don't know enough about XML but I think we ought to stop using CDATA markers in feed templates and use a function that can intelligently wrap any content in CDATA as needed, removing any nested markers.

Is there really any reason to encode them rather than simply remove them?

So what I propose, and what should be seen by somebody with XML chops, is this unintelligent function:

function wp_cdata($data) {

Maybe test for conditions necessitating
CDATA markers and return if no need

$data = str_replace('<![CDATA[', , $data);
$data = str_replace(']]>',
, $data);

Sometimes a trailing square bracket
can mess up XML parsers:
if ( substr($data, -1) == ']' )

$data .= ' ';

return "<![CDATA[$data]]>';

}

comment:7 @andy8 years ago

function wp_cdata($data) {
 // Maybe test for conditions necessitating
 // CDATA markers and return if no need

 $data = str_replace('<![CDATA[', '', $data);
 $data = str_replace(']]>', '', $data);

 // Sometimes a trailing square bracket
 // can mess up XML parsers:
 if ( substr($data, -1) == ']' )
  $data .= ' ';

 return "<![CDATA[$data]]>';
}

comment:8 @tombarta8 years ago

Does wordpress support PHP < 4.0.5? If not, you could use arrays in str_replace:

$data = str_replace(array('<![CDATA[', ']]>'), array('', ''), $data);

comment:9 @Nazgul8 years ago

  • Milestone changed from 2.1.1 to 2.1.2

comment:10 @foolswisdom8 years ago

  • Milestone changed from 2.1.3 to 2.3

comment:11 @tellyworth8 years ago

Not sure how helpful this is, but it is kind of possible to nest CDATA sections. Technically I suppose it's double-encoding rather than nesting, but regardless, this is 100% valid and should work with any compliant XML parser:

<![CDATA foo <![CDATA bar ]]]><![CDATA[]> baz ]]>

It's actually two separate CDATA sections concatenated together.

When decoded by an XML parser, this becomes:

foo <![CDATA bar ]]> baz

Here's a PHP snippet to do it:

return '<![CDATA.str_replace(?]>', ']]]><![CDATA[]>', $str).']]>';

comment:12 follow-up: @tellyworth8 years ago

Whoops, here's that code again:

return '<![CDATA['.str_replace(']]>', ']]]><![CDATA[]>', $str).']]>';

comment:13 @markjaquith8 years ago

  • Milestone changed from 2.3 to 2.4 (next)

Any updates, Andy?

comment:14 @lloydbudd7 years ago

  • Milestone changed from 2.5 to 2.6

comment:15 @Nazgul7 years ago

  • Owner changed from anonymous to andy

comment:16 @AaronCampbell7 years ago

I'm still a little unclear as to the purpose of this line of code. I keep running into this problem, so I'd like to see a fix (I'd be happy to submit a patch), but I don't understand why we are keeping it at all. If it's just needed for feeds, lets limit it to affecting feeds. If someone could clearly explain to me why it's here (when it's needed), I'll put a patch together.

@AaronCampbell7 years ago

comment:17 @AaronCampbell7 years ago

  • Component changed from Administration to Template
  • Keywords has-patch needs-testing added
  • Milestone changed from 2.6 to 2.5.1
  • Priority changed from low to normal
  • Severity changed from normal to major

This ticket has sat here forever with nothing getting done, so I added a patch to get the ball rolling. The patch will fix two problems. It will fix one expressed above by only replacing ]]> with ]]&gt; when it is processing a feed.

There was another problem because we so often use:

//<![CDATA[
alert('test');
//]]>

and strip_tags() leaves the leading there, which looks ugly in the feeds as well. I added a simple statement in two places to handle that:
$text = preg_replace('|//\s*<!\[CDATA\[|', '<![CDATA[', $text);

comment:18 @ryan7 years ago

  • Milestone changed from 2.5.2 to 2.9

Milestone 2.5.2 deleted

comment:19 @Viper007Bond7 years ago

Bumpy bump.

comment:20 @Viper007Bond7 years ago

  • Milestone changed from 2.9 to 2.7

comment:21 @jacobsantos7 years ago

Seems to be a big enough issue that this ticket should go in.

comment:22 @ryan6 years ago

  • Milestone changed from 2.7 to 2.8

Postponing yet again. Patch looked like a good candidate for early 2.8 so we can get plenty of testing on it.

comment:23 @lloydbudd6 years ago

  • Keywords early added

comment:24 @Denis-de-Bernardy6 years ago

might be worth committing to give it a shot?

comment:25 follow-up: @Denis-de-Bernardy6 years ago

patch applies cleanly against today's trunk.

quick question, though: if the closing CDATA gets escaped, shouldn't the opening CDATA get escaped too?

comment:26 @Denis-de-Bernardy6 years ago

  • Keywords dev-feedback removed

let's try to get this into 2.9 early

comment:27 @Denis-de-Bernardy6 years ago

  • Milestone changed from 2.8 to 2.9

comment:28 @Denis-de-Bernardy6 years ago

  • Keywords needs-unit-tests added
  • Severity changed from major to normal

comment:29 in reply to: ↑ 25 @hakre6 years ago

Replying to Denis-de-Bernardy:

patch applies cleanly against today's trunk.

quick question, though: if the closing CDATA gets escaped, shouldn't the opening CDATA get escaped too?

For shure, everything else is pretty stupid.

comment:30 follow-up: @kirkpatrick5 years ago

  • Cc kirkpatrick added

The substitution of the right bracket in the CDATA closing,

$excerpt = str_replace(']]>', ']]&gt;', $excerpt);

occurs in a number of places in WordPress, all in wp-includes:

comment.php Line 1355


feed.php Line 191

formatting.php Line 1734

post-template.php Line 168

So this substitution seems to be desired for reasons other than "protecting feeds". Perhaps security against javascript-driven xml injection attacks? (It would be useful if the gods of WordPress would tell us the meaning of all this.)

Anyway, as it is, a page written by a plugin that creates javascript that includes strings of html will not validate as xhtml. Since we can (presumably) trust plugins, it should be ok to fix this: just reverse the order of lines 167 and 168 in post-template.php, so the filters (potentially set by plugins) are applied after the replacement:
Change

	$content = apply_filters('the_content', $content);
	$content = str_replace(']]>', ']]&gt;', $content);

to

	$content = str_replace(']]>', ']]&gt;', $content);
	$content = apply_filters('the_content', $content);

This way, any CDATA in the post (as stored in the database) will lose the CDATA, as seems to be desired, but the plugin can still operate properly and validate.

The same reversal should also be done (my application doesn't need these, but ...) in comment.php (line 1355 to above 1351) and formatting.php (exchange 1733 and 1734).

This is a minimal change, and I believe makes more sense than the present code. The post content is filtered, but then the plugin is allowed to do its work.

Of course, this fix does not meet the needs of the earlier change posters, who wish to post javascript hidden with CDATA. I don't think I support this for standard WordPress.

comment:31 in reply to: ↑ 30 @Denis-de-Bernardy5 years ago

Replying to kirkpatrick:

So this substitution seems to be desired for reasons other than "protecting feeds"?

Not according to the changeset that introduced it (r1039).

Here's what the file looked like way back then:

http://core.trac.wordpress.org/browser/trunk/wp-includes/template-functions-post.php?rev=1039

@Denis-de-Bernardy5 years ago

conditionally disable CDATA blocks

comment:32 @Denis-de-Bernardy5 years ago

  • Cc josephscott added

I've added a patch, but I'm not 100% that it won't break the Atom API. Can anyone give it a shot?

along the same line of bugs, there is #9992.

comment:33 @westi5 years ago

  • Cc westi added

Would it not be better to move the escaping of the closing CDATA into a function on the the_content filter which can then itself check is_feed() and also if necessary easily be removed by a plugin?

However as has been said before we don't want to change this too much close to release maybe we should just move the current code to a filter function for now but with no conditional check.


comment:34 @markjaquith5 years ago

Moving it to a filter sounds like an acceptable solution this close to RC.

@ryan5 years ago

Escape cdata close via filter on the_content

comment:35 @ryan5 years ago

Patch moves escaping to a filter, although I'm tempted to punt once again and make the decision in 3.0 to finally fix this thing correctly. If we only need this for feeds, do it in get_the_content_feed() and remove it from everywhere else.

comment:36 @markjaquith5 years ago

I'm okay with that too. We can wait until 3.0 and fix it once.

comment:37 @ryan5 years ago

  • Milestone changed from 2.9 to 3.0

comment:38 @hakre5 years ago

I just stumbeled over that line of code today and wondered a bit. Isn't there a proper way to escape the CDATA end sequence(here named close tag)? I think it is but I was not able to find the propper way and it's getting late.

comment:39 @hakre5 years ago

Replacement with &gt; should work with useragents (since about HTML 2.0, this is SGML related; this is 1996), so strictly spoken this bug-report is technically bogus. I do not want to speak against providing a patch here but anyone having this problem should check why this is creating problems properly before adding a fix by removing an upcomming filter or uncommenting the current line in code. It looks to me like the feeds useragent is not proplery following the standards.

comment:40 @hakre5 years ago

I suggest to close as invalid.

comment:41 @hakre5 years ago

As in #9992 tendencies are to name it what it is: an invalid report. Any objections?

comment:42 @Denis-de-Bernardy5 years ago

yes, one: for performance sake, is_feed() is faster than and uneeded call to str_replace(). :-)

comment:43 @shidouhikari5 years ago

  • Cc shidouhikari added
  • Milestone changed from 3.0 to 2.9.2
  • Priority changed from normal to high
  • Severity changed from normal to major

Why call it invalid report?

Plugins should be allowed to add javascript into posts, and to do so they need to wrap the code in CDATA to keep XHTML valid.

If it is being done just for RSS sake, do it only when the content is feeded at all.

2 solutions were suggested, 3 years passed, and the code is still there without anybody defending it. If it must be there somebody should point why. If not, move it to a filter with high hook priority and let each website owner / plugin developer deal with it.

@shidouhikari5 years ago

Moves filter to the end of the function, letting each Wordpress user deal with the situation without having to edit core files

comment:44 @shidouhikari5 years ago

With this patch, nothing changes if no filter is hooked, while anybody can take responsibility over his own sites for mystic consequences of leaving CDATA inside posts :)

If somebody decides to hook a filter that lets CDATA lives, and eventually faces a bug, then the cause will be the filter and not Wordpress core itself. Lets us have control of our code.

comment:45 @nacin5 years ago

  • Milestone changed from 2.9.3 to 3.0

comment:46 follow-up: @dd325 years ago

How about we move the Cdata striping to a function hooked on the filter in question?

We can set the priority to (for example) 100, any filters that require to run afterwards, hook later, all other hooks hook first and get their result affected later down.

Otherwise, due to lack of traction and clear solution here, and my lack of wanting to change the filter order (as it has been for a few releases) this late in the game.. this'd get pushed to the next milestone.

And now i read the higher comments, and see that rboren has suggested exactly this. rboren, what do you say, Commit or deal with in the next release?

comment:47 in reply to: ↑ 46 @aaroncampbell5 years ago

Replying to dd32:
Unfortunately, it's been so long that I can barely remember the issue. Reading my previous posts here, it seems like moving the CDATA stripping to a filter would be enough to fix it. I'd be fine with that solution.

comment:48 @technosailor5 years ago

Ryans patch still applies cleanly. Let's commit and close this ticket for once and for all and work down the 3.0 ticket load :)

comment:49 @shidouhikari5 years ago

  • Keywords needs-testing needs-unit-tests removed

I also don't see how Ryan's patch could break something. It just passed the hardcoded code to a filter. If nobody touches that filter it will remain doing exactally the same.

If a plugin uses a filter with higher priority than 20, he can just remove and add the filter again. Anybody that doesn't want CDATA being escaped can remove the filter and be responsible to anything bad it may cause on his site.

It's a perfect solution that will satisfy everybody.

comment:50 @shidouhikari5 years ago

I've been using Wordpress in 2 sites, in their production and in development environments, having this CDATA replacement totally removed. And in all of them I have a plugin that adds CDATA escaped JavaScript to the_content.

I never felt anything bad and don't miss it in anyway in my use.

comment:51 @nacin5 years ago

  • Milestone changed from 3.0 to 3.1

I would add the early keyword, but it's been there for 18 months. Let's actually handle this early 3.1.

comment:52 @shidouhikari5 years ago

(I'm not complaining, just asking to understand it.)

What blocks this ticket from being commited? Are the suggested solutions not secure? waiting for a metter one to emerge?

comment:53 @shidouhikari5 years ago

I've been using 2 Wordpress sites without this ]]> replace line, and they've been working great. No XHTML 1.1 error and no problem in feeds.

Comments feeds work great, I even have a plugin that tweaks the document and tested it a lot while developing and no problem, feedburner also showing my posts fine.

I have a plugin that adds JS on every link in posts and comments and they are working fine too.

comment:54 @nacin4 years ago

  • Milestone changed from Awaiting Triage to Future Release

comment:55 in reply to: ↑ 12 @andy4 years ago

Replying to tellyworth:

Whoops, here's that code again:

return '<![CDATA['.str_replace(']]>', ']]]><![CDATA[]>', $str).']]>';

This is correct. I have been running a high volume of posts through a strict XML parser and this is how I have been escaping CDATA sections. It splits the closing tag between two adjacent CDATA sections.

@nacin4 years ago

Introduce escape_cdata_close()

comment:56 @nacin4 years ago

  • Keywords early removed

Attached patch introduces escape_cdata_close() per comments by andy and tellyworth.

This could probably be moved to a filter as well, the way ryan's patch was implemented.

@andy4 years ago

Remove incorrect entity encoding of ]]> and introduce cdata functions for filters.

comment:57 @andy4 years ago

  • Priority changed from high to normal
  • Severity changed from major to minor

CDATA wrappers should be removed from all templates and applied via filters using new functions. There are some complications.

A quick search for 'the_title_rss' (e.g.) finds that filter tag used in wp-admin/includes/export.php, which reminds that we use feed-related functions for more than just feeds. Any changes to feed-related filters must be tested with export/import.

The same search also finds the template tag the_title_rss() in feed-rss2.php and feed-atom.php, but only CDATA-wrapped in the latter. Universal use of CDATA tags to guarantee XML validity is of dubious merit. However, if we are going to do it we should be consistent.

3670-I.diff is the first half of the fix. It will break some feeds (rare) so it should not be applied by itself. The second half, installing new CDATA filters, touches a lot of code. Please volunteer.

comment:58 @lgedeon3 years ago

  • Cc luke.gedeon@… added

comment:59 @mpressly18 months ago

It's been 3 years since this bug was updated. On two separate websites, I am currently having to modify the core to allow inclusion of Javascript after each Wordpress upgrade due to this issue. Could the code in question safely be moved to a filter so that it can be overridden as needed without having to modify the core?

comment:60 @yitwail17 months ago

I had to take the approach of filtering plugin output to remove javascript with embedded CDATA tags, and make it an external javascript file I'm enqueing. This creates extra code, an extra server request, and possible issues if the plugin is updated, but seems safer than updating the core. I sincerely hope someone sees fit to addressing this long-standing issue.

comment:61 @nacin15 months ago

  • Component changed from Template to Posts, Post Types
  • Focuses template added

comment:62 in reply to: ↑ description @awelzel15 months ago

Replying to scenic:

I also suggest to make this a FILTER for RSS!

It is just stupid to do this replacement without a filter. Why?

I will open a new ticket for 3.8.1 - hopefully someone will notice it then.

comment:63 @SergeyBiryukov15 months ago

#27134 was marked as a duplicate.

comment:64 @SergeyBiryukov15 months ago

  • Milestone changed from Future Release to 3.9

comment:65 @nacin13 months ago

  • Keywords needs-refresh needs-unit-tests added
  • Milestone changed from 3.9 to Future Release

This needs to be moved again. I think 3670-I.diff is good, but we need to refresh it, write unit tests, and confirm it's the proper fix.

comment:66 @alekv13 months ago

Hi

I very much would appreciate if you could sort out this bug. I've developed a plugin for WooCommerce and AdWords which needs to insert the AdWords tracking code into the /checkout/order-received/ page. The plugin is getting more and more popular. Unfortunately the AdWords script uses the CDATA tags and breaks because of this bug. I have a workaround in place but would prefer a proper solution.

http://wordpress.org/plugins/woocommerce-google-adwords-conversion-tracking-tag/

Thanks and kind regards
Aleks

comment:67 @iapassas13 months ago

Hello,
I am facing the same problem trying to embed a google charts java script in a post.
Interesting though the > is not changed all the time.
1) While embedding the following code is working properly

<script type="text/javascript" src="https://www.google.com/jsapi"></script>
<script type="text/javascript"><!--//--><![CDATA[//><!--
     google.load('visualization', '1', {'packages': ['geochart']});
     google.setOnLoadCallback(drawRegionsMap);
      function drawRegionsMap() {
        var data = google.visualization.arrayToDataTable([
          ['Country', 'Popularity'],
          ['Germany', 200],
          ['United States', 300],
          ['Brazil', 400],
          ['Canada', 500],
          ['France', 600],
          ['RU', 700]
        ]);
        var options = {};
        var chart = new google.visualization.GeoChart(document.getElementById('chart_div'));
        chart.draw(data, options);
    };
 //--><!]]></script>
<div id="chart_div" style="width: 900px; height: 500px;"></div>

2)There is an issue in the following

<script type="text/javascript" src="//ajax.googleapis.com/ajax/static/modules/gviz/1.0/chart.js"><!--//--><![CDATA[//><!--
{"dataSourceUrl":"//docs.google.com/spreadsheet/tq?key=0Au97PknsuEKSdDAxWXktZTJKUUJZR2Fmci13UnFUMHc&transpose=1&headers=0&range=A1%3AC2&gid=1&pub=1","options":{"titleTextStyle":{"bold":true,"color":"#000","fontSize":"12"},"vAxes":[{"title":"Left vertical axis title","useFormatFromData":true,"minValue":null,"viewWindow":{"min":null,"max":null},"maxValue":null},{"useFormatFromData":true,"minValue":null,"viewWindow":{"min":null,"max":null},"maxValue":null}],"pieHole":0,"title":"\u03a0\u03cc\u03c3\u03bf \u03c3\u03b7\u03bc\u03b1\u03bd\u03c4\u03b9\u03ba\u03cc \u03c1\u03cc\u03bb\u03bf \u03c0\u03b1\u03af\u03b6\u03b5\u03b9 \u03ba\u03b1\u03c4\u03ac \u03c4\u03b7 \u03b3\u03bd\u03ce\u03bc\u03b7 \u03c3\u03b1\u03c2, \u03b7 \u03c7\u03c1\u03ae\u03c3\u03b7 \u03c0\u03c1\u03c9\u03c4\u03bf\u03b3\u03b5\u03bd\u03ce\u03bd \u03b4\u03b5\u03b4\u03bf\u03bc\u03ad\u03bd\u03c9\u03bd \u03c9\u03c2 \u03ba\u03cd\u03c1\u03b9\u03b1 \u03c0\u03b7\u03b3\u03ae \u03c3\u03c4\u03b7 \u03b4\u03b7\u03bc\u03bf\u03c3\u03b9\u03bf\u03b3\u03c1\u03b1\u03c6\u03af\u03b1:","booleanRole":"certainty","height":371,"animation":{"duration":0},"colors":["#3366CC","#DC3912","#FF9900","#109618","#990099","#0099C6","#DD4477","#66AA00","#B82E2E","#316395","#994499","#22AA99","#AAAA11","#6633CC","#E67300","#8B0707","#651067","#329262","#5574A6","#3B3EAC","#B77322","#16D620","#B91383","#F4359E","#9C5935","#A9C413","#2A778D","#668D1C","#BEA413","#0C5922","#743411"],"width":600,"is3D":false,"hAxis":{"title":"Horizontal axis title","useFormatFromData":true,"minValue":null,"viewWindow":{"min":null,"max":null},"maxValue":null},"tooltip":{}},"state":{},"view":{},"isDefaultVisualization":false,"chartType":"PieChart","chartName":"Chart 2"}
//--><!]]></script>

I hope the above might be a help and resolve the issue.

Thank you for handling this issue

comment:68 follow-up: @awelzel8 months ago

Well - now WordPress 4.0 is released... and still no fix for this annoying bug :-(

Any chance, that his will ever get fixed so users don't have to modify the core code with every update?

What needs to be done? Need any help?

comment:69 in reply to: ↑ 68 @helen8 months ago

Replying to awelzel:

What needs to be done? Need any help?

Sure! Per nacin:

I think 3670-I.diff​ is good, but we need to refresh it, write unit tests, and confirm it's the proper fix.

comment:70 @awelzel8 months ago

Seems I have to consult https://make.wordpress.org/core/handbook/ and then http://make.wordpress.org/core/handbook/automated-testing/#writing-tests first ;-)

But the diff alone is indeed not a fix, since wp_cdata() as a replacement for the str_replace() does not get used anywhere.

comment:71 @xoogu5 months ago

Why hasn't this been fixed yet? Just change it to a filter that is added to the_content, then the user can disable it if they don't want it.

comment:72 follow-ups: @awelzel5 months ago

I have to admit that I didn't have time to get into WordPress core coding yet - too many other projects - sorry.

But anyway - until there is a solution - for those who don't want to fiddle around with the core code with every update: You can use the following workaround in the templates functions.php or a plugin.

If there is an intentional "]]&gt;" somewhere in a post this may lead to invalid HTML in rare cases - although a "]]>" on its own as part of a page or post is not invalid.

function cdata_markupfix($content) {
    $content = str_replace("]]&gt;", "]]>", $content);
    
    return $content;
}

function cdata_template_redirect( $content ) {
	ob_start('cdata_markupfix');
}

add_action('template_redirect','cdata_template_redirect',-1);

You can see this live on one of my pages where I embed external scripts (using my own plugin) to generate JavaScript stuff and which validates fine: http://arnowelzel.de/wp/en/tools/statistics

comment:73 in reply to: ↑ 72 ; follow-up: @truebird5 months ago

@awelzel

If this would work it would be awesome. Thanks so much for posting it. Mine is slightly different:
$content = str_replace( ']]>', ']]&gt;', $content );

Would I just have to change that line in your code to make it work (or anything else as well)? Also, I am on a multisite if that makes a difference (and the site in question is a subsite of course). Any adjustment for a multisite (or the same)?

I have a child theme as well (I believe that is ok as well)...any adjustments in that regard?

:)

Replying to awelzel:

I have to admit that I didn't have time to get into WordPress core coding yet - too many other projects - sorry.

But anyway - until there is a solution - for those who don't want to fiddle around with the core code with every update: You can use the following workaround in the templates functions.php or a plugin.

If there is an intentional "]]&gt;" somewhere in a post this may lead to invalid HTML in rare cases - although a "]]>" on its own as part of a page or post is not invalid.

function cdata_markupfix($content) {
    $content = str_replace("]]&gt;", "]]>", $content);
    
    return $content;
}

function cdata_template_redirect( $content ) {
	ob_start('cdata_markupfix');
}

add_action('template_redirect','cdata_template_redirect',-1);

You can see this live on one of my pages where I embed external scripts (using my own plugin) to generate JavaScript stuff and which validates fine: http://arnowelzel.de/wp/en/tools/statistics

Last edited 5 months ago by truebird (previous) (diff)

comment:74 in reply to: ↑ 73 ; follow-up: @awelzel5 months ago

Replying to truebird:

@awelzel

If this would work it would be awesome. Thanks so much for posting it. Mine is slightly different:
$content = str_replace( ']]>', ']]&gt;', $content );

This is not only "slightly different" - it's in fact the opposite way and exactly the same, what WordPress already does.

Would I just have to change that line in your code to make it work (or anything else as well)? Also, I am on a multisite if that makes a difference (and the site in question is a subsite of course). Any adjustment for a multisite (or the same)?

I have to admit: I don't get it. To get it right: Your code already does, what WordPress does. You ask me, if changing my code I proposed to your code, that would "make it work"? Why?

In short: Just use my code as it is and try it out on your own. I don't know what other modifications you already have done in your themes and which plugins you use.

I have a child theme as well (I believe that is ok as well)...any adjustments in that regard?

Nope. As I understand child themes, all functions from the main theme are still active, but you can also add additional stiff in the child theme.

comment:75 in reply to: ↑ 74 ; follow-up: @truebird5 months ago

Hey @awelzel

I did try it (in its original form) and was getting no change. I thought my installation may have had a slight variety. Hence, the question.

I will try it again and let you know what happens . . . again thanks for your contribution.

:)

Replying to awelzel:

Replying to truebird:

@awelzel

If this would work it would be awesome. Thanks so much for posting it. Mine is slightly different:
$content = str_replace( ']]>', ']]&gt;', $content );

This is not only "slightly different" - it's in fact the opposite way and exactly the same, what WordPress already does.

Would I just have to change that line in your code to make it work (or anything else as well)? Also, I am on a multisite if that makes a difference (and the site in question is a subsite of course). Any adjustment for a multisite (or the same)?

I have to admit: I don't get it. To get it right: Your code already does, what WordPress does. You ask me, if changing my code I proposed to your code, that would "make it work"? Why?

In short: Just use my code as it is and try it out on your own. I don't know what other modifications you already have done in your themes and which plugins you use.

I have a child theme as well (I believe that is ok as well)...any adjustments in that regard?

Nope. As I understand child themes, all functions from the main theme are still active, but you can also add additional stiff in the child theme.

comment:76 in reply to: ↑ 75 ; follow-up: @alekv5 months ago

Hi everyone

I can confirm that awelzel's fix works fine in my plugin.

Thanks awelzel !

Replying to truebird:

Hey @awelzel

I did try it (in its original form) and was getting no change. I thought my installation may have had a slight variety. Hence, the question.

I will try it again and let you know what happens . . . again thanks for your contribution.

:)

Replying to awelzel:

Replying to truebird:

@awelzel

If this would work it would be awesome. Thanks so much for posting it. Mine is slightly different:
$content = str_replace( ']]>', ']]&gt;', $content );

This is not only "slightly different" - it's in fact the opposite way and exactly the same, what WordPress already does.

Would I just have to change that line in your code to make it work (or anything else as well)? Also, I am on a multisite if that makes a difference (and the site in question is a subsite of course). Any adjustment for a multisite (or the same)?

I have to admit: I don't get it. To get it right: Your code already does, what WordPress does. You ask me, if changing my code I proposed to your code, that would "make it work"? Why?

In short: Just use my code as it is and try it out on your own. I don't know what other modifications you already have done in your themes and which plugins you use.

I have a child theme as well (I believe that is ok as well)...any adjustments in that regard?

Nope. As I understand child themes, all functions from the main theme are still active, but you can also add additional stiff in the child theme.

comment:77 in reply to: ↑ 72 @alekv5 months ago

Works. Thanks!

Replying to awelzel:

I have to admit that I didn't have time to get into WordPress core coding yet - too many other projects - sorry.

But anyway - until there is a solution - for those who don't want to fiddle around with the core code with every update: You can use the following workaround in the templates functions.php or a plugin.

If there is an intentional "]]&gt;" somewhere in a post this may lead to invalid HTML in rare cases - although a "]]>" on its own as part of a page or post is not invalid.

function cdata_markupfix($content) {
    $content = str_replace("]]&gt;", "]]>", $content);
    
    return $content;
}

function cdata_template_redirect( $content ) {
	ob_start('cdata_markupfix');
}

add_action('template_redirect','cdata_template_redirect',-1);

You can see this live on one of my pages where I embed external scripts (using my own plugin) to generate JavaScript stuff and which validates fine: http://arnowelzel.de/wp/en/tools/statistics

comment:78 in reply to: ↑ 76 ; follow-ups: @truebird5 months ago

@alekv

Sweet. Great. Do you run a single WP installation or a multisite?

I will get back to this shortly (will report back on a multisite install).

:)

Replying to alekv:

Hi everyone

I can confirm that awelzel's fix works fine in my plugin.

Thanks awelzel !

Replying to truebird:

Hey @awelzel

I did try it (in its original form) and was getting no change. I thought my installation may have had a slight variety. Hence, the question.

I will try it again and let you know what happens . . . again thanks for your contribution.

:)

Replying to awelzel:

Replying to truebird:

@awelzel

If this would work it would be awesome. Thanks so much for posting it. Mine is slightly different:
$content = str_replace( ']]>', ']]&gt;', $content );

This is not only "slightly different" - it's in fact the opposite way and exactly the same, what WordPress already does.

Would I just have to change that line in your code to make it work (or anything else as well)? Also, I am on a multisite if that makes a difference (and the site in question is a subsite of course). Any adjustment for a multisite (or the same)?

I have to admit: I don't get it. To get it right: Your code already does, what WordPress does. You ask me, if changing my code I proposed to your code, that would "make it work"? Why?

In short: Just use my code as it is and try it out on your own. I don't know what other modifications you already have done in your themes and which plugins you use.

I have a child theme as well (I believe that is ok as well)...any adjustments in that regard?

Nope. As I understand child themes, all functions from the main theme are still active, but you can also add additional stiff in the child theme.

comment:79 in reply to: ↑ 78 @alekv5 months ago

@truebird: single site

Replying to truebird:

@alekv

Sweet. Great. Do you run a single WP installation or a multisite?

I will get back to this shortly (will report back on a multisite install).

:)

Replying to alekv:

Hi everyone

I can confirm that awelzel's fix works fine in my plugin.

Thanks awelzel !

Replying to truebird:

Hey @awelzel

I did try it (in its original form) and was getting no change. I thought my installation may have had a slight variety. Hence, the question.

I will try it again and let you know what happens . . . again thanks for your contribution.

:)

Replying to awelzel:

Replying to truebird:

@awelzel

If this would work it would be awesome. Thanks so much for posting it. Mine is slightly different:
$content = str_replace( ']]>', ']]&gt;', $content );

This is not only "slightly different" - it's in fact the opposite way and exactly the same, what WordPress already does.

Would I just have to change that line in your code to make it work (or anything else as well)? Also, I am on a multisite if that makes a difference (and the site in question is a subsite of course). Any adjustment for a multisite (or the same)?

I have to admit: I don't get it. To get it right: Your code already does, what WordPress does. You ask me, if changing my code I proposed to your code, that would "make it work"? Why?

In short: Just use my code as it is and try it out on your own. I don't know what other modifications you already have done in your themes and which plugins you use.

I have a child theme as well (I believe that is ok as well)...any adjustments in that regard?

Nope. As I understand child themes, all functions from the main theme are still active, but you can also add additional stiff in the child theme.

comment:80 in reply to: ↑ 78 @awelzel4 months ago

Replying to truebird:

@alekv

Sweet. Great. Do you run a single WP installation or a multisite?

I will get back to this shortly (will report back on a multisite install).

:)

JFTR: I also run a multisite installation to handle multi language content.

comment:81 @truebird4 months ago

@awelzel and @alekv

Just reporting back...for me I can not get it to work so far. I will keep trying. I am sure it is some setting or something obvious I am missing if it is working for others.

I will come back here later after more messing around with it.

If you have any other tips or thoughts that would be great . . .

If not, don't worry about posting back a response (just letting you know my status on it) . . .

:)

comment:82 @awelzel4 months ago

@truebird: Most problems are caused by plugins which add additional filters or try to change the default filter chain of WordPress. I recommend setting up a plain WordPress setup in a separate test environment without any plugins at all. Then add the plugins you currently use - one by one - and check the results.

comment:83 @truebird4 months ago

@awelzel

Agreed. I am going to do just that...have a great one...thanks...

comment:84 @SergeyBiryukov4 months ago

#30926 was marked as a duplicate.

comment:85 follow-up: @truebird3 months ago

@awelzel @sergeyBiryukov @alekv and ALL others wanting a up to date current and clean resolution to this 8 year old issue with CDATA errors. I have been meaning to post this for a while now for everyone (just getting around to it) . . . smile.

I spoke extensively on the phone with Google developers (this took a lot of patience and time to get to the right person). I took him to this particular thread and asked if he could give a final current resolution to this that is as clean as possible.

He said he was aware of this issue with WordPress and that the full code that is generated at Google is NOT necessary for Wordpress installs. He explained to REMOVE the part of the code manually that Wordpress installs do not need and it would resolve the issue cleanly.

Here is what he explained to do with the code you get from Google for conversion tracking before placing into Wordpress (this worked perfectly):

First I will give you a sample of the full code for conversions one might get from Google and next I will explain the part you must keep and then what to add to it . . .

Sample original:

<script type="text/javascript">// < ![CDATA[
// < ![CDATA[
// < ![CDATA[
var google_conversion_id = 1111891111; var google_conversion_language = "en"; var google_conversion_format = "3"; var google_conversion_color = "ffffff"; var google_conversion_label = "TlwoCMCEoVgQy7m15pp"; var google_conversion_value = 67.00; var google_conversion_currency = "USD"; var google_remarketing_only = false;
// ]]></script>
<script src="//www.googleadservices.com/pagead/conversion.js" type="text/javascript">// < ![CDATA[
// < ![CDATA[
// < ![CDATA[

// ]]></script>
<noscript>

Sample of what to keep (other parts you delete) and what to add to it to FIX THIS ONCE AND FOR ALL and/or until it is fixed directly in the Wordpress core someday:

<div style="display: inline;"><img style="border-style: none;" src="https://www.googleadservices.com/pagead/conversion/11115891111/?value=67.00&amp;currency_code=USD&amp;label=TlwoCMCEoVgQy7m15pp&amp;guid=ON&amp;script=0" alt="" width="1" height="1" /></div>

*NOTE: There is only one thing to add and that is the "https:" BEFORE the

//

two slashes and www.googleadservices.com . . .

I really hope this help a lot of people with this issue!!!

:)

Greg

Last edited 3 months ago by truebird (previous) (diff)

comment:86 in reply to: ↑ 85 @truebird3 months ago

*First add https: and use it as a first choice for sure (but if you have issues with that then give http: as shot (see below)...

*Additional info: You may have to add http: instead of https: (one of the two will give you a "OK" or blue confirmation that the code is working properly with the Google Tag Assistant extension for Chrome so you can confirm it is working for you correctly. Here is the extension to use to check your work:

https://chrome.google.com/webstore/detail/tag-assistant-by-google/kejbdjndbnbjgmefkgdddjlbokphdefk?hl=en

NOTE: There is only one thing to add and that is the "http:" OR "https:" BEFORE the

//

two slashes and www.googleadservices.com . . .
I really hope this help a lot of people with this issue!!!

Last edited 3 months ago by truebird (previous) (diff)

@mdgl3 months ago

Refreshed and enhanced patch

comment:87 @mdgl3 months ago

  • Keywords needs-refresh removed

This is a long standing problem that appears to have caused much frustration. Let's take another look.

Problem Statement

When generating the XML output for feeds (RDF, RSS, RSS2, Atom), certain data fields are wrapped within CDATA blocks to make sure that they remain valid XML. This may be unnecessary in some cases but works fine unless the content of the data field happens to include the CDATA block terminator string ']]>'. In this situation, we need to "escape" that data to avoid an XML parsing error. WordPress presently does this in a "belt and braces" way by automatically substituting the string ']]>' by ']]&gt;' within the main post content data field. There are a number of problems with this approach:

  • the substitution is always applied, regardless of whether the output is being produced for an XML CDATA field. This effectively mangles the normal output of post content in many situations.
  • the substitution is applied only to the main post content data field (i.e. that returned by "the_content()"). Other data fields that are wrapped within CDATA blocks are not similarly protected.

The most common report is that this behaviour makes it difficult to include JavaScript code within post content whilst retaining XML/XHTML compliance. For example, the TinyMCE editor automatically wraps code within <script> tags in a CDATA block when you switch between the "Text" and "Visual" views, which will then be mangled on output. Although modern browsers generally seem to cope with the missing CDATA terminator (parsing as HTML), other XML processors and validators might not be so forgiving.

With the rise of HTML5, interest in XHTML is doubtless waning, but modifying post content in this way is pretty egregious behaviour and probably still worth fixing.

Solution Overview

According to Wikipedia (http://en.wikipedia.org/wiki/CDATA), the preferred way to encode CDATA sections that include the string ']]>' is to create multiple concatenated CDATA blocks by substituting the string by ']]]]><![CDATA[>'. That is, we forcibly end the first CDATA block after the ']]' and then include the missing '>' at the beginning of the next CDATA block.

This substitution should only occur if we are preparing the output for inclusion in an XML CDATA block. The substitution should, however be applied to whatever data fields we are processing.

Interestingly, the current WordPress export tool (which generates XML/RSS2 output) appears to implement this correctly, applying the function wxr_cdata() to (almost) all data fields, as follows:

function wxr_cdata( $str ) {
   if ( seems_utf8( $str ) == false )
      $str = utf8_encode( $str );
   $str = '<![CDATA[' . str_replace( ']]>', ']]]]><![CDATA[>', $str ) . ']]>';
   return $str;
}

The updated patch applies the same approach when generating the XML output for feeds. Filters could be applied to the functions used to generated feed output, but this elegant solution might introduce backwards-compatibility issues if developers/users have made use of these functions themselves. Instead, the feed generation code has been modified to encode the values explicitly.

The new encoding function has been applied to just those locations dealing with the main post content data field. This includes excerpts since these may be automatically derived from the post content and so a new function get_the_excerpt_rss() has been added to support the retrieval rather than display of excerpts.

The changes in the updated patch are sufficient to fix the bug that is the subject of this ticket, but future work could usefully apply the same or similar XML encoding technique to other data fields as well (see below).

Unlike the earlier and partial patch, it did not seem appropriate to modify deprecated function the_content_rss(), again for reasons of backwards compatibility.

Feed Reader Compability

The encoding in the updated patch has been tested and appears to work correctly with Internet Explorer 11, Opera 27 and Firefox 35 as well as with XMLSpy.

Future Work

The updated patch creates a new function _esc_xml_cdata() which as the name suggests is intended to form part of a future and more powerful esc_xml() abstraction supported by other functions for dealing with XML encoding. In general our handling of XML encoding is very inconsistent and could benefit from a more wholesale clean-up. There are a number of subtleties to XML encoding and is actually quite a tricky problem to address.

Last edited 3 months ago by mdgl (previous) (diff)
Note: See TracTickets for help on using tickets.