Make WordPress Core

Opened 17 years ago

Last modified 14 months ago

#3670 new defect (bug)

Removing CDATA close tag ( ]]> ) unbalances the CDATA block

Reported by: scenic's profile scenic Owned by: andy's profile andy
Milestone: Future Release Priority: normal
Severity: minor Version: 2.1
Component: Posts, Post Types Keywords: has-patch needs-unit-tests
Focuses: template Cc:

Description

I'm
not sure if this is a bug per se, but it breaks included JS in my
posts that are escaped with a CDATA section. I've basically commented
out the following line in the_content() every time I upgrade:

$content = str_replace(']]>', ']]>', $content);

A user on the wp-testers list indicated that this was part of making sure that included CDATA blocks didn't break RSS feeds. I don't use CDATA sections directly in my RSS feeds, so I haven't tested this. In my particular case, the JS is used to embed Flash movies (e.g. YouTube videos) in an XHTML compliant way (without embed tags). I have a custom plugin I've written that strips out the JS and replaces it with the embed tag in the RSS feed.

Perhaps we should use a flag to activate this when the app is going through a feed. Ideally, though, it would simply be removed. Odds are that the CDATA block is being used for a JS block in a post body, and since most RSS aggregators don't allow JS by default, it would be safe to simply remove CDATA blogs if is_feed() is set.

Attachments (7)

3670.001.diff (1.6 KB) - added by AaronCampbell 16 years ago.
3670.diff (665 bytes) - added by Denis-de-Bernardy 14 years ago.
conditionally disable CDATA blocks
3670.2.diff (2.8 KB) - added by ryan 14 years ago.
Escape cdata close via filter on the_content
post-template.php.diff (544 bytes) - added by shidouhikari 14 years ago.
Moves filter to the end of the function, letting each Wordpress user deal with the situation without having to edit core files
3670.3.diff (2.9 KB) - added by nacin 13 years ago.
Introduce escape_cdata_close()
3670-I.diff (2.9 KB) - added by andy 13 years ago.
Remove incorrect entity encoding of ]]> and introduce cdata functions for filters.
3670.4.diff (6.7 KB) - added by mdgl 9 years ago.
Refreshed and enhanced patch

Download all attachments as: .zip

Change History (105)

#1 @scenic
17 years ago

  • Version set to 2.1

#2 @filosofo
17 years ago

Couldn't we just make this a filter that is applied for RSS feeds but not e.g. admins, like kses?

#3 @andy
17 years ago

I was the one who pointed out the reason for that line. The JS-replacement plugin is probably saving your feeds from the problem that line was added to fix.

I think we can come up with a better solution for this. If there is a good reason to wrap some of your post content as CDATA, perhaps you would be better served by adding a filter to your plugin to wrap the_content as CDATA.

Has anyone else run into this?

#4 @alexrabe
17 years ago

I would also appreciate an update to make difference between feed / content, my plugin has exact the same problem, cause when I want the XHTML validation, I need the CDATA implementation, cause the "socalled" flashvars for a Flash video plugin are diveded by ampersand's. At the moment it's possible to use <!-- around CDATA to covers the CDATA problematic in the conent, but it seems to be deprecate from W3C.

So more and More Flash integration trap into this problematic

#5 @scenic
17 years ago

Andy, I agree that the filter is saving me from needing that line. But, embedded JS is a valid thing to do in the content. That replace function should only really happen when we are processing a feed. The global is_feed() (I think) method should just wrap that line at a minimum.

#6 @andy
17 years ago

  • Keywords dev-feedback added

Checking is_feed does not get to the root of the problem, which is that we are trying to nest CDATA sections where we should not. I don't know enough about XML but I think we ought to stop using CDATA markers in feed templates and use a function that can intelligently wrap any content in CDATA as needed, removing any nested markers.

Is there really any reason to encode them rather than simply remove them?

So what I propose, and what should be seen by somebody with XML chops, is this unintelligent function:

function wp_cdata($data) {

Maybe test for conditions necessitating
CDATA markers and return if no need

$data = str_replace('<![CDATA[', , $data);
$data = str_replace(']]>',
, $data);

Sometimes a trailing square bracket
can mess up XML parsers:
if ( substr($data, -1) == ']' )

$data .= ' ';

return "<![CDATA[$data]]>';

}

#7 @andy
17 years ago

function wp_cdata($data) {
 // Maybe test for conditions necessitating
 // CDATA markers and return if no need

 $data = str_replace('<![CDATA[', '', $data);
 $data = str_replace(']]>', '', $data);

 // Sometimes a trailing square bracket
 // can mess up XML parsers:
 if ( substr($data, -1) == ']' )
  $data .= ' ';

 return "<![CDATA[$data]]>';
}

#8 @tombarta
17 years ago

Does wordpress support PHP < 4.0.5? If not, you could use arrays in str_replace:

$data = str_replace(array('<![CDATA[', ']]>'), array('', ''), $data);

#9 @Nazgul
17 years ago

  • Milestone changed from 2.1.1 to 2.1.2

#10 @foolswisdom
17 years ago

  • Milestone changed from 2.1.3 to 2.3

#11 @tellyworth
17 years ago

Not sure how helpful this is, but it is kind of possible to nest CDATA sections. Technically I suppose it's double-encoding rather than nesting, but regardless, this is 100% valid and should work with any compliant XML parser:

<![CDATA foo <![CDATA bar ]]]><![CDATA[]> baz ]]>

It's actually two separate CDATA sections concatenated together.

When decoded by an XML parser, this becomes:

foo <![CDATA bar ]]> baz

Here's a PHP snippet to do it:

return '<![CDATA.str_replace(?]>', ']]]><![CDATA[]>', $str).']]>';

#12 follow-up: @tellyworth
17 years ago

Whoops, here's that code again:

return '<![CDATA['.str_replace(']]>', ']]]><![CDATA[]>', $str).']]>';

#13 @markjaquith
17 years ago

  • Milestone changed from 2.3 to 2.4 (next)

Any updates, Andy?

#14 @lloydbudd
16 years ago

  • Milestone changed from 2.5 to 2.6

#15 @Nazgul
16 years ago

  • Owner changed from anonymous to andy

#16 @AaronCampbell
16 years ago

I'm still a little unclear as to the purpose of this line of code. I keep running into this problem, so I'd like to see a fix (I'd be happy to submit a patch), but I don't understand why we are keeping it at all. If it's just needed for feeds, lets limit it to affecting feeds. If someone could clearly explain to me why it's here (when it's needed), I'll put a patch together.

#17 @AaronCampbell
16 years ago

  • Component changed from Administration to Template
  • Keywords has-patch needs-testing added
  • Milestone changed from 2.6 to 2.5.1
  • Priority changed from low to normal
  • Severity changed from normal to major

This ticket has sat here forever with nothing getting done, so I added a patch to get the ball rolling. The patch will fix two problems. It will fix one expressed above by only replacing ]]> with ]]&gt; when it is processing a feed.

There was another problem because we so often use:

//<![CDATA[
alert('test');
//]]>

and strip_tags() leaves the leading there, which looks ugly in the feeds as well. I added a simple statement in two places to handle that:
$text = preg_replace('|//\s*<!\[CDATA\[|', '<![CDATA[', $text);

#18 @ryan
16 years ago

  • Milestone changed from 2.5.2 to 2.9

Milestone 2.5.2 deleted

#19 @Viper007Bond
16 years ago

Bumpy bump.

#20 @Viper007Bond
16 years ago

  • Milestone changed from 2.9 to 2.7

#21 @jacobsantos
15 years ago

Seems to be a big enough issue that this ticket should go in.

#22 @ryan
15 years ago

  • Milestone changed from 2.7 to 2.8

Postponing yet again. Patch looked like a good candidate for early 2.8 so we can get plenty of testing on it.

#23 @lloydbudd
15 years ago

  • Keywords early added

#24 @Denis-de-Bernardy
15 years ago

might be worth committing to give it a shot?

#25 follow-up: @Denis-de-Bernardy
15 years ago

patch applies cleanly against today's trunk.

quick question, though: if the closing CDATA gets escaped, shouldn't the opening CDATA get escaped too?

#26 @Denis-de-Bernardy
15 years ago

  • Keywords dev-feedback removed

let's try to get this into 2.9 early

#27 @Denis-de-Bernardy
15 years ago

  • Milestone changed from 2.8 to 2.9

#28 @Denis-de-Bernardy
15 years ago

  • Keywords needs-unit-tests added
  • Severity changed from major to normal

#29 in reply to: ↑ 25 @hakre
15 years ago

Replying to Denis-de-Bernardy:

patch applies cleanly against today's trunk.

quick question, though: if the closing CDATA gets escaped, shouldn't the opening CDATA get escaped too?

For shure, everything else is pretty stupid.

#30 follow-up: @kirkpatrick
14 years ago

  • Cc kirkpatrick added

The substitution of the right bracket in the CDATA closing,

$excerpt = str_replace(']]>', ']]&gt;', $excerpt);

occurs in a number of places in WordPress, all in wp-includes:

comment.php Line 1355


feed.php Line 191

formatting.php Line 1734

post-template.php Line 168

So this substitution seems to be desired for reasons other than "protecting feeds". Perhaps security against javascript-driven xml injection attacks? (It would be useful if the gods of WordPress would tell us the meaning of all this.)

Anyway, as it is, a page written by a plugin that creates javascript that includes strings of html will not validate as xhtml. Since we can (presumably) trust plugins, it should be ok to fix this: just reverse the order of lines 167 and 168 in post-template.php, so the filters (potentially set by plugins) are applied after the replacement:
Change

	$content = apply_filters('the_content', $content);
	$content = str_replace(']]>', ']]&gt;', $content);

to

	$content = str_replace(']]>', ']]&gt;', $content);
	$content = apply_filters('the_content', $content);

This way, any CDATA in the post (as stored in the database) will lose the CDATA, as seems to be desired, but the plugin can still operate properly and validate.

The same reversal should also be done (my application doesn't need these, but ...) in comment.php (line 1355 to above 1351) and formatting.php (exchange 1733 and 1734).

This is a minimal change, and I believe makes more sense than the present code. The post content is filtered, but then the plugin is allowed to do its work.

Of course, this fix does not meet the needs of the earlier change posters, who wish to post javascript hidden with CDATA. I don't think I support this for standard WordPress.

#31 in reply to: ↑ 30 @Denis-de-Bernardy
14 years ago

Replying to kirkpatrick:

So this substitution seems to be desired for reasons other than "protecting feeds"?

Not according to the changeset that introduced it (r1039).

Here's what the file looked like way back then:

http://core.trac.wordpress.org/browser/trunk/wp-includes/template-functions-post.php?rev=1039

@Denis-de-Bernardy
14 years ago

conditionally disable CDATA blocks

#32 @Denis-de-Bernardy
14 years ago

  • Cc josephscott added

I've added a patch, but I'm not 100% that it won't break the Atom API. Can anyone give it a shot?

along the same line of bugs, there is #9992.

#33 @westi
14 years ago

  • Cc westi added

Would it not be better to move the escaping of the closing CDATA into a function on the the_content filter which can then itself check is_feed() and also if necessary easily be removed by a plugin?

However as has been said before we don't want to change this too much close to release maybe we should just move the current code to a filter function for now but with no conditional check.


#34 @markjaquith
14 years ago

Moving it to a filter sounds like an acceptable solution this close to RC.

@ryan
14 years ago

Escape cdata close via filter on the_content

#35 @ryan
14 years ago

Patch moves escaping to a filter, although I'm tempted to punt once again and make the decision in 3.0 to finally fix this thing correctly. If we only need this for feeds, do it in get_the_content_feed() and remove it from everywhere else.

#36 @markjaquith
14 years ago

I'm okay with that too. We can wait until 3.0 and fix it once.

#37 @ryan
14 years ago

  • Milestone changed from 2.9 to 3.0

#38 @hakre
14 years ago

I just stumbeled over that line of code today and wondered a bit. Isn't there a proper way to escape the CDATA end sequence(here named close tag)? I think it is but I was not able to find the propper way and it's getting late.

#39 @hakre
14 years ago

Replacement with &gt; should work with useragents (since about HTML 2.0, this is SGML related; this is 1996), so strictly spoken this bug-report is technically bogus. I do not want to speak against providing a patch here but anyone having this problem should check why this is creating problems properly before adding a fix by removing an upcomming filter or uncommenting the current line in code. It looks to me like the feeds useragent is not proplery following the standards.

#40 @hakre
14 years ago

I suggest to close as invalid.

#41 @hakre
14 years ago

As in #9992 tendencies are to name it what it is: an invalid report. Any objections?

#42 @Denis-de-Bernardy
14 years ago

yes, one: for performance sake, is_feed() is faster than and uneeded call to str_replace(). :-)

#43 @shidouhikari
14 years ago

  • Cc shidouhikari added
  • Milestone changed from 3.0 to 2.9.2
  • Priority changed from normal to high
  • Severity changed from normal to major

Why call it invalid report?

Plugins should be allowed to add javascript into posts, and to do so they need to wrap the code in CDATA to keep XHTML valid.

If it is being done just for RSS sake, do it only when the content is feeded at all.

2 solutions were suggested, 3 years passed, and the code is still there without anybody defending it. If it must be there somebody should point why. If not, move it to a filter with high hook priority and let each website owner / plugin developer deal with it.

@shidouhikari
14 years ago

Moves filter to the end of the function, letting each Wordpress user deal with the situation without having to edit core files

#44 @shidouhikari
14 years ago

With this patch, nothing changes if no filter is hooked, while anybody can take responsibility over his own sites for mystic consequences of leaving CDATA inside posts :)

If somebody decides to hook a filter that lets CDATA lives, and eventually faces a bug, then the cause will be the filter and not Wordpress core itself. Lets us have control of our code.

#45 @nacin
14 years ago

  • Milestone changed from 2.9.3 to 3.0

#46 follow-up: @dd32
14 years ago

How about we move the Cdata striping to a function hooked on the filter in question?

We can set the priority to (for example) 100, any filters that require to run afterwards, hook later, all other hooks hook first and get their result affected later down.

Otherwise, due to lack of traction and clear solution here, and my lack of wanting to change the filter order (as it has been for a few releases) this late in the game.. this'd get pushed to the next milestone.

And now i read the higher comments, and see that rboren has suggested exactly this. rboren, what do you say, Commit or deal with in the next release?

#47 in reply to: ↑ 46 @aaroncampbell
14 years ago

Replying to dd32:
Unfortunately, it's been so long that I can barely remember the issue. Reading my previous posts here, it seems like moving the CDATA stripping to a filter would be enough to fix it. I'd be fine with that solution.

#48 @technosailor
14 years ago

Ryans patch still applies cleanly. Let's commit and close this ticket for once and for all and work down the 3.0 ticket load :)

#49 @shidouhikari
14 years ago

  • Keywords needs-testing needs-unit-tests removed

I also don't see how Ryan's patch could break something. It just passed the hardcoded code to a filter. If nobody touches that filter it will remain doing exactally the same.

If a plugin uses a filter with higher priority than 20, he can just remove and add the filter again. Anybody that doesn't want CDATA being escaped can remove the filter and be responsible to anything bad it may cause on his site.

It's a perfect solution that will satisfy everybody.

#50 @shidouhikari
14 years ago

I've been using Wordpress in 2 sites, in their production and in development environments, having this CDATA replacement totally removed. And in all of them I have a plugin that adds CDATA escaped JavaScript to the_content.

I never felt anything bad and don't miss it in anyway in my use.

#51 @nacin
14 years ago

  • Milestone changed from 3.0 to 3.1

I would add the early keyword, but it's been there for 18 months. Let's actually handle this early 3.1.

#52 @shidouhikari
14 years ago

(I'm not complaining, just asking to understand it.)

What blocks this ticket from being commited? Are the suggested solutions not secure? waiting for a metter one to emerge?

#53 @shidouhikari
14 years ago

I've been using 2 Wordpress sites without this ]]> replace line, and they've been working great. No XHTML 1.1 error and no problem in feeds.

Comments feeds work great, I even have a plugin that tweaks the document and tested it a lot while developing and no problem, feedburner also showing my posts fine.

I have a plugin that adds JS on every link in posts and comments and they are working fine too.

#54 @nacin
13 years ago

  • Milestone changed from Awaiting Triage to Future Release

#55 in reply to: ↑ 12 @andy
13 years ago

Replying to tellyworth:

Whoops, here's that code again:

return '<![CDATA['.str_replace(']]>', ']]]><![CDATA[]>', $str).']]>';

This is correct. I have been running a high volume of posts through a strict XML parser and this is how I have been escaping CDATA sections. It splits the closing tag between two adjacent CDATA sections.

@nacin
13 years ago

Introduce escape_cdata_close()

#56 @nacin
13 years ago

  • Keywords early removed

Attached patch introduces escape_cdata_close() per comments by andy and tellyworth.

This could probably be moved to a filter as well, the way ryan's patch was implemented.

@andy
13 years ago

Remove incorrect entity encoding of ]]> and introduce cdata functions for filters.

#57 @andy
13 years ago

  • Priority changed from high to normal
  • Severity changed from major to minor

CDATA wrappers should be removed from all templates and applied via filters using new functions. There are some complications.

A quick search for 'the_title_rss' (e.g.) finds that filter tag used in wp-admin/includes/export.php, which reminds that we use feed-related functions for more than just feeds. Any changes to feed-related filters must be tested with export/import.

The same search also finds the template tag the_title_rss() in feed-rss2.php and feed-atom.php, but only CDATA-wrapped in the latter. Universal use of CDATA tags to guarantee XML validity is of dubious merit. However, if we are going to do it we should be consistent.

3670-I.diff is the first half of the fix. It will break some feeds (rare) so it should not be applied by itself. The second half, installing new CDATA filters, touches a lot of code. Please volunteer.

#58 @lgedeon
12 years ago

  • Cc luke.gedeon@… added

#59 @mpressly
10 years ago

It's been 3 years since this bug was updated. On two separate websites, I am currently having to modify the core to allow inclusion of Javascript after each Wordpress upgrade due to this issue. Could the code in question safely be moved to a filter so that it can be overridden as needed without having to modify the core?

#60 @yitwail
10 years ago

I had to take the approach of filtering plugin output to remove javascript with embedded CDATA tags, and make it an external javascript file I'm enqueing. This creates extra code, an extra server request, and possible issues if the plugin is updated, but seems safer than updating the core. I sincerely hope someone sees fit to addressing this long-standing issue.

#61 @nacin
10 years ago

  • Component changed from Template to Posts, Post Types
  • Focuses template added

#62 in reply to: ↑ description @awelzel
10 years ago

Replying to scenic:

I also suggest to make this a FILTER for RSS!

It is just stupid to do this replacement without a filter. Why?

I will open a new ticket for 3.8.1 - hopefully someone will notice it then.

#63 @SergeyBiryukov
10 years ago

#27134 was marked as a duplicate.

#64 @SergeyBiryukov
10 years ago

  • Milestone changed from Future Release to 3.9

#65 @nacin
10 years ago

  • Keywords needs-refresh needs-unit-tests added
  • Milestone changed from 3.9 to Future Release

This needs to be moved again. I think 3670-I.diff is good, but we need to refresh it, write unit tests, and confirm it's the proper fix.

#66 @alekv
10 years ago

Hi

I very much would appreciate if you could sort out this bug. I've developed a plugin for WooCommerce and AdWords which needs to insert the AdWords tracking code into the /checkout/order-received/ page. The plugin is getting more and more popular. Unfortunately the AdWords script uses the CDATA tags and breaks because of this bug. I have a workaround in place but would prefer a proper solution.

http://wordpress.org/plugins/woocommerce-google-adwords-conversion-tracking-tag/

Thanks and kind regards
Aleks

#67 @iapassas
10 years ago

Hello,
I am facing the same problem trying to embed a google charts java script in a post.
Interesting though the > is not changed all the time.
1) While embedding the following code is working properly

<script type="text/javascript" src="https://www.google.com/jsapi"></script>
<script type="text/javascript"><!--//--><![CDATA[//><!--
     google.load('visualization', '1', {'packages': ['geochart']});
     google.setOnLoadCallback(drawRegionsMap);
      function drawRegionsMap() {
        var data = google.visualization.arrayToDataTable([
          ['Country', 'Popularity'],
          ['Germany', 200],
          ['United States', 300],
          ['Brazil', 400],
          ['Canada', 500],
          ['France', 600],
          ['RU', 700]
        ]);
        var options = {};
        var chart = new google.visualization.GeoChart(document.getElementById('chart_div'));
        chart.draw(data, options);
    };
 //--><!]]></script>
<div id="chart_div" style="width: 900px; height: 500px;"></div>

2)There is an issue in the following

<script type="text/javascript" src="//ajax.googleapis.com/ajax/static/modules/gviz/1.0/chart.js"><!--//--><![CDATA[//><!--
{"dataSourceUrl":"//docs.google.com/spreadsheet/tq?key=0Au97PknsuEKSdDAxWXktZTJKUUJZR2Fmci13UnFUMHc&transpose=1&headers=0&range=A1%3AC2&gid=1&pub=1","options":{"titleTextStyle":{"bold":true,"color":"#000","fontSize":"12"},"vAxes":[{"title":"Left vertical axis title","useFormatFromData":true,"minValue":null,"viewWindow":{"min":null,"max":null},"maxValue":null},{"useFormatFromData":true,"minValue":null,"viewWindow":{"min":null,"max":null},"maxValue":null}],"pieHole":0,"title":"\u03a0\u03cc\u03c3\u03bf \u03c3\u03b7\u03bc\u03b1\u03bd\u03c4\u03b9\u03ba\u03cc \u03c1\u03cc\u03bb\u03bf \u03c0\u03b1\u03af\u03b6\u03b5\u03b9 \u03ba\u03b1\u03c4\u03ac \u03c4\u03b7 \u03b3\u03bd\u03ce\u03bc\u03b7 \u03c3\u03b1\u03c2, \u03b7 \u03c7\u03c1\u03ae\u03c3\u03b7 \u03c0\u03c1\u03c9\u03c4\u03bf\u03b3\u03b5\u03bd\u03ce\u03bd \u03b4\u03b5\u03b4\u03bf\u03bc\u03ad\u03bd\u03c9\u03bd \u03c9\u03c2 \u03ba\u03cd\u03c1\u03b9\u03b1 \u03c0\u03b7\u03b3\u03ae \u03c3\u03c4\u03b7 \u03b4\u03b7\u03bc\u03bf\u03c3\u03b9\u03bf\u03b3\u03c1\u03b1\u03c6\u03af\u03b1:","booleanRole":"certainty","height":371,"animation":{"duration":0},"colors":["#3366CC","#DC3912","#FF9900","#109618","#990099","#0099C6","#DD4477","#66AA00","#B82E2E","#316395","#994499","#22AA99","#AAAA11","#6633CC","#E67300","#8B0707","#651067","#329262","#5574A6","#3B3EAC","#B77322","#16D620","#B91383","#F4359E","#9C5935","#A9C413","#2A778D","#668D1C","#BEA413","#0C5922","#743411"],"width":600,"is3D":false,"hAxis":{"title":"Horizontal axis title","useFormatFromData":true,"minValue":null,"viewWindow":{"min":null,"max":null},"maxValue":null},"tooltip":{}},"state":{},"view":{},"isDefaultVisualization":false,"chartType":"PieChart","chartName":"Chart 2"}
//--><!]]></script>

I hope the above might be a help and resolve the issue.

Thank you for handling this issue

#68 follow-up: @awelzel
10 years ago

Well - now WordPress 4.0 is released... and still no fix for this annoying bug :-(

Any chance, that his will ever get fixed so users don't have to modify the core code with every update?

What needs to be done? Need any help?

#69 in reply to: ↑ 68 @helen
10 years ago

Replying to awelzel:

What needs to be done? Need any help?

Sure! Per nacin:

I think 3670-I.diff​ is good, but we need to refresh it, write unit tests, and confirm it's the proper fix.

#70 @awelzel
10 years ago

Seems I have to consult https://make.wordpress.org/core/handbook/ and then http://make.wordpress.org/core/handbook/automated-testing/#writing-tests first ;-)

But the diff alone is indeed not a fix, since wp_cdata() as a replacement for the str_replace() does not get used anywhere.

#71 @xoogu
9 years ago

Why hasn't this been fixed yet? Just change it to a filter that is added to the_content, then the user can disable it if they don't want it.

#72 follow-ups: @awelzel
9 years ago

I have to admit that I didn't have time to get into WordPress core coding yet - too many other projects - sorry.

But anyway - until there is a solution - for those who don't want to fiddle around with the core code with every update: You can use the following workaround in the templates functions.php or a plugin.

If there is an intentional "]]&gt;" somewhere in a post this may lead to invalid HTML in rare cases - although a "]]>" on its own as part of a page or post is not invalid.

function cdata_markupfix($content) {
    $content = str_replace("]]&gt;", "]]>", $content);
    
    return $content;
}

function cdata_template_redirect( $content ) {
	ob_start('cdata_markupfix');
}

add_action('template_redirect','cdata_template_redirect',-1);

You can see this live on one of my pages where I embed external scripts (using my own plugin) to generate JavaScript stuff and which validates fine: http://arnowelzel.de/wp/en/tools/statistics

#73 in reply to: ↑ 72 ; follow-up: @truebird
9 years ago

@awelzel

If this would work it would be awesome. Thanks so much for posting it. Mine is slightly different:
$content = str_replace( ']]>', ']]&gt;', $content );

Would I just have to change that line in your code to make it work (or anything else as well)? Also, I am on a multisite if that makes a difference (and the site in question is a subsite of course). Any adjustment for a multisite (or the same)?

I have a child theme as well (I believe that is ok as well)...any adjustments in that regard?

:)

Replying to awelzel:

I have to admit that I didn't have time to get into WordPress core coding yet - too many other projects - sorry.

But anyway - until there is a solution - for those who don't want to fiddle around with the core code with every update: You can use the following workaround in the templates functions.php or a plugin.

If there is an intentional "]]&gt;" somewhere in a post this may lead to invalid HTML in rare cases - although a "]]>" on its own as part of a page or post is not invalid.

function cdata_markupfix($content) {
    $content = str_replace("]]&gt;", "]]>", $content);
    
    return $content;
}

function cdata_template_redirect( $content ) {
	ob_start('cdata_markupfix');
}

add_action('template_redirect','cdata_template_redirect',-1);

You can see this live on one of my pages where I embed external scripts (using my own plugin) to generate JavaScript stuff and which validates fine: http://arnowelzel.de/wp/en/tools/statistics

Last edited 9 years ago by truebird (previous) (diff)

#74 in reply to: ↑ 73 ; follow-up: @awelzel
9 years ago

Replying to truebird:

@awelzel

If this would work it would be awesome. Thanks so much for posting it. Mine is slightly different:
$content = str_replace( ']]>', ']]&gt;', $content );

This is not only "slightly different" - it's in fact the opposite way and exactly the same, what WordPress already does.

Would I just have to change that line in your code to make it work (or anything else as well)? Also, I am on a multisite if that makes a difference (and the site in question is a subsite of course). Any adjustment for a multisite (or the same)?

I have to admit: I don't get it. To get it right: Your code already does, what WordPress does. You ask me, if changing my code I proposed to your code, that would "make it work"? Why?

In short: Just use my code as it is and try it out on your own. I don't know what other modifications you already have done in your themes and which plugins you use.

I have a child theme as well (I believe that is ok as well)...any adjustments in that regard?

Nope. As I understand child themes, all functions from the main theme are still active, but you can also add additional stiff in the child theme.

#75 in reply to: ↑ 74 ; follow-up: @truebird
9 years ago

Hey @awelzel

I did try it (in its original form) and was getting no change. I thought my installation may have had a slight variety. Hence, the question.

I will try it again and let you know what happens . . . again thanks for your contribution.

:)

Replying to awelzel:

Replying to truebird:

@awelzel

If this would work it would be awesome. Thanks so much for posting it. Mine is slightly different:
$content = str_replace( ']]>', ']]&gt;', $content );

This is not only "slightly different" - it's in fact the opposite way and exactly the same, what WordPress already does.

Would I just have to change that line in your code to make it work (or anything else as well)? Also, I am on a multisite if that makes a difference (and the site in question is a subsite of course). Any adjustment for a multisite (or the same)?

I have to admit: I don't get it. To get it right: Your code already does, what WordPress does. You ask me, if changing my code I proposed to your code, that would "make it work"? Why?

In short: Just use my code as it is and try it out on your own. I don't know what other modifications you already have done in your themes and which plugins you use.

I have a child theme as well (I believe that is ok as well)...any adjustments in that regard?

Nope. As I understand child themes, all functions from the main theme are still active, but you can also add additional stiff in the child theme.

#76 in reply to: ↑ 75 ; follow-up: @alekv
9 years ago

Hi everyone

I can confirm that awelzel's fix works fine in my plugin.

Thanks awelzel !

Replying to truebird:

Hey @awelzel

I did try it (in its original form) and was getting no change. I thought my installation may have had a slight variety. Hence, the question.

I will try it again and let you know what happens . . . again thanks for your contribution.

:)

Replying to awelzel:

Replying to truebird:

@awelzel

If this would work it would be awesome. Thanks so much for posting it. Mine is slightly different:
$content = str_replace( ']]>', ']]&gt;', $content );

This is not only "slightly different" - it's in fact the opposite way and exactly the same, what WordPress already does.

Would I just have to change that line in your code to make it work (or anything else as well)? Also, I am on a multisite if that makes a difference (and the site in question is a subsite of course). Any adjustment for a multisite (or the same)?

I have to admit: I don't get it. To get it right: Your code already does, what WordPress does. You ask me, if changing my code I proposed to your code, that would "make it work"? Why?

In short: Just use my code as it is and try it out on your own. I don't know what other modifications you already have done in your themes and which plugins you use.

I have a child theme as well (I believe that is ok as well)...any adjustments in that regard?

Nope. As I understand child themes, all functions from the main theme are still active, but you can also add additional stiff in the child theme.

#77 in reply to: ↑ 72 @alekv
9 years ago

Works. Thanks!

Replying to awelzel:

I have to admit that I didn't have time to get into WordPress core coding yet - too many other projects - sorry.

But anyway - until there is a solution - for those who don't want to fiddle around with the core code with every update: You can use the following workaround in the templates functions.php or a plugin.

If there is an intentional "]]&gt;" somewhere in a post this may lead to invalid HTML in rare cases - although a "]]>" on its own as part of a page or post is not invalid.

function cdata_markupfix($content) {
    $content = str_replace("]]&gt;", "]]>", $content);
    
    return $content;
}

function cdata_template_redirect( $content ) {
	ob_start('cdata_markupfix');
}

add_action('template_redirect','cdata_template_redirect',-1);

You can see this live on one of my pages where I embed external scripts (using my own plugin) to generate JavaScript stuff and which validates fine: http://arnowelzel.de/wp/en/tools/statistics

#78 in reply to: ↑ 76 ; follow-ups: @truebird
9 years ago

@alekv

Sweet. Great. Do you run a single WP installation or a multisite?

I will get back to this shortly (will report back on a multisite install).

:)

Replying to alekv:

Hi everyone

I can confirm that awelzel's fix works fine in my plugin.

Thanks awelzel !

Replying to truebird:

Hey @awelzel

I did try it (in its original form) and was getting no change. I thought my installation may have had a slight variety. Hence, the question.

I will try it again and let you know what happens . . . again thanks for your contribution.

:)

Replying to awelzel:

Replying to truebird:

@awelzel

If this would work it would be awesome. Thanks so much for posting it. Mine is slightly different:
$content = str_replace( ']]>', ']]&gt;', $content );

This is not only "slightly different" - it's in fact the opposite way and exactly the same, what WordPress already does.

Would I just have to change that line in your code to make it work (or anything else as well)? Also, I am on a multisite if that makes a difference (and the site in question is a subsite of course). Any adjustment for a multisite (or the same)?

I have to admit: I don't get it. To get it right: Your code already does, what WordPress does. You ask me, if changing my code I proposed to your code, that would "make it work"? Why?

In short: Just use my code as it is and try it out on your own. I don't know what other modifications you already have done in your themes and which plugins you use.

I have a child theme as well (I believe that is ok as well)...any adjustments in that regard?

Nope. As I understand child themes, all functions from the main theme are still active, but you can also add additional stiff in the child theme.

#79 in reply to: ↑ 78 @alekv
9 years ago

@truebird: single site

Replying to truebird:

@alekv

Sweet. Great. Do you run a single WP installation or a multisite?

I will get back to this shortly (will report back on a multisite install).

:)

Replying to alekv:

Hi everyone

I can confirm that awelzel's fix works fine in my plugin.

Thanks awelzel !

Replying to truebird:

Hey @awelzel

I did try it (in its original form) and was getting no change. I thought my installation may have had a slight variety. Hence, the question.

I will try it again and let you know what happens . . . again thanks for your contribution.

:)

Replying to awelzel:

Replying to truebird:

@awelzel

If this would work it would be awesome. Thanks so much for posting it. Mine is slightly different:
$content = str_replace( ']]>', ']]&gt;', $content );

This is not only "slightly different" - it's in fact the opposite way and exactly the same, what WordPress already does.

Would I just have to change that line in your code to make it work (or anything else as well)? Also, I am on a multisite if that makes a difference (and the site in question is a subsite of course). Any adjustment for a multisite (or the same)?

I have to admit: I don't get it. To get it right: Your code already does, what WordPress does. You ask me, if changing my code I proposed to your code, that would "make it work"? Why?

In short: Just use my code as it is and try it out on your own. I don't know what other modifications you already have done in your themes and which plugins you use.

I have a child theme as well (I believe that is ok as well)...any adjustments in that regard?

Nope. As I understand child themes, all functions from the main theme are still active, but you can also add additional stiff in the child theme.

#80 in reply to: ↑ 78 @awelzel
9 years ago

Replying to truebird:

@alekv

Sweet. Great. Do you run a single WP installation or a multisite?

I will get back to this shortly (will report back on a multisite install).

:)

JFTR: I also run a multisite installation to handle multi language content.

#81 @truebird
9 years ago

@awelzel and @alekv

Just reporting back...for me I can not get it to work so far. I will keep trying. I am sure it is some setting or something obvious I am missing if it is working for others.

I will come back here later after more messing around with it.

If you have any other tips or thoughts that would be great . . .

If not, don't worry about posting back a response (just letting you know my status on it) . . .

:)

#82 @awelzel
9 years ago

@truebird: Most problems are caused by plugins which add additional filters or try to change the default filter chain of WordPress. I recommend setting up a plain WordPress setup in a separate test environment without any plugins at all. Then add the plugins you currently use - one by one - and check the results.

#83 @truebird
9 years ago

@awelzel

Agreed. I am going to do just that...have a great one...thanks...

#84 @SergeyBiryukov
9 years ago

#30926 was marked as a duplicate.

#85 follow-up: @truebird
9 years ago

@awelzel @sergeyBiryukov @alekv and ALL others wanting a up to date current and clean resolution to this 8 year old issue with CDATA errors. I have been meaning to post this for a while now for everyone (just getting around to it) . . . smile.

I spoke extensively on the phone with Google developers (this took a lot of patience and time to get to the right person). I took him to this particular thread and asked if he could give a final current resolution to this that is as clean as possible.

He said he was aware of this issue with WordPress and that the full code that is generated at Google is NOT necessary for Wordpress installs. He explained to REMOVE the part of the code manually that Wordpress installs do not need and it would resolve the issue cleanly.

Here is what he explained to do with the code you get from Google for conversion tracking before placing into Wordpress (this worked perfectly):

First I will give you a sample of the full code for conversions one might get from Google and next I will explain the part you must keep and then what to add to it . . .

Sample original:

<script type="text/javascript">// < ![CDATA[
// < ![CDATA[
// < ![CDATA[
var google_conversion_id = 1111891111; var google_conversion_language = "en"; var google_conversion_format = "3"; var google_conversion_color = "ffffff"; var google_conversion_label = "TlwoCMCEoVgQy7m15pp"; var google_conversion_value = 67.00; var google_conversion_currency = "USD"; var google_remarketing_only = false;
// ]]></script>
<script src="//www.googleadservices.com/pagead/conversion.js" type="text/javascript">// < ![CDATA[
// < ![CDATA[
// < ![CDATA[

// ]]></script>
<noscript>

Sample of what to keep (other parts you delete) and what to add to it to FIX THIS ONCE AND FOR ALL and/or until it is fixed directly in the Wordpress core someday:

<div style="display: inline;"><img style="border-style: none;" src="https://www.googleadservices.com/pagead/conversion/11115891111/?value=67.00&amp;currency_code=USD&amp;label=TlwoCMCEoVgQy7m15pp&amp;guid=ON&amp;script=0" alt="" width="1" height="1" /></div>

*NOTE: There is only one thing to add and that is the "https:" BEFORE the

//

two slashes and www.googleadservices.com . . .

I really hope this help a lot of people with this issue!!!

:)

Greg

Last edited 9 years ago by truebird (previous) (diff)

#86 in reply to: ↑ 85 @truebird
9 years ago

*First add https: and use it as a first choice for sure (but if you have issues with that then give http: as shot (see below)...

*Additional info: You may have to add http: instead of https: (one of the two will give you a "OK" or blue confirmation that the code is working properly with the Google Tag Assistant extension for Chrome so you can confirm it is working for you correctly. Here is the extension to use to check your work:

https://chrome.google.com/webstore/detail/tag-assistant-by-google/kejbdjndbnbjgmefkgdddjlbokphdefk?hl=en

NOTE: There is only one thing to add and that is the "http:" OR "https:" BEFORE the

//

two slashes and www.googleadservices.com . . .
I really hope this help a lot of people with this issue!!!

Last edited 9 years ago by truebird (previous) (diff)

@mdgl
9 years ago

Refreshed and enhanced patch

#87 @mdgl
9 years ago

  • Keywords needs-refresh removed

This is a long standing problem that appears to have caused much frustration. Let's take another look.

Problem Statement

When generating the XML output for feeds (RDF, RSS, RSS2, Atom), certain data fields are wrapped within CDATA blocks to make sure that they remain valid XML. This may be unnecessary in some cases but works fine unless the content of the data field happens to include the CDATA block terminator string ']]>'. In this situation, we need to "escape" that data to avoid an XML parsing error. WordPress presently does this in a "belt and braces" way by automatically substituting the string ']]>' by ']]&gt;' within the main post content data field. There are a number of problems with this approach:

  • the substitution is always applied, regardless of whether the output is being produced for an XML CDATA field. This effectively mangles the normal output of post content in many situations.
  • the substitution is applied only to the main post content data field (i.e. that returned by "the_content()"). Other data fields that are wrapped within CDATA blocks are not similarly protected.

The most common report is that this behaviour makes it difficult to include JavaScript code within post content whilst retaining XML/XHTML compliance. For example, the TinyMCE editor automatically wraps code within <script> tags in a CDATA block when you switch between the "Text" and "Visual" views, which will then be mangled on output. Although modern browsers generally seem to cope with the missing CDATA terminator (parsing as HTML), other XML processors and validators might not be so forgiving.

With the rise of HTML5, interest in XHTML is doubtless waning, but modifying post content in this way is pretty egregious behaviour and probably still worth fixing.

Solution Overview

According to Wikipedia (http://en.wikipedia.org/wiki/CDATA), the preferred way to encode CDATA sections that include the string ']]>' is to create multiple concatenated CDATA blocks by substituting the string by ']]]]><![CDATA[>'. That is, we forcibly end the first CDATA block after the ']]' and then include the missing '>' at the beginning of the next CDATA block.

This substitution should only occur if we are preparing the output for inclusion in an XML CDATA block. The substitution should, however be applied to whatever data fields we are processing.

Interestingly, the current WordPress export tool (which generates XML/RSS2 output) appears to implement this correctly, applying the function wxr_cdata() to (almost) all data fields, as follows:

function wxr_cdata( $str ) {
   if ( seems_utf8( $str ) == false )
      $str = utf8_encode( $str );
   $str = '<![CDATA[' . str_replace( ']]>', ']]]]><![CDATA[>', $str ) . ']]>';
   return $str;
}

The updated patch applies the same approach when generating the XML output for feeds. Filters could be applied to the functions used to generated feed output, but this elegant solution might introduce backwards-compatibility issues if developers/users have made use of these functions themselves. Instead, the feed generation code has been modified to encode the values explicitly.

The new encoding function has been applied to just those locations dealing with the main post content data field. This includes excerpts since these may be automatically derived from the post content and so a new function get_the_excerpt_rss() has been added to support the retrieval rather than display of excerpts.

Future work could apply the same XML encoding technique to other data fields (see below).

Unlike the earlier and partial patch, it did not seem appropriate to modify deprecated function the_content_rss(), again for reasons of backwards compatibility.

Feed Reader Compability

The encoding in the updated patch has been tested and appears to work correctly with Internet Explorer 11, Opera 27 and Firefox 35 as well as with XMLSpy.

Future Work

The updated patch creates a new function _esc_xml_cdata() which as the name suggests is intended to form part of a future and more powerful esc_xml() abstraction supported by other functions for dealing with XML encoding. In general our handling of XML encoding is very inconsistent and could benefit from a more wholesale clean-up. There are a number of subtleties to XML encoding and is actually quite a tricky problem to address.

Version 0, edited 9 years ago by mdgl (next)

This ticket was mentioned in Slack in #core by jorbin. View the logs.


9 years ago

#89 @jorbin
9 years ago

The XML builder proposed in #22435 might be something that could help here.

I do think this is going to require a lot of unit tests. We are dealing with 8+ year old code that is run on most page loads.

@stevenkword - What are your thoughts?

#90 @stevenkword
9 years ago

@jorbin I hate to see this be such a prolonged pain point for so many people. There appear to be two proposed ways forward that has caused the ticket to repeatedly stall.

1.) Hook the str_replace functionality by a filter
2.) Properly escape/concatenate CDATA blocks for inclusion in XML

I think there are merits to both approaches. The first solution has little risk and provides developers to properly work around the issues they are having in core. However, it has been pointed out that hooking in the problematic function doesn't solve the bigger picture perfectly. The second solution appears to be what we would like tackle, but not without thorough testing. It is a much bigger lift.

I propose we address this ticket with the filter approach from 3670.2.diff as to put a nail in the coffin of requiring plugin authors to hack at core. I'd then like to open a new ticket aimed at tackling the root of the problem as it relates to the way we build XML.

Last edited 9 years ago by stevenkword (previous) (diff)

This ticket was mentioned in Slack in #core by ocean90. View the logs.


9 years ago

#92 @swissspidy
4 years ago

In 48072:

Sitemaps: Add XML sitemaps functionality to WordPress.

While web crawlers are able to discover pages from links within the site and from other sites, XML sitemaps supplement this approach by allowing crawlers to quickly and comprehensively identify all URLs included in the sitemap and learn other signals about those URLs using the associated metadata.

See https://make.wordpress.org/core/2020/06/10/merge-announcement-extensible-core-sitemaps/ for more details.

This feature exposes the sitemap index via /wp-sitemap.xml and exposes a variety of new filters and hooks for developers to modify the behavior. Users can disable sitemaps completely by turning off search engine visibility in WordPress admin.

This change also introduces a new esc_xml() function to escape strings for output in XML, as well as XML support to wp_kses_normalize_entities().

Props Adrian McShane, afragen, adamsilverstein, casiepa, flixos90, garrett-eclipse, joemcgill, kburgoine, kraftbj, milana_cap, pacifika, pbiron, pfefferle, Ruxandra Gradina, swissspidy, szepeviktor, tangrufus, tweetythierry.
Fixes #50117.
See #3670. See #19998.

This ticket was mentioned in PR #4031 on WordPress/wordpress-develop by samruddhikhandale.


14 months ago
#93

Hi 👋

I am Samruddhi, a Software Engineer for GitHub, working with the GitHub Codespaces team, and a maintainer for the dev containers org 👩‍💻

Thank you @helen for doing all the ground work for setting up a dev container (ref: https://github.com/WordPress/wordpress-develop/pull/3670). This PR is greatly inspired from the contents of #3670 with few tweaks.

Thanks,
Samruddhi

samruddhikhandale commented on PR #4031:


14 months ago
#94

Once a contributor closes their VS Code browser window, how long does the Codespace remain active for?

By default, codespaces time out after 30 minutes of inactivity. When a codespace times out it is stopped and will no longer incur charges for compute usage.

Doc reference - here

@peterwilsoncc Let me know if this info would be helpful in the doc, happy to add that.

@peterwilsoncc commented on PR #4031:


14 months ago
#95

@samruddhikhandale Given the default spend limit is set to zero, I think it's fine to leave it out for now. It's easy enough to add something later if it becomes an issue.

@manfcarlo commented on PR #4031:


14 months ago
#96

Maybe some people who are unfamiliar with codespaces would be interested to know that I have been using a codespace to develop a plugin for the plugin directory, and the youth of the product absolutely shows. I have been hit by two major outages in little over a month, the latest one being ongoing for more than 48 hours and preventing me from releasing a new version to the plugin directory. The outage was reported as resolved, but either you are not testing closely enough, or deliberately misreporting to hide the high duration of downtime. The support ticket I submitted for the previous outage took 7 days to be answered, so I'm expecting 7 full days of downtime and 7 days delay to a plugin release that should already be completed by now. With service like this, there is no chance I will upgrade to the paid tier anytime soon. Not voting down this patch at all, but I wouldn't be encouraging the events team to jump into codespaces with both feet just yet, or it could result in an attendee's whole contributor day being wasted.

I actually think WordPress Playground would be a better long-term solution for this purpose, but it too obviously has a long way to go.

helen commented on PR #4031:


14 months ago
#97

Hi @carlomanf - totally understood that you've experienced frustrations. We use Codespaces for our own dev environments at GitHub so any outages you experience we do as well. I cannot speak to this product area as far as work goes and don't think this is the greatest venue to go back and forth about it, in any case.

As said, this is a way to get a sense of how this particular tool and cloud dev environments as a general concept might benefit the core contribution case. devcontainers as an open spec should mean that this can lay the groundwork for generally being usable across different options based on what works best for the contributor, which should always be our goal - to make it easier for folks to contribute given their differing needs and constraints. A Wasm option would also be great! Let's just not let ourselves get stuck in attempting to be perfect from the start, that's already happened to this effort a couple times.

Note: See TracTickets for help on using tickets.