Opened 6 years ago
Last modified 16 months ago
#3670 new defect (bug)
Removing CDATA close tag ( ]]> ) unbalances the CDATA block
| Reported by: |
|
Owned by: |
|
|---|---|---|---|
| Priority: | normal | Milestone: | Future Release |
| Component: | Template | Version: | 2.1 |
| Severity: | minor | Keywords: | has-patch |
| Cc: | kirkpatrick, josephscott, westi, shidouhikari, luke.gedeon@… |
Description
I'm
not sure if this is a bug per se, but it breaks included JS in my
posts that are escaped with a CDATA section. I've basically commented
out the following line in the_content() every time I upgrade:
$content = str_replace(']]>', ']]>', $content);
A user on the wp-testers list indicated that this was part of making sure that included CDATA blocks didn't break RSS feeds. I don't use CDATA sections directly in my RSS feeds, so I haven't tested this. In my particular case, the JS is used to embed Flash movies (e.g. YouTube videos) in an XHTML compliant way (without embed tags). I have a custom plugin I've written that strips out the JS and replaces it with the embed tag in the RSS feed.
Perhaps we should use a flag to activate this when the app is going through a feed. Ideally, though, it would simply be removed. Odds are that the CDATA block is being used for a JS block in a post body, and since most RSS aggregators don't allow JS by default, it would be safe to simply remove CDATA blogs if is_feed() is set.
Attachments (6)
Change History (64)
I was the one who pointed out the reason for that line. The JS-replacement plugin is probably saving your feeds from the problem that line was added to fix.
I think we can come up with a better solution for this. If there is a good reason to wrap some of your post content as CDATA, perhaps you would be better served by adding a filter to your plugin to wrap the_content as CDATA.
Has anyone else run into this?
I would also appreciate an update to make difference between feed / content, my plugin has exact the same problem, cause when I want the XHTML validation, I need the CDATA implementation, cause the "socalled" flashvars for a Flash video plugin are diveded by ampersand's. At the moment it's possible to use <!-- around CDATA to covers the CDATA problematic in the conent, but it seems to be deprecate from W3C.
So more and More Flash integration trap into this problematic
Andy, I agree that the filter is saving me from needing that line. But, embedded JS is a valid thing to do in the content. That replace function should only really happen when we are processing a feed. The global is_feed() (I think) method should just wrap that line at a minimum.
- Keywords dev-feedback added
Checking is_feed does not get to the root of the problem, which is that we are trying to nest CDATA sections where we should not. I don't know enough about XML but I think we ought to stop using CDATA markers in feed templates and use a function that can intelligently wrap any content in CDATA as needed, removing any nested markers.
Is there really any reason to encode them rather than simply remove them?
So what I propose, and what should be seen by somebody with XML chops, is this unintelligent function:
function wp_cdata($data) {
Maybe test for conditions necessitating
CDATA markers and return if no need
$data = str_replace('<![CDATA[', , $data);
$data = str_replace(']]>', , $data);
Sometimes a trailing square bracket
can mess up XML parsers:
if ( substr($data, -1) == ']' )
$data .= ' ';
return "<![CDATA[$data]]>';
}
function wp_cdata($data) {
// Maybe test for conditions necessitating
// CDATA markers and return if no need
$data = str_replace('<![CDATA[', '', $data);
$data = str_replace(']]>', '', $data);
// Sometimes a trailing square bracket
// can mess up XML parsers:
if ( substr($data, -1) == ']' )
$data .= ' ';
return "<![CDATA[$data]]>';
}
Does wordpress support PHP < 4.0.5? If not, you could use arrays in str_replace:
$data = str_replace(array('<![CDATA[', ']]>'), array('', ''), $data);
comment:10
foolswisdom — 6 years ago
- Milestone changed from 2.1.3 to 2.3
comment:11
tellyworth — 6 years ago
Not sure how helpful this is, but it is kind of possible to nest CDATA sections. Technically I suppose it's double-encoding rather than nesting, but regardless, this is 100% valid and should work with any compliant XML parser:
<![CDATA foo <![CDATA bar ]]]><![CDATA[]> baz ]]>
It's actually two separate CDATA sections concatenated together.
When decoded by an XML parser, this becomes:
foo <![CDATA bar ]]> baz
Here's a PHP snippet to do it:
return '<![CDATA.str_replace(?]>', ']]]><![CDATA[]>', $str).']]>';
comment:12
follow-up:
↓ 55
tellyworth — 6 years ago
Whoops, here's that code again:
return '<![CDATA['.str_replace(']]>', ']]]><![CDATA[]>', $str).']]>';
comment:14
lloydbudd — 5 years ago
- Milestone changed from 2.5 to 2.6
comment:15
Nazgul — 5 years ago
- Owner changed from anonymous to andy
comment:16
AaronCampbell — 5 years ago
I'm still a little unclear as to the purpose of this line of code. I keep running into this problem, so I'd like to see a fix (I'd be happy to submit a patch), but I don't understand why we are keeping it at all. If it's just needed for feeds, lets limit it to affecting feeds. If someone could clearly explain to me why it's here (when it's needed), I'll put a patch together.
AaronCampbell — 5 years ago
comment:17
AaronCampbell — 5 years ago
- Component changed from Administration to Template
- Keywords has-patch needs-testing added
- Milestone changed from 2.6 to 2.5.1
- Priority changed from low to normal
- Severity changed from normal to major
This ticket has sat here forever with nothing getting done, so I added a patch to get the ball rolling. The patch will fix two problems. It will fix one expressed above by only replacing ]]> with ]]> when it is processing a feed.
There was another problem because we so often use:
//<![CDATA[
alert('test');
//]]>
and strip_tags() leaves the leading there, which looks ugly in the feeds as well. I added a simple statement in two places to handle that:
$text = preg_replace('|//\s*<!\[CDATA\[|', '<![CDATA[', $text);
comment:19
Viper007Bond — 5 years ago
Bumpy bump.
comment:20
Viper007Bond — 5 years ago
- Milestone changed from 2.9 to 2.7
comment:21
jacobsantos — 5 years ago
Seems to be a big enough issue that this ticket should go in.
comment:22
ryan — 5 years ago
- Milestone changed from 2.7 to 2.8
Postponing yet again. Patch looked like a good candidate for early 2.8 so we can get plenty of testing on it.
comment:23
lloydbudd — 5 years ago
- Keywords early added
might be worth committing to give it a shot?
comment:25
follow-up:
↓ 29
Denis-de-Bernardy — 4 years ago
patch applies cleanly against today's trunk.
quick question, though: if the closing CDATA gets escaped, shouldn't the opening CDATA get escaped too?
- Keywords dev-feedback removed
let's try to get this into 2.9 early
- Milestone changed from 2.8 to 2.9
- Keywords needs-unit-tests added
- Severity changed from major to normal
comment:29
in reply to:
↑ 25
hakre — 4 years ago
Replying to Denis-de-Bernardy:
patch applies cleanly against today's trunk.
quick question, though: if the closing CDATA gets escaped, shouldn't the opening CDATA get escaped too?
For shure, everything else is pretty stupid.
comment:30
follow-up:
↓ 31
kirkpatrick — 3 years ago
- Cc kirkpatrick added
The substitution of the right bracket in the CDATA closing,
$excerpt = str_replace(']]>', ']]>', $excerpt);
occurs in a number of places in WordPress, all in wp-includes:
comment.php Line 1355
feed.php Line 191
formatting.php Line 1734
post-template.php Line 168
So this substitution seems to be desired for reasons other than "protecting feeds". Perhaps security against javascript-driven xml injection attacks? (It would be useful if the gods of WordPress would tell us the meaning of all this.)
Anyway, as it is, a page written by a plugin that creates javascript that includes strings of html will not validate as xhtml. Since we can (presumably) trust plugins, it should be ok to fix this: just reverse the order of lines 167 and 168 in post-template.php, so the filters (potentially set by plugins) are applied after the replacement:
Change
$content = apply_filters('the_content', $content);
$content = str_replace(']]>', ']]>', $content);
to
$content = str_replace(']]>', ']]>', $content);
$content = apply_filters('the_content', $content);
This way, any CDATA in the post (as stored in the database) will lose the CDATA, as seems to be desired, but the plugin can still operate properly and validate.
The same reversal should also be done (my application doesn't need these, but ...) in comment.php (line 1355 to above 1351) and formatting.php (exchange 1733 and 1734).
This is a minimal change, and I believe makes more sense than the present code. The post content is filtered, but then the plugin is allowed to do its work.
Of course, this fix does not meet the needs of the earlier change posters, who wish to post javascript hidden with CDATA. I don't think I support this for standard WordPress.
comment:31
in reply to:
↑ 30
Denis-de-Bernardy — 3 years ago
Replying to kirkpatrick:
So this substitution seems to be desired for reasons other than "protecting feeds"?
Not according to the changeset that introduced it (r1039).
Here's what the file looked like way back then:
http://core.trac.wordpress.org/browser/trunk/wp-includes/template-functions-post.php?rev=1039
- Cc josephscott added
I've added a patch, but I'm not 100% that it won't break the Atom API. Can anyone give it a shot?
along the same line of bugs, there is #9992.
comment:33
westi — 3 years ago
- Cc westi added
Would it not be better to move the escaping of the closing CDATA into a function on the the_content filter which can then itself check is_feed() and also if necessary easily be removed by a plugin?
However as has been said before we don't want to change this too much close to release maybe we should just move the current code to a filter function for now but with no conditional check.
comment:34
markjaquith — 3 years ago
Moving it to a filter sounds like an acceptable solution this close to RC.
comment:35
ryan — 3 years ago
Patch moves escaping to a filter, although I'm tempted to punt once again and make the decision in 3.0 to finally fix this thing correctly. If we only need this for feeds, do it in get_the_content_feed() and remove it from everywhere else.
comment:36
markjaquith — 3 years ago
I'm okay with that too. We can wait until 3.0 and fix it once.
comment:37
ryan — 3 years ago
- Milestone changed from 2.9 to 3.0
comment:38
hakre — 3 years ago
I just stumbeled over that line of code today and wondered a bit. Isn't there a proper way to escape the CDATA end sequence(here named close tag)? I think it is but I was not able to find the propper way and it's getting late.
comment:39
hakre — 3 years ago
Replacement with > should work with useragents (since about HTML 2.0, this is SGML related; this is 1996), so strictly spoken this bug-report is technically bogus. I do not want to speak against providing a patch here but anyone having this problem should check why this is creating problems properly before adding a fix by removing an upcomming filter or uncommenting the current line in code. It looks to me like the feeds useragent is not proplery following the standards.
comment:40
hakre — 3 years ago
I suggest to close as invalid.
comment:41
hakre — 3 years ago
As in #9992 tendencies are to name it what it is: an invalid report. Any objections?
yes, one: for performance sake, is_feed() is faster than and uneeded call to str_replace(). :-)
comment:43
shidouhikari — 3 years ago
- Cc shidouhikari added
- Milestone changed from 3.0 to 2.9.2
- Priority changed from normal to high
- Severity changed from normal to major
Why call it invalid report?
Plugins should be allowed to add javascript into posts, and to do so they need to wrap the code in CDATA to keep XHTML valid.
If it is being done just for RSS sake, do it only when the content is feeded at all.
2 solutions were suggested, 3 years passed, and the code is still there without anybody defending it. If it must be there somebody should point why. If not, move it to a filter with high hook priority and let each website owner / plugin developer deal with it.
shidouhikari — 3 years ago
Moves filter to the end of the function, letting each Wordpress user deal with the situation without having to edit core files
comment:44
shidouhikari — 3 years ago
With this patch, nothing changes if no filter is hooked, while anybody can take responsibility over his own sites for mystic consequences of leaving CDATA inside posts :)
If somebody decides to hook a filter that lets CDATA lives, and eventually faces a bug, then the cause will be the filter and not Wordpress core itself. Lets us have control of our code.
comment:45
nacin — 3 years ago
- Milestone changed from 2.9.3 to 3.0
comment:46
follow-up:
↓ 47
dd32 — 3 years ago
How about we move the Cdata striping to a function hooked on the filter in question?
We can set the priority to (for example) 100, any filters that require to run afterwards, hook later, all other hooks hook first and get their result affected later down.
Otherwise, due to lack of traction and clear solution here, and my lack of wanting to change the filter order (as it has been for a few releases) this late in the game.. this'd get pushed to the next milestone.
And now i read the higher comments, and see that rboren has suggested exactly this. rboren, what do you say, Commit or deal with in the next release?
comment:47
in reply to:
↑ 46
aaroncampbell — 3 years ago
Replying to dd32:
Unfortunately, it's been so long that I can barely remember the issue. Reading my previous posts here, it seems like moving the CDATA stripping to a filter would be enough to fix it. I'd be fine with that solution.
comment:48
technosailor — 3 years ago
Ryans patch still applies cleanly. Let's commit and close this ticket for once and for all and work down the 3.0 ticket load :)
comment:49
shidouhikari — 3 years ago
- Keywords needs-testing needs-unit-tests removed
I also don't see how Ryan's patch could break something. It just passed the hardcoded code to a filter. If nobody touches that filter it will remain doing exactally the same.
If a plugin uses a filter with higher priority than 20, he can just remove and add the filter again. Anybody that doesn't want CDATA being escaped can remove the filter and be responsible to anything bad it may cause on his site.
It's a perfect solution that will satisfy everybody.
comment:50
shidouhikari — 3 years ago
I've been using Wordpress in 2 sites, in their production and in development environments, having this CDATA replacement totally removed. And in all of them I have a plugin that adds CDATA escaped JavaScript to the_content.
I never felt anything bad and don't miss it in anyway in my use.
comment:51
nacin — 3 years ago
- Milestone changed from 3.0 to 3.1
I would add the early keyword, but it's been there for 18 months. Let's actually handle this early 3.1.
comment:52
shidouhikari — 3 years ago
(I'm not complaining, just asking to understand it.)
What blocks this ticket from being commited? Are the suggested solutions not secure? waiting for a metter one to emerge?
comment:53
shidouhikari — 3 years ago
I've been using 2 Wordpress sites without this ]]> replace line, and they've been working great. No XHTML 1.1 error and no problem in feeds.
Comments feeds work great, I even have a plugin that tweaks the document and tested it a lot while developing and no problem, feedburner also showing my posts fine.
I have a plugin that adds JS on every link in posts and comments and they are working fine too.
comment:54
nacin — 3 years ago
- Milestone changed from Awaiting Triage to Future Release
comment:55
in reply to:
↑ 12
andy — 3 years ago
Replying to tellyworth:
Whoops, here's that code again:
return '<![CDATA['.str_replace(']]>', ']]]><![CDATA[]>', $str).']]>';
This is correct. I have been running a high volume of posts through a strict XML parser and this is how I have been escaping CDATA sections. It splits the closing tag between two adjacent CDATA sections.
comment:56
nacin — 3 years ago
- Keywords early removed
Attached patch introduces escape_cdata_close() per comments by andy and tellyworth.
This could probably be moved to a filter as well, the way ryan's patch was implemented.
Remove incorrect entity encoding of ]]> and introduce cdata functions for filters.
comment:57
andy — 3 years ago
- Priority changed from high to normal
- Severity changed from major to minor
CDATA wrappers should be removed from all templates and applied via filters using new functions. There are some complications.
A quick search for 'the_title_rss' (e.g.) finds that filter tag used in wp-admin/includes/export.php, which reminds that we use feed-related functions for more than just feeds. Any changes to feed-related filters must be tested with export/import.
The same search also finds the template tag the_title_rss() in feed-rss2.php and feed-atom.php, but only CDATA-wrapped in the latter. Universal use of CDATA tags to guarantee XML validity is of dubious merit. However, if we are going to do it we should be consistent.
3670-I.diff is the first half of the fix. It will break some feeds (rare) so it should not be applied by itself. The second half, installing new CDATA filters, touches a lot of code. Please volunteer.
comment:58
lgedeon — 16 months ago
- Cc luke.gedeon@… added

Couldn't we just make this a filter that is applied for RSS feeds but not e.g. admins, like kses?