Make WordPress Core

Opened 13 months ago

Last modified 13 months ago

#59082 new defect (bug)

Titles and descriptions in RSS feeds need to use CDATA to encode special characters.

Reported by: jsmoriss's profile jsmoriss Owned by:
Milestone: Awaiting Review Priority: normal
Severity: major Version: 6.3
Component: Feeds Keywords:
Focuses: Cc:

Description

Only some RSS XML tag values are enclosed in CDATA, but all values that include special characters should be enclosed in CDATA so we can encode them, like UTF8, emojis, etc. (see https://validator.w3.org/feed/).

For example the item description in feed-rss.php uses CDATA:

<description><![CDATA[<?php the_excerpt_rss(); ?>]]></description>

But the channel description does not:

<description><?php bloginfo_rss( 'description' ); ?></description>

Only some description tags use CDATA:

wordpress/wp-includes$ grep '<description>' feed*
feed-rdf.php:	<description><?php bloginfo_rss( 'description' ); ?></description>
feed-rdf.php:		<description><![CDATA[<?php the_excerpt_rss(); ?>]]></description>
feed-rdf.php:		<description><![CDATA[<?php the_excerpt_rss(); ?>]]></description>
feed-rss2-comments.php:	<description><?php bloginfo_rss( 'description' ); ?></description>
feed-rss2-comments.php:			<description><?php echo ent2ncr( __( 'Protected Comments: Please enter your password to view comments.' ) ); ?></description>
feed-rss2-comments.php:			<description><![CDATA[<?php comment_text_rss(); ?>]]></description>
feed-rss2.php:	<description><?php bloginfo_rss( 'description' ); ?></description>
feed-rss2.php:			<description><![CDATA[<?php the_excerpt_rss(); ?>]]></description>
feed-rss2.php:			<description><![CDATA[<?php the_excerpt_rss(); ?>]]></description>
feed-rss.php:	<description><?php bloginfo_rss( 'description' ); ?></description>
feed-rss.php:		<description><![CDATA[<?php the_excerpt_rss(); ?>]]></description>

And none of the title tags use CDATA:

wordpress/wp-includes$ grep '<title>' feed*
feed-atom-comments.php:		<title>
feed.php:	<title>' . $rss_title . '</title>
feed-rdf.php:	<title><?php wp_title_rss(); ?></title>
feed-rdf.php:	<title><?php the_title_rss(); ?></title>
feed-rss2-comments.php:	<title>
feed-rss2-comments.php:		<title>
feed-rss2.php:	<title><?php wp_title_rss(); ?></title>
feed-rss2.php:		<title><?php the_title_rss(); ?></title>
feed-rss.php:	<title><?php wp_title_rss(); ?></title>
feed-rss.php:		<title><?php the_title_rss(); ?></title>

I found this issue because a site's title contained an encoded special character and an RSS reader refused to parse the XML because the title value was not in a CDATA enclosure.

js.

Change History (2)

#1 @jsmoriss
13 months ago

wp_title_rss() and get_wp_title_rss() (for example) return the value of wp_get_document_title(), which often includes HTML and/or UTF8 encoded characters, so the RSS XML title tags should use CDATA.

I would suggest that this is incorrect:

feed-rss2.php:	<title><?php wp_title_rss(); ?></title>
feed-rss.php:	<title><?php wp_title_rss(); ?></title>

js.

#2 @jsmoriss
13 months ago

An example HTML entity fix that passes the W3C validator without using CDATA is:

add_filter( 'get_wp_title_rss', 'fix_document_title_for_rss' );

function fix_document_title_for_rss( $itle ) {

    return ent2ncr( $title );
}

The get_wp_title_rss() function could be modified to apply ent2ncr() on all results from the 'get_wp_title_rss' filter - for example:

return ent2ncr( apply_filters( 'get_wp_title_rss', wp_get_document_title(), $deprecated ) );

js.

Note: See TracTickets for help on using tickets.