WordPress.org

Make WordPress Core

Opened 2 years ago

Last modified 10 months ago

#19998 new defect (bug)

Feeds can contain characters that are not valid XML

Reported by: westi Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 3.3.1
Component: Feeds Keywords: has-patch
Focuses: Cc:

Description

It is possible for any of the feeds to contain control characters which are not valid in XML e.g. http://www.fileformat.info/info/unicode/char/1c/index.htm

When outputting user supplied content in an XML context we should strip these control characters - they are unprintable and just break feed parsers.

http://en.wikipedia.org/wiki/Valid_characters_in_XML has a good list of what is and isn't valid.

I guess we need a strip_for_xml() function or something.

Attachments (1)

19998b.patch (1.2 KB) - added by lgedeon 15 months ago.

Download all attachments as: .zip

Change History (6)

comment:1 solarissmoke2 years ago

One approach could be to filter using the set of valid characters from the spec:

function strip_for_xml( $utf8 ) {
  return preg_replace( '/[^\x{0009}\x{000a}\x{000d}\x{0020}-\x{D7FF}\x{E000}-\x{FFFD}]+/u', ' ', $utf8 );
}

This assumes that the feed is served as UTF-8. I've no idea what it would do to XML in other charsets.

Last edited 2 years ago by solarissmoke (previous) (diff)

comment:2 lgedeon15 months ago

Should we just remove the offending characters or replace them with a space or ?

How do we find out if we are in utf8? do we just skip this step if we are not? At least for now?

comment:3 lgedeon15 months ago

  • Cc luke.gedeon@… added

I found an answer for the second question in /wp-includes/formatting.php:

$is_utf8 = in_array( get_option( 'blog_charset' ), array( 'utf8', 'utf-8', 'UTF8', 'UTF-8' ) );

lgedeon15 months ago

comment:4 lgedeon15 months ago

  • Keywords has-patch added; needs-patch removed

Patch 19998b.patch takes approach proposed by solarissmoke and uses function name proposed by Westi.

comment:5 SergeyBiryukov10 months ago

#24701 was marked as a duplicate.

Note: See TracTickets for help on using tickets.