Make WordPress Core

Opened 10 years ago

Last modified 5 years ago

#4794 closed defect (bug)

xml-rpc should identify encoding — at Version 8

Reported by: redsweater Owned by: josephscott
Milestone: 3.5 Priority: normal
Severity: normal Version: 2.2.2
Component: XML-RPC Keywords: has-patch dev-feedback
Focuses: Cc:

Description (last modified by Denis-de-Bernardy)

WordPress provides users with a preference to identify the text encoding of the blog's content. But this encoding format is not used to identify the content expectations for (most) XML documents generated by xmlrpc.php.

Notice that when RSD support was added, the developer who wrote that code *did* include the blog's encoding in the document header. But for all other XML documents generated (i.e. replies to XML-RPC queries, the encoding is omitted.

When the encoding is omitted, as I understand it, the presumed encoding is UTF8. In my limited experience with customers running non-UTF8 blogs, they tend to use ISO-8859-1 encoding. When they use this encoding and also take advantage of some of the accented characters in that set, such as 0xE9 or 0xc9, the resulting document is illegal XML because it contains characters that are not part of the presumed UTF8 set.

This failure to identify properly the encoding of XML documents can lead blog clients to fail to parse the XML, and therefore cause the XML-RPC to more or less completely fail for a certain class of users.

I propose that xmlrpc.php be modified such that every XML document it generates for the purposes of exposing blog content, be identified as being of the encoding specified by the user in Options -> Reading.

Change History (8)

#1 @foolswisdom
10 years ago

  • Summary changed from WordPress should identify XML document text encoding to xml-rpc should identify encoding

#2 @matt
10 years ago

Isn't the encoding sent in the Content-type header even in xmlrpc.php?

#3 @redsweater
10 years ago

If it was, I don't think that would be sufficient, because the idea is that the XML document should be legally parseable as-is, right?

But it doesn't advertise it in the Content-type header. Here are the relevant lines from a typical response:

Content-Length: 3714
Content-Type: text/xml

<?xml version="1.0"?>

#4 @redsweater
10 years ago

Perhaps there was some confusion as to what constitutes "encoding" - yes the Content-type header identifies the content as being text XML, but the XML then in turn does not identify which character encoding it uses for its node contents.

#5 @Otto42
10 years ago

The problem is specifically in the IXR_Server class:

function output($xml) {
        $xml = '<?xml version="1.0"?>'."\n".$xml;

Edit that to include the charset and you should be good.

#6 @josephscott
10 years ago

  • Cc josephscott added

#7 @darkdragon
9 years ago

  • Component changed from Administration to XML-RPC

#8 @Denis-de-Bernardy
8 years ago

  • Description modified (diff)

seems still valid.

Note: See TracTickets for help on using tickets.