WordPress.org

Make WordPress Core

Opened 7 years ago

Closed 20 months ago

#4794 closed defect (bug) (fixed)

xml-rpc should identify encoding

Reported by: redsweater Owned by: josephscott
Milestone: 3.5 Priority: normal
Severity: normal Version: 2.2.2
Component: XML-RPC Keywords: has-patch dev-feedback
Focuses: Cc:

Description (last modified by Denis-de-Bernardy)

WordPress provides users with a preference to identify the text encoding of the blog's content. But this encoding format is not used to identify the content expectations for (most) XML documents generated by xmlrpc.php.

Notice that when RSD support was added, the developer who wrote that code *did* include the blog's encoding in the document header. But for all other XML documents generated (i.e. replies to XML-RPC queries, the encoding is omitted.

When the encoding is omitted, as I understand it, the presumed encoding is UTF8. In my limited experience with customers running non-UTF8 blogs, they tend to use ISO-8859-1 encoding. When they use this encoding and also take advantage of some of the accented characters in that set, such as 0xE9 or 0xc9, the resulting document is illegal XML because it contains characters that are not part of the presumed UTF8 set.

This failure to identify properly the encoding of XML documents can lead blog clients to fail to parse the XML, and therefore cause the XML-RPC to more or less completely fail for a certain class of users.

I propose that xmlrpc.php be modified such that every XML document it generates for the purposes of exposing blog content, be identified as being of the encoding specified by the user in Options -> Reading.

Attachments (5)

4794.diff (728 bytes) - added by solarissmoke 3 years ago.
Set charset/encoding in XML response - same as we do everywhere else in core
class-IXR.php (739 bytes) - added by sergey.s.betke@… 2 years ago.
patch for resolving XMLRPC encoding error
class-IXR.php.diff (739 bytes) - added by sergey.s.betke@… 2 years ago.
excuse me, correct file name extension
4794.2.diff (723 bytes) - added by SergeyBiryukov 21 months ago.
4794.3.diff (944 bytes) - added by SergeyBiryukov 20 months ago.

Download all attachments as: .zip

Change History (23)

comment:1 foolswisdom7 years ago

  • Summary changed from WordPress should identify XML document text encoding to xml-rpc should identify encoding

comment:2 matt7 years ago

Isn't the encoding sent in the Content-type header even in xmlrpc.php?

comment:3 redsweater7 years ago

If it was, I don't think that would be sufficient, because the idea is that the XML document should be legally parseable as-is, right?

But it doesn't advertise it in the Content-type header. Here are the relevant lines from a typical response:

Content-Length: 3714
Content-Type: text/xml

<?xml version="1.0"?>
<methodResponse>

comment:4 redsweater7 years ago

Perhaps there was some confusion as to what constitutes "encoding" - yes the Content-type header identifies the content as being text XML, but the XML then in turn does not identify which character encoding it uses for its node contents.

comment:5 Otto427 years ago

The problem is specifically in the IXR_Server class:

function output($xml) {
        $xml = '<?xml version="1.0"?>'."\n".$xml;
...

Edit that to include the charset and you should be good.

comment:6 josephscott7 years ago

  • Cc josephscott added

comment:7 darkdragon6 years ago

  • Component changed from Administration to XML-RPC

comment:8 Denis-de-Bernardy5 years ago

  • Description modified (diff)

seems still valid.

comment:9 Denis-de-Bernardy5 years ago

  • Keywords needs-patch added
  • Milestone changed from 2.9 to Future Release
  • Type changed from defect (bug) to enhancement

solarissmoke3 years ago

Set charset/encoding in XML response - same as we do everywhere else in core

comment:10 solarissmoke3 years ago

  • Keywords has-patch added; needs-patch removed

comment:11 sergey.s.betke@…2 years ago

  • Cc sergey.s.betke@… added
  • Type changed from enhancement to defect (bug)

This bug still exists in WordPress 3.2.1. When PHP.ini doesn't have default_charset option, and http server used default charset different that get_option('blog_charset'), XML-RPC application (for example - Microsoft Live Writer, Microsoft Word) when reading the posts, tags, categories, don't recognize the content encoding.
Example - http://sergey-s-betke.blogs.novgaro.ru/wordpress-3-2-1-i-problema-s-utf-8-v-ajax-i-xmlrpc.

Patch tested - it's working properly.

sergey.s.betke@…2 years ago

patch for resolving XMLRPC encoding error

sergey.s.betke@…2 years ago

excuse me, correct file name extension

comment:12 sergey.s.betke@…2 years ago

  • Keywords dev-feedback added

I added the patch (class-IXR.php.diff) to the current WordPress kernel version. I ask the developers to make check-in.

comment:13 follow-up: SergeyBiryukov22 months ago

Closed #20705 as a duplicate.

comment:14 in reply to: ↑ 13 aercolino21 months ago

  • Cc cappuccino.e.cornetto@… added

Replying to SergeyBiryukov:

Closed #20705 as a duplicate.

Right, it's a duplicate. But is there any reason for not fixing this issue?
I've just checked 3.4.1 and still no fix...

SergeyBiryukov21 months ago

comment:15 SergeyBiryukov21 months ago

  • Milestone changed from Future Release to 3.5

Refreshed the patch (departed from the coding standards for consistency with the surrounding code).

Moving for review along with #19448 and #19454.

comment:16 ryan20 months ago

This is technically a third-party library, but we've inherited it so we can tweak if needed. Just in case someone is trying to use this standalone, let's add a function_exists() for get_option().

SergeyBiryukov20 months ago

comment:17 SergeyBiryukov20 months ago

4794.3.diff only specifies encoding if get_option() exists.

comment:18 ryan20 months ago

  • Resolution set to fixed
  • Status changed from new to closed

In [21531]:

Specify the encoding in IXR_Server::output(). Props solarissmoke, sergey.s.betke@…, SergeyBiryukov. fixes #4794

Note: See TracTickets for help on using tickets.