WordPress.org

Make WordPress Core

Opened 12 years ago

Closed 7 years ago

#4794 closed defect (bug) (fixed)

xml-rpc should identify encoding

Reported by: redsweater Owned by: josephscott
Milestone: 3.5 Priority: normal
Severity: normal Version: 2.2.2
Component: XML-RPC Keywords: has-patch dev-feedback
Focuses: Cc:
PR Number:

Description (last modified by Denis-de-Bernardy)

WordPress provides users with a preference to identify the text encoding of the blog's content. But this encoding format is not used to identify the content expectations for (most) XML documents generated by xmlrpc.php.

Notice that when RSD support was added, the developer who wrote that code *did* include the blog's encoding in the document header. But for all other XML documents generated (i.e. replies to XML-RPC queries, the encoding is omitted.

When the encoding is omitted, as I understand it, the presumed encoding is UTF8. In my limited experience with customers running non-UTF8 blogs, they tend to use ISO-8859-1 encoding. When they use this encoding and also take advantage of some of the accented characters in that set, such as 0xE9 or 0xc9, the resulting document is illegal XML because it contains characters that are not part of the presumed UTF8 set.

This failure to identify properly the encoding of XML documents can lead blog clients to fail to parse the XML, and therefore cause the XML-RPC to more or less completely fail for a certain class of users.

I propose that xmlrpc.php be modified such that every XML document it generates for the purposes of exposing blog content, be identified as being of the encoding specified by the user in Options -> Reading.

Attachments (5)

4794.diff (728 bytes) - added by solarissmoke 9 years ago.
Set charset/encoding in XML response - same as we do everywhere else in core
class-IXR.php (739 bytes) - added by sergey.s.betke@… 8 years ago.
patch for resolving XMLRPC encoding error
class-IXR.php.diff (739 bytes) - added by sergey.s.betke@… 8 years ago.
excuse me, correct file name extension
4794.2.diff (723 bytes) - added by SergeyBiryukov 7 years ago.
4794.3.diff (944 bytes) - added by SergeyBiryukov 7 years ago.

Download all attachments as: .zip

Change History (23)

#1 @foolswisdom
12 years ago

  • Summary changed from WordPress should identify XML document text encoding to xml-rpc should identify encoding

#2 @matt
12 years ago

Isn't the encoding sent in the Content-type header even in xmlrpc.php?

#3 @redsweater
12 years ago

If it was, I don't think that would be sufficient, because the idea is that the XML document should be legally parseable as-is, right?

But it doesn't advertise it in the Content-type header. Here are the relevant lines from a typical response:

Content-Length: 3714
Content-Type: text/xml

<?xml version="1.0"?>
<methodResponse>

#4 @redsweater
12 years ago

Perhaps there was some confusion as to what constitutes "encoding" - yes the Content-type header identifies the content as being text XML, but the XML then in turn does not identify which character encoding it uses for its node contents.

#5 @Otto42
12 years ago

The problem is specifically in the IXR_Server class:

function output($xml) {
        $xml = '<?xml version="1.0"?>'."\n".$xml;
...

Edit that to include the charset and you should be good.

#6 @josephscott
12 years ago

  • Cc josephscott added

#7 @darkdragon
12 years ago

  • Component changed from Administration to XML-RPC

#8 @Denis-de-Bernardy
10 years ago

  • Description modified (diff)

seems still valid.

#9 @Denis-de-Bernardy
10 years ago

  • Keywords needs-patch added
  • Milestone changed from 2.9 to Future Release
  • Type changed from defect (bug) to enhancement

@solarissmoke
9 years ago

Set charset/encoding in XML response - same as we do everywhere else in core

#10 @solarissmoke
9 years ago

  • Keywords has-patch added; needs-patch removed

#11 @sergey.s.betke@…
8 years ago

  • Cc sergey.s.betke@… added
  • Type changed from enhancement to defect (bug)

This bug still exists in WordPress 3.2.1. When PHP.ini doesn't have default_charset option, and http server used default charset different that get_option('blog_charset'), XML-RPC application (for example - Microsoft Live Writer, Microsoft Word) when reading the posts, tags, categories, don't recognize the content encoding.
Example - http://sergey-s-betke.blogs.novgaro.ru/wordpress-3-2-1-i-problema-s-utf-8-v-ajax-i-xmlrpc.

Patch tested - it's working properly.

@sergey.s.betke@…
8 years ago

patch for resolving XMLRPC encoding error

@sergey.s.betke@…
8 years ago

excuse me, correct file name extension

#12 @sergey.s.betke@…
8 years ago

  • Keywords dev-feedback added

I added the patch (class-IXR.php.diff) to the current WordPress kernel version. I ask the developers to make check-in.

#13 follow-up: @SergeyBiryukov
7 years ago

Closed #20705 as a duplicate.

#14 in reply to: ↑ 13 @aercolino
7 years ago

  • Cc cappuccino.e.cornetto@… added

Replying to SergeyBiryukov:

Closed #20705 as a duplicate.

Right, it's a duplicate. But is there any reason for not fixing this issue?
I've just checked 3.4.1 and still no fix...

#15 @SergeyBiryukov
7 years ago

  • Milestone changed from Future Release to 3.5

Refreshed the patch (departed from the coding standards for consistency with the surrounding code).

Moving for review along with #19448 and #19454.

#16 @ryan
7 years ago

This is technically a third-party library, but we've inherited it so we can tweak if needed. Just in case someone is trying to use this standalone, let's add a function_exists() for get_option().

#17 @SergeyBiryukov
7 years ago

4794.3.diff only specifies encoding if get_option() exists.

#18 @ryan
7 years ago

  • Resolution set to fixed
  • Status changed from new to closed

In [21531]:

Specify the encoding in IXR_Server::output(). Props solarissmoke, sergey.s.betke@…, SergeyBiryukov. fixes #4794

Note: See TracTickets for help on using tickets.