Make WordPress Core

Opened 15 years ago

Closed 13 years ago

Last modified 13 years ago

#11034 closed defect (bug) (wontfix)

AJAX requests stored as UTF-8 even if a non-UTF-8 charset is in use

Reported by: iansealy's profile iansealy Owned by: azaozz's profile azaozz
Milestone: Priority: low
Severity: normal Version: 2.8.5
Component: Autosave Keywords: has-patch dev-feedback
Focuses: Cc:

Description

AJAX responses are always sent in UTF-8. If a blog is using a non-UTF-8 charset then the data will potentially be stored incorrectly and any text will often end up garbled.

For example, if your blog uses ISO-8859-1 and some text submitted via AJAX contains non-ASCII characters then they'll be stored in the database as UTF-8 double bytes rather than their ISO-8859-1 single byte equivalents. Since these characters will be displayed as if they're ISO-8859-1 you'll just end up seeing garbage.

The attached hack simply converts all AJAX responses from UTF-8 to the blog's charset using iconv.

I've put this as low priority since most blogs probably use UTF-8. I've also put this in the Autosave component, but obviously all AJAX responses are affected.

Attachments (1)

admin-ajax.php.diff (829 bytes) - added by iansealy 15 years ago.

Download all attachments as: .zip

Change History (12)

#1 @scribu
15 years ago

  • Keywords has-patch added; AJAX UTF-8 removed
  • Milestone changed from Unassigned to 2.9

#2 @azaozz
15 years ago

Do you mean "AJAX requests", not "AJAX responses"? As far as I can see the browser encodes any text in the AJAX requests with the current page encoding as set in the header.

#3 @iansealy
15 years ago

  • Cc iansealy added
  • Summary changed from AJAX responses stored as UTF-8 even if a non-UTF-8 charset is in use to AJAX requests stored as UTF-8 even if a non-UTF-8 charset is in use

Sorry, I did, of course, mean AJAX requests, rather than responses.

I see different encodings. So, for example, here's the Content-Type header from the wp-admin/post.php?action=edit response:

Content-Type: text/html; charset=ISO-8859-1

And here's the Content-Type header from the AJAX request sent to wp-admin/admin-ajax.php when an autosave happens:

Content-Type: application/x-www-form-urlencoded; charset=UTF-8

You see ISO-8859-1 (or whatever non-UTF-8 encoding you chose for testing) in the AJAX request too?

#4 @iansealy
15 years ago

Doh! The original patch didn't deal with arrays, so broke things like updating widgets. I've attached a new patch.

#5 @markjaquith
15 years ago

  • Milestone changed from 2.9 to 3.0

Since this is a long-standing issue and isn't a blocker, I'm moving it to 3.0. @iansealy, ping this ticket in a couple weeks and we can work on it for 3.0.

#6 @iansealy
15 years ago

@markjaquith: Sorry, couple of weeks became a couple of months! Want to take a look at this?

#7 @dd32
15 years ago

  • Keywords early added
  • Milestone changed from 3.0 to 3.1

As we're nearing beta, and this will require extra testing, i'm bumping to 3.1 early.

#8 @nacin
14 years ago

  • Keywords early removed
  • Milestone changed from Awaiting Triage to Future Release

#9 @sergey.s.betke@…
13 years ago

  • Cc sergey.s.betke@… added
  • Keywords dev-feedback added

#10 @azaozz
13 years ago

  • Resolution set to wontfix
  • Status changed from new to closed

Using iconv to try to rectify the character encoding doesn't seem right:

  • The external app is usually not available.
  • When available, running the requests through it and presuming they are UTF-8 is wrong (and would break a lot of sites).
  • This is a JS problem and would be better to use JS to fix it.

By default JS is UTF-8. The most straightforward solution seems to be to json encode any text passed through XHR when the HTML document's encoding is not UTF-8. Of course the "best" solution is still to use UTF-8 everywhere :)

This problem has been around for so long that starting to doubt it really needs fixing. Since WordPress has been using UTF-8 by default for several years, it seems there are very few people left that use different encodings, i.e. nearly all users have converted to UTF-8.

Closing as wontfix for now, if there's till need to try and handle this in core, feel free to reopen with a patch.

#11 @ocean90
13 years ago

  • Milestone Future Release deleted
Note: See TracTickets for help on using tickets.