Make WordPress Core

Opened 6 years ago

Closed 6 years ago

Last modified 5 years ago

#30471 closed defect (bug) (fixed)

json_encode() fails on non-utf8 strings

Reported by: pento Owned by: pento
Milestone: 4.1 Priority: normal
Severity: normal Version: 4.1
Component: General Keywords: has-patch
Focuses: Cc:


In PHP 5.4 and earlier, when json_encode() is run on a string that contains non-UTF-8 characters, it will return the string 'null'. As of PHP 5.5, the string will be converted to UTF-8 before it's escaped for the JSON output.

Given this example:

$charsets = mb_detect_order();
if ( ! in_array( 'EUC-JP', $charsets ) ) {
	$charsets[] = 'EUC-JP';
	mb_detect_order( $charsets );

$eucjp = mb_convert_encoding( 'aあb', 'EUC-JP', 'UTF-8' );
$utf8 = mb_convert_encoding( $eucjp, 'UTF-8', 'EUC-JP' );

$json = wp_json_encode( $eucjp );

$json will be the string 'null' in PHP 5.4 an older, and the string "a\u3042b"' in PHP 5.5+.

This becomes more complex as we try to encode arrays or objects.

$json = wp_json_encode( array( 'c', $eucjp ) );

Here, $json will be the string '["c",null]' in PHP 5.4 an older, and the string '["c","a\u3042b"]' in PHP 5.5+.

It's fairly simple to fix this in wp_json_encode() for strings, and horrible to try and fix it for arbitrary arrays/objects.

Attachments (2)

30471.diff (1.4 KB) - added by pento 6 years ago.
30471.2.diff (1.9 KB) - added by pento 6 years ago.

Download all attachments as: .zip

Change History (8)

#1 @pento
6 years ago

In 30534:

json_encode() returns different results for non UTF-8 strings in PHP 5.5+, versus earlier versions of PHP.

This fixes the unit tests that fail in earlier versions, see #30471 for fixing this globally in wp_json_encode().

6 years ago

#2 @pento
6 years ago

  • Keywords has-patch added

30471.diff fixes this for plain strings.

Walking through arbitrary arrays/objects to check for non UTF-8 characters isn't a good option.

Maybe we could do a lazy check for $json containing the string null, and version_compare( PHP_VERSION, '5.4', '<=' )? This will give false positives when $data contains an array or object with a null value, but it's a lot faster than an element-by-element check.

6 years ago

#3 @pento
6 years ago

30471.2.diff will run whenever PHP < 5.5, and $json contains the string null.

Also, in case we decide to go with 30471.diff, it contains a bug - it returns null instead of 'null'.

#4 @pento
6 years ago

  • Owner set to pento
  • Resolution set to fixed
  • Status changed from new to closed

In 30561:

When json_encode() returns a JSON string containing 'null' in PHP 5.4 or earlier, wp_json_encode() will now sanity check the data, as older versions of PHP failed to encode non UTF-8 characters correctly, instead returning 'null'.

Fixes #30471.

#5 @nacin
6 years ago

  • Milestone changed from Awaiting Review to 4.1

This ticket was mentioned in Slack in #core by binarykitten. View the logs.

5 years ago

Note: See TracTickets for help on using tickets.