WordPress.org

Make WordPress Core

Opened 3 years ago

Closed 2 years ago

#28786 closed defect (bug) (fixed)

wp_send_json silently fails on non UTF-8 strings

Reported by: pento Owned by: pento
Milestone: 4.1 Priority: normal
Severity: normal Version: 3.5
Component: Formatting Keywords:
Focuses: Cc:

Description

When wp_send_json() is used to return a JSON-encoded lump of data, it silently fails if the data contains any non UTF-8 characters.

Attachments (18)

28786.diff (1.5 KB) - added by pento 3 years ago.
28786.2.diff (2.7 KB) - added by pento 3 years ago.
28786.3.diff (2.8 KB) - added by pento 3 years ago.
28786.4.diff (2.8 KB) - added by pento 3 years ago.
28786.5.diff (2.8 KB) - added by pento 3 years ago.
28786.6.diff (4.5 KB) - added by pento 3 years ago.
28786.7.diff (4.5 KB) - added by pento 3 years ago.
json_test.php (1.3 KB) - added by pento 3 years ago.
28786.b1.diff (443 bytes) - added by DJPaul 3 years ago.
28786.8.diff (6.8 KB) - added by pento 3 years ago.
28786.9.diff (6.9 KB) - added by pento 3 years ago.
28786.10.diff (17.7 KB) - added by pento 3 years ago.
28786.11.diff (18.6 KB) - added by pento 3 years ago.
28786.12.diff (18.7 KB) - added by pento 3 years ago.
28786.13.diff (668 bytes) - added by JustinSainton 3 years ago.
28786.docs.diff (4.0 KB) - added by TobiasBg 3 years ago.
28786.docs.2.diff (4.2 KB) - added by TobiasBg 3 years ago.
28786-unittests.diff (1.5 KB) - added by MikeHansenMe 3 years ago.
see #30284

Download all attachments as: .zip

Change History (64)

@pento
3 years ago

#1 @pento
3 years ago

  • Keywords has-patch needs-unit-tests added

@pento
3 years ago

@pento
3 years ago

@pento
3 years ago

#2 @pento
3 years ago

  • Keywords needs-unit-tests removed

attachment:28786.4.diff

  • Adds unit tests
  • Tries to use mb_convert_encoding(), if available
  • Removes some old code

#3 @pento
3 years ago

  • Keywords commit added

No objections? Cool. Lets commit it and see what happens.

#4 @ocean90
3 years ago

Related: #28807

@pento
3 years ago

#5 @pento
3 years ago

attachment:28786.5.diff falls back to treating the string as UTF-8 if it can't detect the encoding. This is the same behaviour as wp_check_invalid_utf8().

@pento
3 years ago

@pento
3 years ago

#6 @pento
3 years ago

attachment:28786.7.diff

  • Adds $options and $depth parameters, to match json_encode()
  • A handful of performance improvements for _wp_json_sanity_check()
  • Tidied up test_wp_json_encode()
  • test_wp_json_encode_depth() added to make sure we obey $depth

@pento
3 years ago

#7 @pento
3 years ago

attachment:json_test.php does some performance testing. Results show that, for data that works fine with json_encode(), the overhead of wp_json_encode() is minimal. For extremely large data sets that need to be converted, performance can be pretty bad, but I don't think we're going to encounter much in the way of 6D arrays with 1,000,000 items in them. :-)

$ php json_test.php 
json_encode: 0.051064968109131
wp_json_encode overhead: 0.052061080932617
Best case wp_json_encode: 2.9125101566315
Worst case wp_json_encode: 15.276411056519

@DJPaul
3 years ago

#8 @DJPaul
3 years ago

@pento's patch fixes the situation where an audio file with non UTF-8 in the meta description will cause the Media Library to not load. However, if that audio file is used in the playlist shortcode, it still breaks the player JS on the front-end. 28786.b1.diff patches the relevant part of media.php to use the new encode function.

Last edited 3 years ago by DJPaul (previous) (diff)

This ticket was mentioned in IRC in #wordpress-dev by ocean90. View the logs.


3 years ago

This ticket was mentioned in IRC in #wordpress-dev by DrewAPicture. View the logs.


3 years ago

#11 @DrewAPicture
3 years ago

#28807 was marked as a duplicate.

#12 @DrewAPicture
3 years ago

  • Keywords needs-patch added; has-patch commit removed

Closed #28807 as a duplicate, since both tickets serve to handle non-UTF-8 failures with json_encode(). We'll need to make sure that the plugin API response calls referenced in #28807 are handled here as well.

This will need lead dev support to get in for 4.0.

@pento
3 years ago

#13 @pento
3 years ago

  • Keywords has-patch commit 4.1-early added; needs-patch removed
  • Milestone changed from 4.0 to Future Release

attachment:28786.8.diff

It's too late for this to go into 4.0, it can go in 4.1 early.

This ticket was mentioned in IRC in #wordpress-dev by helen. View the logs.


3 years ago

@pento
3 years ago

#15 @pento
3 years ago

  • Keywords 4.1-early removed
  • Milestone changed from Future Release to 4.1

28786.9.diff is a refresh to apply cleanly against trunk.

#16 @pento
3 years ago

#15827 was marked as a duplicate.

@pento
3 years ago

#17 @pento
3 years ago

28786.10.diff converts all json_encode() calls in WordPress to be wp_json_encode().

#18 @mdawaffe
3 years ago

I worry about mb_detect_encoding() but don't have any data or experience to validate or assuage that worry.

I know we're using this code on WordPress.com in places. In other places, we use code that explicitly converts from the blog_charset (no attempt at auto-detection) to UTF-8.

@pento
3 years ago

#19 @pento
3 years ago

28786.11.diff now also sanity checks array/object keys.

And because we all like things that go faster, unwinding unnecessary recursion gives us a bit of a boost when we need to sanity check the data.

$ php json_test.php 
json_encode: 0.055270910263062
wp_json_encode overhead: 0.058070182800293
Best case wp_json_encode: 0.98551392555237
Worst case wp_json_encode: 4.3197019100189

#20 @azaozz
3 years ago

Looking at 28786.11.diff, couple of nitpicks:

  • _wp_json_convert_string() can be called many times. Instead of doing function_exists( 'mb_convert_encoding' ) every time, can probably set a static after the first check.
  • A user note in the PHP manual mentions that mb_detect_encoding() should be run in strict mode (third arg set to true). Perhaps some "real life" testing for that is worth it.

Also wondering if we should use _wp_json_convert_string() when htmlspecialchars() fails (can probably rename it to something without 'json' and have an arg to convert to the 'blog_charset'). Still hear sometimes that wp_richedit_pre() and wp_htmledit_pre() may return an empty string especially for posts that were imported (so the charset is not the same as get_option( 'blog_charset' )).

Last edited 3 years ago by azaozz (previous) (diff)

#21 @DrewAPicture
3 years ago

  • Component changed from General to Formatting

@pento
3 years ago

#22 @pento
3 years ago

28786.12.diff uses a static var instead of function_exists(), and does strict mb_detect_encoding() checks.

I'm inclined to keep _wp_json_convert_string() as a function specifically for JSON. I'm working on a patch for WPDB that will provide more comprehensive character set conversion functionality.

This ticket was mentioned in Slack in #core by pento. View the logs.


3 years ago

This ticket was mentioned in Slack in #core by nacin. View the logs.


3 years ago

This ticket was mentioned in Slack in #core by pento. View the logs.


3 years ago

#26 @pento
3 years ago

  • Owner set to pento
  • Resolution set to fixed
  • Status changed from new to closed

In 30055:

Add wp_json_encode(), a wrapper for json_encode() that ensures everything is converted to UTF-8.

Change all core calls from json_encode() to wp_json_encode().

Fixes #28786.

This ticket was mentioned in Slack in #core by justinsainton. View the logs.


3 years ago

#28 @JustinSainton
3 years ago

Minor doc fix in 28786.13.diff

#29 @pento
3 years ago

In 30058:

Fix a PHPDoc typo for wp_json_encode().

Props JustinSainton.

See #28786.

#30 @TobiasBg
3 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

28786.docs.diff adds some documentation to the private functions, as per drew, fixes punctuation in some more docs, and changes else if to elseif, which seems to be preferred in the WP Coding Standards.

Last edited 3 years ago by TobiasBg (previous) (diff)

@TobiasBg
3 years ago

This ticket was mentioned in Slack in #core by tobiasbg. View the logs.


3 years ago

This ticket was mentioned in Slack in #core by tobiasbg. View the logs.


3 years ago

#33 @markjaquith
3 years ago

In 30075:

Define JSON_PRETTY_PRINT so it can be used with wp_json_encode()

  • JSON_PRETTY_PRINT was introduced in PHP 5.4
  • Now you can use it with lower PHP versions, without a notice

fixes #30139
see #28786

#34 @DrewAPicture
3 years ago

28786.docs.2.diff looks pretty good to me. Nice work @TobiasBg!

This ticket was mentioned in Slack in #core by drew. View the logs.


3 years ago

#36 @markjaquith
3 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 30078:

Docs and code standards cleanup for [30055] (wp_json_encode() & friends)

fixes #28786
props TobiasBg

#37 @wonderboymusic
3 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

Currently, wp_json_encode() returns false in its try/catch block - most of the functions that it is used in or passed to expect string.

Also problematic, if someone sets a JS var equal to the output, their code will explode - see https://github.com/WordPress/WordPress/commit/007ec52958742ed768c74f37e21e6db2b75aa32e#diff-aa3b133f32e8e18853fde3a700fb9a70R274

This may not totally matter, but:
https://scrutinizer-ci.com/g/staylor/WordPress/inspections/c68e0913-d1dd-4dfd-bdcf-5b2dc68e0f3c/issues/files/wp-admin/includes/ajax-actions.php?status=new&orderField=path&order=asc

#38 follow-ups: @pento
3 years ago

wp_json_encode() returns false on failure, to match the behaviour of json_encode().

I'm not wild about changing wp_json_encode()'s behaviour (to return 0 or an empty string on failure, perhaps), though there's possibly value in having a function that always returns valid JSON, I think there being a clear "this thing failed to encode" return value is more valuable.

I think our better option is to fix the places that call wp_json_encode(), to ensure proper type checking.

#39 in reply to: ↑ 38 @azaozz
3 years ago

Replying to pento:

...(to return 0 or an empty string on failure, perhaps)

Right, also a "proper" JSON encoded value is 'false', '0', '""', etc. On the JS side JSON.parse() and jQuery.parseJSON() throw errors on empty string, int, etc.

We should probably handle the false returned from wp_json_encode() in wp_send_json_success() and wp_send_json_error().

Last edited 3 years ago by azaozz (previous) (diff)

#40 in reply to: ↑ 38 @TobiasBg
3 years ago

Replying to pento:

I think our better option is to fix the places that call wp_json_encode(), to ensure proper type checking.

I agree. This is much better than having wp_json_encode() not return false, which can be checked for to initiate error handling. If wp_json_encode() returns valid JSON (in terms of "empty" valid JSON) where json_encode() returns false, it would no longer be a drop-in replacement.
There would be no good way to find out if a wp_json_encode() return value like "" is the result of valid input (like an empty string), or the result of "wrong" input, that just could not be encoded.

#41 in reply to: ↑ 38 @mdawaffe
3 years ago

Replying to pento:

I think our better option is to fix the places that call wp_json_encode(), to ensure proper type checking.

I also agree :) json_encode() can also return false, so these places where a failing wp_json_encode() breaks things are already places where a failing json_encode() breaks things.

#42 @pento
3 years ago

In 30533:

Split the tests for wp_json_encode() into smaller chunks (let's all them "units"). Skip a couple of these tests when running on older versions of PHP that don't support the tested functionality.

See #28786.

This ticket was mentioned in Slack in #core by pento. View the logs.


2 years ago

#44 @pento
2 years ago

  • Keywords has-patch commit removed
  • Milestone changed from 4.1 to Future Release

How we handle the return value from wp_json_encode() isn't a regression from how we handled the return value from json_encode().

Bumping to 4.2, I'll put a patch up in the next few days.

#45 @helen
2 years ago

We may be better closing this as fixed on 4.1 and opening a new ticket for the return value handling.

#46 @pento
2 years ago

  • Milestone changed from Future Release to 4.1
  • Resolution set to fixed
  • Status changed from reopened to closed

I agree - we can continue this discussion in #30613.

Note: See TracTickets for help on using tickets.