Opened 11 years ago
Closed 10 years ago
#28786 closed defect (bug) (fixed)
wp_send_json silently fails on non UTF-8 strings
Reported by: | pento | Owned by: | pento |
---|---|---|---|
Milestone: | 4.1 | Priority: | normal |
Severity: | normal | Version: | 3.5 |
Component: | Formatting | Keywords: | |
Focuses: | Cc: |
Description
When wp_send_json()
is used to return a JSON-encoded lump of data, it silently fails if the data contains any non UTF-8 characters.
Attachments (18)
Change History (64)
#5
@
11 years ago
attachment:28786.5.diff falls back to treating the string as UTF-8 if it can't detect the encoding. This is the same behaviour as wp_check_invalid_utf8()
.
#6
@
11 years ago
- Adds
$options
and$depth
parameters, to matchjson_encode()
- A handful of performance improvements for
_wp_json_sanity_check()
- Tidied up
test_wp_json_encode()
test_wp_json_encode_depth()
added to make sure we obey$depth
#7
@
11 years ago
attachment:json_test.php does some performance testing. Results show that, for data that works fine with json_encode()
, the overhead of wp_json_encode()
is minimal. For extremely large data sets that need to be converted, performance can be pretty bad, but I don't think we're going to encounter much in the way of 6D arrays with 1,000,000 items in them. :-)
$ php json_test.php json_encode: 0.051064968109131 wp_json_encode overhead: 0.052061080932617 Best case wp_json_encode: 2.9125101566315 Worst case wp_json_encode: 15.276411056519
#8
@
11 years ago
@pento's patch fixes the situation where an audio file with non UTF-8 in the meta description will cause the Media Library to not load. However, if that audio file is used in the playlist shortcode, it still breaks the player JS on the front-end. 28786.b1.diff patches the relevant part of media.php to use the new encode function.
This ticket was mentioned in IRC in #wordpress-dev by ocean90. View the logs.
10 years ago
This ticket was mentioned in IRC in #wordpress-dev by DrewAPicture. View the logs.
10 years ago
#13
@
10 years ago
- Keywords has-patch commit 4.1-early added; needs-patch removed
- Milestone changed from 4.0 to Future Release
- Updates the patch to apply cleanly against trunk
- Adds the fix from attachment:28786.b1.diff
- Fixes #28807
It's too late for this to go into 4.0, it can go in 4.1 early.
This ticket was mentioned in IRC in #wordpress-dev by helen. View the logs.
10 years ago
#15
@
10 years ago
- Keywords 4.1-early removed
- Milestone changed from Future Release to 4.1
28786.9.diff is a refresh to apply cleanly against trunk.
#17
@
10 years ago
28786.10.diff converts all json_encode()
calls in WordPress to be wp_json_encode()
.
#18
@
10 years ago
I worry about mb_detect_encoding()
but don't have any data or experience to validate or assuage that worry.
I know we're using this code on WordPress.com in places. In other places, we use code that explicitly converts from the blog_charset
(no attempt at auto-detection) to UTF-8.
#19
@
10 years ago
28786.11.diff now also sanity checks array/object keys.
And because we all like things that go faster, unwinding unnecessary recursion gives us a bit of a boost when we need to sanity check the data.
$ php json_test.php json_encode: 0.055270910263062 wp_json_encode overhead: 0.058070182800293 Best case wp_json_encode: 0.98551392555237 Worst case wp_json_encode: 4.3197019100189
#20
@
10 years ago
Looking at 28786.11.diff, couple of nitpicks:
_wp_json_convert_string()
can be called many times. Instead of doingfunction_exists( 'mb_convert_encoding' )
every time, can probably set astatic
after the first check.- A user note in the PHP manual mentions that
mb_detect_encoding()
should be run in strict mode (third arg set to true). Perhaps some "real life" testing for that is worth it.
Also wondering if we should use _wp_json_convert_string()
when htmlspecialchars()
fails (can probably rename it to something without 'json' and have an arg to convert to the 'blog_charset'). Still hear sometimes that wp_richedit_pre()
and wp_htmledit_pre()
may return an empty string especially for posts that were imported (so the charset is not the same as get_option( 'blog_charset' )
).
#22
@
10 years ago
28786.12.diff uses a static
var instead of function_exists()
, and does strict mb_detect_encoding()
checks.
I'm inclined to keep _wp_json_convert_string()
as a function specifically for JSON. I'm working on a patch for WPDB that will provide more comprehensive character set conversion functionality.
This ticket was mentioned in Slack in #core by pento. View the logs.
10 years ago
This ticket was mentioned in Slack in #core by nacin. View the logs.
10 years ago
This ticket was mentioned in Slack in #core by pento. View the logs.
10 years ago
#26
@
10 years ago
- Owner set to pento
- Resolution set to fixed
- Status changed from new to closed
In 30055:
This ticket was mentioned in Slack in #core by justinsainton. View the logs.
10 years ago
#30
@
10 years ago
- Resolution fixed deleted
- Status changed from closed to reopened
28786.docs.diff adds some documentation to the private functions, as per drew, fixes punctuation in some more docs, and changes else if
to elseif
, which seems to be preferred in the WP Coding Standards.
This ticket was mentioned in Slack in #core by tobiasbg. View the logs.
10 years ago
This ticket was mentioned in Slack in #core by tobiasbg. View the logs.
10 years ago
#34
@
10 years ago
28786.docs.2.diff looks pretty good to me. Nice work @TobiasBg!
This ticket was mentioned in Slack in #core by drew. View the logs.
10 years ago
#37
@
10 years ago
- Resolution fixed deleted
- Status changed from closed to reopened
Currently, wp_json_encode()
returns false in its try/catch
block - most of the functions that it is used in or passed to expect string.
Also problematic, if someone sets a JS var equal to the output, their code will explode - see https://github.com/WordPress/WordPress/commit/007ec52958742ed768c74f37e21e6db2b75aa32e#diff-aa3b133f32e8e18853fde3a700fb9a70R274
This may not totally matter, but:
https://scrutinizer-ci.com/g/staylor/WordPress/inspections/c68e0913-d1dd-4dfd-bdcf-5b2dc68e0f3c/issues/files/wp-admin/includes/ajax-actions.php?status=new&orderField=path&order=asc
#38
follow-ups:
↓ 39
↓ 40
↓ 41
@
10 years ago
wp_json_encode()
returns false
on failure, to match the behaviour of json_encode()
.
I'm not wild about changing wp_json_encode()
's behaviour (to return 0 or an empty string on failure, perhaps), though there's possibly value in having a function that always returns valid JSON, I think there being a clear "this thing failed to encode" return value is more valuable.
I think our better option is to fix the places that call wp_json_encode()
, to ensure proper type checking.
#39
in reply to:
↑ 38
@
10 years ago
Replying to pento:
...(to return 0 or an empty string on failure, perhaps)
Right, also a "proper" JSON encoded value is 'false'
, '0'
, '""'
, etc. On the JS side JSON.parse() and jQuery.parseJSON() throw errors on empty string, int, etc.
We should probably handle the false
returned from wp_json_encode()
in wp_send_json_success()
and wp_send_json_error()
.
#40
in reply to:
↑ 38
@
10 years ago
Replying to pento:
I think our better option is to fix the places that call
wp_json_encode()
, to ensure proper type checking.
I agree. This is much better than having wp_json_encode()
not return false, which can be checked for to initiate error handling. If wp_json_encode()
returns valid JSON (in terms of "empty" valid JSON) where json_encode()
returns false, it would no longer be a drop-in replacement.
There would be no good way to find out if a wp_json_encode()
return value like ""
is the result of valid input (like an empty string), or the result of "wrong" input, that just could not be encoded.
#41
in reply to:
↑ 38
@
10 years ago
Replying to pento:
I think our better option is to fix the places that call
wp_json_encode()
, to ensure proper type checking.
I also agree :) json_encode()
can also return false, so these places where a failing wp_json_encode()
breaks things are already places where a failing json_encode()
breaks things.
This ticket was mentioned in Slack in #core by pento. View the logs.
10 years ago
#44
@
10 years ago
- Keywords has-patch commit removed
- Milestone changed from 4.1 to Future Release
How we handle the return value from wp_json_encode()
isn't a regression from how we handled the return value from json_encode()
.
Bumping to 4.2, I'll put a patch up in the next few days.
attachment:28786.4.diff
mb_convert_encoding()
, if available