Make WordPress Core

Opened 10 months ago

Last modified 4 months ago

#59234 new enhancement

Introduce a `wp_json_decode()` function, including validation when available

Reported by: jrf's profile jrf Owned by:
Milestone: Future Release Priority: normal
Severity: normal Version: 6.4
Component: General Keywords: php83 needs-patch dev-feedback
Focuses: Cc:

Description

From: https://core.trac.wordpress.org/ticket/59231:

New json_validate() function

This function is a high-performance way to validate json prior to decoding it. This function cannot be polyfilled without a performance hit.

However, due to the potential for using json for Denial-of-Service attack vectors (via a HUGE file/stream), I would strongly recommend for WP Core to start using this new function in all appropriate places wrapped within an if ( function_exists() ) {}.

The json_decode() function is used 44 times within src (excluding external dependencies).

We may want to consider introducing a wp_json_decode() function to ensure the use of json_validate() (when available).
This would then mirror the already existing `wp_json_encode()` function.

Attachments (1)

Perform_json_validate_before_json_decode_on_REST_API_request_data.patch (929 bytes) - added by nicomollet 5 months ago.
REST API controller json_validate before json_decode

Download all attachments as: .zip

Change History (9)

#1 follow-up: @ayeshrajans
10 months ago

Thanks for opening this ticket. I have a few thoughts on this:

  • If we were to introduce a wp_json_decode function (which I actually did, that lead me to these points), we will have to throw an exception to handle errors. This is because false itself is a valid return value, now we are overstepping the general use case of "decode" to "validate and decode".
  • If we were to run PHP 8.3 native json_validate _and_ json_decode inside the new wp_json_decode, wouldn't that be a performance degradation for valid data?
  • I think a more mild approach would be to polyfill json_validate function? That way, we are giving the choice to the programmer if we are dealing with potentially invalid JSON. For PHP 8.3, we don't have a performance penalty, but at least now the choice is explicit?
<?php

/**
 * Decodes a JSON string. It runs a more performance data validation if
 * json_decode function is available.
 *
 * @since 6.4.0
 *
 * @param string $json The json string being decoded.
 * @param bool $associative Optional. When true, JSON objects will be returned
 *     as associative arrays; when false, JSON objects will be returned as
 *     objects. Note that PHP >= 7.2 native json_decode() function has
 *     $associative as a nullable parameter. In wp_json_decode, it is not
 *     nullable to ensure PHP < 7.2 compatibility.
 * @param int $depth Optional. Maximum nesting depth of the structure being
 *     decoded. The value must be greater than 0, and less than or equal to
 *     2147483647.
 * @param int $flags Optional. Bitmask of JSON decode options. @see json_decode
 *     for available options.
 *
 * @return mixed Returns the value encoded in json as an appropriate PHP type.
 *    Unquoted values true, false and null are returned as true, false and null
 *    respectively. null is returned if the json cannot be decoded or if the
 *    encoded data is deeper than the nesting limit.
 *
 */
function wp_json_decode( string $json, bool $associative = FALSE, int $depth = 512, int $flags = 0 ) {
        if ( function_exists( 'json_validate' ) && json_validate( $json, $depth, $flags & JSON_INVALID_UTF8_IGNORE ) === FALSE ) {
                return FALSE;
        }

        return json_decode( $json, $associative, $depth, $flags );
}

#2 in reply to: ↑ 1 ; follow-up: @jrf
10 months ago

@ayeshrajans

Replying to ayeshrajans:

Thanks for opening this ticket. I have a few thoughts on this:

  • If we were to introduce a wp_json_decode function (which I actually did, that lead me to these points), we will have to throw an exception to handle errors. This is because false itself is a valid return value, now we are overstepping the general use case of "decode" to "validate and decode".

I imagine a WP_Error object could work as a "invalid" return value and be the only exception to what can be reliably decoded by the function ?

Having said that, I'd be happy for WP to start using Exceptions, though I suspect that needs a separate discussion.

Happy to have a think about the function name, maybe call it wp_json_validate_and_decode() ?
I do still think a function is better than having the duplicate code all over the place with a larger risk of people forgetting to validate.

  • If we were to run PHP 8.3 native json_validate _and_ json_decode inside the new wp_json_decode, wouldn't that be a performance degradation for valid data?

From what I read in the mailing list discussion and the RFC, the PHP native (C) implementation is lightning fast, so that should be unnoticeable.

  • I think a more mild approach would be to polyfill json_validate function? That way, we are giving the choice to the programmer if we are dealing with potentially invalid JSON. For PHP 8.3, we don't have a performance penalty, but at least now the choice is explicit?

There was a whole discussion about the (im)possibility of polyfilling this reliably and correctly without impacting performance on the mailinglist and I think some of it is also mentioned in the RFC.

The problem is not so much with small bits of json, but with the large files/streams and that is exactly the case we want to harden against. A polyfill will just not do in that case and have a heavy performance hit, while the native C implementation does not.

#3 @oglekler
9 months ago

  • Milestone changed from 6.4 to 6.5

There is no patch and no time before Beta 1 to make and test one, so, I am moving this ticket into the 6.5 milestone.

#4 in reply to: ↑ 2 @dalleyne
7 months ago

Replying to jrf:

Happy to have a think about the function name, maybe call it wp_json_validate_and_decode() ?

wp_safe_json_decode is a good alternative. It'll imply that it validates, and decodes.

  • If we were to run PHP 8.3 native json_validate _and_ json_decode inside the new wp_json_decode, wouldn't that be a performance degradation for valid data?

From what I read in the mailing list discussion and the RFC, the PHP native (C) implementation is lightning fast, so that should be unnoticeable.

  • I think a more mild approach would be to polyfill json_validate function? That way, we are giving the choice to the programmer if we are dealing with potentially invalid JSON. For PHP 8.3, we don't have a performance penalty, but at least now the choice is explicit?

There was a whole discussion about the (im)possibility of polyfilling this reliably and correctly without impacting performance on the mailinglist and I think some of it is also mentioned in the RFC.

The problem is not so much with small bits of json, but with the large files/streams and that is exactly the case we want to harden against. A polyfill will just not do in that case and have a heavy performance hit, while the native C implementation does not.

How about this implementation:

<?php
function wp_safe_json_decode($json, $assoc = false, $depth = 512, $options = 0) {
    // Perform validation if json_validate is available
    if (function_exists('json_validate') && json_validate( $json, $depth, $flags & JSON_INVALID_UTF8_IGNORE ) === FALSE ) {
        return new WP_Error('json_validation_error', 'JSON validation failed.');
    }

    // Decode the JSON string
    $result = json_decode($json, $assoc, $depth, $options);
    if (json_last_error() !== JSON_ERROR_NONE) {
        // Return a WP_Error with the appropriate error message
        return new WP_Error('json_decoding_error', 'JSON decoding failed: ' . json_last_error_msg());
    }

    return $result;
}

#5 follow-up: @TobiasBg
7 months ago

Just saw: The PHP docs for json_validate() discourage its use when json_encode() is used afterwards.

Caution
Calling json_validate() immediately before json_decode() will unnecessarily parse the string twice, as json_decode() implicitly performs validation during decoding.
json_validate() should therefore only be used if the decode JSON payload is not immediately used and knowing whether the string contains valid JSON is needed.

So, adding this before every json_encode() (via a wrapper function) is probably not needed, but could instead be done selectively in places where a JSON string is forwarded in unmodified form, or where that DOS attach vector exists (with user-supplied data, for example).

#6 @nicomollet
5 months ago

I checked every use of json_decode() and I only saw one that requires protection from DOS attack vector: the REST API controller.
In WP_REST_Request class, parse_json_params() method it parses the body for every request to the REST API, so well exposed to attacks.
I suggest adding a json_validate() just before.

Submitted a patch with it.

@nicomollet
5 months ago

REST API controller json_validate before json_decode

#7 in reply to: ↑ 5 @kkmuffme
4 months ago

As mentioned in https://core.trac.wordpress.org/ticket/59234#comment:5 the PHP docs say that what this ticket suggests should not be done.

I think this ticket can be marked as closed, bc there isn't really a way forward for it, is there?

#8 @swissspidy
4 months ago

  • Keywords dev-feedback added
  • Milestone changed from 6.5 to Future Release

Changing the milestone for now but yeah it sounds like a wontfix candidate.

Note: See TracTickets for help on using tickets.