WordPress.org

Make WordPress Core

Opened 14 months ago

Closed 8 weeks ago

#38583 closed enhancement (fixed)

Support for objects in schema validation and sanitization

Reported by: rachelbaker Owned by: joehoyle
Milestone: 4.9 Priority: high
Severity: major Version: 4.7
Component: REST API Keywords: has-unit-tests has-patch commit
Focuses: Cc:

Description

Follow-up on #38531 and support sanitizing and validating objects with our schema. This would allow our Meta and Settings endpoints to accept non-scalar values.
See https://github.com/WP-API/WP-API/issues/2859

Attachments (6)

38583.1.diff (13.0 KB) - added by TimothyBlynJacobs 4 months ago.
38583.2.diff (26.8 KB) - added by TimothyBlynJacobs 4 months ago.
38583.3.diff (8.7 KB) - added by joehoyle 2 months ago.
38583.4.diff (9.4 KB) - added by joehoyle 2 months ago.
Add support for stdClass
38583-settings.diff (10.9 KB) - added by joehoyle 2 months ago.
38583.5.diff (9.8 KB) - added by joehoyle 8 weeks ago.

Download all attachments as: .zip

Change History (56)

#2 @jason_the_adams
6 months ago

Looking forward to seeing this in a release. So this is waiting for sanitization support, currently?

This ticket was mentioned in Slack in #core-restapi by jasontheadams. View the logs.


6 months ago

#4 @TimothyBlynJacobs
4 months ago

  • Keywords has-patch has-unit-tests dev-feedback added; needs-patch removed

I've added two separate versions.

1) Simply adds object validation. For the properties like content or title which can accept either a structured object or a string, the validation_callback is set to null ( matching sanitization ).

2) Adds object validation and adds support for multiple type options. So content for instance becomes type => [ 'object', 'string'].

This ticket was mentioned in Slack in #core-restapi by timothybjacobs. View the logs.


4 months ago

#6 @rmccue
4 months ago

  • Milestone changed from Future Release to 4.9
  • Owner set to rmccue
  • Status changed from new to reviewing
  • Type changed from defect (bug) to enhancement
  • Version set to 4.7

#7 @westonruter
3 months ago

  • Priority changed from normal to high

Bumping priority to high for visibility and alignment with 4.9 goals, and given proximity to beta 1 deadline.

This ticket was mentioned in Slack in #core-restapi by kadamwhite. View the logs.


3 months ago

#9 @kadamwhite
2 months ago

@joehoyle I think you'd be in the best position to review this, do you have bandwidth to do that today so we can consider this for 4.9?

#10 @joehoyle
2 months ago

38583.1.diff is looking good to me, I don't think it would be right to go with 38583.2.diff at this point. It could be considered after adding basic object support, but the two should be separate. Thanks for all the work on it though, that's a big patch!

For 38583.1.diff this looks mostly good. I think we shouldn't support additionalProperties for now, the point of the schema is to whitelist data and types into the REST API (and out), so I'm not sure I see a long term path for it; but again, let's get basic object support before putting a nail in that coffin.

I can clean up 38583.1.diff to remove those, test etc.

We also ideally need to look at register_meta / register_setting to make sure they also handle the extra types, else this is only useful to register_rest_field and custom controllers / endpoints.

@joehoyle
2 months ago

#11 @joehoyle
2 months ago

Added new patch working off 38583.1.diff:

  • Removed support for additionalProperties
  • Fixed sanitize -> validate in comments
  • Will now remove / ignore any properties that are not included in the registered properties. We follow this pattern mostly around the API where over-providing data is just silently ignored.

@joehoyle
2 months ago

Add support for stdClass

#12 @joehoyle
2 months ago

In 41727:

REST API: Support for objects in schema validation and sanitization.

When registering routes developers can now define their complex objects in the schema and benefit from the automatic validation and sanitization in the REST API. This also paves the way for support for complex object registration via register_meta and register_setting.

See #38583.
Props TimothyBlynJacobs5.

#13 @joehoyle
2 months ago

I'm going to keep this ticket up and do a follow-up patch to add support for register_meta and register_setting.

This ticket was mentioned in Slack in #core by jeffpaul. View the logs.


2 months ago

#15 @joehoyle
2 months ago

Added a patch for enabling use of objects in register_setting. This is essentially whitelisting the added type in the settings controller, and also now passes properties from the arg registration into the validation functions.

#16 @kadamwhite
2 months ago

In 41758:

REST API: Support objects in settings schema.

Enables register_setting to accept an object as its schema value, allowing settings to accept non-scalar values through the REST API.
This whitelists the added type in the settings controller, and passes properties from argument registration into the validation functions.

Props joehoyle.
See #38583.

#17 @kadamwhite
2 months ago

Leaving this open in case we want to continue to use this ticket for register_meta, but defer to @joehoyle on whether that should at this point be broken out.

This ticket was mentioned in Slack in #core by kadamwhite. View the logs.


2 months ago

#19 @westonruter
2 months ago

  • Resolution set to fixed
  • Status changed from reviewing to closed

Let's resolve this as fixed and open a new task/defect for any necessary change for register_meta().

#20 @mnelson4
2 months ago

This is a nice feature we will definetely use in our plugin!
Unfortunately, the only reason I learned about it was because it broke our plugin's unit tests! :)

Our custom REST API endpoints accept a query parameterof type "object" which has a pretty complex structure, so we were performing our validation on it later in the request. However, this change now has validation kicking in early and wiping the object right away. So, our previous valid query params are now being marked as invalid.

eg, this is the info in the WP REST API index page about the endpoint

"/ee/v4.8.36/datetimes": {
            "namespace": "ee/v4.8.36",
            "methods": [
                "GET",
                "POST"
            ],
            "endpoints": [
                {
                    "methods": [
                        "GET"
                    ],
                    "args": {
                        "where": {
                            "required": false,
                            "default": [],
                            "type": "object"
                        },
...

A request like mysite.com/wp-json/ee/v4.8.36/datetimes?where[DTT_EVT_start]=2017-01-01T00:00:00 used to only show the datetimes where their DTT_EVT_start property matched January 1st 2017, but this new validation removes that condition because it only considers an empty object to be a valid value for the where arg.

I'm not sure if other plugin developers or users with custom endpoints will experience the same issue.

--- I removed my code suggestion. I now think @TimothyBlynJacobs 's original approach, to have an additionalProperties argument, is best.

Last edited 2 months ago by mnelson4 (previous) (diff)

This ticket was mentioned in Slack in #core-restapi by mnelson4. View the logs.


2 months ago

#22 @mnelson4
2 months ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

I'm sorry if this isn't the right way to do it, but I reopened this ticket. In slack @TimothyBlynJacobs mentioned

Yeah, the commit for that needs massaging for unlisted properties. The current version in core is the direct opposite of the spec and has the potential to break other code

I found this article that gives a fairly simple explanation of the spec. It says

The additionalProperties keyword is used to control the handling of extra stuff, that is, properties whose names are not listed in the properties keyword. By default any additional properties are allowed.

The additionalProperties keyword may be either a boolean or an object. If additionalProperties is a boolean and set to false, no additional properties will be allowed.

ie, the default should be to ALLOW additional properties, not exclude them.

@TimothyBlynJacobs's original diff had that behaviour (see lines 1071 and 1210) but @joehoyle decided to not include it saying

I think we shouldn't support additionalProperties for now, the point of the schema is to whitelist data and types into the REST API (and out), so I'm not sure I see a long term path for it; but again, let's get basic object support before putting a nail in that coffin.

I appreciate @joehoyle is trying to not add anything unnecessary on this commit. But if additionalProperties is not supported now, and non-whitelised properties are removed, then IMO it does put a nail in the coffin: it goes against the JSON schema specification and breaks backward compatibility. And if we were to later decide we do want to follow the JSON schema specification (namely, allow additional properties by default), we'd have to make another breaking change in order to do that.

So if we don't want to support the additionProperties argument yet, the default should at least be to default to ALLOW non-whitelisted properties.

Thoughts?

#23 @TimothyBlynJacobs
2 months ago

I strongly think that we should support additionalProperties. It, along with patternProperties allows for very useful functionality. I originally only included additionalProperties for the exact reason @mnelson4 mentioned, patternProperties is something that could easily be included later.

Additionally, if we want to whitelist object keys for use in things like meta or settings, then it would be far more preferable to merge the setting schema entry with defaults for the object data type that would contain 'additionalProperties' => false. This way custom endpoints would work properly that already have object validation and you'd have the increased "security" for settings or meta.

This ticket was mentioned in Slack in #core-restapi by mnelson4. View the logs.


2 months ago

This ticket was mentioned in Slack in #core by mnelson4. View the logs.


2 months ago

This ticket was mentioned in Slack in #core by mnelson4. View the logs.


2 months ago

#27 @westonruter
2 months ago

  • Keywords needs-patch added; has-patch dev-feedback removed
  • Severity changed from normal to major

I agree we need to retain this behavior:

ie, the default should be to ALLOW additional properties, not exclude them.

I've had a conversation previously about how adding additionalProperties: true to a schema is good just to be explicit, even though it is the default. If the behavior changes where this is no longer the default, things will break.

Increasing severity as it conflicts with JSON Schema spec and breaks back-compat.

#28 @joehoyle
2 months ago

I've pinged @rmccue for his thoughts here too.

I'm strongly against supporting additionalProperties if it means "store any extra data that's in the object even if it doesn't have a registered schema". We are only able to accept the data we do because of type information we get via the registered schema. Without this, we won't be able to cast types, and is also a forwards compatibility problem.

This patch is / was to _start_ supporting object so we can at-least now build up complex registered structures by combining array, object, integer, number, string and boolean. I don't see a case where we _would_ wholesale accept objects with properties of unknown data types and insert them into the database.

I agree we do need to fix the case outlined by @mnelson4 I just don't think this needs to mean allowing unregistered properties to get saved.

I appreciate @joehoyle is trying to not add anything unnecessary on this commit. But if additionalProperties is not supported now, and non-whitelised properties are removed, then IMO...

In this case we could just set additionalProperties: false to be spec-compliant.

@mnelson4 and @TimothyBlynJacobs:

I'm not 100% sure if you are just wanting an object to be validated by the schema, and respect additionalProperties, or are you _also_ wanting to use register_meta / register_rest_field et al to actually support that from an API client, whereby the server should accept the additional data, store it, and return it on fetching that object?

I can see how we could not strip additional properties in sanitization, _maybe_, but we'd just move the property whitelisting to the Meta / Settings / REST Field level, to have _those_ features not support additionalProperties.

#29 @TimothyBlynJacobs
2 months ago

I disagree, object validation has use cases beyond just meta or settings. It can be used for top-level schemas in core and especially for people developing on top of the REST API.

If we only want whitelisted properties to be in settings and meta when an object type is passed, we should set additionalProperties to false as a default argument and merge it with the provided schema. additionalProperties is not just a simple boolean, yes add properties, no, don't add properties. additionalProperties itself can be a schema. So if you want a meta value that is a list of arbitrarily keyed integers, for example...

{
  "type": "object",
  "additionalProperties": {
    "type": "integer"
  }
}

Just checking for an object type without any listed arguments to solve @mnelson4's case would not completely solve the backwards compatibility issues because the previous behavior would be allow any properties registered, not just ones specified.

This ticket was mentioned in Slack in #core by jeffpaul. View the logs.


8 weeks ago

#31 @mnelson4
8 weeks ago

@joehoyle

We are only able to accept the data we do because of type information we get via the registered schema.

I think this is the crux of the issue: as I understand it, JSON schema places constraints to define what JSON is valid, not allowances on what's valid. E.g., the simplest JSON schema {} allows ANY JSON (https://spacetelescope.github.io/understanding-json-schema/basics.html), and then all additions to the schema narrow the range of what JSON is considered valid.

So if a registered schema is empty, eg {}, it should accept ANY JSON; whereas you'd like it to accept NONE.

Basically, JSON schema has more of a "blacklist" approach (everything's ok, unless the schema says it isn't), but you're wanting more of a "whitelist" approach (nothing is ok unless the schema says it is).

A "whitelist" approach is good for security and having well-defined data. The difficulty is that's not JSON schema. So we have two ideals at ends here: stick to the JSON spec vs whitelist data instead of blacklisting it.

Does that sound accurate?

I don't see a case where we _would_ wholesale accept objects with properties of unknown data types and insert them into the database.

I think @westonruter's link to using the WP API for the customizer is an application, isn't it? https://github.com/WP-API/wp-api-customize-endpoints/pull/10#pullrequestreview-43035491

Also, elsewhere in our plugin we have an endpoint for changelog entries related to data in our custom tables, where we can store log entry items in a PHP array. E.g.,
https://github.com/eventespresso/event-espresso-core/blob/master/docs/C--REST-API/ee4-rest-api-writing-data.md#serialized-data is a bit of documentation and http://dev2.eventespresso.com/wp-json/ee/v4.8.36/change_logs?_method=options has the route data, look for "LOG_message".

So it seems to me like there is application for accepting an object with undefined properties.

we could just set additionalProperties: false to be spec-compliant.

This might end up being the best option.

I'm not 100% sure if you are just wanting an object to be validated by the schema, and respect additionalProperties, or are you _also_ wanting to use register_meta / register_rest_field et al to actually support that from an API client, whereby the server should accept the additional data, store it, and return it on fetching that object?

In our plugin we aren't using register_rest_field- we're just using custom endpoints for retrieving data from custom database tables.

So I think I was originally just wanting the former, however I kinda think the latter would be good too. It would be nice if register_rest_field and register_meta just stuck to pure JSON schema (we wouldn't need to document any execptions etc). But again I understand it's probably safer to only accept data using a whitelist approach.

@TimothyBlynJacobs

If we only want whitelisted properties to be in settings and meta when an object type is passed, we should set additionalProperties to false as a default argument and merge it with the provided schema.

I think this approach has the best of both worlds: we're sticking to the spec, but settings and meta APIs can have more of a whitelist approach.

The main downside to fully respecting JSON schema's additionalProperties is that anywhere we define the schema for an object, and we don't accept any other properties other than those listed, we now need to add additionalProperties:false. And I think there's a lot of those, right?

#32 @TimothyBlynJacobs
8 weeks ago

We also currently allow non listed properties for the top-level object.

And I think there's a lot of those, right?

I don't think there are. The objects in core currently have the validation_callback disabled now because they accept both a simple string and a complex object.

#33 @joehoyle
8 weeks ago

Ok so I think we are on the same page in terms of what the options are.

We also currently allow non listed properties for the top-level object.

That's right, this was somewhat controversial and we chose to do it so it wasn't required to markup arguments / object properties. In this case we are talking about specifically registered fields. If I have code that does:

<?php
register_rest_route( ..., ..., [
    [
        'args' => [
             'an-object' => [
                 'type' => 'object',
                 'properties' => [
                     'value' => [ 'type' => 'string' ],
                  ],
              ],
         ],
     ],
] );

In my callback, I _should_ expect to be able to do:

<?php
function callback( WP_REST_Request $request ) {
    do_something_scary( $request['my-object'] );
}

The point of this whole exercise of the schema is, practically speaking, mainly benefiting the back-end code. If the above actually means I can't rely on $request['my-object'] not including lots of other data then I don't think it becomes all that useful. This is _much_ more labor intensive for nested objects, so I basically need to write a deep sanitizer myself then.

Forgetting the JSON Schema spec for now (we have already diverged from JSON Schema), I feel like whitelisting is the right approach as we are being conservative with the data we let through sanitized properties.

Being "insecure by default" unless a developer adds additionalProperties => false is, in my mind, asking for security issues. It's all well and good that JSON Schema has chosen to do that, but I don't think it's the right approach for the WP REST Api.

I think the final decision will need to come from either @kadamwhite or @rmccue, and I'd say if we don't get a decision in the next 24 hours we need to revert r41758 and r41727, as @mnelson4 mentioned this is effectively making the decision to "whitelist by default" which we need to have made a conscious decision on.

This ticket was mentioned in Slack in #core-restapi by joehoyle. View the logs.


8 weeks ago

#35 @joehoyle
8 weeks ago

Also I think we could lighten the sanitizing for backwards compat for cases like https://core.trac.wordpress.org/ticket/38583#comment:20 so we don't sanitize the object if there's no properties list. The typical case for the REST API sanization right now is: is it's something not understood, pass it through, else sanitize it

#36 @TimothyBlynJacobs
8 weeks ago

Forgetting the JSON Schema spec for now (we have already diverged from JSON Schema),

We haven't significantly diverged from JSON Schema. We are using the v3 format for required properties, but everything else is proper. I've also been looking into supporting both v3 and v4 of required properties transparently.

If we want to make additionalProperties => false the default everywhere, I'd be ok with that provided we allow a developer to pass their own additionalProperties value and it appears as additionalProperties => false in the schema.

If we agree on that, I can work up a patch to support that.

#37 @mnelson4
8 weeks ago

Ok so I think we are on the same page in terms of what the options are.

Cool!

The typical case for the REST API sanization right now is: is it's something not understood, pass it through, else sanitize it

This is probably a good pragmatic approach: I think it will preserve backward compatibility nicely (I doubt anyone declared a schema for an object with no properties and literally expected only an empty object to be valid), and when we do declare properties of an object, we're defaulting to a whitelisting approach.

I think, in the future, there will still be room for accepting additionalProperties too, it's just that it's default value is conditional: it's true unless properties is set, in which case it's false. That's a bit tricky, but if we document it I think that will be fine.

#38 @rmccue
8 weeks ago

I disagree with @joehoyle here, primarily for the reason of consistency. The root object is essentially the same as any nested object. The root object has always, and should always, allow additional properties (this allows us forwards-compatibility, pluggability, etc).

If the behaviour of additionalProperties doesn't match, then we can never do nested schemas; e.g. if I have a schema which references another schema, then the same data should be allowed for both types. If additional properties are not allowed, then the behaviour is much different, and forwards-compat/etc will be harder.

Is this potentially unsafe by default? Sure, in the same way that (array) $request is currently unsafe. Consistency matters more in this case.

That's not to say we shouldn't allow the opposite for developers that want that. A "strict mode" is something we've talked about for a while, but never really implemented.

For settings/meta registration, it's probably a good idea to set the default there to avoid saving arbitrary data, since the expectation will be that your setting/meta is strict. That is not the same expectation as request input data though.

#39 @joehoyle
8 weeks ago

Ok - thanks all for bearing with me! So, I think we should:

  • Default to not stripping un-registered properties unless additionalProperties is false.
  • The automated schema writing for Meta and Settings will set additionalProperties = false
  • We won't support other additionalProperties stuff just yet, but we are defaulting to pass through we it can be additive later.

This means all custom endpoints that don't bother to set additionalProperties will get all the data (registered or not) just like the root object behaves.

#40 @mnelson4
8 weeks ago

Thanks @rmccue for weighing in on REST API priorities and bringing up the consistency aspect.

That plan sounds good to me then, @joehoyle.

#41 @TimothyBlynJacobs
8 weeks ago

Sounds great to me!

#42 @westonruter
8 weeks ago

@joehoyle are you going to own the changes you proposed in 39. This should be part of beta4, being released in ~12 hours.

@joehoyle
8 weeks ago

#43 @joehoyle
8 weeks ago

Attached patch with the above items. With the slight exception of:

The automated schema writing for Meta and Settings will set additionalProperties = false

We don't actually have meta done yet, so it's only been added to settings.

This ticket was mentioned in Slack in #core-restapi by joehoyle. View the logs.


8 weeks ago

#45 @melchoyce
8 weeks ago

  • Keywords has-patch added; needs-patch removed

This ticket was mentioned in Slack in #core-restapi by mnelson4. View the logs.


8 weeks ago

#47 @mnelson4
8 weeks ago

for the record, the most recent test resolved our plugin's unit test failure

#48 @westonruter
8 weeks ago

  • Keywords commit added
  • Owner changed from rmccue to joehoyle
  • Status changed from reopened to assigned

#49 @joehoyle
8 weeks ago

In 42000:

REST API: Don’t remove unregistered properties from objects in schema.

In r41727 the ability to sanitise and validate objects from JSON schema was added, with a whitelist approach. It was decided we should pass through all non-registered properties to reflect the behaviour of the root object in register_rest_route. To prevent arbitrary extra data via setting objects, we force additionalProperties to false in the settings endpoint.

See #38583.

#50 @joehoyle
8 weeks ago

  • Resolution set to fixed
  • Status changed from assigned to closed

@mnelson4 confirmed this patch now fixed the issue for him, resolving as fixed once again.

Note: See TracTickets for help on using tickets.