Make WordPress Core

Opened 22 months ago

Last modified 10 months ago

#56494 new defect (bug)

REST API: Fix and test rest_default_additional_properties_to_false

Reported by: annabansaghi's profile anna.bansaghi Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 6.0.2
Component: REST API Keywords: has-patch has-unit-tests
Focuses: Cc:

Description

Currently the rest_default_additional_properties_to_false

  • does not support the "allOf", "anyOf", "oneOf" keywords from meta-schema v4, v6, v7
  • has no tests

The patch provides both of these.

Attachments (8)

tree-mr-tree.png (28.7 KB) - added by anna.bansaghi 21 months ago.
Tree vs MR Tree
4-test-cases.png (96.6 KB) - added by anna.bansaghi 21 months ago.
4 test cases
different-kind-of-properties.png (33.2 KB) - added by anna.bansaghi 21 months ago.
Different kind of properties
oneOf-local---oneOf-local-base.png (38.1 KB) - added by anna.bansaghi 21 months ago.
oneOf local type, oneOf local & base type
oneOf-base-object.png (19.5 KB) - added by anna.bansaghi 21 months ago.
oneOf base object type
oneOf-additional-local---oneOf-additional-base.png (48.1 KB) - added by anna.bansaghi 21 months ago.
oneOf additional local vs oneOf additional base
invalid-schema.png (10.3 KB) - added by anna.bansaghi 21 months ago.
Invalid schema
invalid-schema.2.png (3.8 KB) - added by anna.bansaghi 21 months ago.
Invalid schema

Download all attachments as: .zip

Change History (18)

This ticket was mentioned in PR #3170 on WordPress/wordpress-develop by annaghi.


22 months ago
#1

  • Keywords has-unit-tests added

This PR offers:

  • support for "allOf", "anyOf", "oneOf", "items" keywords
  • unit tests

Trac ticket: https://core.trac.wordpress.org/ticket/56494#ticket

TimothyBJacobs commented on PR #3170:


21 months ago
#2

Thanks for the PR @annaghi! This is looking great, thank you for the comprehensive tests.

I think we need to add some more documentation to the $types array discussing why it is sometimes passed and sometimes not.

Additionally, Core doesn't support the tuple form of items so we should remove it from this function.

This ticket was mentioned in PR #3374 on WordPress/wordpress-develop by annaghi.


21 months ago
#3

This PR is related to https://github.com/WordPress/wordpress-develop/pull/3170

It is a different version of the same function, still providing

  • support for "allOf", "anyOf", "oneOf", ~and "items"~ as schemaArray typed keywords
  • unit tests

Trac ticket: https://core.trac.wordpress.org/ticket/56494

#4 @anna.bansaghi
21 months ago

Hi @TimothyBJacobs, thanks for the review, and raising the need of discussion of passing the $types parameter.

I was thinking about it and experimenting with the function a lot, and finally I came up with two possible solutions.

It became a bit long write-up, but it helped me to organize the information around MR Trees for this function.

Insignificant changes

At the beginning I would like to mention the least important changes:

  • moved the array section before the object
  • that moved the "Task" at the very end of the function

Now the code can be split into 3 parts from "bigger" to "smaller" data structure:

  • recursive call on forest: "Forest part"
  • recursive call on tree: "Tree part"
  • performing some task on the current node: "Task"

Interesting changes

The schema traversal can be implemented in two ways (hence the two solutions), depending on how we define the schema being "object".

Object definition

The schema is an object if:

  • has type ⊃ 'object'

Or we can say more permissive that if:

  • has type ⊃ 'object', or
  • has properties keyword set, or
  • has patternProperties keyword set, or
  • has additionalProperties keyword set with array value

Similarly a schema is an array if:

  • has type ⊃ 'array', or
  • has items keyword set (not supporting the tuple form)

"Task" to be performed

In this sense the "Task" can be implemented in two ways:

<?php
if ( ! isset( $schema['additionalProperties'] ) && in_array( 'object', $types, true ) ) {

    $schema['additionalProperties'] = false;
}

or

<?php
if ( ! isset( $schema['additionalProperties'] ) &&
    ( in_array( 'object', $types, true ) ||
        isset( $schema['properties'] ) ||
        isset( $schema['patternProperties'] ) ) ) {

    $schema['additionalProperties'] = false;
}

Solution 1

It is more similar to the original function, uses the first object definition, and the first "Task" will be performed.
https://github.com/WordPress/wordpress-develop/pull/3170
The schema traversal might stop at some nodes, and leave sub-trees unvisited without performing the "Task" on them. This is because the nodes are checked against the type, and are discarded from further traversal if they do not have, or have wrong type.

Solution 2

It uses the more liberal second object definition, and the second "Task" will be performed.
https://github.com/WordPress/wordpress-develop/pull/3374
The schema will be fully traversed, the "Task" will be performed on much more nodes, because there are no guards in the "Forest part" nor in the "Tree part".

The test cases for these solutions have some minor differences, and they seem to have more sense in Solution 2.

Solution 1

This implementation is very similar to the original one, in which we assume that we work on valid schema.

Definition of valid schema

The original implementation ran on Trees, and the schema was valid if:

  • has type
  • has correct type (if it is array then it has items, and if it is object then it has some properties)

The proposed implementation runs on MR Trees, and the schema is valid if:

  • has type
  • has correct type (if it is array then it has items, and if it is object then it has some properties)
  • has correct base type - local type relation

Tree, MR Tree and the third rule will be discussed in the separate comment.

If the schema is invalid, then

1. Throw notice & discard schema from further traversal

In the original implementation if the schema had no type, then the schema was discarded from further traversal because of the first line $type = (array) $schema['type'];, and a run-time notice was thrown:
PHP Notice: Undefined index: type in /home/anna/html/wp60/wp-includes/rest-api.php on line 3041

In the proposed implementation the behavior remained the same, except the built-in notice was replaced with the _doing_it_wrong() notice.

2. Discard schema from further traversal

In the original implementation if the schema had no correct type, then the schema was discarded from further traversal because of these guards:

<?php
if ( in_array( 'object', $type, true ) ) {
    ...
}
<?php
if ( in_array( 'array', $type, true ) ) {
    ...
}

For example these wrong schemas were discarded:

<?php
[
    'type'       => 'array',
    'properties' => [...]
]

or the reverse:

<?php
[
    'type'  => 'object',
    'items' => [...]
]

This remained as is in the proposed implementation.

3. Skip performing "Task" on schema

In the original implementation if the schema had no type ⊃ 'object' then the "Task" was skipped. For example the "Task" was not performed on this wrong schema:

<?php
[
    'type'       => 'string',
    'properties' => [...]
]

The "Task" remained as is in the proposed implementation.

Solution 2

While in Solution 1 the schema might have sub-trees unvisited because of a missing or wrong type, in this solution the traversal is full, and the "Task" will be performed on much more nodes of the schema.

I came up with this solution because I think

Validation

Both solutions are silent about wrong type keyword, but they handle this situation differently (discarding vs. keeping nodes in the traversal in order to do not or to do perform the "Task" on them, respectively). This function in general should not be responsible for validation, it is done in another function where the value is validated against the schema, and notices will be thrown.

Sanitization

Looking at the rest-api.php code it seems to me that we do not sanitize the schema (well, besides this very function where we set the additionalProperties to false), and discarding nodes from performing the "Task" on them seems like a (silent to the user) anti-sanitization.

Completeness

It feels wrong for me discarding silently some nodes from performing the "Task" on them, so first I was experimenting with full featured sanitization. It went out of hands, so I thought: let's expand the object definition instead, and there is no reason to think about validation nor sanitization anymore in this function.

But I am not aware of all aspects of these changes, and would like to understand if the first solution is desired.


If the first solution is the correct one, then it might be worth to continue reading about how supporting allOf, anyOf, oneOf is related to the third rule in valid schema definition.

Now having base type on allOf, anyOf, oneOf schemas allows not having type on children. But we need to perform the "Task" on children too, so they have to have correct type. And that is when and why we pass the $types in the "Forest part": the children will use the base type incoming from their parent.

Last edited 21 months ago by anna.bansaghi (previous) (diff)

#5 @anna.bansaghi
21 months ago

Tree defined by mutual recursion (MR Tree)

This comment is related to the Solution 1, and because it discards invalid schemas from traversal, it is enough to analyze valid schemas.

Before supporting the keywords with schemaArray definition (allOf, anyOf, oneOf), the function ran on Trees.
After supporting the keywords with schemaArray definition, the function runs on MR Trees.

This is how I imagine these data structures:

Tree vs MR Tree

Tree

MR Tree

A Tree can have two kinds of nodes:

  • tree
  • leaf

An MR Tree can have three kinds of nodes:

  • forest
  • tree
  • leaf

With the short terms:

  • parent: I will refer only to "the parent of a forest". There is no special meaning for us for the parent of any other node.
  • roots: I will refer only to "the roots of trees which are inside a forest". There is no special meaning for us for the root of a tree outside a forest.

We can see on the second figure that

  • we can capture the essence of the forest with the horizontal line, which tries to mimic the notation of the indexed array [],
  • that horizontal line also can be imagined as a "fence" between the parent and the roots which has to be "jumped through", and
  • the trees in the forest are independent from each other, there is a possibility to run the algorithm concurrently on them (if ever needed).

Schema defined by MR

I have picked 4 schemas from the test cases, and they would look like this:

  • different kind of properties
  • oneOf local type
  • oneOf local & base type
  • oneOf base object type

Different kind of properties

oneOf local type, oneOf local & base type

oneOf base object type

Three kinds of nodes and they are all valid

  • filled circle: the "Task" has to be performed on this node, and the type keyword is set with correct value
  • empty circle: the "Task" has to be performed on this node, but the type keyword is missing (this case will be handled by passing the $types parameter from parent to roots)
  • dotted circle: there is no need to perform the "Task" on this node, and the type keyword is missing (this case is handled by defaulting to $types = array())

Well defined where these nodes can appear in a valid MR schema

  • filled: tree and leaf nodes have to be filled, with exceptions:
    • if the node is a tree and also parent, then it can be dotted too
    • if the node is a tree or leaf and also root, then it can be empty too
  • empty: only the roots can be empty
  • dotted: only the parent can be dotted

The parent of a forest can be

  • dotted: meaning the type is missing, so the roots have to have their own local type, or
  • filled: meaning it has type which is the base type in respect of the roots.

A root inside a forest can be

  • filled: in which case the root has its own local type, or
  • empty: in which case the parent has to have the base type.

Valid schema defined by MR

We can discover the pattern of which kinds of nodes exist together in the parent - root relation:

  • filled parent - filled root (root's type overwrites the (base) parent's type)
  • dotted parent - filled root
  • filled parent - empty root

Inserting two more test cases in order to illustrate the alternating situations:

  • dotted parent - filled root: oneOf additional properties, local type, :||
  • filled parent - empty root: oneOf additional properties, base type, :||

oneOf additional local vs oneOf additional base

Wrong base type - local type relation

The schema is invalid if the parent has no type, and a root also have no type.
Meaning there is no base type on parent, and there is no local type on the root, so the "Task" cannot be performed on the root.

  • Iteration 1 runs on the parent, and passes its base type in the $types parameter to the next iteration.
  • Iteration 2 runs on each root, and they have no local type, so the incoming $types should be used.

Invalid schema

Definition of valid MR schema

All the previous analysis leads us to extract the meaning of the third rule:

  • has type
  • has correct type (if it is array then it has items, and if it is object then it has some properties)
  • has correct base type - local type relation (the situation on the red figure above does not exist)

Which part of the function runs on which node

Current node is filled

  • if the node is the parent of a forest, then the "Forest part" will be executed passing the node's type to the roots
  • if the node is a tree (in- or outside a forest), then the "Tree part" will be executed
  • in both cases and also if the node is a leaf (in- or outside a forest), the "Task" will be performed

Current node is dotted

  • the node is the parent of a forest without type, so the "Forest part" will be executed passing the type = array() to the roots (and they must have type on their own)

Current node is empty

  • if the node is a tree (inside a forest), then the "Tree part" will be executed using the incoming base type
  • in the previous case and also if the node is a leaf (inside a forest), the "Task" will be performed using the incoming base type

Docs

The difference between standard and WordPress schema might lead to confusion for schema writers, and so might need to highlight this difference.

They have to be aware of that the WordPress schema mirrors the boolean additionalProperties values, and sets those to false by default. See the test case allOf base object type & properties which is the WordPress version of a standard JSON schema (URL provided in the test).

If someone reads the standard JSON schema docs, or finds schema on SO for example, then they need to mirror the schema to fit in WordPress.

Last edited 21 months ago by anna.bansaghi (previous) (diff)

@anna.bansaghi
21 months ago

Tree vs MR Tree

@anna.bansaghi
21 months ago

4 test cases

@anna.bansaghi
21 months ago

Different kind of properties

@anna.bansaghi
21 months ago

oneOf local type, oneOf local & base type

@anna.bansaghi
21 months ago

oneOf base object type

@anna.bansaghi
21 months ago

oneOf additional local vs oneOf additional base

@anna.bansaghi
21 months ago

Invalid schema

@anna.bansaghi
21 months ago

Invalid schema

annaghi commented on PR #3170:


21 months ago
#6

Hi @TimothyBJacobs , the PR is ready for review again, and I left detailed information in the Trac ticket.

annaghi commented on PR #3374:


21 months ago
#7

Hi @TimothyBJacobs , the PR is the counterpart of the other one: https://github.com/WordPress/wordpress-develop/pull/3170, and ready for review.
I left detailed information in the Trac ticket about the two PRs.

This ticket was mentioned in PR #3386 on WordPress/wordpress-develop by annaghi.


21 months ago
#8

This PR is one of the series:

  • Solution 1 closest to the original with notice + discard

https://github.com/WordPress/wordpress-develop/pull/3170

  • Solution 1.5 A variant of the Solution 1 without notice + discard :point_left:
  • Solution 2 extended object definition

https://github.com/WordPress/wordpress-develop/pull/3374

and offers:

  • support for "allOf", "anyOf", "oneOf", ~and "items"~ as schemaArray typed keywords
  • unit tests

Trac ticket: https://core.trac.wordpress.org/ticket/56494

#9 @anna.bansaghi
21 months ago

Solution 1.5

I have just realized, that we have a third solution too :)
https://github.com/WordPress/wordpress-develop/pull/3386

It is that version which you have already seen @TimothyBlynJacobs, a variant of the Solution 1.
The difference is that Solution 1.5 does not throw notice if the type is missing on the schema. Everything else is the same as for Solution 1.

Last edited 21 months ago by anna.bansaghi (previous) (diff)

#10 @leogermani
10 months ago

I landed here because of the Warning when type is not defined. If type is defined inside my anyOf criteria, there's no point in having a type in that level.

The solution here looks great, and kudos for all the explanation, but I wonder if this adds too much overhead and could have impact on performance. I was wondering if there's a better way to handle this, meaning expecting additionalProperties to be false if not declared.

Maybe this check could be done somewhere else, when we actually look for this property, instead of having to loop through all the schemas to add it.

For example here https://github.com/WordPress/wordpress-develop/blob/trunk/src/wp-includes/rest-api.php#L2353 we could just set it to false on the fly if not set.

Same thing on https://github.com/WordPress/wordpress-develop/blob/trunk/src/wp-includes/rest-api.php#L2795 and https://github.com/WordPress/wordpress-develop/blob/trunk/src/wp-includes/rest-api.php#L3030 - and it seems that this is all that's needed.

Note: See TracTickets for help on using tickets.