WordPress.org

Make WordPress Core

Opened 6 months ago

Closed 6 weeks ago

Last modified 5 weeks ago

#50244 closed feature request (fixed)

Add bulk operation support to the Rest API

Reported by: andraganescu Owned by: TimothyBlynJacobs
Milestone: 5.6 Priority: high
Severity: normal Version: 4.4
Component: REST API Keywords: has-patch has-unit-tests
Focuses: performance Cc:

Description

This ticket is a continuation of the Slack discussions around implementing support for bulk operations on the REST endpoints.

Originally this was started as the response to a need found by the new navigation screen experiment. It then became clear that the better option is to implement systemic support, at the level of WP_REST_Server.

The goal is to obtain the ability to send multiple entities in one call for create, save and update, for example to create multiple posts or multiple menus or menu items with a single API request.

Change History (44)

#1 @TimothyBlynJacobs
6 months ago

  • Component changed from General to REST API
  • Focuses performance added
  • Milestone changed from Awaiting Review to 5.5
  • Owner set to TimothyBlynJacobs
  • Priority changed from normal to high
  • Status changed from new to assigned
  • Type changed from defect (bug) to feature request
  • Version set to 4.4

#2 @zieladam
6 months ago

One of the conclusions from the Slack chat was that the REST API already uses separate functions to validate and store the data, and so it should be fairly straightforward to reuse these functions to validate the entire batch input before persisting anything.

Today I have been looking into flagging any place in the menu items controller that's doing the validation outside of validate_callback. I found the validation and sanitization are implemented all at once in prepare_item_for_database and that validate_callback method is not used for any property at all.

This follows the pattern from WP_REST_Posts_Controller which does the exact same thing. The same is true for WP_REST_Comments_Controller, WP_REST_Attachments_Controller, and potentially a lot of open source classes inheriting from any of these.

It seems like the first step here would have to be refactoring these four classes to make them use sanitize_callback and validate_callback for each schema property. What do y'all think?

Last edited 6 months ago by zieladam (previous) (diff)

#3 @TimothyBlynJacobs
6 months ago

@zieladam Yep, but to clarify the bulk of the validation happens by providing a schema for the properties which validates and sanitizes according to the provided JSON schema. When the validation isn't possible to do just with a schema, the goal is to provide a validate_callback, see for instance check_template() in the posts controller. But we don't do this consistently, for instance handle_status_param() could be implemented as a validate_callback and sanitize_callback. The majority of those other checks could also be refactored to be callbacks as well.

The menus controller is separately doing validation against the schema which should be dropped, https://github.com/WordPress/gutenberg/blob/2881f84896cace4cc1828774c8e09252d9e67e39/lib/class-wp-rest-menu-items-controller.php#L368. And has some things that can be expressed in the schema, https://github.com/WordPress/gutenberg/blob/2881f84896cace4cc1828774c8e09252d9e67e39/lib/class-wp-rest-menu-items-controller.php#L516.

So I think starting with the menus controller, any validation checking that happens in prepare_item_for_database that can be refactored to have a validate_callback should be starting with the checks that happen on a single parameter. Something to note, is that applying a custom validate_callback skips the schema validating since that is implemented just as the default validate callback. So in that case, if the validation is supposed to be additive, then we should call rest_validate_request_arg() in the top of the new validate_callback and bail out if that returns a WP_Error instance.

This ticket was mentioned in Slack in #core-restapi by timothybjacobs. View the logs.


6 months ago

This ticket was mentioned in Slack in #core-restapi by chrisl27. View the logs.


6 months ago

This ticket was mentioned in PR #311 on WordPress/wordpress-develop by TimothyBJacobs.


6 months ago

  • Keywords has-patch has-unit-tests added

#7 @TimothyBlynJacobs
6 months ago

I've created a PR with a first pass at what this could look like. Things for thoughts:

  1. I don't really love this being in wp/v2, it isn't really part of the wp data namespace. I kinda would like it do just be wp-json/batch.
  2. Should routes have to opt-in to supporting batching? We could have a register_rest_route flag like allows_batch. While theoretically any properly coded route wouldn't have an issue being used in this manner, not every route is properly coded ( #50239 ).
  3. We could also support a validate_callback in the register_rest_route on the same level as callback and permission_callback that would take a WP_REST_Request object and do validation for it. This would be helpful for cases where validation has to be done with all the fields together as context, not just individual field validation. We'd do the parameter validation first, and then call validate_callback on the whole request.
  4. I'm thinking about registering the batch controller as part of WP_REST_Server itself like the namespace routes. That way we don't have to expose the match_request_to_handler method.
  5. In pre validation mode, validation will happen twice. Once when we check every request for passing validation, and then again in the individual dispatch. Adjusting this to only happen once makes me nervous, but there would be a performance benefit to it only happening once.
  6. The maxItems number is fairly arbitrary, and isn't currently implemented #48821.
  7. I've omitted support for GET requests entirely at the moment. The other requests will have their own permission checks, so it is more difficult for an unscrupulous person to create a lot of load. But if we allow GET requests this could easily get out of control with no permission checks, and the permission checks are non-obvious. I think we should also encourage using linking/embedding over a batch GET for the time being. So I'd like to punt GET batching to a later release.

This ticket was mentioned in Slack in #core-restapi by timothybjacobs. View the logs.


6 months ago

This ticket was mentioned in Slack in #core-restapi by timothybjacobs. View the logs.


6 months ago

#10 @zieladam
6 months ago

@TimothyBlynJacobs I read through that PR and it looks pretty good to me. Some thoughts:

Should routes have to opt-in to supporting batching? We could have a register_rest_route flag like allows_batch. While theoretically any properly coded route wouldn't have an issue being used in this manner, not every route is properly coded ( #50239 ).

I think it should be opt-in - as we discussed earlier, there are important routes that do not support pre-validation at the moment and sending batch requests with validation=pre will not work as expected. With opt-in approach, the batch processing endpoint could be merged first and then more and more routes would opt-in to batch processing as time passes.

We could also support a validate_callback in the register_rest_route on the same level as callback and permission_callback that would take a WP_REST_Request object and do validation for it. This would be helpful for cases where validation has to be done with all the fields together as context, not just individual field validation. We'd do the parameter validation first, and then call validate_callback on the whole request.

Great idea

In pre validation mode, validation will happen twice. Once when we check every request for passing validation, and then again in the individual dispatch. Adjusting this to only happen once makes me nervous, but there would be a performance benefit to it only happening once.

The only thing that could go wrong there is something unexpected happening between the batch validation and the handler invocation, e.g. requesting a batch of two requests, both trying to insert a row with the same value that's subject to an UNIQUE constraint - I think core should protect developers from doing that, but offer an option to opt out of this protection when more performance is required. Maybe by having two pre-validation modes like validation=pre-and-individual and validation=pre-only?

I've omitted support for GET requests entirely at the moment.

Sounds good and makes sense

Last edited 6 months ago by zieladam (previous) (diff)

This ticket was mentioned in Slack in #core by timothybjacobs. View the logs.


6 months ago

#12 @TimothyBlynJacobs
6 months ago

Thanks for the review @zieladam! I've pushed some changes to the PR.

I think it should be opt-in - as we discussed earlier, there are important routes that do not support pre-validation at the moment and sending batch requests with validation=pre will not work as expected. With opt-in approach, the batch processing endpoint could be merged first and then more and more routes would opt-in to batch processing as time passes.

Passing an allow_batch keyword is now required.

Great idea

Implemented as a validate_callback option.

The only thing that could go wrong there is something unexpected happening between the batch validation and the handler invocation, e.g. requesting a batch of two requests, both trying to insert a row with the same value that's subject to an UNIQUE constraint - I think core should protect developers from doing that, but offer an option to opt out of this protection when more performance is required. Maybe by having two pre-validation modes like validation=pre-and-individual and validation=pre-only?

What I've done right now is made it so it only validates once. Right now, that isn't an issue that would get caught by validation anyway. It'd fail when the wp_insert_* call happens.

Leaving the choice up to the caller feels weird to me. Maybe make it specific to the endpoint?

This ticket was mentioned in Slack in #core by peterwilsoncc. View the logs.


6 months ago

#14 @TimothyBlynJacobs
6 months ago

  • Milestone changed from 5.5 to 5.6

It looks like the navigation screen is not going to make it for 5.5. Punting this to 5.6 as its the main consumer of this feature.

#15 @zieladam
5 months ago

@TimothyBlynJacobs

Passing an allow_batch keyword is now required.

Awesome!

Implemented as a validate_callback option.

Thank you!

Leaving the choice up to the caller feels weird to me. Maybe make it specific to the endpoint?

Ah yes, it sounds like the best option indeed.

Also, I am working on a PR to make WP_REST_Menu_Items_Controller compatible with this change. I will post a link here as soon as it's ready to share.

#16 @TimothyBlynJacobs
5 months ago

Also, I am working on a PR to make WP_REST_Menu_Items_Controller compatible with this change. I will post a link here as soon as it's ready to share.

Fantastic! What I think I'm going to do is commit the validate_callback support for 5.5 so we can utilize that behavior in Gutenberg when doing that refactor.

#17 @zieladam
5 months ago

That sounds great @TimothyBlynJacobs ! In fact the PR I started ( https://github.com/WordPress/gutenberg/pull/23474 - not ready for a review yet) should really use a controller-level validate_callback and I was planning to "rebase" it on top of your PR, right now the field-level validation does not really cut it.

Last edited 5 months ago by zieladam (previous) (diff)

#18 @prbot
5 months ago

adamziel commented on PR #311:

Adding a support for sanitize_callback would make things even easier on the controller side. It could be done by adding the following code to sanitize_params method in class WP_REST_Request:

`php

if ( isset( $attributessanitize_callback? ) ) {

$sanitized = call_user_func( $attributessanitize_callback?, $this );

if ( is_wp_error( $sanitized ) ) {

return $sanitized;

}

}

`

#19 @zieladam
5 months ago

@TimothyBlynJacobs class-wp-rest-menu-items-controller.php PR is now ready for discussion and reviews: https://github.com/WordPress/gutenberg/pull/23474

Last edited 5 months ago by zieladam (previous) (diff)

#20 @prbot
5 months ago

TimothyBJacobs commented on PR #311:

Why sanitize? The sanitize callbacks aren’t supposed to return an error instance.

#21 @prbot
5 months ago

adamziel commented on PR #311:

Why sanitize?

@TimothyBJacobs How else would you approach classifying e.g. this part of the class-wp-rest-menu-items-controller.php?

https://github.com/WordPress/gutenberg/blob/b1ffaf373e6b2604fcaded300e8f8b445d4237df/lib/class-wp-rest-menu-items-controller.php#L516-L527

It performs validation based on:

  • Menu item stored in the database
  • Results of sanitization of menu-item-type earlier in the same function

It doesn't neatly fit into the validation callback, but it fits the sanitize callback, with the only caveat being returning an error.

The sanitize callbacks aren’t supposed to return an error instance.

Returning an error instance is supported for per-field validated callback so my thinking was that a per-request callback should also support it.

My bigger problem here is that the notion of validation and sanitization in the current API relates to request data, whereas with endpoints like class-wp-rest-menu-items-controller.php what we really want to validate/sanitize is a hydrated object.

#22 @prbot
5 months ago

TimothyBJacobs commented on PR #311:

How else would you approach classifying e.g. this part of the class-wp-rest-menu-items-controller.php?

It doesn't neatly fit into the validation callback, but it fits the sanitize callback, with the only caveat being returning an error.

Why doesn't it fit in validate? The purpose of sanitize is to transform a value, AFAICT it isn't transforming a value, but validating that it is valid when combined with other request data.

Returning an error instance is supported for per-field validated callback so my thinking was that a per-request callback should also support it.

Right, but I think that would be in the per-request validate_callback no?

#23 @prbot
5 months ago

adamziel commented on PR #311:

@TimothyBJacobs here's my reasoning:

Consider the following block of code:

https://github.com/WordPress/gutenberg/blob/2e80a87f2bedd22a4b30f1aa10421d7230ff1e82/lib/class-wp-rest-menu-items-controller.php#L393-L411

What it does is this: If menu-item-object is missing from $prepared_nav_item, then fill it based on menu-item-type and menu-item-object-id.

The logic depends on $prepared_nav_item defined earlier, based on both request and database data, let's call it a hydrated $prepared_nav_item.

Let's try to split it assuming we only have the following buckets to choose from:

  1. per-field validate_callback
  2. per-field sanitize_callback
  3. per-request validate_callback
  4. Route callback

Let's see how it fits each of these buckets:

  • per-field validate_callback: menu-item-object is not a request field so there is no related validate_callback. Let's consider reusing menu-item-type validate_callback. It would need to hydrate $prepared_nav_item first, and then validate conditionally on menu-item-object being falsy. That doesn't seem right.
  • per-field sanitize_callback: Same as above, plus there isn't a good way to account for this line: $prepared_nav_item['menu-item-object'] = get_term_field( 'taxonomy', $original );.
  • per-request validate_callback: Hydrating $prepared_nav_item and returning new WP_Error in the error case would work just fine. There is no good way to account for this line though: $prepared_nav_item['menu-item-object'] = get_term_field( 'taxonomy', $original );
  • Route callback: Provided returning an error was handled by the validate_callback, we could hydrate $prepared_nav_item again and duplicate most of the logic/ifs to assign $prepared_nav_item['menu-item-object'] = get_term_field( 'taxonomy', $original );. This means we need to hydrate twice and duplicate some of the validation logic in the controller to only set menu-item-object under specific conditions. It's okay-ish but not ideal.

There is also one additional problem. See the following code:

https://github.com/WordPress/gutenberg/blob/2e80a87f2bedd22a4b30f1aa10421d7230ff1e82/lib/class-wp-rest-menu-items-controller.php#L413-L420

It performs validation based on the value of resolved $prepared_nav_item['menu-item-object'] from the snippet above. That logic does not nicely follow a linear pattern like 1. Throw errors for invalid inputs 2. Transform the data to save-able form. It's more like an interdependent mixture of both. There are other parts ([like menu position logic](https://github.com/WordPress/gutenberg/blob/2e80a87f2bedd22a4b30f1aa10421d7230ff1e82/lib/class-wp-rest-menu-items-controller.php#L432)) that fit the above reasoning too.

Now let's see what would happen with per-request sanitize_callback (or hydrate_callback, transform_callback, prepare_item_for_database, or any other name really):

sanitize_callback hydrates $prepared_nav_item, performs a check and return an error if needed, assigns $prepared_nav_item[ "menu-item-object" ], and then makes the $prepared_nav_item available for the action handler to process without any further transformations (at least ones that have request-dependent error conditions).

It seems cleaner overall, let me know what you think or if you have any alternative ideas - I don't insist on this specific approach, just trying to figure out what would work best here :)

#24 @joehoyle
5 months ago

I think it should be opt-in - as we discussed earlier, there are important routes that do not support pre-validation at the moment and sending batch requests with validation=pre will not work as expected. With opt-in approach, the batch processing endpoint could be merged first and then more and more routes would opt-in to batch processing as time passes.

I'm not sure I agree with this rationale. Making this opt-in will make this feature be of quite limited value. I think in many cases people won't use Batch, so won't ever even add it to their route registration.

If a route doesn't support pre-validation (which I think is to say that validation is done directly in the get_items (etc) handler, why would it matter? Route callbacks can always create errors, so avoiding that I'm assuming is not the issue.

#25 @zieladam
5 months ago

If a route doesn't support pre-validation (which I think is to say that validation is done directly in the get_items (etc) handler, why would it matter? Route callbacks can always create errors, so avoiding that I'm assuming is not the issue.

This ticket was born to address the following use case:

  1. A batch of n, let's say 100 requests is sent over in a single request.
  2. The entire dataset is validated and sanitized sufficiently to guarantee there won't be any error caused by the input.
  3. All the request handlers are being executed, only erroring out in rare cases such as lost database connection, but never due to something being wrong with the input.

The goal being either processing all requests or no request, but almost never some of them. This would be even more reliable with transactions, but they're not trivial to implement (think caches) so the first version focuses just on the processing itself.

What you're saying, as I understand, is that there is value in having two modes of operating: 1. Pre-validation as described above when the route supports it 2. Simple batch processing for all routes - e.g. sending over 100 requests and expecting that e.g. 60 of them would succeed while 40 would fail. Can you think of some specific use cases where that would be handy? Also, we'd need to distinguish between:

  1. Batch requests with pre-validation vs no pre-validation (different URL? request parameter?)
  2. Endpoints supporting pre-validation vs endpoints not supporting pre-validation (route config?)
Version 1, edited 5 months ago by zieladam (previous) (next) (diff)

#26 @TimothyBlynJacobs
5 months ago

I'm not sure I agree with this rationale. Making this opt-in will make this feature be of quite limited value.

My rationale for the opt-in flag is more along these lines:

While theoretically any properly coded route wouldn't have an issue being used in this manner, not every route is properly coded

The REST API is of course designed to be reentrant, but I've seen a surprising number of instances where due to how people have implemented controllers it wouldn't be. In particular for non-get requests and things with permission callbacks.

If one of those routes are included in the batch, the results could be bizarre or completely broken. If we're fine with that, then I think we could switch it to an opt-out mechanism.

I think in many cases people won't use Batch, so won't ever even add it to their route registration.

So this would be for when a developer wants to make batch requests for another plugin's routes?

This ticket was mentioned in Slack in #core by zieladam. View the logs.


5 months ago

#28 @TimothyBlynJacobs
3 months ago

So my current thought is to commit the refactoring of WP_REST_Server::dispatch() by the end of this week. Then we can implement and iterate on the actual design of the batch endpoint in the Gutenberg repository. This will require making the match_request_to_handler and respond_to_request methods temporarily public. We can mark them as private and then remove the public visibility when the batch endpoint is merged before Beta 1.

This ticket was mentioned in Slack in #core by hellofromtonya. View the logs.


3 months ago

#30 @hellofromTonya
3 months ago

  • Keywords early added

Moving to early per conversation in Slack bug scrub.

#31 @TimothyBlynJacobs
3 months ago

In 48945:

REST API: Support a route-level validation callback.

Most request data is validated on a per-parameter basis. Often, however, additional validation is needed that operates on the entire request object. Currently, this is done in the route callback and often in the prepare_item_for_database method specifically.

#50244 aims to introduce batch processing in the REST API. An important feature is the ability to enforce that all requests have valid data before executing the route callbacks in "pre-validate" mode.

This patch introduces support for calling a validate_callback after all parameter validation has succeeded. That allows moving more validation outside of the route callback and into WP_REST_Request which will improve "pre-validate" support.

Props TimothyBlynJacobs, zieladam.
Fixes #51255.
See #50244.

#32 @prbot
3 months ago

TimothyBJacobs commented on PR #311:

I royally screwed up this rebase. Closing this out.

#34 @TimothyBlynJacobs
3 months ago

In 48947:

REST API: Refactor WP_REST_Server::dispatch() to make internal logic reusable.

#50244 aims to introduce batch processing in the REST API. An important feature is the ability to enforce that all requests have valid data before executing the route callbacks in "pre-validate" mode.

This necessitates splitting WP_REST_Server::dispatch() into two methods so the batch controller can determine the request handler to perform pre-validation and then respond to the requests.

The two new methods, match_request_to_handler and respond_to_request, have a public visibility, but are marked as @access private. This is to allow for iteration on the batch controller to happen in the Gutenberg repository. Developers should not rely upon these methods, their visibility may change in the future.

See #50244.
Props andraganescu, zieladam, TimothyBlynJacobs.

#35 @TimothyBlynJacobs
3 months ago

  • Keywords early removed

Removing the early keyword. The early portion of the ticket has been committed. The actual batch implementation will come with the Gutenberg code merge near Beta 1.

That PR is in progress here: https://github.com/WordPress/gutenberg/pull/25096

Last edited 3 months ago by TimothyBlynJacobs (previous) (diff)

This ticket was mentioned in Slack in #core by hellofromtonya. View the logs.


3 months ago

#37 @hellofromTonya
7 weeks ago

Notes from conversation with Timothy earlier today in Slack:

it’ll end up being merge Monday most likely when @ noisysocks does the general 5.6 Gutenberg merge

#38 @TimothyBlynJacobs
7 weeks ago

Something we need to decide is whether this should ship as __experimental or as v1. Generally we've decided to ship APIs as versioned, not __experimental. If we need to, we can increment the version :) The benefit of __experimental is that we can make changes without maintaining technical debt.

Cc'ing:

@spacedmonkey
@andraganescu
@zieladam
@helen
@youknowriad

This ticket was mentioned in Slack in #core-restapi by hellofromtonya. View the logs.


6 weeks ago

#42 @TimothyBlynJacobs
6 weeks ago

  • Resolution set to fixed
  • Status changed from reviewing to closed

In 49252:

REST API: Introduce support for batching API requests.

A new route is introduced, batch/v1, that accepts a list of API requests to run. Each request runs in sequence, and the responses are returned in the order they've been received.

Optionally, the require-all-validate validation mode can be used to first validate each request's parameters and only proceed with processing if each request validates successfully.

By default, the batch size is limited to 25 requests. This can be controlled using the rest_get_max_batch_size filter. Clients are strongly encouraged to discover the maximum batch size supported by the server by making an OPTIONS request to the batch/v1 endpoint and inspecting the described arguments.

Additionally, the two new methods, match_request_to_handler and respond_to_request introduced in [48947] now have a protected visibility as we don't want to expose the inner workings of the WP_REST_Server::dispatch API.

Batching is not currently supported for GET requests.

Fixes #50244.
Props andraganescu, zieladam, TimothyBlynJacobs.

#43 @prbot
6 weeks ago

TimothyBJacobs commented on PR #629:

Fixed in 9defd1fabcfc6e17736ff2c285d5f0ad5772533b.

This ticket was mentioned in Slack in #core by helen. View the logs.


5 weeks ago

Note: See TracTickets for help on using tickets.