Make WordPress Core

Opened 8 years ago

Closed 7 years ago

Last modified 5 years ago

#38131 closed enhancement (fixed)

REST API: Provide interface to include or exclude specific fields from response JSON

Reported by: kadamwhite's profile kadamwhite Owned by: kadamwhite's profile kadamwhite
Milestone: 4.9 Priority: normal
Severity: normal Version: 4.4
Component: REST API Keywords: commit has-unit-tests has-patch
Focuses: performance Cc:

Description

A frequent request in the API content routes plugin (see discussion in #446, #445) is to provide a mechanism to include or exclude specific fields from the response JSON, to simplify the response object and to limit the size of the object that must be transferred down to the requesting client.

This was de-scoped a few years back in #572 as plugin territory, because it was not felt to meet the 80/20 balance necessary to deal with it in a first-party way, but it's been one of the most consistently discussed issues since. Since field inclusion or exclusion relates to the core infrastructure of the API and not to the specific routes provided in the REST API plugin, the right place for this discussion to continue is here on Trac.

The ask, from API consumers: provide a query parameter argument that can be used to include or exclude fields from the response JSON. Variations of ?fields[]=title&fields[]=excerpt&fields[]=id have been discussed (and implemented in at least one plugin), while another plugin has been implemented that uses JMESPath to specify the fields to include.

Advantages:

  • Smaller JSON transferred over the wire; bandwidth is extremely precious (Ericsson Mobility Report, Feb 2016) and transferring as little data as is necessary for a given interface is a responsibility we should take on as application developers. If you're rendering out a generic archive list you likely do not need all of the post content, e.g., and you'll be saving people time and money by not including that in the download.
  • Minor developer convenience improvement to be able to request and work with only what you want

Disadvantages:

  • There is no impact to the underlying queries performed, so while limiting fields saves on bandwidth performance it does not impact server performance
  • Could prohibit effective server caching of responses in some cases, negatively impacting time to response

I move to continue the discussion around providing this functionality as a first-party capability of the API infrastructure, as I believe we should be doing whatever we can to provide means of reducing the bandwidth necessary to consume WordPress content through the API.

Attachments (4)

38131.diff (4.6 KB) - added by kadamwhite 8 years ago.
Initial patch based on PoC discussion w/ Ryan McCue
38131.2.diff (4.6 KB) - added by adamsilverstein 7 years ago.
38131.3.diff (15.8 KB) - added by TimothyBlynJacobs 7 years ago.
38131.4.diff (5.2 KB) - added by kadamwhite 7 years ago.
Revision of 38131.2.diff supporting array format on _fields param

Download all attachments as: .zip

Change History (45)

#1 @rmccue
8 years ago

My opinion is that we continue to pursue the idea, but in plugin territory as a feature plugin. So far, there hasn't been a compelling plugin to handle this (which is totally technically feasible), and we don't really want it in the core API infrastructure just yet.

I think it's valuable to continue tracking the issue, but with the resolution for now to implement in a feature plugin and explore the idea further there.

@kadamwhite
8 years ago

Initial patch based on PoC discussion w/ Ryan McCue

#2 @kadamwhite
8 years ago

After re-opening discussion of adding this parameter on the REST API plugin repository, I've uploaded initial patch based on a PoC @rmccue created, adapted for use within core instead of as a plugin. The implementation here follows the comma-separated fields=fieldname1,fieldname2 paradigm used with the .com API, among others, with the parameter key switched to _fields to match the other core API properties. (Although "context" is still un-underscored).

I have some open questions for discussion:

  • context is always specified in the accepted args for get collections in the API index (on /wp-json/); _embed and other underscore-prefixed properties are not. Should this property be specified in the schema anywhere?
  • This currently will remove _links on collection responses, and maintain them on single-item responses: the inconsistency is bad on its own, but it raises the question of whether _links should be excludable via this mechanism. I lean towards yes because since _fields as implemented here is opt-in only it is up to the consumer of the API to specify what they want, and if they do not want _links excluding it saves a lot of size (particularly in collection responses).
  • This code may be in the wrong place within core's codebase
  • Should this be supported for all types of request, or just GET? I lean towards supporting broadly
  • I've never written unit tests for core before, these may be over verbose or malformed in some way.

This ticket was mentioned in Slack in #core-restapi by kadamwhite. View the logs.


8 years ago

#4 follow-up: @rmccue
8 years ago

  • Milestone changed from Awaiting Review to Future Release

I want to avoid having this in core until we've fully explored all the edge cases around this. In particular, this has the potential to ruin both PHP and server caches. We're not really gaining any sort of performance benefit out of it, since the data has to be queried anyway, so it's purely about response size.

Until we've sorted this out and properly discussed the repercussions of having essentially dynamic responses, it needs to be in a plugin. I'd like to keep this ticket open though, because I do want to see that eventually resolved in core.

#5 @Bueltge
8 years ago

Maybe it is a better idea to track include, exclude fields in different solutions, track tickets?

Also as small complement this plugin (https://github.com/bueltge/wp-rest-api-filter-items), that currently in usage on customer sites to reduce to specific fields. Maybe it is a help to create a solid solution.

This ticket was mentioned in Slack in #core-restapi by kadamwhite. View the logs.


8 years ago

#7 in reply to: ↑ 4 @jnylen0
7 years ago

  • Version set to 4.4

Replying to rmccue:

In particular, this has the potential to ruin both PHP and server caches.

I'm not sure this is such a big issue. In practice, I wouldn't expect much variation in the sets of fields requested by clients - it will either be the full response or a whitelisted set of fields that is the same across different requests. Also, correctly implemented API caching already needs to respect other GET parameters anyway.

We're not really gaining any sort of performance benefit out of it, since the data has to be queried anyway, so it's purely about response size.

Performance benefits should be part of the task here. It's not hard to envision code that skips most of the parameter calculations/transformations entirely if they weren't requested as part of the response. A lot of the API is already written this way: instead of checking $schema['parameter'] we'd check $response_fields['parameter'].

Even if that's not a big performance benefit generally (since most of the data is already being queried), there are definitely specific cases where we can save a lot of time by excluding fields. The first example that comes to mind is content.rendered. WP's content filters are expensive, and can even include external API requests (see also https://github.com/WP-API/WP-API/issues/2589).

I'd like to keep this ticket open though, because I do want to see that eventually resolved in core.

+1

This ticket was mentioned in Slack in #core-restapi by jnylen. View the logs.


7 years ago

This ticket was mentioned in Slack in #core-restapi by kadamwhite. View the logs.


7 years ago

#10 @jnylen0
7 years ago

It would be nice if this also provided a way to specify keys returned for embeds - context=embed is a pretty limited set of data which is not always sufficient, as seen recently in #39805.

Last edited 7 years ago by jnylen0 (previous) (diff)

This ticket was mentioned in Slack in #core-restapi by kadamwhite. View the logs.


7 years ago

#12 @adamsilverstein
7 years ago

38131.2.diff is a refresh of .1 against trunk with the following changes:

  • trim the parameters so whitespace is ignored
  • bail early if the parameter passes no fields (?fields=)
  • slight docs update

@kadamwhite I agree we should support all request types. Also, I think should exclude _links when _fields are passed.

This ticket was mentioned in Slack in #core-restapi by adamsilverstein. View the logs.


7 years ago

#14 @adamsilverstein
7 years ago

  • Milestone changed from Future Release to 4.8

This ticket was mentioned in Slack in #core-restapi by adamsilverstein. View the logs.


7 years ago

#16 @joehoyle
7 years ago

I'm +1 on this. I think we should probably handle embeds and object fields as two issues. Choosing _which_ embeds could potentially be part of the "object fields", but once we get into whitelisting fields of specific embeds I think we are getting into GraphQL territory. While I think there may be merit in such a thing, I think a good first milestone would be whitelisting embeds and specific fields of the main requested object.

I think the patch by @adamsilverstein is a good start - we could roll with something like this for now, and then add internal optimizations in follow up patches / releases. As @jnylen0 pointed out we already do a good amount of schema checking. I think there might be potentially to actually have get_item_schema() take into account the requested fields; or do the "diff" early, so then we don't need to change any of the code that does isset( $schema['title'] ) etc, so we can just keep that pattern, and implement it on more controllers and fields over time.

#17 @adamsilverstein
7 years ago

We created a plugin for further iteration and testing on this ticket. Testing and PR's welcome: https://github.com/WP-API/rest-filter-response-fields

#19 @rmccue
7 years ago

  • Milestone changed from 4.8 to Future Release

Not ready for 4.8, bumping.

This ticket was mentioned in Slack in #core-restapi by adamsilverstein. View the logs.


7 years ago

This ticket was mentioned in Slack in #core-restapi by jnylen. View the logs.


7 years ago

This ticket was mentioned in Slack in #core by afercia. View the logs.


7 years ago

#23 @afercia
7 years ago

As mentioned in #38920 and #38922 excluding specific fields from the response would benefit in different scenarios. Especially the links suggestion search (#38920). The current PHP/AJAX search returns just a few fields and the response size is minimal.
In Gutenberg instead, when inserting a link the response returns all the fields, including the post content. For feature parity with the current editor, there would be the need of just the post ID, title, permalink, and date. Even better, Gutenberg currently displays just the post title and permalink, so it would need even less fields.
Instead, everything is returned. I understand that's the nature of an API that has been designed for a different data consuming model. However, for an efficient use in core and in Gutenberg, the response should really be tailored to the actual, specific need on a case by case basis.

Thinking at websites with very long post contents (great publishers... prolific bloggers...) the actual response size can be potentially huge, even some MB. With PHP/AJAX, in the worst case, it's a few KB. In this specific kind of scenario, I don't see why WordPress should use a tool that is slower and heavier compared to the current implementation.

Having the ability to exclude fields would solve the issue.

Worth noting the link suggestions in Gutenberg currently returns only posts, while the old search returns posts, pages, and CPTs. This is a separate issue though, see #39965.

#24 @jnylen0
7 years ago

I don't think the patches here are taking the correct approach. For one issue, this won't work with rest_do_request() in PHP-internal code; for another, it doesn't provide any way forward on performance (all fields will still be calculated, regardless of whether they are needed).

#25 @TimothyBlynJacobs
7 years ago

Attached is an alternate approach that handles the filtering of the responses in the controller itself. This matches how the context parameter works, but it isn't included by default.

This version also handles nested fields, specifying title.raw for just the raw title or title to retrieve both the raw and rendered title. This could also be used by controllers to only calculate the fields as necessary. I'm not sure whether this should be used to limit the fields queried ( I think that can be done in WP_User_Query but I'm not sure about the others ). I imagine some of the performance benefits here, besides response size, would be not filtering the_content if it wasn't a requested field, for instance.

#26 @adamsilverstein
7 years ago

@TimothyBlynJacobs Thanks for the updated patch here, it makes sense to include the filtering code in the controller itself (using the filter made it easier to release as a plugin). The nested field support is also nice - my only concern would be about the overhead.

In terms of limiting fields queried, there is a deeper issue here with cache hydration - the query is expected to hydrate the cache for subsequent queries (also the reason WP_QUERY only offers a few options for it's 'fields parameter). if we only queried for some fields, we couldn't hydrate the cache fully, and subsequent (on the same page even) queries for post data would fail (or need another query).

Last edited 7 years ago by adamsilverstein (previous) (diff)

This ticket was mentioned in Slack in #core-restapi by timothybjacobs. View the logs.


7 years ago

This ticket was mentioned in Slack in #core-restapi by adamsilverstein. View the logs.


7 years ago

#29 @kadamwhite
7 years ago

  • Milestone changed from Future Release to 4.9
  • Owner set to kadamwhite
  • Status changed from new to accepted

After discussion with @rmccue, at this time (4.9) we plan to move forward with basic support for filtering out non-nested fields at the infrastructure level. 38131.2.diff is the best candidate at the moment.

Adding support for nesting can be handled as an enhancement later, whereas we couldn't easily back out from supporting nesting. I think we want to think through how we could leverage that to better avoid some of the work on the server before we go too far down that rabbit hole.

This ticket was mentioned in Slack in #core by jeffpaul. View the logs.


7 years ago

@kadamwhite
7 years ago

Revision of 38131.2.diff supporting array format on _fields param

#31 @kadamwhite
7 years ago

Added 38131.4.diff, which updates @adamsilverstein's 38131.2.diff to support ?_fields[]=array&_fields[]=syntax field lists, using preg_split on @TimothyBlynJacobs' recommendation. Over to @joehoyle for review.

This ticket was mentioned in Slack in #core-restapi by kadamwhite. View the logs.


7 years ago

#33 @joehoyle
7 years ago

38131.4.diff looks good to me. Questions I asked @kadamwhite just for a log:

  • Does this work with _envelope? Yes, it does.
  • Is there an issue if you pass a field that doesn't exist? No.
  • Should we include _links regardless of requested data? Given this is opt-in behavior we want to allow people to get super tiny responses, _links can be bigger than the actual response.

#34 @kadamwhite
7 years ago

  • Keywords commit has-unit-tests has-patch added

#35 @adamsilverstein
7 years ago

+1 looks good. i'll test the .org plugin we released to make sure it defers to the core functionality.

#36 @kadamwhite
7 years ago

  • Resolution set to fixed
  • Status changed from accepted to closed

In 41744:

REST API: Add _fields parameter to selectively include fields in response JSON.

Allows REST API consumers to specify the specific fields needed in their application code, whitelisting those fields and omitting all others from the returned JSON response object.
This permits applications that only need for example the ID and title of posts to avoid having to transfer the entire rendered post content over the wire alongside the desired fields.
While this whitelisting has no affect on the queries run when preparing the response, it can yield significant reductions in the bandwidth required to transfer a response payload for simple applications.

Props adamsilverstein, TimothyBlynJacobs, svrooij.
Fixes #38131.

#37 @kadamwhite
7 years ago

@adamsilverstein Thanks for thinking of that!

I've broken out a new ticket #42094 to cover the field nesting; I realized that nesting is also now supported in @svrooij's rest-api-filter-fields plugin, as well as by the JMESPath plugin (albeit with a divergent syntax).

#38 follow-up: @svrooij
7 years ago

Implementing this would render my plugin obsolete, but I’m glad that there is some attention to this in the core. As a plugin developer it wasn’t easy to implement filtering fields that would be queried from the database.

But I still think that there could be a speed improvement if for instance the_content where not the be retrieved from the database (or properly rendered).
Is there anything I can help with? Or do you need some input on something?

#39 in reply to: ↑ 38 @danielbachhuber
6 years ago

Replying to svrooij:

But I still think that there could be a speed improvement if for instance the_content where not the be retrieved from the database (or properly rendered).
Is there anything I can help with? Or do you need some input on something?

I've created a new ticket for this: #43874

This ticket was mentioned in Slack in #core-restapi by timothybjacobs. View the logs.


5 years ago

This ticket was mentioned in Slack in #core-restapi by kadamwhite. View the logs.


5 years ago

Note: See TracTickets for help on using tickets.