Opened 16 months ago
Last modified 5 months ago
#62057 assigned defect (bug)
Addressing Autosave Memory Issues
| Reported by: |
|
Owned by: |
|
|---|---|---|---|
| Milestone: | Future Release | Priority: | normal |
| Severity: | normal | Version: | |
| Component: | Editor | Keywords: | has-patch dev-reviewed reporter-feedback |
| Focuses: | Cc: |
Description
This is based on research put together by @tallulahhh. I am assembling this ticket as part of the WordCamp US Contributor day.
To put it in brief: Autosave handling loads too much unused stuff into PHP memory on editor load. This introduces risk of autosave-induced OOM at scale, and wastes memory and resources across the board. Thankfully, I’ve also found that it’s very simple to effectively address, and I don’t think it’s hyperbole to claim this has scope to save a significant amount of waste WP memory, transport and query complexity – platform-wide, and even globally.
Attachments (2)
Change History (27)
#2
@
16 months ago
The src/wp-content/themes/twentytwentyone/.npmrc change in the patch is unrelated.
#4
@
16 months ago
Sample Filter
<?php // Removes the autosaves route from editor // It can OOM PHP when autosave parallelism (count per-post) is massive // this sidesteps it and compels JS to fetch it from the /autosaves/ route directly function remove_autosaves_from_preload_paths( $preload_paths, $block_editor_context ) { // Loop through preload paths and remove any containing '/autosaves' foreach ( $preload_paths as $key => $path ) { // Handle paths that are arrays (like the 'OPTIONS' ones) if ( is_array( $path ) ) { if ( strpos( $path[0], '/autosaves' ) !== false ) { unset( $preload_paths[$key] ); } } // Handle string paths elseif ( strpos( $path, '/autosaves' ) !== false ) { unset( $preload_paths[$key] ); } } return $preload_paths; } add_filter( 'block_editor_rest_api_preload_paths', 'remove_autosaves_from_preload_paths', 10, 2 );
How to verify:
- make a post, save it, make an edit, and wait on that page with it open (so an autosave happens)
- open the same post in another tab, and get the yellow “there is an autosave” warning
- reload with browser dev tools > network open and filter for “autosave”
- with this filter active, you’ll see that Gutenberg fetches the autosave data from the rest route
- you can toggle the filter too: with the filter inactive, no rest request for autosaves is made because it’s preloaded
#5
@
16 months ago
Confirming that without the autosaves preload, JS loads the autosaves with context edit!
https://core.trac.wordpress.org/attachment/ticket/62057/62057.2.diff
#6
@
16 months ago
This probably needs someone from the editor team to know why we are preloading the autosaves to begin with. Does it cause weird layout shift or some other kind of jank?
#8
@
16 months ago
- Component changed from Autosave to Editor
- Focuses rest-api removed
- Milestone changed from Awaiting Review to Future Release
This preloading was added in [46110] pretty much exactly 5 years ago.
With the introduction of https://github.com/WordPress/gutenberg/pull/7945, the block editor requests autosave data when the editor is loaded.
Not clear if this sill the case, as this might have changed in the meantime.
#9
@
16 months ago
In tests, the editor behaves just the same when not loading the autosaves route, it just uses less PHP memory doing so. The "there is a new autosave" warning the editor pops when applicable still does so as expected. When the autosaves route isn't preloaded, the "newer autosave" message pops up when Gutenberg does its equivalent rest_request for it.
I tinkered with this, and found that limiting the fields output by the /autosaves/ routelet seems to have no adverse affect on the "there is a newer autosave", so there's definitely no need to load the large fields like excerpt and content just to check for presence of newer autosaves.
Here's the field-limiting filter I was using for testing:
function filter_autosaves_response( $response, $post, $request ) {
// Check if the request is for the autosaves endpoint
if ( strpos( $request->get_route(), '/wp/v2/posts/' ) !== false && strpos( $request->get_route(), '/autosaves' ) !== false ) {
// Limit the fields to id, dates, and parent
$limited_data = array(
'id' => $response->data['id'],
'date' => $response->data['date'],
'date_gmt' => $response->data['date_gmt'],
'parent' => $response->data['parent'],
);
$response->set_data( $limited_data );
}
return $response;
}
add_filter( 'rest_prepare_revision', 'filter_autosaves_response', 10, 3 );
Also useful for this, since the /autosaves/ route is private: here's a handy "force all rest route to public" filter that I was using to make it easier to directly examine the posts/123/autosaves route's response on my test rig:
function make_all_routes_public( $endpoints ) {
foreach ( $endpoints as $route => $handlers ) {
foreach ( $handlers as $key => $handler ) {
if ( isset( $endpoints[$route][$key]['permission_callback'] ) ) {
$endpoints[$route][$key]['permission_callback'] = '__return_true';
}
}
}
return $endpoints;
}
add_filter( 'rest_endpoints', 'make_all_routes_public' );
Additionally, the revision diff tool used to compare autosaves does its work with direct SQL queries, and doesn't touch the /autosaves/ route, so none of this affects that.
The conclusion I get from all this is that the autosaves route (and preloading it) is largely used for making POST requests to it (performing autosaves during editor use), and the editor's GET requests for it without a field specification arg is a waste of resources, loading fields that aren't used in a regular editor session.
So this could maybe be addressed by doing a fields arg on all GET requests for the route, something like this for the preload:
sprintf( '%s/autosaves?_fields=id,date,date_gmt,parent&context=edit', $rest_path ),
In Gutenberg, I think this would be a case of adjusting what fields are requested by getAutosaves and getAutosave (plural and singular) in packages/core-data/src/resolvers.js.
This ticket was mentioned in PR #7555 on WordPress/wordpress-develop by @adamsilverstein.
16 months ago
#11
Trac ticket: https://core.trac.wordpress.org/ticket/62057
#12
@
16 months ago
So this could maybe be addressed by doing a fields arg on all GET requests for the route, something like this for the preload:
I had hoped this smaller change would be enough to solve the issue, but I still saw the autosave fire after load.
I reviewed the [PR](https://github.com/WordPress/gutenberg/pull/7945) where this was added originally. It says:
Attempts to resolve an issue whereby the editor is unaware of the existence of
a more recent autosave, resulting in attempts to perform an autosave fail.
I was no longer able to reproduce this issue, so I think this is safe to remove.
I pushed this change to a PR to run tests, then this should be ready for commit.
#13
@
16 months ago
- Keywords dev-reviewed commit added; dev-feedback removed
It still raises the issue though that Gutenberg's JS is loading the full payload of all autosaves per-post for every post the editor loads, which can be suboptimal at scale...
This still needs addressing separately in Gutenberg directly - the current patch only delays the loading of all autosave data.
#14
@
14 months ago
- Milestone changed from Future Release to 6.8
- Owner set to adamsilverstein
- Status changed from new to assigned
#15
@
14 months ago
- Milestone changed from 6.8 to Future Release
- Owner adamsilverstein deleted
Reviewing this again, I'm not confident in making this change without additional evidence for its benefits. Rather than avoid the preload, avoiding loading the full content is probably a better route.
Reviewing just the proposed change here (removing the preload):
- The data is still loaded in full via the rest endpoint, the load is just shifted to a later point. Memory savings would be minimal overall - if that isn't the case, maye a test could show that?
- The data still needs to be loaded over the network, so I don't think there are savings there.
- It is not clear if the original reason for adding the preload is still relevant - eg. there is a risk of a regression here.
#16
@
9 months ago
Back again with a big discovery on this:
- Gutenberg only ever calls the autosaves route for one autosave at a time, be it GET or POST. There’s getAutosave singular and getAutosaves plural in Gutenberg – and only the single one is ever used.
This means the autosave route and all Gutenberg’s interactions with it work fine when it only returns one result – and the memory consumption issue stemming from it returning all autosaves per-post by default might just be an accident after all, since Gutenberg never needs any more than one autosave.
As such, the autosave route only ever needs return one autosave.
I figured out a way to do this with filters, and limiting the autosave route payload to one autosave only can un-break a site whose editor would otherwise OOM with a crit on a post with a huge autosaves response.
---
Replication
To make testing easy, I made a mu-plugin that autosaves frequently with a different userID each time. This makes it trivial to create a huge autosave payload quickly:
<?php function custom_autosave_dynamic_author( $data, $postarr ) { // Only apply to autosaves if (defined('DOING_AUTOSAVE') && DOING_AUTOSAVE) { $timestamp = time(); $pseudo_user_id = 100000 + ($timestamp % 1000); $data['post_author'] = $pseudo_user_id; } return $data; } add_filter('wp_insert_post_data', 'custom_autosave_dynamic_author', 10, 2); define( 'AUTOSAVE_INTERVAL', 1 );
With this active, you can start with a fresh test env, edit hello-world, then type (or paste stuff) into it to create lots of autosaves from different users at a one per sec rate. After a while of doing this (especially if pasting) the autosave route's response will be gigantic, and on editor (re)load, the load of the autosaves route as part of the preloads will kill the editor in a fatal OOM.
At this point, taking the autosaves route out of the preload_paths can resolve the OOM as previously discussed (as well as making them easy to inspect payload-wise when using the editor).
But the autosaves route payload route is still wasteful for GET requests, and will, at some point of adding autosaves, OOM all on its own - which has recently been observed in the wild for particularly massive posts with many editor touches.
---
Singular autosave filter
Unfortunately per_page doesn't work for the autosaves route, so rather than changing the request in gutenberg, I had to change the response instead.
Since I've been working on this from a "unbreak the site, don't hack core" perspective for live sites, I made the autosaves route singular return only for GET requests like this:
<?php // singular autosave route for wp-json add_action( 'rest_api_init', function() { register_rest_route( 'wp/v2', '/posts/(?P<id>d+)/autosave', array( 'methods' => WP_REST_Server::READABLE, 'callback' => 'get_latest_autosave', 'permission_callback' => 'autosave_permission_check', 'args' => array( 'id' => array( 'description' => __( 'Unique identifier for the post.' ), 'type' => 'integer', ), ), ) ); } ); function autosave_permission_check( WP_REST_Request $request ) { $post_id = (int) $request['id']; if ( ! current_user_can( 'edit_post', $post_id ) ) { return new WP_Error( 'rest_cannot_view', __( 'Sorry, you are not allowed to view autosaves for this post.' ), array( 'status' => rest_authorization_required_code() ) ); } return true; } function get_latest_autosave( WP_REST_Request $request ) { $post_id = (int) $request['id']; $post = get_post( $post_id ); if ( ! $post ) { return new WP_Error( 'rest_post_invalid', __( 'Invalid post ID.' ), array( 'status' => 404 ) ); } // wp_get_post_autosave() returns the latest autosave WP_Post object if one exists. $autosave = wp_get_post_autosave( $post_id ); if ( ! $autosave ) { // No autosave found; return an empty array. return rest_ensure_response( [] ); } // Use the existing autosaves controller to prepare the autosave data. $controller = new WP_REST_Autosaves_Controller( 'post' ); $response = $controller->prepare_item_for_response( $autosave, $request ); return rest_ensure_response( $response ); }
Applying this to the previous replication step un-breaks the editor for the affected post and allows normal operation.
This has been enough to un-break sites with big autosave payloads, and though it would obviously take a different shape in a core patch, I wanted to share how we've been doing it filter-style for affected envs.
Removing the route from preload_paths can also help avoid OOM for sites whose singular autosave payload is huge all on its own.
This filter and replication method should be enough to prove that Core needs the current autosaves response to GET requests to be singular-autosave only. Perhaps a second route or ?all arg could be used for Gutenberg's getAutosaves (plural) method.
I'll be away from this a while, but hopefully that helps move things along :)
#17
@
9 months ago
Thanks for the detailed analysis @tallulahhh - this is immensely helpful in understanding the scope of the issue.
Unfortunately per_page doesn't work for the autosaves route, so rather than changing the request in gutenberg, I had to change the response instead.
Hmm, I wonder why? Maybe that needs fixing first. If it did work correctly, would that solve the issue (for many revisions OOM anyway)?
We should make sure the fields key is supported so we can avoid requesting the autosave content if we don't need it.
To make testing easy, I made a mu-plugin that autosaves frequently with a different userID each time.
Clever, thanks for sharing that.
As such, the autosave route only ever needs return one autosave.
some use cases may want more than one though.
#18
@
8 months ago
@tallulahhh Thanks again for all the additional details and direction. I took a pass at adding support for the per_page parameter to the autosaves route and changing the preload to only load a single autosave in https://github.com/WordPress/wordpress-develop/pull/7555.
What do you think? Can you give that a test and see if it resolves the issue you were experiencing?
#20
@
7 months ago
- Keywords commit removed
As there is no consensus on landing a specific patch, I am removing the commit keyword
#21
@
6 months ago
@adamsilverstein This looks very promising at first glance!
I've just returned from sabbatical so I haven't yet had a chance to test https://github.com/WordPress/wordpress-develop/pull/7555 yet, but the method looks sound and I'll get into it soon.
I would still encourage the removal of the preload, for these reasons:
- loading the editor and a gigantic autosave response (even if it's only one autosave) makes it easier to hit PHP mem limit for that init and OOM, whereas avoiding the preload splits that mem usage over 2 inits instead and gives more php mem overhead for the big editor init
- easier to debug from the editor when you can see the autosave request/response activity in devtools
Customers with heroically large posts (and editor plugins etc eating php mem) can get a lot of relief from showstopping editor OOMs this way, regardless of the "too many autosaves in response" situation.
It is technically 2 issues - 'too many autosaves in response' and 'autosave preload carries oom risk for huge posts' - but I'm aware that removal of the preload here would require a companion PR for Gutenberg to make the requests do per-page=1, so while I encourage losing the preload, I'll defer to y'all as to how this should be handled.
I'll give the PR a test (in the conditions in which I first found and isolated the issue) soon. Eager to mitigate this excessive autosave payload waste ecosystem-wide! 👍
#24
@
5 months ago
- Milestone changed from 6.9 to Future Release
Thanks for the feedback @tallulahhh - I will work on updating the PR.
#25
@
5 months ago
Added a companion PR for Gutenberg to avoid loading multiple autosaves from the client side: https://github.com/WordPress/gutenberg/pull/71367
Here's some context around risk of Autosave Parallelism...
When loading a post, the WP editor preloads data from a bunch of rest routes in the
preload_pathsarray inwp-admin/edit-form-blocks.php, including one for the post's autosaves, a route that goes like this:wp-json/wp/v2/posts/123/autosavesSince the full autosaves route payload includes the_content of each autosave, and retains 1 autosave per user that's ever edited the post, this can become large quite quickly!
In a situation where a post has an extremely large number of autosaves or very large autosave content, this can create a PHP OOM when the (potentially massive, unchecked) payload from the autosaves route is loaded by PHP, making the post OOM and seem impossible to load/edit in the editor.
This has been observed in the wild, with OOMs, in sites with a variety of users editing longstanding posts with long content.
It can be verified by measuring PHP memory usage in the editor - posts with a large autosave preload payload will consume more PHP memory. With some multiple megabyte pastes of lorem ipsum in the editor, and 1min to allow the autosave/s to take place, the difference is noticeable.
The PHP OOM risk can be mitigated by simply not preloading them, by removing the autosaves route from
preload_paths- which instead makes JS load it directly from the rest route. Previously-affected large posts with large/many autosaves don't PHP OOM once the preload is prevented.Specifically, this PHP OOM risk can be avoided by removing this line from
wp-admin/edit-form-blocks.php:sprintf( '%s/autosaves?context=edit', $rest_path ),This method (or doing it in a filter to
block_editor_rest_api_preload_paths) is reliable for sites that skirt the upper limits of autosave parallelism.Steps to verify:
It still raises the issue though that Gutenberg's JS is loading the full payload of all autosaves per-post for every post the editor loads, which can be suboptimal at scale...
In theory, the editor technically only needs the dates of existing autosaves to see if a newer one exists when opening a post - which could be done using a fields arg for the initial request - at which point it could then (if needed) load an individual autosave from the rest route by ID to diff against the editor content. Avoiding loading the_content of multiple autosaves at once could therefore keep autosave-based memory consumption/waste down in the editor in general.