Opened 4 months ago
Last modified 8 weeks ago
#62057 assigned defect (bug)
Addressing Autosave Memory Issues
Reported by: | whyisjake | Owned by: | |
---|---|---|---|
Milestone: | Future Release | Priority: | normal |
Severity: | normal | Version: | |
Component: | Editor | Keywords: | has-patch dev-reviewed commit |
Focuses: | Cc: |
Description
This is based on research put together by @tallulahhh. I am assembling this ticket as part of the WordCamp US Contributor day.
To put it in brief: Autosave handling loads too much unused stuff into PHP memory on editor load. This introduces risk of autosave-induced OOM at scale, and wastes memory and resources across the board. Thankfully, I’ve also found that it’s very simple to effectively address, and I don’t think it’s hyperbole to claim this has scope to save a significant amount of waste WP memory, transport and query complexity – platform-wide, and even globally.
Attachments (2)
Change History (17)
#2
@
4 months ago
The src/wp-content/themes/twentytwentyone/.npmrc
change in the patch is unrelated.
#4
@
4 months ago
Sample Filter
<?php // Removes the autosaves route from editor // It can OOM PHP when autosave parallelism (count per-post) is massive // this sidesteps it and compels JS to fetch it from the /autosaves/ route directly function remove_autosaves_from_preload_paths( $preload_paths, $block_editor_context ) { // Loop through preload paths and remove any containing '/autosaves' foreach ( $preload_paths as $key => $path ) { // Handle paths that are arrays (like the 'OPTIONS' ones) if ( is_array( $path ) ) { if ( strpos( $path[0], '/autosaves' ) !== false ) { unset( $preload_paths[$key] ); } } // Handle string paths elseif ( strpos( $path, '/autosaves' ) !== false ) { unset( $preload_paths[$key] ); } } return $preload_paths; } add_filter( 'block_editor_rest_api_preload_paths', 'remove_autosaves_from_preload_paths', 10, 2 );
How to verify:
- make a post, save it, make an edit, and wait on that page with it open (so an autosave happens)
- open the same post in another tab, and get the yellow “there is an autosave” warning
- reload with browser dev tools > network open and filter for “autosave”
- with this filter active, you’ll see that Gutenberg fetches the autosave data from the rest route
- you can toggle the filter too: with the filter inactive, no rest request for autosaves is made because it’s preloaded
#5
@
4 months ago
Confirming that without the autosaves preload, JS loads the autosaves with context edit!
https://core.trac.wordpress.org/attachment/ticket/62057/62057.2.diff
#6
@
4 months ago
This probably needs someone from the editor team to know why we are preloading the autosaves to begin with. Does it cause weird layout shift or some other kind of jank?
#8
@
4 months ago
- Component changed from Autosave to Editor
- Focuses rest-api removed
- Milestone changed from Awaiting Review to Future Release
This preloading was added in [46110] pretty much exactly 5 years ago.
With the introduction of https://github.com/WordPress/gutenberg/pull/7945, the block editor requests autosave data when the editor is loaded.
Not clear if this sill the case, as this might have changed in the meantime.
#9
@
4 months ago
In tests, the editor behaves just the same when not loading the autosaves route, it just uses less PHP memory doing so. The "there is a new autosave" warning the editor pops when applicable still does so as expected. When the autosaves route isn't preloaded, the "newer autosave" message pops up when Gutenberg does its equivalent rest_request for it.
I tinkered with this, and found that limiting the fields output by the /autosaves/ routelet seems to have no adverse affect on the "there is a newer autosave", so there's definitely no need to load the large fields like excerpt and content just to check for presence of newer autosaves.
Here's the field-limiting filter I was using for testing:
function filter_autosaves_response( $response, $post, $request ) { // Check if the request is for the autosaves endpoint if ( strpos( $request->get_route(), '/wp/v2/posts/' ) !== false && strpos( $request->get_route(), '/autosaves' ) !== false ) { // Limit the fields to id, dates, and parent $limited_data = array( 'id' => $response->data['id'], 'date' => $response->data['date'], 'date_gmt' => $response->data['date_gmt'], 'parent' => $response->data['parent'], ); $response->set_data( $limited_data ); } return $response; } add_filter( 'rest_prepare_revision', 'filter_autosaves_response', 10, 3 );
Also useful for this, since the /autosaves/ route is private: here's a handy "force all rest route to public" filter that I was using to make it easier to directly examine the posts/123/autosaves route's response on my test rig:
function make_all_routes_public( $endpoints ) { foreach ( $endpoints as $route => $handlers ) { foreach ( $handlers as $key => $handler ) { if ( isset( $endpoints[$route][$key]['permission_callback'] ) ) { $endpoints[$route][$key]['permission_callback'] = '__return_true'; } } } return $endpoints; } add_filter( 'rest_endpoints', 'make_all_routes_public' );
Additionally, the revision diff tool used to compare autosaves does its work with direct SQL queries, and doesn't touch the /autosaves/ route, so none of this affects that.
The conclusion I get from all this is that the autosaves route (and preloading it) is largely used for making POST requests to it (performing autosaves during editor use), and the editor's GET requests for it without a field specification arg is a waste of resources, loading fields that aren't used in a regular editor session.
So this could maybe be addressed by doing a fields arg on all GET requests for the route, something like this for the preload:
sprintf( '%s/autosaves?_fields=id,date,date_gmt,parent&context=edit', $rest_path ),
In Gutenberg, I think this would be a case of adjusting what fields are requested by getAutosaves
and getAutosave
(plural and singular) in packages/core-data/src/resolvers.js
.
This ticket was mentioned in PR #7555 on WordPress/wordpress-develop by @adamsilverstein.
3 months ago
#11
Trac ticket: https://core.trac.wordpress.org/ticket/62057
#12
@
3 months ago
So this could maybe be addressed by doing a fields arg on all GET requests for the route, something like this for the preload:
I had hoped this smaller change would be enough to solve the issue, but I still saw the autosave fire after load.
I reviewed the [PR](https://github.com/WordPress/gutenberg/pull/7945) where this was added originally. It says:
Attempts to resolve an issue whereby the editor is unaware of the existence of
a more recent autosave, resulting in attempts to perform an autosave fail.
I was no longer able to reproduce this issue, so I think this is safe to remove.
I pushed this change to a PR to run tests, then this should be ready for commit.
#13
@
3 months ago
- Keywords dev-reviewed commit added; dev-feedback removed
It still raises the issue though that Gutenberg's JS is loading the full payload of all autosaves per-post for every post the editor loads, which can be suboptimal at scale...
This still needs addressing separately in Gutenberg directly - the current patch only delays the loading of all autosave data.
#14
@
2 months ago
- Milestone changed from Future Release to 6.8
- Owner set to adamsilverstein
- Status changed from new to assigned
#15
@
8 weeks ago
- Milestone changed from 6.8 to Future Release
- Owner adamsilverstein deleted
Reviewing this again, I'm not confident in making this change without additional evidence for its benefits. Rather than avoid the preload, avoiding loading the full content is probably a better route.
Reviewing just the proposed change here (removing the preload):
- The data is still loaded in full via the rest endpoint, the load is just shifted to a later point. Memory savings would be minimal overall - if that isn't the case, maye a test could show that?
- The data still needs to be loaded over the network, so I don't think there are savings there.
- It is not clear if the original reason for adding the preload is still relevant - eg. there is a risk of a regression here.
Here's some context around risk of Autosave Parallelism...
When loading a post, the WP editor preloads data from a bunch of rest routes in the
preload_paths
array inwp-admin/edit-form-blocks.php
, including one for the post's autosaves, a route that goes like this:wp-json/wp/v2/posts/123/autosaves
Since the full autosaves route payload includes the_content of each autosave, and retains 1 autosave per user that's ever edited the post, this can become large quite quickly!
In a situation where a post has an extremely large number of autosaves or very large autosave content, this can create a PHP OOM when the (potentially massive, unchecked) payload from the autosaves route is loaded by PHP, making the post OOM and seem impossible to load/edit in the editor.
This has been observed in the wild, with OOMs, in sites with a variety of users editing longstanding posts with long content.
It can be verified by measuring PHP memory usage in the editor - posts with a large autosave preload payload will consume more PHP memory. With some multiple megabyte pastes of lorem ipsum in the editor, and 1min to allow the autosave/s to take place, the difference is noticeable.
The PHP OOM risk can be mitigated by simply not preloading them, by removing the autosaves route from
preload_paths
- which instead makes JS load it directly from the rest route. Previously-affected large posts with large/many autosaves don't PHP OOM once the preload is prevented.Specifically, this PHP OOM risk can be avoided by removing this line from
wp-admin/edit-form-blocks.php
:sprintf( '%s/autosaves?context=edit', $rest_path ),
This method (or doing it in a filter to
block_editor_rest_api_preload_paths
) is reliable for sites that skirt the upper limits of autosave parallelism.Steps to verify:
It still raises the issue though that Gutenberg's JS is loading the full payload of all autosaves per-post for every post the editor loads, which can be suboptimal at scale...
In theory, the editor technically only needs the dates of existing autosaves to see if a newer one exists when opening a post - which could be done using a fields arg for the initial request - at which point it could then (if needed) load an individual autosave from the rest route by ID to diff against the editor content. Avoiding loading the_content of multiple autosaves at once could therefore keep autosave-based memory consumption/waste down in the editor in general.