Make WordPress Core

Opened 3 weeks ago

Last modified 11 days ago

#62057 new defect (bug)

Addressing Autosave Memory Issues

Reported by: whyisjake's profile whyisjake Owned by:
Milestone: Future Release Priority: normal
Severity: normal Version:
Component: Editor Keywords: has-patch dev-feedback
Focuses: Cc:

Description

This is based on research put together by @tallulahhh. I am assembling this ticket as part of the WordCamp US Contributor day.

To put it in brief: Autosave handling loads too much unused stuff into PHP memory on editor load. This introduces risk of autosave-induced OOM at scale, and wastes memory and resources across the board. Thankfully, I’ve also found that it’s very simple to effectively address, and I don’t think it’s hyperbole to claim this has scope to save a significant amount of waste WP memory, transport and query complexity – platform-wide, and even globally.

Attachments (2)

62057.diff (907 bytes) - added by whyisjake 3 weeks ago.
62057.2.diff (619 bytes) - added by whyisjake 3 weeks ago.

Download all attachments as: .zip

Change History (12)

#1 @tallulahhh
3 weeks ago

Here's some context around risk of Autosave Parallelism...

When loading a post, the WP editor preloads data from a bunch of rest routes in the preload_paths array in wp-admin/edit-form-blocks.php, including one for the post's autosaves, a route that goes like this:
wp-json/wp/v2/posts/123/autosaves

Since the full autosaves route payload includes the_content of each autosave, and retains 1 autosave per user that's ever edited the post, this can become large quite quickly!

In a situation where a post has an extremely large number of autosaves or very large autosave content, this can create a PHP OOM when the (potentially massive, unchecked) payload from the autosaves route is loaded by PHP, making the post OOM and seem impossible to load/edit in the editor.

This has been observed in the wild, with OOMs, in sites with a variety of users editing longstanding posts with long content.

It can be verified by measuring PHP memory usage in the editor - posts with a large autosave preload payload will consume more PHP memory. With some multiple megabyte pastes of lorem ipsum in the editor, and 1min to allow the autosave/s to take place, the difference is noticeable.

The PHP OOM risk can be mitigated by simply not preloading them, by removing the autosaves route from preload_paths - which instead makes JS load it directly from the rest route. Previously-affected large posts with large/many autosaves don't PHP OOM once the preload is prevented.

Specifically, this PHP OOM risk can be avoided by removing this line from wp-admin/edit-form-blocks.php:
sprintf( '%s/autosaves?context=edit', $rest_path ),

This method (or doing it in a filter to block_editor_rest_api_preload_paths) is reliable for sites that skirt the upper limits of autosave parallelism.

Steps to verify:

  • Make a post with at least one autosave
  • load it in the WP editor with browser devtools looking for requests with 'autosave'
  • with the autosaves preload, PHP loads it and no rest route request is made
  • without the autosaves preload, JS loads it and the network request for it visibly fires on editor load
  • PHP memory profiling can further verify that loading the editor with preload consumes more PHP memory

It still raises the issue though that Gutenberg's JS is loading the full payload of all autosaves per-post for every post the editor loads, which can be suboptimal at scale...

In theory, the editor technically only needs the dates of existing autosaves to see if a newer one exists when opening a post - which could be done using a fields arg for the initial request - at which point it could then (if needed) load an individual autosave from the rest route by ID to diff against the editor content. Avoiding loading the_content of multiple autosaves at once could therefore keep autosave-based memory consumption/waste down in the editor in general.

Last edited 3 weeks ago by tallulahhh (previous) (diff)

@whyisjake
3 weeks ago

#2 @swissspidy
3 weeks ago

The src/wp-content/themes/twentytwentyone/.npmrc change in the patch is unrelated.

@whyisjake
3 weeks ago

#3 @whyisjake
3 weeks ago

Thanks @swissspidy.

#4 @whyisjake
3 weeks ago

Sample Filter

<?php

// Removes the autosaves route from editor
// It can OOM PHP when autosave parallelism (count per-post) is massive
// this sidesteps it and compels JS to fetch it from the /autosaves/ route directly

function remove_autosaves_from_preload_paths( $preload_paths, $block_editor_context ) {
 // Loop through preload paths and remove any containing '/autosaves'
 foreach ( $preload_paths as $key => $path ) {
     // Handle paths that are arrays (like the 'OPTIONS' ones)
     if ( is_array( $path ) ) {
         if ( strpos( $path[0], '/autosaves' ) !== false ) {
             unset( $preload_paths[$key] );
         }
     }
     // Handle string paths
     elseif ( strpos( $path, '/autosaves' ) !== false ) {
         unset( $preload_paths[$key] );
     }
 }

 return $preload_paths;
}
add_filter( 'block_editor_rest_api_preload_paths', 'remove_autosaves_from_preload_paths', 10, 2 );

How to verify:

  1. make a post, save it, make an edit, and wait on that page with it open (so an autosave happens)
  2. open the same post in another tab, and get the yellow “there is an autosave” warning
  3. reload with browser dev tools > network open and filter for “autosave”
  4. with this filter active, you’ll see that Gutenberg fetches the autosave data from the rest route
  5. you can toggle the filter too: with the filter inactive, no rest request for autosaves is made because it’s preloaded
Last edited 3 weeks ago by whyisjake (previous) (diff)

#5 @farhansabirdynasign
3 weeks ago

Confirming that without the autosaves preload, JS loads the autosaves with context edit!
https://core.trac.wordpress.org/attachment/ticket/62057/62057.2.diff

Last edited 3 weeks ago by farhansabirdynasign (previous) (diff)

#6 @TimothyBlynJacobs
3 weeks ago

This probably needs someone from the editor team to know why we are preloading the autosaves to begin with. Does it cause weird layout shift or some other kind of jank?

#7 @whyisjake
3 weeks ago

No, no jank or anything.

#8 @swissspidy
3 weeks ago

  • Component changed from Autosave to Editor
  • Focuses rest-api removed
  • Milestone changed from Awaiting Review to Future Release

This preloading was added in [46110] pretty much exactly 5 years ago.

With the introduction of https://github.com/WordPress/gutenberg/pull/7945, the block editor requests autosave data when the editor is loaded.

Not clear if this sill the case, as this might have changed in the meantime.

#9 @tallulahhh
3 weeks ago

In tests, the editor behaves just the same when not loading the autosaves route, it just uses less PHP memory doing so. The "there is a new autosave" warning the editor pops when applicable still does so as expected. When the autosaves route isn't preloaded, the "newer autosave" message pops up when Gutenberg does its equivalent rest_request for it.

I tinkered with this, and found that limiting the fields output by the /autosaves/ routelet seems to have no adverse affect on the "there is a newer autosave", so there's definitely no need to load the large fields like excerpt and content just to check for presence of newer autosaves.

Here's the field-limiting filter I was using for testing:

function filter_autosaves_response( $response, $post, $request ) {
    // Check if the request is for the autosaves endpoint
    if ( strpos( $request->get_route(), '/wp/v2/posts/' ) !== false && strpos( $request->get_route(), '/autosaves' ) !== false ) {
        // Limit the fields to id, dates, and parent
        $limited_data = array(
            'id' => $response->data['id'],
            'date' => $response->data['date'],
            'date_gmt' => $response->data['date_gmt'],
            'parent' => $response->data['parent'],
        );
        $response->set_data( $limited_data );
    }
    return $response;
}
add_filter( 'rest_prepare_revision', 'filter_autosaves_response', 10, 3 );

Also useful for this, since the /autosaves/ route is private: here's a handy "force all rest route to public" filter that I was using to make it easier to directly examine the posts/123/autosaves route's response on my test rig:

function make_all_routes_public( $endpoints ) {
    foreach ( $endpoints as $route => $handlers ) {
        foreach ( $handlers as $key => $handler ) {
            if ( isset( $endpoints[$route][$key]['permission_callback'] ) ) {
                $endpoints[$route][$key]['permission_callback'] = '__return_true';
            }
        }
    }
    return $endpoints;
}
add_filter( 'rest_endpoints', 'make_all_routes_public' );

Additionally, the revision diff tool used to compare autosaves does its work with direct SQL queries, and doesn't touch the /autosaves/ route, so none of this affects that.

The conclusion I get from all this is that the autosaves route (and preloading it) is largely used for making POST requests to it (performing autosaves during editor use), and the editor's GET requests for it without a field specification arg is a waste of resources, loading fields that aren't used in a regular editor session.

So this could maybe be addressed by doing a fields arg on all GET requests for the route, something like this for the preload:

sprintf( '%s/autosaves?_fields=id,date,date_gmt,parent&context=edit', $rest_path ),

In Gutenberg, I think this would be a case of adjusting what fields are requested by getAutosaves and getAutosave (plural and singular) in packages/core-data/src/resolvers.js.

Last edited 3 weeks ago by tallulahhh (previous) (diff)

#10 @whyisjake
11 days ago

  • Keywords has-patch dev-feedback added
Note: See TracTickets for help on using tickets.