WordPress.org

Make WordPress Core

Opened 3 years ago

Last modified 4 months ago

#45471 assigned enhancement

Allow caching of parse_blocks results

Reported by: joostdevalk Owned by: francina
Milestone: Future Release Priority: normal
Severity: normal Version:
Component: Cache API Keywords: early needs-refresh needs-unit-tests
Focuses: performance Cc:

Description

A *lot* of Gutenberg implementations are going to have to parse the blocks in a post. Core itself already parses the blocks on output, but also when trimming an excerpt in excerpt_remove_blocks. All this parsing is done with parse_blocks. Unfortunately, the only thing passed to parse_blocks right now is a string, with no way of caching it.

My suggestion would be to add a cache key to the parse_blocks function, which defaults to false. I've attached a proposed patch to the function, if we agree on this we could then look at how to implement this in core itself.

Attachments (1)

cache-key.patch (1.1 KB) - added by joostdevalk 3 years ago.
Cache key patch

Download all attachments as: .zip

Change History (11)

@joostdevalk
3 years ago

Cache key patch

This ticket was mentioned in Slack in #core-editor by joostdevalk. View the logs.


3 years ago

This ticket was mentioned in Slack in #core-editor by desrosj. View the logs.


3 years ago

#3 @dmsnell
3 years ago

This is a good idea, definitely valid for re-parses.

The cache key however appears to be based purely on the content of the document though and not on the parser, which is filterable. I think that in almost all cases this will be fine as the parser won't likely change within a cache session, but perhaps we could rewrite the patch to account for that.

For instance, we could probably just add the name from the parser class given in the filter to the cache key. On this point, why pass in the cache key and default to false at all? I'm probably missing something but what key would we use other than one we could autogenerate based on content and parser?

function parse_blocks( $content ) {
        $parser_class = apply_filters( 'block_parser_class', 'WP_Block_Parser' );
        $cache_key    = sprintf( 'block_%s_%s', md5( $content ), $parser_class );

        $cached = wp_cache_get( $cache_key, 'parsed_blocks' );
        if ( $cached ) {
                return $cached;
        }

        $parser = new $parser_class();
        $parsed = $parser->parse( $content );

        wp_cache_add( $cache_key, $parsed, 'parsed_blocks' );

        return $parsed;
}

Are we trying to only turn on caching when we think or know we will need it?

Also, at this point is there a reason not to create a PR in the Github repo?

Thanks! This is great having more help on this area.

#4 @desrosj
3 years ago

In the current state of the patch, a cached empty string would not be returned and would always trigger parsing. if ( false !== $cached ) should be used if an empty string should be returned when cached.

This ticket was mentioned in Slack in #hosting-community by joostdevalk. View the logs.


3 years ago

#6 @francina
8 months ago

  • Keywords early needs-refresh added; has-patch removed
  • Milestone changed from Awaiting Review to 5.8
  • Owner set to francina
  • Status changed from new to assigned

#7 @francina
4 months ago

  • Milestone changed from 5.8 to Future Release

This is not going to make it on time for the feature freeze cutoff date, May 25, 2021. Moving to future release.

#8 @desrosj
4 months ago

Was just thinking this through a bit, and I have a few concerns.

First, I think that if caching is added for block parsing, it should be on by default, and WordPress should manage cache keys automatically.

Second, I think we have to work under the assumption that the parser changes the output. I lean towards @dmsnell's example above, but it doesn't tell us if the parsing will change the block. One potential option around this is to use the content before parsing to create the md5() hash, and the content after parsing for the stored cache value. As long as the pre-parsed content is unchanged, and the parsing class remains the same, the output content should be the same.

I'm also wondering if it's better to delegate the caching responsibility to the parsing classes (WP_Block_Parser by default). Different parsing classes may want to cache blocks more or less aggressively. Are there any good examples of custom block parsing classes in the wild? I think it may help to review those.

#9 follow-up: @dmsnell
4 months ago

Second, I think we have to work under the assumption that the parser changes the output. I lean towards @dmsnell's example above, but it doesn't tell us if the parsing will change the block. One potential option around this is to use the content before parsing to create the md5() hash, and the content after parsing for the stored cache value. As long as the pre-parsed content is unchanged, and the parsing class remains the same, the output content should be the same.

I'm confused on what you are saying @desrosj - I think what you list as a "potential option" is what I proposed in my comment earlier, though "doesn't tell us if the parsing with change the block" is still unclear. Just to be clear, we're not looking at rendered output in this function but only a transformation from the serialized HTML into the block objects that are fed into the renderer. $content in this case is "the pre-parsed content."

a cached empty string would not be returned and would always trigger parsing

Good catch! Type errors strike again. Practically speaking though this wouldn't really matter since it probably takes about as much time to parse an empty document as it does to pull it from cache. Caching empty documents also seems wrong; we could early-abort if the input document is blank.

I'm also wondering if it's better to delegate the caching responsibility to the parsing classes

This is a great idea and one of the reasons we added in the filterable parser. That being said, caching the parse generally seems like a fair option to stick in as the default. This would break behaviors if a custom parser class introduced non-determinism in its parse, possibly through something like random ids or time-based parsing, or *gasp* through database lookups. This isn't the place to perform those side-effects but someone will probably do it.

If we want to demonstrate a custom caching parser class we could wrap the default behavior and it might serve as an example of how to wrap the parser.

<?php
class WP_Caching_Block_Parser {
    public function construct( $parser_class = 'WP_Block_Parser' ) {
       $this->parser_class = $parser_class;
    }

    public function parse( $content ) {
        if ( ! $content ) {
            return [];
        }

        $cache_key    = sprintf( 'block_%s_%s', md5( $content ), $this->parser_class );

        $cached = wp_cache_get( $cache_key, 'parsed_blocks' );
        if ( $cached ) {
                return $cached;
        }

        $parser = new ($this->parser_class)();
        $parsed = $parser->parse( $content );
    }
}

function parse_blocks( $content ) {
    $parser_class = apply_filters( 'block_parser_class', 'WP_Caching_Block_Parser' );
    $parser = new $parser_class();
    return $parser->parse( $content );
}

Well I don't like what I just wrote in a permanent sense but the idea at least is close to what I want - default behavior is to cache all posts and use the default parser. If you have special needs then you can replace the whole thing, and if you want to keep the caching behavior around you can wrap your own class in this one.

Are there any good examples of custom block parsing classes in the wild? I think it may help to review those.

I haven't come across any.

#10 in reply to: ↑ 9 @desrosj
4 months ago

  • Keywords needs-unit-tests added; dev-feedback removed

Replying to dmsnell:

I think what you list as a "potential option" is what I proposed in my comment earlier, though "doesn't tell us if the parsing with change the block" is still unclear.

Ah, yes. I just reread your example and I think I misread it originally. We are suggesting the same thing!

Note: See TracTickets for help on using tickets.