Make WordPress Core

Opened 5 years ago

Last modified 14 months ago

#45471 assigned enhancement

Allow caching of parse_blocks results

Reported by: joostdevalk's profile joostdevalk Owned by: francina's profile francina
Milestone: Future Release Priority: normal
Severity: normal Version: 5.0
Component: Cache API Keywords: dev-feedback
Focuses: performance Cc:

Description

A *lot* of Gutenberg implementations are going to have to parse the blocks in a post. Core itself already parses the blocks on output, but also when trimming an excerpt in excerpt_remove_blocks. All this parsing is done with parse_blocks. Unfortunately, the only thing passed to parse_blocks right now is a string, with no way of caching it.

My suggestion would be to add a cache key to the parse_blocks function, which defaults to false. I've attached a proposed patch to the function, if we agree on this we could then look at how to implement this in core itself.

Attachments (1)

cache-key.patch (1.1 KB) - added by joostdevalk 5 years ago.
Cache key patch

Download all attachments as: .zip

Change History (17)

@joostdevalk
5 years ago

Cache key patch

This ticket was mentioned in Slack in #core-editor by joostdevalk. View the logs.


5 years ago

This ticket was mentioned in Slack in #core-editor by desrosj. View the logs.


5 years ago

#3 @dmsnell
5 years ago

This is a good idea, definitely valid for re-parses.

The cache key however appears to be based purely on the content of the document though and not on the parser, which is filterable. I think that in almost all cases this will be fine as the parser won't likely change within a cache session, but perhaps we could rewrite the patch to account for that.

For instance, we could probably just add the name from the parser class given in the filter to the cache key. On this point, why pass in the cache key and default to false at all? I'm probably missing something but what key would we use other than one we could autogenerate based on content and parser?

function parse_blocks( $content ) {
        $parser_class = apply_filters( 'block_parser_class', 'WP_Block_Parser' );
        $cache_key    = sprintf( 'block_%s_%s', md5( $content ), $parser_class );

        $cached = wp_cache_get( $cache_key, 'parsed_blocks' );
        if ( $cached ) {
                return $cached;
        }

        $parser = new $parser_class();
        $parsed = $parser->parse( $content );

        wp_cache_add( $cache_key, $parsed, 'parsed_blocks' );

        return $parsed;
}

Are we trying to only turn on caching when we think or know we will need it?

Also, at this point is there a reason not to create a PR in the Github repo?

Thanks! This is great having more help on this area.

#4 @desrosj
5 years ago

In the current state of the patch, a cached empty string would not be returned and would always trigger parsing. if ( false !== $cached ) should be used if an empty string should be returned when cached.

This ticket was mentioned in Slack in #hosting-community by joostdevalk. View the logs.


5 years ago

#6 @francina
3 years ago

  • Keywords early needs-refresh added; has-patch removed
  • Milestone changed from Awaiting Review to 5.8
  • Owner set to francina
  • Status changed from new to assigned

#7 @francina
3 years ago

  • Milestone changed from 5.8 to Future Release

This is not going to make it on time for the feature freeze cutoff date, May 25, 2021. Moving to future release.

#8 @desrosj
3 years ago

Was just thinking this through a bit, and I have a few concerns.

First, I think that if caching is added for block parsing, it should be on by default, and WordPress should manage cache keys automatically.

Second, I think we have to work under the assumption that the parser changes the output. I lean towards @dmsnell's example above, but it doesn't tell us if the parsing will change the block. One potential option around this is to use the content before parsing to create the md5() hash, and the content after parsing for the stored cache value. As long as the pre-parsed content is unchanged, and the parsing class remains the same, the output content should be the same.

I'm also wondering if it's better to delegate the caching responsibility to the parsing classes (WP_Block_Parser by default). Different parsing classes may want to cache blocks more or less aggressively. Are there any good examples of custom block parsing classes in the wild? I think it may help to review those.

#9 follow-up: @dmsnell
3 years ago

Second, I think we have to work under the assumption that the parser changes the output. I lean towards @dmsnell's example above, but it doesn't tell us if the parsing will change the block. One potential option around this is to use the content before parsing to create the md5() hash, and the content after parsing for the stored cache value. As long as the pre-parsed content is unchanged, and the parsing class remains the same, the output content should be the same.

I'm confused on what you are saying @desrosj - I think what you list as a "potential option" is what I proposed in my comment earlier, though "doesn't tell us if the parsing with change the block" is still unclear. Just to be clear, we're not looking at rendered output in this function but only a transformation from the serialized HTML into the block objects that are fed into the renderer. $content in this case is "the pre-parsed content."

a cached empty string would not be returned and would always trigger parsing

Good catch! Type errors strike again. Practically speaking though this wouldn't really matter since it probably takes about as much time to parse an empty document as it does to pull it from cache. Caching empty documents also seems wrong; we could early-abort if the input document is blank.

I'm also wondering if it's better to delegate the caching responsibility to the parsing classes

This is a great idea and one of the reasons we added in the filterable parser. That being said, caching the parse generally seems like a fair option to stick in as the default. This would break behaviors if a custom parser class introduced non-determinism in its parse, possibly through something like random ids or time-based parsing, or *gasp* through database lookups. This isn't the place to perform those side-effects but someone will probably do it.

If we want to demonstrate a custom caching parser class we could wrap the default behavior and it might serve as an example of how to wrap the parser.

<?php
class WP_Caching_Block_Parser {
    public function construct( $parser_class = 'WP_Block_Parser' ) {
       $this->parser_class = $parser_class;
    }

    public function parse( $content ) {
        if ( ! $content ) {
            return [];
        }

        $cache_key    = sprintf( 'block_%s_%s', md5( $content ), $this->parser_class );

        $cached = wp_cache_get( $cache_key, 'parsed_blocks' );
        if ( $cached ) {
                return $cached;
        }

        $parser = new ($this->parser_class)();
        $parsed = $parser->parse( $content );
    }
}

function parse_blocks( $content ) {
    $parser_class = apply_filters( 'block_parser_class', 'WP_Caching_Block_Parser' );
    $parser = new $parser_class();
    return $parser->parse( $content );
}

Well I don't like what I just wrote in a permanent sense but the idea at least is close to what I want - default behavior is to cache all posts and use the default parser. If you have special needs then you can replace the whole thing, and if you want to keep the caching behavior around you can wrap your own class in this one.

Are there any good examples of custom block parsing classes in the wild? I think it may help to review those.

I haven't come across any.

#10 in reply to: ↑ 9 @desrosj
3 years ago

  • Keywords needs-unit-tests added; dev-feedback removed

Replying to dmsnell:

I think what you list as a "potential option" is what I proposed in my comment earlier, though "doesn't tell us if the parsing with change the block" is still unclear.

Ah, yes. I just reread your example and I think I misread it originally. We are suggesting the same thing!

#11 @mukesh27
2 years ago

  • Keywords needs-patch dev-feedback added; early needs-refresh removed
  • Milestone changed from Future Release to 6.1
  • Version set to 5.0

Hi there!

As Gutenberg grows we should consider this ticket and need to implement a cache function. @dmsnell can you please create a Patch for changes?

As per the last comment, @desrosj do you have any feedback on this? putting on the 6.1 milestone. It can always be moved off if it requires additional work/testing.

#12 @dmsnell
2 years ago

@mukesh27 it's been a while since looking at this. I'm wondering if we know if we need it. Do we have evidence that parsing the blocks is causing performance issues? For most posts the parsing stage is still extremely fast. I worry that introducing it preemptively in Core could have negative side-effects and possibly slow down typical page views.

The option to build one's own caching parser via a WordPress plugin remains and is a good option for handling special cases where for some reason the parsing is measurably slow.

#13 @mehulkaklotar
21 months ago

  • Keywords needs-unit-tests needs-patch removed

I ran some tests around caching the parsed blocks, results were actually not what I thought would be. In most of my results, cached blocks took more time compare to just parsing them straight with parse functions and return them. I would suggest to not cache it until we can do it more fast with core. The Block Parser class can override by developer if they want to cache their own block with separate class.

https://prnt.sc/Lpid4y2Ucq_c

Version 0, edited 21 months ago by mehulkaklotar (next)

#14 @mxbclang
20 months ago

  • Milestone changed from 6.1 to Future Release

Moving to Future Release as the Performance Team is happy to work on this but we're awaiting feedback on next steps.

#15 @tillkruess
14 months ago

@mxbclang What's the next step here? Benchmarking the impact of caching these? @spacedmonkey any thoughts on this?

#16 @spacedmonkey
14 months ago

I am not sure how this work. With dynamic blocks and a number of filters with blocks, I can't see how it would be cached. How would cache invalidation work? What if you have a block that access data from the database, like latest posts or something and that changes.

If we are going to explore this, I would recommend the following path.

Filter block_parser_class and override WP_Block_Parser class.
Create your own class that extends WP_Block_Parser and add caching there.

This way you can explore options in a feature plugin before anything gets to core. A feature plugin would need until tests that validate cache invalidation.

Note: See TracTickets for help on using tickets.