Make WordPress Core

Opened 20 months ago

Closed 8 weeks ago

Last modified 5 days ago

#61401 closed feature request (fixed)

Blocks: Efficiently find and traverse blocks in a document.

Reported by: dmsnell's profile dmsnell Owned by: dmsnell's profile dmsnell
Milestone: 6.9 Priority: normal
Severity: normal Version:
Component: Editor Keywords: has-patch has-unit-tests commit dev-reviewed
Focuses: Cc:

Description

The existing block parser reliably parses blocks, but it's also a heavy operation, involving a full parse of the entire document, the construction of a full block tree which includes multiple copies of different spans of the source content, and the JSON-parsing of every span of block attributes.

In many situations it's only necessary to find specific blocks within a document, or find where they start or end, or build a map of document structure without needing the full parse.

WordPress should provide a reliable and efficient way to traverse the Blocks in an HTML document in a streaming manner.

Change History (52)

This ticket was mentioned in PR #6760 on WordPress/wordpress-develop by @dmsnell.


20 months ago
#1

  • Keywords has-patch added

Trac ticket: Core-61401

In this patch two new functions are introduced for the purpose of returning a PCRE pattern that can be used to quickly and efficiently find blocks within an HTML document without having to parse the entire document and without building a full block tree.

These new functions enable more efficient processing for work that only needs to examine document structure or know a few things about a document without knowing everything, including but not limited to:

  • Finding the URL of the first image block in a document.
  • Inserting hooked blocks.
  • Analyzing block counts.

This ticket was mentioned in PR #6760 on WordPress/wordpress-develop by @dmsnell.


7 months ago
#2

Trac ticket: Core-61401.
Theme: _Everything could be streaming, single-pass, reentrant, and low-overhead._

## Todo

  • [ ] While this is developed here, any changes to the Block Parser need to coordinate with the package in the Gutenberg repository.

## Breaking changes

  • When encountering a block delimiter with a closing flag and also a void flag, the existing parser prefers returning as a void block, but this returns the block closer. This is an edge case when things are already erroneous, but it makes more sense to me when writing this that we should prefer closing to introducing a void, as the void flag is more likely to be a mistake, and because if we treat a closer as a void we could lead to deep chains of unclosed blocks. This is something I'd like to re-examine as a whole with the block parsing, taking lessons from HTML's stack machine, but not in this change (for example, treat it as a closer if there's an open block of the given name).

## Related

  • Core-45312 where whitespace-only blocks are surprising with parse_blocks()

## Summary

In this patch two new functions are introduced for the purpose of returning a PCRE pattern that can be used to quickly and efficiently find blocks within an HTML document without having to parse the entire document and without building a full block tree.

These new functions enable more efficient processing for work that only needs to examine document structure or know a few things about a document without knowing everything, including but not limited to:

  • Finding the URL of the first image block in a document.
  • Inserting hooked blocks.
  • Analyzing block counts.

Further, a new class is introduced to further manage the process of finding block comment delimiters, one based on a hand-crafted parser designed for high performance: WP_Parsed_Block_Delimiter_Info.

This class provides a number of conveniences:

  • It performs zero allocations beyond a static set of numeric indices.
  • It holds onto the reference of the text it scanned, but can be detached to release that text. When detaching, it creates a substring of the text containing the full delimiter match.
  • It can indicate if the delimiter is for a given block type without performing any allocations.
  • It returns a lazy JSON parser by default for the attributes (not implemented yet) for more efficient interaction with the block attributes.
  • Inasmuch as is possible, all costs are explicit and only paid when requested by the calling code.

https://github.com/WordPress/wordpress-develop/assets/5431237/aa7b122c-30ad-469e-83a1-c8b32a57bc1f

## Example

// Get the first image in a post with the PCRE pattern.
while ( 1 === preg_match( get_named_block_delimiter_regex( 'image' ), $post_content, $matches, null, $at ) ) {
        if ( '/' === $matches['closer'] ) {
                $at += strlen( $matches[0] );
                continue;
        }
        
        $attrs = json_parse( $matches['attrs'] );
        if ( isset( $attrs['url'] ) ) {
                return $attrs['url'];
        }
}

return null;
// Get the first image in a post with the utility class.
$image = null;
$at    = 0;
while ( ! isset( $image ) ) {
        $image = WP_Parsed_Block_Delimiter_Info::next_delimiter( $post_content, $at, $next_delimiter_at, $next_delimiter_length );
        if (
                'opener' === $image->get_delimiter_type() &&
                $image->is_block_type( 'core/image' )
        ) {
                break;
        }

        $image = null;
        $at    = $next_delimiter_at + $next_delimiter_length;
}

This ticket was mentioned in PR #9105 on WordPress/wordpress-develop by @dmsnell.


7 months ago
#3

  • Keywords has-unit-tests added

Replaces #6760

Trac ticket: Core-61401

The Block Scanner follows the HTML API in providing a streaming,
near-zero-overhead, lazy, re-entrant parser for traversing block
structure. This class provides an alternate interface to
parse_blocks() which is more amenable to a number of common
server-side operations on posts, such as:

  • Generating an excerpt from only the first N blocks in a post.
  • Determining which block types are present in a post.
  • Determining which posts contain a block of a given type.
  • Generating block supports content for a post.

#4 @dmsnell
7 months ago

With over a year in the meantime and significant learning along the way, I am introducing a new proposal for a WP_Block_Scanner class, which achieves the goal of this ticket.

#5 @Mamaduka
5 months ago

Just finished a quick read on unit tests, and I like what I'm seeing. Fantastic work, @dmsnell!

I really liked the examples for the initial PR. Maybe we should include similar examples for WP_Block_Scanner and showcase the real world applications for core/extenders.

This ticket was mentioned in PR #9841 on WordPress/wordpress-develop by @dmsnell.


4 months ago
#6

Trac ticket: Core-61401
See: #9105

Adds a higher-level sibling to the Block Scanner class, providing block semantics over block delimiter traversal.

This ticket was mentioned in PR #9842 on WordPress/wordpress-develop by @dmsnell.


4 months ago
#7

Trac ticket: Core-61401
See: #9105, #9841, (#9842), #9843

This ticket was mentioned in PR #9843 on WordPress/wordpress-develop by @dmsnell.


4 months ago
#8

Trac ticket: Core-61401
See: #9105, #9841, #9842, (#9843)

@dmsnell commented on PR #9841:


4 months ago
#9

Merged into a singular WP_Block_Scanner class in https://github.com/WordPress/wordpress-develop/pull/9105

@gziolo commented on PR #9105:


3 months ago
#10

Can you elaborate on your thinking process for splitting into WP_Block_Scanner and WP_Block_Processor? I'm not sure whether the latter contains anything unique at this point.

@dmsnell commented on PR #9105:


3 months ago
#11

thinking process for splitting into WP_Block_Scanner and WP_Block_Processor?

this is great feedback here because I don’t have a solid feel for splitting or combining them. the core of it is the question of intent: does calling code want to scan the delimiter's or does the calling code want to navigate through the blocks?

this is the same with the HTML API:

  • the Tag Processor only visits real textual tokens, and it visits tokens which may be ignored, ones which may be ignored contextually even, when parsed as HTML.
  • the HTML Processor visits tokens that aren’t in the text itself, implicit openers and closers defined by the parser. it also renames a tag and performs auto-closing.

now with blocks there is far less that the higher-level cousin needs to do:

  • auto-close blocks when parsing errors are present
  • separate innerHTML from freeform HTML content
  • create block structure as parse_blocks() would produce, which _could_ be built out of the class too

the performance cliff between these two is substantially smaller than with HTML tokenization and parsing. is it worth the separation or not? is the intent-cliff shallow enough? right now I think it’s at least working through the new questions that arise from a single class to see if it seems to fall into place once done, but I’m not ruling out separating them again.

@dmsnell commented on PR #9105:


3 months ago
#12

@gziolo something I was thinking about on my walk this morning was an idea I’ve ignored for a while, but maybe is worth it here: by default we skip freeform HTML content. the way we could signal to visit it is not with a boolean attribute, but using a special block type in next_delimiter().

// visit all explicit block delimiters
$scanner->next_delimiter();

// visit all block delimiters plus freeform blocks.
$scanner->next_delimiter( '*' );

// visit all freeform blocks: a possible special token meaning “no explicit block”
$scanner->next_delimiter( '#freeform' );

// visit all HTML spans, including `innerHTML`
$scanner->next_token();

this would collapse the difference between next_token() and next_delimiter() to whether they visit inner HTML. for any performance ideas about visiting freeform blocks, we could indicate that from within next_delimiter() and keep it internal.

I will try and explore this angle.

@gziolo commented on PR #9105:


3 months ago
#13

// visit all freeform blocks: a possible special token meaning “no explicit block”
$scanner->next_delimiter( '#freeform' );

We have this concept of freeform content handler in JavaScript which you can even modify with wp.blocks.setFreeformContentHandlerName and it defaults to core/freeform. I don't think we ever did anything like that on the server, so either #freeform or core/freeform resonates with me. Maybe folks could even be able to set this alias because I noticed that, for example, in the widgets editor, the freeform handler gets set to core/html. It probably is safer to go more meta with #freeform or ::freeform (like CSS pseudo-elements).

@dmsnell commented on PR #9105:


3 months ago
#14

We have this concept of freeform content handler in JavaScript which you can even modify with wp.blocks.setFreeformContentHandlerName and it defaults to core/freeform. I don't think we ever did anything like that on the server

That’s something I wish had been more hard-coded and less extensible, based on my work on the two-stage block parsing and loading inside the block editor, both in WordPress and in Tumblr. But I guess JS is different because we have to load _something_ into the editor, whereas on the server a _non-block_ can remain _not a block_.

Since yesterday I’ve been using * and deferring an approach which visits only the HTML spans. Something that makes me nervous about using #freeform is that we have a relatively short timeline if we want to get this in for 6.9 and properly choosing a name for those seems rushed. It would be an easy thing to add later, I would hope. Something I didn’t mention is why I proposed that syntax:

  • block names cannot start with or have the octothorp character anywhere
  • this mirrors the #text and other _token type_ descriptions in the HTML API

When I left off last night I was working on my extract_block() method, whose tests were failing. I think there is still some ambiguity in the code around the distinction between innerHTML and freeform blocks. What I plan on doing today is to try and do something I’ve been avoiding because of the increased internal state requirements: track HTML spans and pause when we match them. Currently this has all worked by recognizing HTML spans only after matching the next delimiter. With a special state value we can then _visit_ the HTML and move on without re-scanning. There are some complications doing this with incomplete inputs as well, so it’s probably good and time to make this change.

Hopefully that clears up all of the remaining issues.

@dmsnell commented on PR #9105:


3 months ago
#15

This has been updated significantly to clarify the interface and to merge the block scanner and block processor into one.

It would be really helpful to have review over the public methods, their docblocks, and the usage in the test suite.

I plan on going back to update the class/module docblock with more substantial general guidance for the class.

Also I plan on doing a separate type annotation pass, so please disregard any comments currently on type annotations.

@dmsnell commented on PR #9105:


3 months ago
#16

With significant haste in the eleventh hour I have finished making some major changes to this proposal:

  • There is now only a single WP_Block_Processor class (thanks @swissspidy for renaming the PR).
  • This singular class differentiates between freeform content and inner HTML, making it possible for calling code to inquire which is which.
    • A new block type wildcard * can be passed to the next_ functions to ensure that they visit top-level freeform content.
  • The docs have all been updated, names have been changed, and type annotations have been added.
  • The processor treats HTML spans like void blocks; previously it created a separate opener and closer.
  • Almost all references to static:: have been replaced by self::. After consideration, as a default first step I think it would be wise to treat most of a this class as final and discourage subclasses from reimplementing methods like next_token() the way subclasses do in the HTML API. Without _needing_ a higher-level subclass this makes it easier.
    • The exception are a few methods intended to be replaced by subclasses, including the missing lazy JSON parser for static::get_attributes().

Personally I’m quite happy with this revision and I thank you all for your input on this. I think it’s in a much more coherent and usable state now. At this point I believe it’s ready for final review and merge, though I think it would be best only to block merging now for design issues with the class, the methods, the things which must be maintained _ad infinitum_. For other issues, I would prefer if we can address them during the beta phase after merging.

Of course, I invite all feedback and will address things if I can in a timely manner, but right now I would love to see this out there in trunk to get it in use and testing.

#17 @dmsnell
3 months ago

  • Owner set to dmsnell
  • Resolution set to fixed
  • Status changed from new to closed

In 60939:

Blocks: Introduce WP_Block_Processor for efficiently parsing blocks.

The Block Processor follows the HTML API in providing a streaming, near-zero-overhead, lazy, re-entrant parser for traversing block structure. This class provides an alternate interface to parse_blocks() which is more amenable to a number of common server-side operations on posts, especially those needing to operate on only a part of a full post.

Developed in https://github.com/WordPress/wordpress-develop/pull/9105
Discussed in https://core.trac.wordpress.org/ticket/61401

Props dmsnell, gziolo, jonsurrell, soean, tjnowell, westonruter.
Fixes #61401.

This ticket was mentioned in PR #6760 on WordPress/wordpress-develop by @dmsnell.


3 months ago
#18

Trac ticket: Core-61401.
Theme: _Everything could be streaming, single-pass, reentrant, and low-overhead._

## Todo

  • [ ] While this is developed here, any changes to the Block Parser need to coordinate with the package in the Gutenberg repository.

## Breaking changes

  • When encountering a block delimiter with a closing flag and also a void flag, the existing parser prefers returning as a void block, but this returns the block closer. This is an edge case when things are already erroneous, but it makes more sense to me when writing this that we should prefer closing to introducing a void, as the void flag is more likely to be a mistake, and because if we treat a closer as a void we could lead to deep chains of unclosed blocks. This is something I'd like to re-examine as a whole with the block parsing, taking lessons from HTML's stack machine, but not in this change (for example, treat it as a closer if there's an open block of the given name).

## Related

  • Core-45312 where whitespace-only blocks are surprising with parse_blocks()

## Summary

In this patch two new functions are introduced for the purpose of returning a PCRE pattern that can be used to quickly and efficiently find blocks within an HTML document without having to parse the entire document and without building a full block tree.

These new functions enable more efficient processing for work that only needs to examine document structure or know a few things about a document without knowing everything, including but not limited to:

  • Finding the URL of the first image block in a document.
  • Inserting hooked blocks.
  • Analyzing block counts.

Further, a new class is introduced to further manage the process of finding block comment delimiters, one based on a hand-crafted parser designed for high performance: WP_Parsed_Block_Delimiter_Info.

This class provides a number of conveniences:

  • It performs zero allocations beyond a static set of numeric indices.
  • It holds onto the reference of the text it scanned, but can be detached to release that text. When detaching, it creates a substring of the text containing the full delimiter match.
  • It can indicate if the delimiter is for a given block type without performing any allocations.
  • It returns a lazy JSON parser by default for the attributes (not implemented yet) for more efficient interaction with the block attributes.
  • Inasmuch as is possible, all costs are explicit and only paid when requested by the calling code.

https://github.com/WordPress/wordpress-develop/assets/5431237/aa7b122c-30ad-469e-83a1-c8b32a57bc1f

## Example

// Get the first image in a post with the PCRE pattern.
while ( 1 === preg_match( get_named_block_delimiter_regex( 'image' ), $post_content, $matches, null, $at ) ) {
        if ( '/' === $matches['closer'] ) {
                $at += strlen( $matches[0] );
                continue;
        }
        
        $attrs = json_parse( $matches['attrs'] );
        if ( isset( $attrs['url'] ) ) {
                return $attrs['url'];
        }
}

return null;
// Get the first image in a post with the utility class.
$image = null;
$at    = 0;
while ( ! isset( $image ) ) {
        $image = WP_Parsed_Block_Delimiter_Info::next_delimiter( $post_content, $at, $next_delimiter_at, $next_delimiter_length );
        if (
                'opener' === $image->get_delimiter_type() &&
                $image->is_block_type( 'core/image' )
        ) {
                break;
        }

        $image = null;
        $at    = $next_delimiter_at + $next_delimiter_length;
}

This ticket was mentioned in PR #10291 on WordPress/wordpress-develop by @dmsnell.


3 months ago
#19

Trac ticket: Core-61401
See: ~#9105~, ~#9841~, (~#9842~), ~#9843~

This ticket was mentioned in PR #10292 on WordPress/wordpress-develop by @dmsnell.


3 months ago
#20

Trac ticket: Core-61401
See: ~#9105~, ~#9841~, (~#9842~), ~#9843~

#21 @dmsnell
3 months ago

  • Milestone changed from Awaiting Review to 6.9
  • Resolution fixed deleted
  • Status changed from closed to reopened

This ticket was mentioned in Slack in #core by jorbin. View the logs.


3 months ago

#23 @jorbin
3 months ago

  • Resolution set to fixed
  • Status changed from reopened to closed

Since the main work for this was completed with [60939], i am remarking this as fixed. I think it could be worth a follow-up task ticket for the new PRs, but I'll defer to the 6.9 sqaud of if that is appropriate.

This ticket was mentioned in Slack in #core by dmsnell. View the logs.


2 months ago

#25 @dmsnell
2 months ago

@dlh raised a question about the naming of extract_block(), how it surprised him that it advanced. after some brief discussion in Slack we considered it might be rather urgent to address this before WordPress 6.9 ships, at which point it will be much more difficult to rename the method later.

One suggestion is extract_full_block_and_advance() as a way of communicating that the Block Processor steps forward entirely past the block being extracted.

If anyone has thoughts on this (including whether a change should be made right now or what name it should be changed to) please share ASAP. Thank you everyone!

This ticket was mentioned in Slack in #core by dmsnell. View the logs.


8 weeks ago

This ticket was mentioned in PR #10538 on WordPress/wordpress-develop by @dmsnell.


8 weeks ago
#27

Trac ticket: Core-61401

In testing during the release candidacy for WordPress 6.9 it was found that the extract_block() method may do more work than is expected based off of its name.

This change renames the method to extract_full_block_and_advance() to communicate that it does move the Block Processor forward and to hint at the fact that it also encompasses all inner blocks during that advance.

#28 @dmsnell
8 weeks ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

This ticket was mentioned in Slack in #core by wildworks. View the logs.


8 weeks ago

#30 @wildworks
8 weeks ago

This ticket was featured in today's 6.9 Bug Scrub.

@dlh raised a question about the naming of extract_block(), how it surprised him that it advanced. after some brief discussion in Slack we considered it might be rather urgent to address this before WordPress 6.9 ships, at which point it will be much more difficult to rename the method later.

I think we need to know what was actually discussed in this regard, so I'd like to mention two Slack links:

@jonsurrell commented on PR #10538:


8 weeks ago
#31

I've approved this PR. I should note that this assumes it will be backported to 6.9.

If, for some reason, it's not included in 6.9 then the previous method would need to be restored as a deprecated method.

This ticket was mentioned in Slack in #core by welcher. View the logs.


8 weeks ago

#33 @welcher
8 weeks ago

  • Keywords commit added

This was discussed in the 6.9 bug scrub. It's ready for commit and there will need to be a backport.

#34 @dmsnell
8 weeks ago

In 61294:

Block Processor: Rename extract_block() method for clearer documentation.

In testing during the release candidacy for WordPress 6.9 it was found
that the extract_block() method may do more work than is expected
based off of its name.

This change renames the method to extract_full_block_and_advance() to
communicate that it does move the Block Processor forward and to hint at
the fact that it also encompasses all inner blocks during that advance.

Developed in https://github.com/WordPress/wordpress-develop/pull/10538
Discussed in https://core.trac.wordpress.org/ticket/61401

Follow-up to [60939].

Props dlh, dmsnell, jonsurrell, westonruter.

See #61401.

#35 @dmsnell
8 weeks ago

  • Keywords dev-feedback added; commit removed

Resetting for back porting [61294] into the 6.9 branch.

This ticket was mentioned in Slack in #core by dmsnell. View the logs.


8 weeks ago

#38 @desrosj
8 weeks ago

  • Keywords commit dev-reviewed added; dev-feedback removed

[61294] looks good to backport. Thanks everyone!

#39 @dmsnell
8 weeks ago

In 61296:

Block Processor: Rename extract_block() method for clearer documentation.

In testing during the release candidacy for WordPress 6.9 it was found
that the extract_block() method may do more work than is expected
based off of its name.

This change renames the method to extract_full_block_and_advance() to
communicate that it does move the Block Processor forward and to hint at
the fact that it also encompasses all inner blocks during that advance.

Developed in https://github.com/WordPress/wordpress-develop/pull/10538
Discussed in https://core.trac.wordpress.org/ticket/61401

Follow-up to [60939].

Reviewed by desrosj, jonsurrell.
Merges [61294] to the 6.9 branch.

Props dlh, dmsnell, jonsurrell, westonruter.

See #61401.

#40 @dmsnell
8 weeks ago

  • Resolution set to fixed
  • Status changed from reopened to closed

@dmsnell commented on PR #10291:


10 days ago
#41

from Slack, @carolinan writes,

The theme can’t rely on a class added in 6.9 without a version check, or there will be a fatal error.
The minimum WP version the theme needs to work in is 5.3.

@dmsnell commented on PR #10291:


10 days ago
#42

This is turning out to be something I’m not sure I can adequately test or carry over the finish line, but I think it’s ready to go otherwise. Please ping me if you would like to help with keeping it updated.

@dmsnell commented on PR #10292:


10 days ago
#43

@westonruter I’ve updated this against trunk and incorporated your feedback. I’m not sure how to adequately test it, but given that I think the logic is right I am inclined to merge this and get as much organic testing as can be done during the time before 7.0.

What do you think?

@sabernhardt commented on PR #10291:


10 days ago
#44

  1. If this moves forward, refactoring code in a theme would require creating a separate enhancement ticket in the Bundled Theme component (with "Twenty Twenty-One: " at the beginning of the summary).
  2. The theme versions do not have a third number; the next version of T21 will be 2.8.
  3. The minimum version of WordPress probably will _never_ change for these themes.
  4. Wouldn't it be simpler to reuse existing variable names within one function instead of creating two versions of the same function?
    function twenty_twenty_one_print_first_instance_of_block( $block_name, $content = null, $instances = 1 ) {
    	$instances_count = 0;
    	$blocks_content  = '';
    
    	if ( ! $content ) {
    		$content = get_the_content();
    	}
    
    	if ( class_exists( 'WP_Block_Processor' ) ) {
    		$processor = new WP_Block_Processor( $content );
    
    		// Scan for blocks whose block type matches the prefix.
    		$prefix      = rtrim( $block_name, '*' );
    		$match_fully = $prefix === $block_name;
    
    		// Loop over top-level blocks.
    		while (
    			$processor->next_block() &&
    			null !== ( $block = $processor->extract_full_block_and_advance() ) &&
    			$instances_count < $instances
    		) {
    			if (
    				/*
    				 * Prefix matches with a wildcard require printing the block name,
    				 * while full block-type matching can be delegated to the processor.
    				 * In each case, the condition only holds when the match is successful.
    				 */
    				$match_fully
    					? $processor->is_block_type( $block_name )
    					: str_starts_with( $processor->get_printable_block_type(), $prefix )
    			) {
    				$blocks_content .= render_block( $block );
    				++$instances_count;
    			}
    		}
    	} else { // original parsing for sites using WordPress < 6.9
    		// Parse blocks in the content.
    		$blocks = parse_blocks( $content );
    
    		// Loop blocks.
    		foreach ( $blocks as $block ) {
    
    			// Confidence check.
    			if ( ! isset( $block['blockName'] ) ) {
    				continue;
    			}
    
    			// Check if this the block matches the $block_name.
    			$is_matching_block = false;
    
    			// If the block ends with *, try to match the first portion.
    			if ( '*' === $block_name[-1] ) {
    				$is_matching_block = 0 === strpos( $block['blockName'], rtrim( $block_name, '*' ) );
    			} else {
    				$is_matching_block = $block_name === $block['blockName'];
    			}
    
    			if ( $is_matching_block ) {
    				// Increment count.
    				++$instances_count;
    
    				// Add the block HTML.
    				$blocks_content .= render_block( $block );
    
    				// Break the loop if the $instances count was reached.
    				if ( $instances_count >= $instances ) {
    					break;
    				}
    			}
    		}
    	}
    
    	if ( $blocks_content ) {
    		/** This filter is documented in wp-includes/post-template.php */
    		echo apply_filters( 'the_content', $blocks_content ); // phpcs:ignore WordPress.Security.EscapeOutput
    		return true;
    	}
    
    	return false;
    }
    

@dmsnell commented on PR #10291:


10 days ago
#45

thanks @sabernhardt!

creating a separate enhancement ticket in the Bundled Theme component (with "Twenty Twenty-One: " at the beginning of the summary).

This needs a new ticket anyway, as it’s currently linked to work from within 6.9. I’ll try and do this, but someone will want to review as I’ve not done it before.

The theme versions do not have a third number; the next version of T21 will be 2.8.

So many things I’m learning about themes 😄 — will update.

The minimum version of WordPress probably will never change for these themes.
Wouldn't it be simpler to reuse existing variable names within one function instead of creating two versions of the same function?

I’m not sure on which metric of simpler this would be anything but a subjective call. My instinct was to split the functions because once we know that the class exists, there’s never any reason to perform runtime checks.

What I think is potentially more ideal would be adding compatibility files which conditionally require the newer implementations, but I think generally folks don’t like that as much.

This idiom does not seem that common and has its drawbacks so I’m happy to undo it, but I think the simplicity of shared variable names is outweighed by the simplicity of having two focused versions of the function. Perhaps the best thing to do here is create two new functions, which I had considered, but decided against in order to avoid namespace pollution.

@dmsnell commented on PR #10291:


10 days ago
#46

@sabernhardt in 9d100de I made the following changes:

  • Unified the two functions, largely because of the mention that the old one will never go away via minimum-version-bump. These are less independently testable, but I think automated testing is already lost on this anyway 🤷‍♂️.
  • Rearranged some code from the legacy function to mirror the new code. This change could have been done independently but it makes more sense to me to do it for the symmetry with the new approach. I’m happy to remove this change if it’s unwanted (same goes for my little update to use array_shift(); that’s slightly more memory efficient because it frees up the parsed blocks as they are processed or rendered but not essential. it _can_ make a substantial impact for very large posts and for posts with very deep cascades of nested blocks though - see also #8999)

Further, I have created the Trac ticket in the bundled-themes component and linked it with this PR.

Thanks a million for your help!

@westonruter commented on PR #10292:


9 days ago
#47

Since you're touching this function, maybe it would be a good opportunity to improve typing for PHPStan level 8:

  • src/wp-includes/media.php

    diff --git a/src/wp-includes/media.php b/src/wp-includes/media.php
    index 436ee97d59..8e153f37e8 100644
    a b function get_media_embedded_in_content( $content, $types = null ) { 
    52515251 * @param int|WP_Post $post          Post ID or object.
    52525252 * @param bool        $html          Optional. Whether to return HTML or data in the array. Default true.
    52535253 * @param int         $max_galleries Optional. Only collect up to this many galleries. Default unlimited.
    5254  * @return array A list of arrays, each containing gallery data and srcs parsed
    5255  *               from the expanded shortcode.
     5254 * @return array<array<string, mixed>> A list of arrays, each containing gallery data and srcs parsed
     5255 *                                     from the expanded shortcode.
    52565256 */
    52575257function get_post_galleries( $post, $html = true, $max_galleries = PHP_INT_MAX ) {
    52585258        $post = get_post( $post );
    52595259
    5260         if ( ! $post ) {
     5260        if ( ! ( $post instanceof WP_Post ) ) {
    52615261                return array();
    52625262        }
    52635263
    function get_post_galleries( $post, $html = true, $max_galleries = PHP_INT_MAX ) 
    53175317                $top_chunks  = array();
    53185318                while ( $processor->next_token() && $processor->get_depth() > $gallery_depth ) {
    53195319                        if ( $processor->is_html() ) {
    5320                                 $chunk_span    = $processor->get_span();
     5320                                $chunk_span = $processor->get_span();
     5321                                if ( ! ( $chunk_span instanceof WP_HTML_Span ) ) {
     5322                                        continue;
     5323                                }
    53215324                                $chunk         = substr( $post->post_content, $chunk_span->start, $chunk_span->length );
    53225325                                $html_chunks[] = $chunk;
    53235326                                if ( $gallery_depth === $processor->get_depth() - 1 ) {
    function get_post_galleries( $post, $html = true, $max_galleries = PHP_INT_MAX ) 
    53355338                        // Get the rest of the innerHTML of the Gallery’s direct children.
    53365339                        while ( $processor->next_token() && $processor->get_depth() > $gallery_depth ) {
    53375340                                if ( $processor->is_html() && $processor->get_depth() === $gallery_depth + 2 ) {
    5338                                         $chunk_span    = $processor->get_span();
     5341                                        $chunk_span = $processor->get_span();
     5342                                        if ( ! ( $chunk_span instanceof WP_HTML_Span ) ) {
     5343                                                continue;
     5344                                        }
    53395345                                        $html_chunks[] = substr( $post->post_content, $chunk_span->start, $chunk_span->length );
    53405346                                }
    53415347                        }
    function get_post_galleries( $post, $html = true, $max_galleries = PHP_INT_MAX ) 
    54285434         *
    54295435         * @since 3.6.0
    54305436         *
    5431          * @param array  $galleries Associative array of all found post galleries.
    5432          * @param WP_Post $post      Post object.
     5437         * @param array<array<string, mixed>> $galleries Associative array of all found post galleries.
     5438         * @param WP_Post                     $post      Post object.
    54335439         */
    54345440        return apply_filters( 'get_post_galleries', $galleries, $post );
    54355441}

@dmsnell commented on PR #10292:


9 days ago
#48

array<array<string, mixed>>

@westonruter do we have typedefs available? or a good way to show that this isn’t just some generic array, but rather {ids?: string, src: string[]}?

@dmsnell commented on PR #10292:


9 days ago
#49

@westonruter the confusing test failures uncovered a bug in the WP_Block_Processor class! I’m fixing that in #10701 but incorporating it in this branch history so that we can verify test passage.

@westonruter commented on PR #10292:


9 days ago
#50

do we have typedefs available? or a good way to show that this isn’t just some generic array, but rather {ids?: string, src: string[]}?

Technically we won't until PHPStan is blessed, although there are cases of it being used:

ack  -Q 'array{' $(git ls-files | grep '.php' | grep -v wp-content | grep -v sodium | grep -v phpunit | grep -v SimplePie )
src/wp-includes/PHPMailer/PHPMailer.php
1341:     * @return array{name: string, email: string}

src/wp-includes/interactivity-api/class-wp-interactivity-api.php
124:     * @var array{attributes: array<string, string|bool>}|null
353:     * @return array{attributes: array<string, string|bool>}|null Current element.

src/wp-includes/interactivity-api/interactivity-api.php
136: * @return array{attributes: array<string, string|bool>}|null Current element.

src/wp-includes/l10n/class-wp-translation-controller.php
390:     * @return array{source: WP_Translation_File, entries: string[]}|false {

src/wp-includes/pomo/translations.php
300:             * @return array{0: int, 1: string}

I won't complain if you include them!!

@dmsnell commented on PR #10292:


9 days ago
#51

Now that the tests are passing, I’m going to next add some test cases verifying the $max_galleries parameter.

@westonruter I’ve rebased this, and will do so once again after merging the WP_Block_Processor fix; the history was getting large and I wanted to clear out the intermediate changes.

@jonsurrell commented on PR #10291:


5 days ago
#52

I'm not well versed in development practices around these Core themes.

I have reviewed the changes here and they seem sensible.
I did some manual testing on a personal site with this change on trunk and this change applied to the 6.8 branch. They appear to have the same behavior and no errors.

Note: See TracTickets for help on using tickets.