Opened 14 months ago
Last modified 3 weeks ago
#61401 new feature request
Blocks: Efficiently find and traverse blocks in a document.
Reported by: |
|
Owned by: | |
---|---|---|---|
Milestone: | Awaiting Review | Priority: | normal |
Severity: | normal | Version: | |
Component: | Editor | Keywords: | has-patch has-unit-tests |
Focuses: | Cc: |
Description
The existing block parser reliably parses blocks, but it's also a heavy operation, involving a full parse of the entire document, the construction of a full block tree which includes multiple copies of different spans of the source content, and the JSON-parsing of every span of block attributes.
In many situations it's only necessary to find specific blocks within a document, or find where they start or end, or build a map of document structure without needing the full parse.
WordPress should provide a reliable and efficient way to traverse the Blocks in an HTML document in a streaming manner.
Change History (4)
This ticket was mentioned in PR #6760 on WordPress/wordpress-develop by @dmsnell.
14 months ago
#1
- Keywords has-patch added
This ticket was mentioned in PR #6760 on WordPress/wordpress-develop by @dmsnell.
3 weeks ago
#2
Trac ticket: Core-61401.
Theme: _Everything could be streaming, single-pass, reentrant, and low-overhead._
## Todo
- [ ] While this is developed here, any changes to the Block Parser need to coordinate with the package in the Gutenberg repository.
## Breaking changes
- When encountering a block delimiter with a closing flag and also a void flag, the existing parser prefers returning as a void block, but this returns the block closer. This is an edge case when things are already erroneous, but it makes more sense to me when writing this that we should prefer closing to introducing a void, as the void flag is more likely to be a mistake, and because if we treat a closer as a void we could lead to deep chains of unclosed blocks. This is something I'd like to re-examine as a whole with the block parsing, taking lessons from HTML's stack machine, but not in this change (for example, treat it as a closer if there's an open block of the given name).
## Related
- Core-45312 where whitespace-only blocks are surprising with
parse_blocks()
## Summary
In this patch two new functions are introduced for the purpose of returning a PCRE pattern that can be used to quickly and efficiently find blocks within an HTML document without having to parse the entire document and without building a full block tree.
These new functions enable more efficient processing for work that only needs to examine document structure or know a few things about a document without knowing everything, including but not limited to:
- Finding the URL of the first image block in a document.
- Inserting hooked blocks.
- Analyzing block counts.
Further, a new class is introduced to further manage the process of finding block comment delimiters, one based on a hand-crafted parser designed for high performance: WP_Parsed_Block_Delimiter_Info
.
This class provides a number of conveniences:
- It performs zero allocations beyond a static set of numeric indices.
- It holds onto the reference of the text it scanned, but can be detached to release that text. When detaching, it creates a substring of the text containing the full delimiter match.
- It can indicate if the delimiter is for a given block type without performing any allocations.
- It returns a lazy JSON parser by default for the attributes (not implemented yet) for more efficient interaction with the block attributes.
- Inasmuch as is possible, all costs are explicit and only paid when requested by the calling code.
## Example
// Get the first image in a post with the PCRE pattern.
while ( 1 === preg_match( get_named_block_delimiter_regex( 'image' ), $post_content, $matches, null, $at ) ) {
if ( '/' === $matches['closer'] ) {
$at += strlen( $matches[0] );
continue;
}
$attrs = json_parse( $matches['attrs'] );
if ( isset( $attrs['url'] ) ) {
return $attrs['url'];
}
}
return null;
// Get the first image in a post with the utility class.
$image = null;
$at = 0;
while ( ! isset( $image ) ) {
$image = WP_Parsed_Block_Delimiter_Info::next_delimiter( $post_content, $at, $next_delimiter_at, $next_delimiter_length );
if (
'opener' === $image->get_delimiter_type() &&
$image->is_block_type( 'core/image' )
) {
break;
}
$image = null;
$at = $next_delimiter_at + $next_delimiter_length;
}
This ticket was mentioned in PR #9105 on WordPress/wordpress-develop by @dmsnell.
3 weeks ago
#3
- Keywords has-unit-tests added
Replaces #6760
Trac ticket: Core-61401
The Block Scanner follows the HTML API in providing a streaming,
near-zero-overhead, lazy, re-entrant parser for traversing block
structure. This class provides an alternate interface to
parse_blocks()
which is more amenable to a number of common
server-side operations on posts, such as:
- Generating an excerpt from only the first N blocks in a post.
- Determining which block types are present in a post.
- Determining which posts contain a block of a given type.
- Generating block supports content for a post.
Trac ticket: Core-61401
In this patch two new functions are introduced for the purpose of returning a PCRE pattern that can be used to quickly and efficiently find blocks within an HTML document without having to parse the entire document and without building a full block tree.
These new functions enable more efficient processing for work that only needs to examine document structure or know a few things about a document without knowing everything, including but not limited to: