WordPress.org

Make WordPress Core

Opened 4 months ago

Closed 5 weeks ago

#46133 closed defect (bug) (fixed)

Parse blocks to generate the excerpt

Reported by: gziolo Owned by: pento
Milestone: 5.2 Priority: normal
Severity: normal Version: 5.0
Component: Formatting Keywords: has-patch has-unit-tests
Focuses: Cc:

Description

Moved from GitHub: https://github.com/WordPress/gutenberg/pull/11704

Originally authored by @pento:

Now that https://github.com/WordPress/gutenberg/pull/11141 has landed, we can use the same logic when generating an excerpt, which allows us to deprecate get_dynamic_blocks_regex().

By using a list of allowed blocks, we can avoid the other issue that strip_dynamic_blocks() is intended to address, when a dynamic block can call get_the_excerpt(), causing an infinite loop.

Out of interest, https://github.com/WordPress/gutenberg/commit/198c82f0323a433eaea33f63a3fb94866be5ba32 is an alternate method, which just tweaks strip_dynamic_blocks() to use the parser, but I think this is the better method.

It should be recreated in core based on the referenced Pull Request.

Attachments (3)

46133.diff (2.8 KB) - added by desrosj 7 weeks ago.
46133.2.diff (3.2 KB) - added by pento 7 weeks ago.
46133.3.diff (5.6 KB) - added by azaozz 5 weeks ago.

Download all attachments as: .zip

Change History (15)

#1 @gziolo
4 months ago

  • Owner set to pento
  • Status changed from new to assigned

This comment from GitHub seems to be important:

Core uses the same code as in this PR, so both Gutenberg and core are currently wrong, in their own special way. 🙂

I think we can just implement something like https://github.com/WordPress/gutenberg/pull/10108, and modify this change to use that new filter. This can be done in core, we don't need to bother in Gutenberg.

#2 @pento
4 months ago

  • Keywords needs-patch needs-unit-tests added
  • Milestone changed from Awaiting Review to 5.2
  • Version set to 5.0

@desrosj
7 weeks ago

#3 @desrosj
7 weeks ago

  • Component changed from General to Formatting
  • Keywords has-patch has-unit-tests dev-feedback added; needs-patch needs-unit-tests removed

@gziolo @pento It's unclear to me if this requires further action. It seems that this issue was fixed in [43884], though no unit tests were added for this.

In my testing, the unit tests from the unmerged GB-11704 PR pass. Removing the [https://core.trac.wordpress.org/browser/tags/5.1/src/wp-includes/formatting.php#L3689 line that strips blocks for excerpts in core causes the tests fail.

If my assessment is true, let's add these tests and this can be closed.

@pento
7 weeks ago

#4 @pento
7 weeks ago

Thanks for the follow up, @desrosj.

The existing functionality is incomplete, it doesn't deal with dynamic blocks nested within whitelisted blocks.

46133.2.diff modifies the test to demonstrate the problem.

This ticket was mentioned in Slack in #core by jeffpaul. View the logs.


6 weeks ago

#6 @JeffPaul
6 weeks ago

  • Keywords dev-feedback removed

#7 @azaozz
5 weeks ago

What's the way forward here, can we skip blocks that have nested blocks? Or perhaps have a whitelist for what blocks can be nested?

Ideally we are looking for some text that is relevant to the post content. If no text is found, we can perhaps use an image. But some posts may not have either (a gallery post?). In these edge cases it would be acceptable to have an empty excerpt imho.

Another possible way might be to look at the block content (HTML) without rendering the blocks and "extract" the text?

Last edited 5 weeks ago by azaozz (previous) (diff)

@azaozz
5 weeks ago

#8 @azaozz
5 weeks ago

In 46133.3.diff:

  • Add a (non-filterable) $allowed_inner_blocks list. Same as the default $allowed_blocks except core/columns.
  • When rendering the allowed blocks, exclude blocks that have non-whitelisted or nested inner blocks.
  • Add a helper function to render whitelisted blocks from inside the columns block.
  • Fix the tests to match the output from get_the_excerpt() (has some "unusual" left over white space).

#9 @gziolo
5 weeks ago

Add a (non-filterable) $allowed_inner_blocks list. Same as the default $allowed_blocks except core/columns.

There is going to be core/group block added to Gutenberg after 5.2 release. We might want to include it in the whitelist as well together with core/columns.

#10 @gziolo
5 weeks ago

It looks like this diff covers the case where you have Columns block nested inside another Columns block. It would be good to add a test to ensure it doesn't regress in the future and to confirm my assumption :)

#11 @azaozz
5 weeks ago

@gziolo thanks for the review! :)

There is going to be core/group block added to Gutenberg after 5.2 release

Yeah, thinking we will need to keep enhancing the way we auto-generate excerpts for posts and include all blocks that may have relevant text.

The challenge is that we only want "static text" and need to exclude dynamic blocks as they may cause infinite loop if trying to generate an excerpt, and in most cases they are not particularly relevant to the rest of the post.

Ideally we should parse all "text containing blocks" regardless if they are wrapped in "container" blocks. The current patch only looks one level deep in all whitelisted blocks, and has an exception for core/columns where it looks inside each column. This can be enhanced and perhaps can be done better. Planning to look at it again for 5.3.

#12 @azaozz
5 weeks ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 45265:

Fix parsing of inner blocks when auto-generating an excerpt. Helps to prevent cases where dynamic inner blocks may cause an infinite loop if trying to auto-generate an excerpt.

Props desrosj, pento, gziolo, azaozz.
Fixes #46133.

Note: See TracTickets for help on using tickets.