Make WordPress Core

Opened 22 months ago

Last modified 5 weeks ago

#55996 assigned defect (bug)

the get_the_block_template_html call all the same functions as the the_conent filter so they are run twice

Reported by: pbearne's profile pbearne Owned by: flixos90's profile flixos90
Milestone: Future Release Priority: normal
Severity: normal Version:
Component: Formatting Keywords: 2nd-opinion
Focuses: performance Cc:

Description

In get_the_block_template_html we have this code

<?php
$content = $wp_embed->run_shortcode( $_wp_current_template_content );
$content = $wp_embed->autoembed( $content );
$content = do_blocks( $content );
$content = wptexturize( $content );
$content = convert_smilies( $content );
$content = shortcode_unautop( $content );
$content = wp_filter_content_tags( $content );
$content = do_shortcode( $content );
$content = str_replace( ']]>', ']]&gt;', $content );

These are direct calls to the same functions as used by the filter the_content

<?php
add_filter( 'the_content', 'do_blocks', 9 );
add_filter( 'the_content', 'wptexturize' );
add_filter( 'the_content', 'convert_smilies', 20 );
add_filter( 'the_content', 'wpautop' );
add_filter( 'the_content', 'shortcode_unautop' );
add_filter( 'the_content', 'prepend_attachment' );
add_filter( 'the_content', 'wp_filter_content_tags' );
add_filter( 'the_content', 'wp_replace_insecure_home_url' );

So they are been run twice for site content

Change History (54)

#2 @SergeyBiryukov
22 months ago

  • Focuses performance added

#3 @SergeyBiryukov
22 months ago

Hi there, thanks for the ticket!

As far as I can tell, get_the_block_template_html() is only used in wp-includes/template-canvas.php and is not run through the_content, so it looks like these functions are not run twice. Still, these direct calls seem inconsistent.

I think we should be able to just apply the_content filter here, like we do in other similar places in core:

$content = apply_filters( 'the_content', $content );

Has there been any discussion in Gutenberg on why that approach was not chosen here, and the functions are called directly instead? Is it because the_content may also include other callbacks that would not be expected here?

Last edited 22 months ago by SergeyBiryukov (previous) (diff)

#4 @manfcarlo
22 months ago

The functions are definitely run twice, as it was determined to be the cause of the bug with the chessboard plugin as reported in #54614. I'm sure the opener of this issue would not have opened it if they had not verified that the functions run twice.

Given the bugs it causes, using the_content filter is not an option.

Additionally, the query loop block allows the post content of one post to be embedded within the post content of another post.

The solution I propose is to move certain functions (wptexturize, do_shortcode and possibly others) out of the_content filter and into the block callback of certain blocks (paragraph block, classic block, heading block etc.) This will prevent them from ever being run twice on the same content.

#5 follow-up: @ocean90
21 months ago

I'm trying to debug why the lazy loading attribute isn't omitted for the first image (feature from #53675) in a block theme using template parts and als noticed the direct call to wp_filter_content_tags() in get_the_block_template_html() but also render_block_core_template_part().

It seems like false gets passed to wp_get_loading_attr_default() in these cases which is actually an undocumented value but happens when the function is not used as part of a filter thus current_filter() returns false.

Removing both wp_filter_content_tags() calls still adds the loading attribute so there's indeed something that still runs the the_content filter on the output.

cc: @flixos90 for awareness

#6 in reply to: ↑ 5 ; follow-up: @ocean90
21 months ago

Replying to ocean90:

Removing both wp_filter_content_tags() calls still adds the loading attribute so there's indeed something that still runs the the_content filter on the output.

That's actually because of the post content block which calls the filter in the render callback. At this point the loading attribute may not be added but due to the following wp_filter_content_tags() calls for rendering the template parts the attribute is added.

Related: #56295

#7 in reply to: ↑ 6 @needle
18 months ago

Replying to ocean90:

Removing both wp_filter_content_tags() calls still adds the loading attribute so there's indeed something that still runs the the_content filter on the output.

That's actually because of the post content block which calls the filter in the render callback.

Yes, this is exactly what I've just run into. It's easily verified that wptexturize() is called twice by Block Themes when rendering post/page content.

This becomes more than just a display problem when the content has text that should not be texturised (e.g. inline Javascript) even if it's added via the_content filter with a priority greater than 11, i.e. after all default core filters have run.

It seems to me that get_the_block_template_html() definitely shouldn't be calling the functions that are more usually hooked in to the_content (especially directly as it does) but that also implies that each block should be solely responsible for the parsing of its own content and I'm not sure that's the case.

An HTML block in post/page content, for example, shouldn't run its content through any filters at all, given that the content is "raw" and therefore the responsibility of someone knowledgeable to insert. Yet try adding one to a page in Twenty Twenty-Three and making its content:

<script>
function foo() {
  for( i=0; i < document.Main.elements.length; i++ ) {
    if (1 && 0) {
    }
  }
}
</script>

You'd expect this to output as-is in the page, but since it's run through wptexturize() by both render_block_core_post_content() and get_the_block_template_html(), the && ends up encoded as &#038;&#038; and the Javascript fails.

#8 follow-up: @manfcarlo
18 months ago

A patch for this is available at https://github.com/WordPress/gutenberg/pull/44995

@ocean90 If you have time to test, I would be interested to know if the patch fixes #56295.

@needle I tested your example and it appears that &#038;&#038; is produced even when wptexturize is only applied once. While I agree that it is unwanted, it looks to be outside the scope of this ticket. Perhaps you should open another ticket and/or search to see if it's already been reported in another ticket.

#9 in reply to: ↑ 8 ; follow-up: @needle
18 months ago

Replying to manfcarlo:

A patch for this is available at https://github.com/WordPress/gutenberg/pull/44995

@manfcarlo Unfortunately your patch doesn't solve the first issue I mentioned, so let me restate the problem...

Say you want to add something to the rendered post content that must not be run through the core default filters. In a plugin, this is traditionally done by hooking in to the_content with a priority greater than 20 (IOW after the final default filter convert_smilies has run).

However, because get_the_block_template_html() applies wptexturize() (plus the other functions mentioned in this ticket) directly to the complete rendered markup of the page template, this mean that whatever you've added via the_content still gets run through those functions.

Your patch removes those function callbacks from the_content filter in render_block_core_post_content(), but plugin and child theme authors would actually expect those filters to be applied to post content unless they had already explicitly removed them.

In my view, the issue here lies not in render_block_core_post_content() but in get_the_block_template_html() which, in principle, should do little (if anything) beyond running the template through do_blocks(). (I assume that render_block_core_template_part() should also delegate rendering to the blocks nested inside it and that it, too, should not call wptexturize() etc etc either.)

Any renderable content in a block theme template should presumably be wrapped in the appropriate block identifier code and its parsing should therefore be handled by the block itself. As it stands, get_the_block_template_html() breaks that contract.

The solution I propose is to move certain functions (wptexturize, do_shortcode and possibly others) out of the_content filter and into the block callback of certain blocks (paragraph block, classic block, heading block etc.) This will prevent them from ever being run twice on the same content.

This is (kind of) why I raised the example of the HTML block. My point was about encapsulation of rendering responsibilities within blocks themselves and the apparent conflict with get_the_block_template_html() applying those functions outside the scope of the blocks themselves.

Excluding the content of an HTML block from any rendering at all (as would be expected by the name of the block) seems problematic as long as get_the_block_template_html() applies the functions that it does.

The larger issue here is whether the_content filter is actually an appropriate filter to invoke in a block theme context at all.

I tested your example and it appears that &#038;&#038; is produced even when wptexturize is only applied once. While I agree that it is unwanted, it looks to be outside the scope of this ticket.

Thanks - yes, on review this does seem to be a bug in wptexturize() itself.

#10 in reply to: ↑ 9 ; follow-up: @manfcarlo
18 months ago

Replying to needle:

Replying to manfcarlo:

A patch for this is available at https://github.com/WordPress/gutenberg/pull/44995

@manfcarlo Unfortunately your patch doesn't solve the first issue I mentioned, so let me restate the problem...

Say you want to add something to the rendered post content that must not be run through the core default filters. In a plugin, this is traditionally done by hooking in to the_content with a priority greater than 20 (IOW after the final default filter convert_smilies has run).

However, because get_the_block_template_html() applies wptexturize() (plus the other functions mentioned in this ticket) directly to the complete rendered markup of the page template, this mean that whatever you've added via the_content still gets run through those functions.

The problem you describe is not mentioned in the ticket description at all. The ticket description only identifies the problem of the filters being run twice on the same content, which, to my knowledge, the patch solves.

In other words, with the patch, none of the filters identified in the ticket description should ever run more than once on the same content. Unwanted effects such as #54614 and #56295 are believed to be directly caused by the double filters, so the patch should solve them.

If the problem you describe still remains after eliminating the double filters, then it can be concluded that it's not caused by the double filters and is therefore outside the scope of this ticket.

Plenty of independent bugs will still remain after applying the patch, and they can be worked on later, but it should not delay #54614 and #56295 from being solved if there is a patch to solve them.

This is (kind of) why I raised the example of the HTML block. My point was about encapsulation of rendering responsibilities within blocks themselves and the apparent conflict with get_the_block_template_html() applying those functions outside the scope of the blocks themselves.

Excluding the content of an HTML block from any rendering at all (as would be expected by the name of the block) seems problematic as long as get_the_block_template_html() applies the functions that it does.

The argument here is rather fallacious and reminds me of people complaining that free speech is threatened because someone spoke out to disagree with them. If you want "encapsulation of rendering responsibilities within blocks themselves" then you must concede that any wrapper block that contains inner blocks can run something like:

<?php
return wptexturize( do_blocks( $inner_blocks ) );

It's not really fair to blame get_the_block_template_html for doing the same, simply because it's not a block. It could be argued that get_the_block_template_html actually does function much like a wrapper block, and there was an abandoned exploration to make the idea more explicit.

The only way to stop blocks like Custom HTML from being filtered by a core wrapper block is to delegate the filters to micro-level blocks such as Paragraph, which I would imagine is inefficient and not how the filters were originally intended to be used.

The larger issue here is whether the_content filter is actually an appropriate filter to invoke in a block theme context at all.

I'm not exactly sure of what you're hinting at here, but block themes have been available for 2–3 core release cycles now and I would not expect the core committers to agree to anything that breaks the pre-existing behaviour.

#11 in reply to: ↑ 10 @needle
18 months ago

Replying to manfcarlo:

The problem you describe is not mentioned in the ticket description at all. The ticket description only identifies the problem of the filters being run twice on the same content, which, to my knowledge, the patch solves.

The ticket doesn't actually mention a problem at all; it simply notes that there is duplication. I'm filling in the gaps, which are twofold:

  1. Some functions (e.g. wptexturize) that are usually applied via the_content filter are applied twice to the content.
  2. In non-block themes, it is possible to bypass these functions when they are applied via the_content filter. In block themes, as things stand, this is not possible.

Your patch removes the core callbacks from the_content filter, which addresses (1) but does not address (2).

In other words, with the patch, none of the filters identified in the ticket description should ever run more than once on the same content. Unwanted effects such as #54614 and #56295 are believed to be directly caused by the double filters, so the patch should solve them.

I agree with you on this but disagree with your choice of which calls to remove. In my opinion, it is the direct calls in get_the_block_template_html() that should be removed (because they break the existing functionality that some plugins rely on) rather than removing the expected callbacks in render_block_core_post_content(). The result would be the same in terms of solving (1), but would also offer a path to solving (2).

My knowledge of the knock-on effects of removing those function calls from get_the_block_template_html() is limited, but if they really do need to be called there, then I favour Weston Ruter's original suggestion that they should be called via a separate hook rather than directly. They could then be unhooked or overridden if need be, which is not the case at present.

The argument here is rather fallacious and reminds me of people complaining that free speech is threatened because someone spoke out to disagree with them.

Please don't be offensive. This is a technical discussion not a political one.

The only way to stop blocks like Custom HTML from being filtered by a core wrapper block is to delegate the filters to micro-level blocks such as Paragraph, which I would imagine is inefficient and not how the filters were originally intended to be used.

Yes, this is indeed a condundrum. The reason why I used the example of the HTML block is that it neatly illustrates this conundrum.

I would not expect the core committers to agree to anything that breaks the pre-existing behaviour.

I can't anticipate what core committers will do, but what I do know is that the direct function calls in get_the_block_template_html() are already breaking the pre-existing behaviour, so the question is how best to address that breakage.

#12 follow-up: @flixos90
18 months ago

  • Keywords has-patch added; needs-patch removed
  • Milestone changed from Awaiting Review to 6.2

I have opened a pull request to address this problem: https://github.com/WordPress/wordpress-develop/pull/3549

The TLDR of what it does is:

  • After parsing the blocks in the block template, replace post content within the block template with temporary placeholders.
  • Run the other block template processing logic, which now will no longer unnecessarily reprocess the post content again.
  • Inject the parsed post content back into the block template.

This change fixes the double execution of these functions on post content with minimal disruptions, and without attempting to significantly alter the approach of get_the_block_template_html() or to introduce new filters. Please have a look.

This PR already fixes problems like #56930 as well for example.

cc @pbearne @spacedmonkey @adamsilverstein

#13 @flixos90
18 months ago

  • Owner set to flixos90
  • Status changed from new to assigned

This ticket was mentioned in PR #3549 on WordPress/wordpress-develop by @flixos90.


18 months ago
#14

  • get_the_block_template_html() calls various (partially expensive) functions to parse the block template content.
  • A problem with that is that, within the block template, there usually are one or more instances of the post-content block, which itself parses post content using similar functions (which has historically been the established behavior).
  • This means that the post content blobs within the template content are now double-processed. Not only is this wasteful, more importantly it leads to various bugs (e.g. Trac ticket 56930) and is unintuitive for plugin developers (e.g. if you use the wp_content_img_tag filter to make a modification, you now have to consider that this filter may run twice on the same piece of content).
  • This PR addresses this problem by doing the following:
    • After fully parsing the blocks in the block template, the post content has already been fully parsed including all the typical functions.
    • To avoid double parsing, it is then extracted out of the block template temporarily and replaced with placeholders.
    • The similar functions can then process the rest of the block template, without touching the post content again.
    • Once that has happened, the post content is re-injected in the correct places, keeping the overall template content intact while fixing the double parsing.
  • Having benchmarked this with a basic post in Twenty Twenty-Two, there is no notable performance benefit to this (neither faster nor slower, since replacing the content is a similar amount of logic), but the critical part here is fixing the double execution and the bugs that are resulting from it.

Trac ticket: https://core.trac.wordpress.org/ticket/55996

@manfcarlo commented on PR #3549:


18 months ago
#15

I can see three major flaws with this implementation:

  1. If the `render_block` filter is used to change the string that was saved to the global variable, the string search will fail and the post content will not get replaced.
  2. If the same placeholder string was already present in the raw content, such as inside a Custom HTML block, it will get replaced unintentionally.
  3. If one of the shortcode callbacks called through do_shortcode corrupts the placeholder string, it will not get replaced and the post content will be missing.

There is an alternative implementation at https://github.com/WordPress/gutenberg/pull/44995 that avoids all of the above and is much cleaner.

#16 in reply to: ↑ 12 @needle
18 months ago

Replying to flixos90:

I have opened a pull request to address this problem: https://github.com/WordPress/wordpress-develop/pull/3549

I think this approach is preferable to removing callbacks to the_content since it solves both the "double execution problem" and the "avoid all execution problem" I detailed above. Thanks @flixos90 this looks promising.

I do, however, agree with @manfcarlo that the placeholder string approach may have weaknesses in certain circumstances, though I'm not sure what mitigations can be applied given the string format of the raw data. Tough call.

@flixos90 commented on PR #3549:


18 months ago
#17

@carlomanf Thanks for sharing these. Some replies:

  1. That's a great catch. We can address that though, by moving the logic of storing in the global variable to the latest possible point, where no external code can interfere anymore.
  2. I think this really is a _theoretical_ concern. We definitely should use a placeholder string that is "unique" enough so that nobody will _randomly_ use a similar placeholder. Of course one can still choose to use exactly the same placeholder, but then you're intentionally breaking this feature, and it goes without saying that if you have code access and _want_ to break WordPress, you already can in millions of ways. Definitely open to choosing another "more unique" placeholder string if that's preferable.
  3. Realistically, under which circumstances could that possibly occur? The template content would need to have a shortcode, which is the first thing I question. Shortcodes are mostly a thing of the past, and while they are still used a ton in post content, I have yet to see or hear about any usage of shortcodes in a block template. More importantly though, even if a shortcode is used in a template, it would need to be a shortcode that wraps the post-content block, which I think is even far more unlikely. So this one I as well consider a theoretical concern.

To be clear, I'm not saying the last 2 things are impossible to happen, but they are extreme edge cases to the degree that makes them irrelevant (<0.1% of sites).

@flixos90 commented on PR #3549:


18 months ago
#18

@carlomanf I have pushed additional commits which address your points 1. and 2. While I don't think 2. is a significant concern, it's still trivial to address so I think it's worth doing.

The fix to point 1. led me to another neat code cleanup here, as I've now replaced the global with a simple local variable that cannot be tampered with outside of the intended usage (when parsing the core/post-content block).

#19 @flixos90
18 months ago

  • Keywords has-unit-tests added; needs-unit-tests removed

@manfcarlo commented on PR #3549:


18 months ago
#20

Points 2 and 3 that I raised can also be combined, so one of the shortcode callbacks could add and it would then get replaced. I wouldn't say it's necessarily an intentional attempt to break this feature, it might just be a coincidence that they choose the same string.

Why not use a string that is generated uniquely at runtime, such as with `uniqid`? Then, in case any corruption errors occur, they would likely be solved by refreshing the page.

Also, this solution does not stop double filtering when one post content is embedded inside another post content, which is a far more common occurrence than the other scenarios and is handled easily by https://github.com/WordPress/gutenberg/pull/44995. How feasible do you think it would be to modify this solution to handle it?

@flixos90 commented on PR #3549:


18 months ago
#21

Providing a quick update here, I've found another simpler fix for the lazy-loading problem from https://core.trac.wordpress.org/ticket/56930 specifically and opened #3560. Therefore I've removed one unit test from here which is now part of that PR. This PR here remains valid for fixing https://core.trac.wordpress.org/ticket/55996.

@flixos90 commented on PR #3549:


18 months ago
#22

@carlomanf

Points 2 and 3 that I raised can also be combined, so one of the shortcode callbacks could add and it would then get replaced. I wouldn't say it's necessarily an intentional attempt to break this feature, it might just be a coincidence that they choose the same string.

My previous comment remains, if you want to break this feature, you can. Let's be realistic here, what are the chances that there is a shortcode in an FSE template that wraps a post-content block and then adds some placeholder in there (without replacing it with anything, which would probably be a bug by itself!) that is exactly the same placeholder as WP core uses? Obviously WordPress operates at a massive scale and edge-cases occur, but what we're talking about here is extremely far fetched to the degree that I can't see it happening without intentionally wanting to break this.

Why not use a string that is generated uniquely at runtime, such as with `uniqid`? Then, in case any corruption errors occur, they would likely be solved by refreshing the page.

We can potentially combine this with the current approach. Using uniqid() by itself would arguably worse than the current approach, since now it attempts to find a placeholder that isn't anywhere in the content. A string created by uniqid() could also be in the template, while the current code makes a collision with the parsed post content impossible.

Also, this solution does not stop double filtering when one post content is embedded inside another post content, which is a far more likely occurrence than the other scenarios and is handled easily by WordPress/gutenberg#44995. How feasible do you think it would be to modify this solution to handle it?

What is a realistic use-case to include a post-content block within another post-content block? To my knowledge, that is not even possible since this is a template-specific block.

@manfcarlo commented on PR #3549:


18 months ago
#23

Realistically, under which circumstances could that possibly occur? The template content would need to have a shortcode, which is the first thing I question. Shortcodes are mostly a thing of the past, and while they are still used a ton in post content, I have yet to see or hear about any usage of shortcodes in a block template. More importantly though, even if a shortcode is used in a template, it would need to be a shortcode that wraps the post-content block, which I think is even far more unlikely. So this one I as well consider a theoretical concern.

Imagine there is a shortcode that wraps some content, performs a word count on it, and prints out the word count appended to the raw content. Currently, if a post content block is added within the shortcode, the post content is included in the word count. After this patch is applied, the shortcode will begin to receive the placeholder and the post content will no longer be included in the word count.

I don't know of a particular shortcode that does the above or something like it, but I wouldn't underestimate the use of legacy features and I wouldn't at all be surprised if the bug gets reported at some point. For example, https://github.com/WordPress/gutenberg/issues/35676 is a separate but similar issue that shows how widely plugin/theme authors are using shortcodes together with blocks and discovering unexpected behaviour.

The issue with the hypothetical word count shortcode would actually be worse, because it would not be the unexpected behaviour of a new feature, but an unexpected change in behaviour after several release cycles given that shortcode support was added in 5.9.

The root cause of all of these bugs is that the block paradigm that was ostensibly intended to be a clean slate ended up receiving support for legacy features, some of which date back to the b2/cafelog days, and neither was ever designed to be compatible with the other. There is probably no way forward here that isn't going to break at least something, and quoting obviously fabricated statistics like "<0.1% of sites" is not going to be helpful, especially not after the bug reports inevitably roll in.

The main difference between https://github.com/WordPress/gutenberg/pull/44995 and this patch is as follows:

  • Both patches ensure that the filters never run more than once, and therefore solve any problems introduced by the double filtering, e.g. the lazy-load images.
  • This one does so by running the filters separately on different pieces of content and re-assembling them at the end.
  • The other patch does so by running the filters just once on the whole content, which might be better from a performance perspective.
  • Both patches are likely to break existing behaviour in different ways.
  • It will be practically impossible to quantify the scale of the breakage, especially before any patch is deployed.

@flixos90 commented on PR #3549:


18 months ago
#24

@carlomanf I agree with you in that there is a risk of potential breakage in this change. I do however still think that the risk for what you're pointing out to come up as a real-world bug is extremely low. It obviously does not make sense to argue about this because we can't know until it's out there. My take here specifically is that the chance of this causing a bug is so small that we can add it to core, basically "wait and see". If my assessment here shows to be wrong, and we get bug reports of this, I'll happily revert and then we have to look for alternatives. We also have beta period for that purpose (though even if this is in a stable release, we could revert it if it became a problem).

The implementation from https://github.com/WordPress/gutenberg/pull/44995 has other problems, specifically for the wp_filter_content_tags() function it introduces a backward compatibility break. The function is supposed to run specifically on the_content, as otherwise the lazy-loading logic will not apply correctly. It seems to me that the main difference between the two PRs is that this one keeps those functions running on the_content instead of the block template, while the other one runs them on the block template instead of the_content. The first is how WordPress has been behaving since forever basically, which is why I consider that a backward compatible approach.

#25 @mukesh27
16 months ago

  • Keywords needs-testing added

The PR need testing.

This ticket was mentioned in PR #3833 on WordPress/wordpress-develop by @spacedmonkey.


16 months ago
#26

Working on https://github.com/WordPress/wordpress-develop/pull/3549. This is a simpler solution. Just removing the filters. However, I am still seeing double processing in as wp_filter_content_tags is also called in render_block_core_template_part.

Trac ticket: https://core.trac.wordpress.org/ticket/55996

Did some performance testing.

### Before

https://i0.wp.com/user-images.githubusercontent.com/237508/211532443-df02cb06-fcc2-41c8-89e7-4ec824f1076c.png

### After
https://i0.wp.com/user-images.githubusercontent.com/237508/211532480-ebc7ede9-c44d-4fc6-bfd2-53ead1c07c1a.png

#27 @mukesh27
15 months ago

  • Keywords changes-requested added

Thanks @pbearne for the ticket and @flixos90 for PR!

This ticket was discussed in the recent performance bug scrub.

Some changes have been requested that must be reviewed and addressed.

Props to @joemcgill and @adamsilverstein

#28 @costdev
14 months ago

#57351 was marked as a duplicate.

#29 @costdev
14 months ago

#57351 was opened regarding shortcodes not being escaped when using the documented [[shortcode]] format.

I tracked this to the same issue this ticket reports: the_content filters were running twice, first to convert the [[shortcode]] to a literal [shortcode], then again to render the shortcode.

When I applied PR 3549 from this ticket, [[shortcode]] was correctly converted to a literal [shortcode]. While there may still be work left to do on the PR, it appears to resolve this particular issue - Nice!

@manfcarlo commented on PR #3549:


14 months ago
#30

Also, this solution does not stop double filtering when one post content is embedded inside another post content, which is a far more likely occurrence than the other scenarios and is handled easily by WordPress/gutenberg#44995. How feasible do you think it would be to modify this solution to handle it?

What is a realistic use-case to include a post-content block within another post-content block? To my knowledge, that is not even possible since this is a template-specific block.

It is possible, encouraged, and mentioned in this video at 5–6 minutes, 18 minutes and 32 minutes. It is also mentioned in this article and this article.

Handling post content within post content is essential for this issue to be solved, and currently, this patch does nothing to handle it.

#31 @manfcarlo
14 months ago

@costdev I see that you recently closed #57351 as a duplicate of this one, on the rationale that while it was not directly reporting the same effect, it was caused by the same issue.

On the same pattern, perhaps this ticket should be closed as a duplicate of: https://github.com/WordPress/gutenberg/issues/37754

It seems that more people have been finding that ticket recently and leaving thoughts over there rather than on this one. It was also opened earlier.

#32 follow-up: @flixos90
14 months ago

  • Milestone changed from 6.2 to 6.3

We are not really having any path forward here at this point. It seems that, between the two approaches of removing/re-adding filters (https://github.com/WordPress/gutenberg/pull/44995 and https://github.com/WordPress/wordpress-develop/pull/3833) and foundationally avoiding duplicate parsing of the content (https://github.com/WordPress/wordpress-develop/pull/3549), both approaches have problems:

  • The approach of removing filters and readding them does not actually solve the double execution problem.
  • The approach of avoiding double execution is a whole lot more complex to get right, and there are outstanding edge-cases that probably need to be addressed.

I think it's fair to say that, at least in the current stage they are in, neither of the approaches is a proper fix for the bug.

I personally think the approach of removing and re-adding filters is a hacky workaround, not a proper fix. What we should work towards is to actually avoid double content parsing, as in record content that was already parsed and ensure that it is not parsed again. This is obviously quite complex to get right, and so far my PR is not solving it reliably. Still, I think going with a workaround just because we can get it done quicker is not a good idea, we should continue working towards an actual fix for the underlying problem - which is the double parsing, not the execution of specific filter callbacks.

At this point, unfortunately we are too late in the cycle to get this done, so I'll move it to 6.3. Let's continue thinking through how we can properly address this bug.

#33 in reply to: ↑ 32 @manfcarlo
14 months ago

Replying to flixos90:

I personally think the approach of removing and re-adding filters is a hacky workaround, not a proper fix. What we should work towards is to actually avoid double content parsing, as in record content that was already parsed and ensure that it is not parsed again.

I agree with you on this. I think there were some promising ideas from your implementation, but the fatal flaw was doing the recording within the string itself instead of some external store.

What about patching the functions themselves (wptexturize, shortcode_unautop, wp_filter_content_tags, do_shortcode, convert_smilies) to skip the sections that have already been processed by post content blocks? This seems to fix all of the issues that have been raised so far, and in the event that it conflicts with a plugin for whatever reason, it would just fall back to the current behaviour instead of causing total disappearance of content.

#34 @nico23
13 months ago

Just want to note that this is not obvious at all that running the functions twice causes shortcode escaping not to work.

Just saying because I was redirected here from this GB issue to this so called duplicate that points here. This bug states its about performance and not mention the shortcode escaping issue at all, but that is actually a more serious and more important issue than just performance.

#35 @flixos90
13 months ago

Just a heads up here that I'm planning to work on a new approach for this problem later this month, based on the feedback seen on the existing PRs.

The idea is to implement a proper parsing class for this, which takes the idea from https://github.com/WordPress/wordpress-develop/pull/3549 but makes it more reliable including nesting support (e.g. post content block within another post content block) etc.

#36 @miyarakira
11 months ago

In WP 6.2.1, the call to do_shortcode inside get_the_block_template_html was removed to address a security vulnerability (Shortcode Execution in User Generated Content). This caused issues for numerous users, as described in the ticket #58333. In response, in WP 6.2.2, the call to do_shortcode was reinstated, but moved to *before* do_blocks. (Some users are still reporting issues because this changes behavior in subtle ways.)

In wp-includes/block-template.php:

<?php
$content = shortcode_unautop( $content );
$content = do_shortcode( $content );
$content = do_blocks( $content );
$content = wptexturize( $content );
$content = convert_smilies( $content );

In wp-includes/blocks/template-part.php:

<?php
$content = shortcode_unautop( $content );
$content = do_shortcode( $content );
..
$content = do_blocks( $content );
..
$content = wptexturize( $content );
$content = convert_smilies( $content );

Apparently putting do_shortcode before do_blocks maintains the security fix while supporting shortcodes in block templates. In the above mentioned ticket, I asked what this implies for the_content and widget_block_content filters, which apply do_blocks *then* do_shortcode.

In wp-includes/default-filters.php:

<?php
add_filter( 'the_content', 'do_blocks', 9 );
add_filter( 'the_content', 'wptexturize' );
add_filter( 'the_content', 'convert_smilies', 20 );
add_filter( 'the_content', 'shortcode_unautop' );
add_filter( 'the_content', 'do_shortcode', 11 );
..
add_filter( 'widget_block_content', 'do_blocks', 9 );
add_filter( 'widget_block_content', 'do_shortcode', 11 );

And I also mentioned the Post Content block, which applies the_content filter and can have nested Post Content blocks. However, the question received no response.

All this is related to the issue described in this ticket, where similar sets of content filters are being applied in multiple places, sometimes repeatedly on the same content. In WP 6.2.2, this is further complicated by an inconsistent order of filters. As a block author, I have certain blocks whose content should not be modified in any way, before or after rendering - but currently, it's impossible to opt-out of these filters.

These content filters are causing other related issues, because they can corrupt block content in unexpected ways.

What's clear from these is that functions like wptexturize and do_shortcode are unsuitable for processing an entire page of HTML including block content. They should only be applied selectively to certain parts of the page, where it makes sense to do so.

A proper solution to this is not simple, because it involves restructuring the above duplicated and inconsistent code for applying content filters, and designing a more sophisticated logic that allows for certain blocks to opt out of them, as well as preventing the filters from being applied multiple times to the same content.

Last edited 11 months ago by miyarakira (previous) (diff)

#37 @flixos90
10 months ago

  • Milestone changed from 6.3 to 6.4

While this is technically a bug, it is quite complex and unreasonable to fix this late in the cycle. Therefore punting this to 6.4, it should be a priority for that one, and I'd love to get to a solution in time for that release.

#38 @oglekler
8 months ago

  • Keywords needs-testing removed

I am removing needs-testing because it looks like there is nothing to test yet.

#39 @flixos90
7 months ago

Revisiting this ticket, I would like us to take a step back and rethink what we want to accomplish here.

While this ticket is explicitly focused on avoiding duplicate execution of certain functions on the same content, I am unsure whether that is what we really want, because:

  • There are valid reasons to re-process certain content even if it was already processed at another processing layer: For example, a plugin or even core itself could implement image loading optimization logic at the entire block template level (template context) to refine the attributes that core otherwise would add at the post content level (the_content context). Such a thing would not be possible if we were preventing already processed content to be re-processed.
  • While it seems undesirable from a performance perspective to process certain content multiple times, I don't think it would make a notable difference for performance since the processing logic would still need to be initialized just as before: For example, to parse blocks at the entire block template level, we still need to initialize all the block parsing logic even if the post content blocks were already parsed. And there wouldn't be any duplication either because already parsed blocks wouldn't be reparsed.

While based on that I lean towards changing course on this ticket, that is not to say we just leave it as is. There are certainly some problems that result from the double execution that we can solve. Here's a few that were brought up before that could be addressed separately:

  • Certain functions like wptexturize() or convert_smilies() may not be desirable for some sites to run, and right now there is no way to remove them from the block template level, even when they're being removed from the_content filter. What we could do here is replace the hard-coded function list in get_the_block_template_html() with a regular filter and callbacks just like in the other cases so that functions can be added/removed. See draft pull request
  • Certain extension points for plugins/themes, such as the wp_content_img_tag filter, are applied in different contexts and thus may process the same content twice which from an API perspective would be unexpected and can lead to problems such as an attribute being added twice. What we could do here is to use context specific filters so that developers wouldn't by accident make certain modifications twice. See draft pull request

This ticket was mentioned in PR #5188 on WordPress/wordpress-develop by @flixos90.


7 months ago
#40

See https://core.trac.wordpress.org/ticket/55996#comment:39 for context.

  • Introduce a template filter for the block template HTML.
    • The name is based on the context that is already being used for this purpose in wp_filter_content_tags(), to keep parity. It is also short and concise.
  • Move all hard-coded function calls to become filters, run in the same order as today.
    • The only exception is that convert_smilies() is now run after wp_filter_content_tags(), which functionally shouldn't matter since these two functions do not interfere with each other. The reason convert_smilies() now runs later is to get parity with how the same callback is added to the_content, at priority 20.
  • Move the logic to conditionally wrap the parsing of block template blocks in a singular main query loop (see https://core.trac.wordpress.org/ticket/58154) into do_blocks, only running it for the template context.
    • This keeps things as they are today, however directly implemented into the main function.
    • Having context available for parsing blocks may also come in handy in the future.

Trac ticket: https://core.trac.wordpress.org/ticket/55996

This ticket was mentioned in PR #5189 on WordPress/wordpress-develop by @flixos90.


7 months ago
#41

See https://core.trac.wordpress.org/ticket/55996#comment:39 for context.

  • Introduce filter wp_img_tag_{$context} as a replacement for the generic wp_content_img_tag filter. Can be used e.g. like wp_img_tag_the_content for post content images or wp_img_tag_template for block template images.
  • Deprecate wp_content_img_tag, recommending the context specific filters instead.
    • The current/old filter should no longer be used as it can lead to problems if the developer using it is unaware of the filter being applied in different contexts. Having specific filter names per context avoids that intuitively, so that if a developer wants to modify in-post-content images, they can do so safely without having to worry about that same content being parsed again in another context.

Trac ticket: https://core.trac.wordpress.org/ticket/55996

@spacedmonkey commented on PR #5189:


7 months ago
#42

I am not sure about this change. I think the filter is solid one and already has the context as part the filter. If a developer is using the filter, they should take ownership of ensure the output is doing what is expected. If someone is not working with their filter, that is their responity. I can get behind a new filter and I am unsure of the needed to deprecate the existing filter.

This ticket was mentioned in Slack in #core-performance by flixos90. View the logs.


7 months ago

@flixos90 commented on PR #5189:


7 months ago
#44

@spacedmonkey I partially agree with you, certainly a developer _could_ use this filter correctly. But nothing in the filter name or docs clarifies the caveat that it can be run multiple times on the same content, which is what's causing problems. I think an alternative filter that has the context name attached and thus does not allow duplicate execution on the same content prevents any accidents, so it's more intuitive to use, even for developers that don't read the filter docs in detail or understand WP core inside out.

In any case, I think we should do one of the following:

  • Either introduce the new dynamic filter and deprecate the other one, and thus prioritize being forgiving to developers that don't know about the caveat with the duplicate execution.
  • Or just leave things as is and close the PR, and thus prioritize existing usage and leave responsibility with the individual developer.

Having both kinds of filters is pointless IMO and doesn't satisfy either perspective.

#45 @flixos90
7 months ago

  • Milestone changed from 6.4 to 6.5

Despite this being a bug, it's definitely too late to address it now given the potential implications of any possible solution. And based on the above, more discussion is required here in the first place.

This ticket was mentioned in Slack in #core-performance by mukeshpanchal27. View the logs.


5 months ago

This ticket was mentioned in Slack in #core-performance by mukeshpanchal27. View the logs.


5 months ago

#48 @flixos90
5 months ago

  • Keywords 2nd-opinion added; has-patch has-unit-tests changes-requested removed

Still looking for feedback here on the thoughts I shared in 39.

This ticket was mentioned in Slack in #core-performance by mukeshpanchal27. View the logs.


3 months ago

This ticket was mentioned in Slack in #core-performance by mukeshpanchal27. View the logs.


3 months ago

This ticket was mentioned in Slack in #core-performance by mukeshpanchal27. View the logs.


2 months ago

#52 @swissspidy
2 months ago

  • Milestone changed from 6.5 to 6.6

This ticket was mentioned in Slack in #core-performance by mukeshpanchal27. View the logs.


5 weeks ago

#54 @mukesh27
5 weeks ago

  • Milestone changed from 6.6 to Future Release

This ticket was discussed during the Performance bug scrub.

Moving to Future release as Felix is still waiting for feedback per 39.

At any time, we can reassign it to a milestone.

Note: See TracTickets for help on using tickets.