Make WordPress Core

Changes between Initial Version and Version 2 of Ticket #64944


Ignore:
Timestamp:
03/25/2026 11:22:00 PM (3 months ago)
Author:
sabernhardt
Comment:

(This can happen with blocks since WordPress 5.0, and it was possible earlier when adding br tags manually within the Code/Text view of the classic editor.)

Legend:

Unmodified
Added
Removed
Modified
  • Ticket #64944

    • Property Keywords has-unit-tests added
    • Property Version changed from 6.9 to
    • Property Component changed from General to Formatting
  • Ticket #64944 – Description

    initial v2  
    1 When using the verse block (or similarly paragraph block and "shft + enter" spacing), `<br>`s are added in the block's content between lines. When WordPress generates an excerpt (when no custom excerpt is set), these `<br>`s are stripped along with other html tags. This often creates excerpts with missing spaces between words.
     1When using the Verse block (or similarly Paragraph block and `shift + enter` spacing), `<br>`s are added in the block's content between lines. When WordPress generates an excerpt (when no custom excerpt is set), these `<br>`s are stripped along with other HTML tags. This often creates excerpts with missing spaces between words.
    22
    33Consider the common poetry formatting where multiple lines exist in a paragraph block to represent a stanza.
    44
    55On the code side of the editor this looks like:
    6 ```
     6{{{
    77<!-- wp:verse -->
    88<pre class="wp-block-verse">this is<br>a verse block<br>it has<br>the same issues</pre>
    99<!-- /wp:verse -->
    10 ```
     10}}}
    1111or
    12 ```<!-- wp:paragraph -->
     12{{{
     13<!-- wp:paragraph -->
    1314<p>This is a poem<br>using shft+space<br>Inside a paragraph block<br>for good stanza formatting</p>
    14 <!-- /wp:paragraph -->```
     15<!-- /wp:paragraph -->
     16}}}
    1517
    1618When WP generates an excerpt based off the post content this ends up as:
     
    1921"This is a poemusing shift+spaceInside a paragraph blockfor good stanza formatting."
    2022
    21 This shows up often in excerpts generated from content corresponding to poety, song lyrics, or other similar formats. When excerpts are used in any context (post previews, email subject descriptions, etc.) these missing white spaces obviously look horrible.
     23This shows up often in excerpts generated from content corresponding to poetry, song lyrics, or other similar formats. When excerpts are used in any context (post previews, email subject descriptions, etc.) these missing white spaces obviously look horrible.
    2224
    23 ### To Reproduce (recently tested in WP Playground on 6.9):
    24 * Create a new post using a paragraph of verse block. For the verse block, standard "enter" to add new lines will repro the issue. For the paragraph block, "shft + enter" to create new lines within the block.
     25=== To Reproduce (recently tested in WP Playground on 6.9):
     26
     27* Create a new post using a Paragraph or Verse block. For the Verse block, standard `enter` to add new lines will repro the issue. For the Paragraph block, `shift + enter` to create new lines within the block.
    2528* Do not create a custom excerpt.
    2629* Publish the post.
    27 * Run get_the_excerpt for the post.
     30* Run `get_the_excerpt` for the post.
    2831* Verify that there are no spaces between the last words of one line and first words of the next.
    2932
    3033
    31 ### How to fix?
     34=== How to fix?
     35
    3236I am uncertain on the best approach to resolve some notes:
    3337
    3438`wp_trim_excerpt` - calls get_the_content when no excerpt text is passed to it. Later calls `wp_trim_words`
    3539
    36 `wp_trim_words` - calls `wp_strip_all_tags` and later creates a `$words_array` using preg_split on the "/[\n\r\t ]+/" pattern.
     40`wp_trim_words` - calls `wp_strip_all_tags` and later creates a `$words_array` using `preg_split` on the `"/[\n\r\t ]+/"` pattern.
    3741
    38 `wp_strip_all_tags` - strips all the tags in a preg_replace. Later, if $remove_breaks is true, replaces '/[\r\n\t ]+/' patterns with spaces. In the current chain in this context $remove_breaks is false so this doesn't happen here, and the preg_split noted above in `wp_trim_words` will find these.
     42`wp_strip_all_tags` - strips all the tags in a `preg_replace`. Later, if `$remove_breaks` is true, replaces `'/[\r\n\t ]+/'` patterns with spaces. In the current chain in this context `$remove_breaks` is false so this doesn't happen here, and the `preg_split` noted above in `wp_trim_words` will find these.
    3943
    40 One thought, if `wp_strip_all_tags` similarly considered `<br>`s in the $remove_breaks block AND moved this handling before the preg_replace that strips tags, that seems like potentially a general improvement. If the goal is to replace breaks with spaces, then `<br>`s should be considered there. However, we don't call $remove breaks in our context coming from `wp_trim_words` and it may not make sense to add that there.
     44One thought, if `wp_strip_all_tags` similarly considered `<br>`s in the `$remove_breaks` block AND moved this handling before the `preg_replace` that strips tags, that seems like potentially a general improvement. If the goal is to replace breaks with spaces, then `<br>`s should be considered there. However, we don't call `$remove_breaks` in our context coming from `wp_trim_words` and it may not make sense to add that there.
    4145
    42 Another thought, would it make sense for `wp_trim_words` to replace `<br>`s with spaces before calling `wp_strip_all_tags` ? Those spaces would then be caught by the pattern in the preg_split creating the $words_array.
     46Another thought, would it make sense for `wp_trim_words` to replace `<br>`s with spaces before calling `wp_strip_all_tags` ? Those spaces would then be caught by the pattern in the `preg_split` creating the `$words_array`.
    4347
    4448I am attaching a diff for the latter. `<br>` tags are stripped without preserving spacing, causing words to concatenate (e.g., ‘thisexample’). This replaces `<br>` with a space before tag stripping to preserve word boundaries.