Opened 5 months ago
Last modified 3 months ago
#64552 new defect (bug)
wp_trim_words fails with certain charsets in the Excerpt
| Reported by: |
|
Owned by: | |
|---|---|---|---|
| Milestone: | Future Release | Priority: | normal |
| Severity: | normal | Version: | 3.0 |
| Component: | Formatting | Keywords: | has-test-info has-patch has-unit-tests |
| Focuses: | Cc: |
Description (last modified by )
Bug Report
Following the GB74925 we have noticed that the excerpts were not being adapted accordingly in the Editor. It was also noticed that the current regex trim was not applying for certain more edge character types of cases like
Environment
- WordPress: 7.0-alpha-61215-src
- PHP: 8.2.29
- Server: nginx/1.29.4
- Database: mysqli (Server: 8.4.7 / Client: mysqlnd 8.2.29)
- Browser: Chrome 144.0.0.0
- OS: Windows 10/11
- Theme: Twenty Twenty-Three 1.6
- MU Plugins: None activated
- Plugins:
- Gutenberg 22.4.1
Testing Instructions
- Ideally use the 2023 theme
- Create a new post and add the following excerpt:
Fabio vel iudice vincam, sunt in culpa qui officia. Inmensae subtilitatis, obscuris et malesuada fames. Ambitioni dedisse scripsisse iudicaretur. Nec dubitamus multa iter quae et nos invenerat. Petierunt uti sibi concilium totius Galliae in diem certam indicere.
- Go to Appearance > Editor
- Click on the canvas
- Select the excerpt of the newly created Post
- Adjust the Excerpt Max Number of Words (you need Gutenberg trunk to test this; if you don't use trunk, it will not work either in the Editor).
- Set, for example, 30 words, and Save
- Go to the front end.
- 🐞 The number of words is not the same as in the editor.
Expected Behaviour
- The number of words of the excerpt should be the same in the Editor and in the front end.
Additional Information
Apparently, it appears that the wp_trim_words is not trimming all types of spacing characters accordingly; hence, the word count is inconsistent. @wildworks proposed that the /[\n\r\t ]+/ must be fixed for this purpose.
Change History (3)
This ticket was mentioned in PR #11258 on WordPress/wordpress-develop by @alexodiy.
3 months ago
#3
- Keywords has-patch has-unit-tests added; needs-patch removed
This updates the word boundary handling on UTF-8 sites to use a Unicode-aware whitespace pattern, so ideographic spaces (U+3000), non-breaking spaces (U+00A0), and other Unicode whitespace characters are treated as word separators. This matches the behavior already used in the Gutenberg editor.
For non-UTF-8 sites, the previous regex is kept as a fallback.
Props sirlouen, wildworks.
Fixes #64552.
Pushing back to Future Release, as we don't really have a patch for it yet.