Opened 2 years ago
#56656 new defect (bug)
Move accent removal from `sanitize_title_with_dashes()` to `remove_accents()`
Reported by: | anrghg | Owned by: | |
---|---|---|---|
Milestone: | Awaiting Review | Priority: | normal |
Severity: | major | Version: | |
Component: | Formatting | Keywords: | |
Focuses: | Cc: |
Description
When sanitize_title()
attempts to remove accents, it fails, because neither means does work as expected:
- It calls
remove_accents()
, but this only converts a handful symbols and a set of precomposed Latin letters to Latin base letters. - One of the filters (in
default-filters.php
) it applies calls backsanitize_title_with_dashes()
, but this only removes five accents for the matter (plus two spacing acutes). Full list of removed combining diacritics: U+0301, U+0341, U+0300, U+0304, U+030C.
So, when a title contains a combining tilde, this gets into the slug. Example:
Title: Eñe [U+0045 U+006E U+0303 U+0065]
Slug: Display: eñe/
Slug: Encoded: en%cc%83e/
This problem was mentioned in #56530.
The proposed solution is to fix sanitize_title_with_dashes()
as suggested in #56531, and to fix remove_accents()
by adding:
<?php $string = preg_replace( '/\p{M}/u', '', $string );
Note: See
TracTickets for help on using
tickets.