Make WordPress Core

Opened 2 years ago

#56656 new defect (bug)

Move accent removal from `sanitize_title_with_dashes()` to `remove_accents()`

Reported by: anrghg's profile anrghg Owned by:
Milestone: Awaiting Review Priority: normal
Severity: major Version:
Component: Formatting Keywords:
Focuses: Cc:

Description

When sanitize_title() attempts to remove accents, it fails, because neither means does work as expected:

  1. It calls remove_accents(), but this only converts a handful symbols and a set of precomposed Latin letters to Latin base letters.
  2. One of the filters (in default-filters.php) it applies calls back sanitize_title_with_dashes(), but this only removes five accents for the matter (plus two spacing acutes). Full list of removed combining diacritics: U+0301, U+0341, U+0300, U+0304, U+030C.

So, when a title contains a combining tilde, this gets into the slug. Example:

Title: Eñe [U+0045 U+006E U+0303 U+0065]
Slug: Display: eñe/
Slug: Encoded: en%cc%83e/

This problem was mentioned in #56530.

The proposed solution is to fix sanitize_title_with_dashes() as suggested in #56531, and to fix remove_accents() by adding:

<?php
$string = preg_replace( '/\p{M}/u', '', $string );

Change History (0)

Note: See TracTickets for help on using tickets.