Make WordPress Core


Ignore:
Timestamp:
07/21/2022 09:09:56 PM (3 years ago)
Author:
audrasjb
Message:

Formatting: Normalize to Unicode NFC encoding before converting accent characters in remove_accents().

This changeset adds Unicode sequence normalization from NFD to NFC, via the normalizer_normalize() PHP function which is available with the recommended intl PHP extension.

This fixes an issue where NFD characters were not properly sanitized. It also provides a unit test for NFD sequences (alternate Unicode representations of the same characters).

Props NumidWasNotAvailable, targz, nacin, nunomorgadinho, p_enrique, gitlost, SergeyBiryukov, markoheijnen, mikeschroder, ocean90, pento, helen, rodrigosevero, zodiac1978, ironprogrammer, audrasjb, azaozz, laboiteare, nuryko, virgar, dxd5001, onnimonni, johnbillion.
Fixes #24661, #47763, #35951.
See #30130, #52654.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/src/wp-includes/formatting.php

    r53455 r53754  
    15851585 * @since 5.7.0 Added locale support for `de_AT`.
    15861586 * @since 6.0.0 Added the `$locale` parameter.
     1587 * @since 6.1.0 Added Unicode NFC encoding normalization support.
    15871588 *
    15881589 * @param string $string Text that might have accent characters.
     
    15981599
    15991600    if ( seems_utf8( $string ) ) {
     1601
     1602        // Unicode sequence normalization from NFD (Normalization Form Decomposed)
     1603        // to NFC (Normalization Form [Pre]Composed), the encoding used in this function.
     1604        if ( function_exists( 'normalizer_normalize' ) ) {
     1605            if ( ! normalizer_is_normalized( $string, Normalizer::FORM_C ) ) {
     1606                $string = normalizer_normalize( $string, Normalizer::FORM_C );
     1607            }
     1608        }
     1609
    16001610        $chars = array(
    16011611            // Decompositions for Latin-1 Supplement.
Note: See TracChangeset for help on using the changeset viewer.