Make WordPress Core


Ignore:
Timestamp:
07/21/2022 09:09:56 PM (23 months ago)
Author:
audrasjb
Message:

Formatting: Normalize to Unicode NFC encoding before converting accent characters in remove_accents().

This changeset adds Unicode sequence normalization from NFD to NFC, via the normalizer_normalize() PHP function which is available with the recommended intl PHP extension.

This fixes an issue where NFD characters were not properly sanitized. It also provides a unit test for NFD sequences (alternate Unicode representations of the same characters).

Props NumidWasNotAvailable, targz, nacin, nunomorgadinho, p_enrique, gitlost, SergeyBiryukov, markoheijnen, mikeschroder, ocean90, pento, helen, rodrigosevero, zodiac1978, ironprogrammer, audrasjb, azaozz, laboiteare, nuryko, virgar, dxd5001, onnimonni, johnbillion.
Fixes #24661, #47763, #35951.
See #30130, #52654.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/tests/phpunit/tests/formatting/removeAccents.php

    r53562 r53754  
    1010    public function test_remove_accents_simple() {
    1111        $this->assertSame( 'abcdefghijkl', remove_accents( 'abcdefghijkl' ) );
     12    }
     13
     14    /**
     15     * @ticket 24661
     16     *
     17     * Tests Unicode sequence normalization from NFD (Normalization Form Decomposed)
     18     * to NFC (Normalization Form [Pre]Composed), the encoding used in `remove_accents()`.
     19     *
     20     * For more information on Unicode normalization, see
     21     * https://unicode.org/faq/normalization.html.
     22     *
     23     * @requires extension intl
     24     */
     25    public function test_remove_accents_latin1_supplement_nfd_encoding() {
     26        $input  = 'ªºÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõöøùúûüýþÿ';
     27        $output = 'aoAAAAAAAECEEEEIIIIDNOOOOOOUUUUYTHsaaaaaaaeceeeeiiiidnoooooouuuuythy';
     28
     29        $this->assertSame( $output, remove_accents( $input ), 'remove_accents replaces Latin-1 Supplement with NFD encoding' );
    1230    }
    1331
Note: See TracChangeset for help on using the changeset viewer.