Make WordPress Core


Ignore:
Timestamp:
08/12/2025 06:13:48 PM (9 months ago)
Author:
dmsnell
Message:

Add wp_is_valid_utf8() for normalizing UTF-8 checks.

There are several existing mechanisms in Core to determine if a given string contains valid UTF-8 bytes or not. These are spread out and depend on which extensions are installed on the running system and what is set for blog_charset. The seems_utf8() function is one of these mechanisms.

seems_utf8() does not properly validate UTF-8, unfortunately, and is slow, and the purpose of the function is veiled behind its name and historic legacy.

This patch deprecates seems_utf() and introduces wp_is_valid_utf8(); a new, spec-compliant, efficient, and focused UTF-8 validator. This new validator defers to mb_check_encoding() where present, otherwise validating with a pure-PHP implementation. This makes the spec-compliant validator available on all systems regardless of their runtime environment.

Developed in https://github.com/WordPress/wordpress-develop/pull/9317
Discussed in https://core.trac.wordpress.org/ticket/38044

Props dmsnell, jonsurrell, jorbin.
Fixes #38044.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/src/wp-admin/includes/image.php

    r60475 r60630  
    10401040
    10411041    foreach ( array( 'title', 'caption', 'credit', 'copyright', 'camera', 'iso' ) as $key ) {
    1042         if ( $meta[ $key ] && ! seems_utf8( $meta[ $key ] ) ) {
     1042        if ( $meta[ $key ] && ! wp_is_valid_utf8( $meta[ $key ] ) ) {
    10431043            $meta[ $key ] = utf8_encode( $meta[ $key ] );
    10441044        }
     
    10461046
    10471047    foreach ( $meta['keywords'] as $key => $keyword ) {
    1048         if ( ! seems_utf8( $keyword ) ) {
     1048        if ( ! wp_is_valid_utf8( $keyword ) ) {
    10491049            $meta['keywords'][ $key ] = utf8_encode( $keyword );
    10501050        }
Note: See TracChangeset for help on using the changeset viewer.