Make WordPress Core


Ignore:
Timestamp:
09/23/2025 03:34:20 AM (6 months ago)
Author:
dmsnell
Message:

Charset: Improve UTF-8 scrubbing ability via new UTF-8 scanning pipeline.

This is the fourth in a series of patches to modernize and standardize UTF-8 handling.

wp_check_invalid_utf8() has long been dependent on the runtime configuration of the system running it. This has led to hard-to-diagnose issues with text containing invalid UTF-8. The function has also had an apparent defect since its inception: when requesting to strip invalid bytes it returns an empty string.

This patch updates the function to remove all dependency on the system running it. It defers to the mbstring extension if that’s available, falling back to the new UTF-8 scanning pipeline.

To support this work, wp_scrub_utf8() is created with a proper fallback so that the remaining logic inside of wp_check_invalid_utf8() can be minimized. The defect in this function has been fixed, but instead of stripping the invalid bytes it will replace them with the Unicode replacement character for stronger security guarantees.

Developed in https://github.com/WordPress/wordpress-develop/pull/9498
Discussed in https://core.trac.wordpress.org/ticket/63837

Follow-up to: [60768].
Props askapache, chriscct7, Cyrille37, desrosj, dmsnell, helen, jonsurrell, kitchin, miqrogroove, pbearne, shailu25.
Fixes #63837, #29717.
See #63863.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/src/wp-settings.php

    r60743 r60793  
    112112require ABSPATH . WPINC . '/class-wp-list-util.php';
    113113require ABSPATH . WPINC . '/class-wp-token-map.php';
     114require ABSPATH . WPINC . '/utf8.php';
    114115require ABSPATH . WPINC . '/formatting.php';
    115116require ABSPATH . WPINC . '/meta.php';
Note: See TracChangeset for help on using the changeset viewer.