Make WordPress Core


Ignore:
Timestamp:
06/02/2024 03:14:35 PM (21 months ago)
Author:
dmsnell
Message:

HTML API: Add custom text decoder.

Provides a custom decoder for strings coming from HTML attributes and
markup. This custom decoder is necessary because of deficiencies in
PHP's html_entity_decode() function:

  • It isn't aware of 720 of the possible named character references in HTML, leaving many out that should be translated.
  • It isn't aware of the ambiguous ampersand rule, which allows conversion of character references in certain contexts when they are missing their closing ;.
  • It doesn't draw a distinction for the ambiguous ampersand rule when decoding attribute values instead of markup values.
  • Use of html_entity_decode() requires manually passing non-default paramter values to ensure it decodes properly.

This decoder also provides some conveniences, such as making a
single-pass and interruptable decode operation possible. This will
provide a number of opportunities to optimize detection and decoding
of things like value prefixes, and whether a value contains a given
substring.

Developed in https://github.com/WordPress/wordpress-develop/pull/6387
Discussed in https://core.trac.wordpress.org/ticket/61072

Props dmsnell, gziolo, jonsurrell, jorbin, westonruter, zieladam.
Fixes #61072.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/src/wp-includes/html-api/class-wp-html-tag-processor.php

    r58233 r58281  
    1616 *    This would increase the size of the changes for some operations but leave more
    1717 *    natural-looking output HTML.
    18  *  - Properly decode HTML character references in `get_attribute()`. PHP's
    19  *    `html_entity_decode()` is wrong in a couple ways: it doesn't account for the
    20  *    no-ambiguous-ampersand rule, and it improperly handles the way semicolons may
    21  *    or may not terminate a character reference.
    2218 *
    2319 * @package WordPress
     
    25002496         */
    25012497        $enqueued_value = substr( $enqueued_text, $equals_at + 2, -1 );
    2502         return html_entity_decode( $enqueued_value );
     2498        return WP_HTML_Decoder::decode_attribute( $enqueued_value );
    25032499    }
    25042500
     
    25732569        $raw_value = substr( $this->html, $attribute->value_starts_at, $attribute->value_length );
    25742570
    2575         return html_entity_decode( $raw_value );
     2571        return WP_HTML_Decoder::decode_attribute( $raw_value );
    25762572    }
    25772573
     
    28732869        }
    28742870
    2875         $decoded = html_entity_decode( $text, ENT_QUOTES | ENT_HTML5 | ENT_SUBSTITUTE );
     2871        $decoded = WP_HTML_Decoder::decode_text_node( $text );
    28762872
    28772873        /*
Note: See TracChangeset for help on using the changeset viewer.