Make WordPress Core

Changes between Version 1 and Version 3 of Ticket #60698

05/09/2024 02:21:18 AM (7 weeks ago)


  • Ticket #60698 – Description

    v1 v3  
    114114In [ #6387] I have built a spec-compliant HTML text decoder which utilizes this token map. The performance of the new decoder is approximately 20% slower than calling `html_entity_decode()` directly, except it properly decodes what PHP can't. In fact, the performance bottleneck in that work is not even in the token map, but in converting a sequence of digits into a UTF-8 string from the given code point.
     116My proposal is adding a new class `WP_Token_Map` providing at least two methods for normal use:
     118 - `contains( $token )` returns whether the passed string is in the set.
     119 - `read_token( $text, $offset = 0, $skip_bytes )` indicates if the character sequence starting at the given offset in the passed string forms a token in the set, and if so, returns the replacement for that token. It also sets `&$skip_bytes` to the length of the token so that calling code .
     121It also provides utility functions for pre-computing these classes, as they are designed for relatively-static cases where the actual code is intended to be generated dynamically, but stay static over time. For example, HTML5 defines the set of named character references and indicates that the list //shall not// change or be expanded. [ HTML5 spec]. Precomputing can save on the startup-up cost of building the optimized lookup tables.
     123 - `static::from_array( array $mappings )` generates a new token map from the given array of whose keys are tokens and whose values are the replacements.
     124 - `to_array()` dumps the set of mapping into an array suitable for passing back into `from_array()`.
     125 - `static::from_precomputed_table( ...$table )` instantiates a token set from a precomputed table, skipping the computation for building the table and sorting the tokens.
     126 - `precomputed_php_source_table()` generates PHP source code which can be loaded with the previous static method for maintenance of the core static token sets.
    116128== Other potential uses