HTML API: Add custom text decoder.
Provides a custom decoder for strings coming from HTML attributes and
markup. This custom decoder is necessary because of deficiencies in
PHP's html_entity_decode() function:
- It isn't aware of 720 of the possible named character references in
HTML, leaving many out that should be translated.
- It isn't aware of the ambiguous ampersand rule, which allows
conversion of character references in certain contexts when they
are missing their closing
;.
- It doesn't draw a distinction for the ambiguous ampersand rule
when decoding attribute values instead of markup values.
- Use of
html_entity_decode() requires manually passing non-default
paramter values to ensure it decodes properly.
This decoder also provides some conveniences, such as making a
single-pass and interruptable decode operation possible. This will
provide a number of opportunities to optimize detection and decoding
of things like value prefixes, and whether a value contains a given
substring.
Developed in https://github.com/WordPress/wordpress-develop/pull/6387
Discussed in https://core.trac.wordpress.org/ticket/61072
Props dmsnell, gziolo, jonsurrell, jorbin, westonruter, zieladam.
Fixes #61072.