Make WordPress Core


Ignore:
Timestamp:
07/31/2024 04:54:23 PM (16 months ago)
Author:
dmsnell
Message:

HTML API: Introduce full parsing mode in HTML Processor.

The HTML Processor has only supported a specific kind of parsing mode
called _the fragment parsing mode_, where it behaves in the same way
that node.innerHTML = html does in the DOM. This mode assumes a
context node and doesn't support parsing an entire document.

As part of work to add more spec support to the HTML API, this patch
introduces a full parsing mode, which can parse a full HTML document
from start to end, including the doctype declaration and head tags.

Developed in https://github.com/wordpress/wordpress-develop/pull/6977
Discussed in https://core.trac.wordpress.org/ticket/61576

Props: dmsnell, jonsurrell.
See #61576.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/src/wp-includes/html-api/class-wp-html-processor-state.php

    r58779 r58836  
    430430
    431431    /**
     432     * The recognized encoding of the input byte stream.
     433     *
     434     * > The stream of code points that comprises the input to the tokenization
     435     * > stage will be initially seen by the user agent as a stream of bytes
     436     * > (typically coming over the network or from the local file system).
     437     * > The bytes encode the actual characters according to a particular character
     438     * > encoding, which the user agent uses to decode the bytes into characters.
     439     *
     440     * @since 6.7.0
     441     *
     442     * @var string|null
     443     */
     444    public $encoding = null;
     445
     446    /**
     447     * The parser's confidence in the input encoding.
     448     *
     449     * > When the HTML parser is decoding an input byte stream, it uses a character
     450     * > encoding and a confidence. The confidence is either tentative, certain, or
     451     * > irrelevant. The encoding used, and whether the confidence in that encoding
     452     * > is tentative or certain, is used during the parsing to determine whether to
     453     * > change the encoding. If no encoding is necessary, e.g. because the parser is
     454     * > operating on a Unicode stream and doesn't have to use a character encoding
     455     * > at all, then the confidence is irrelevant.
     456     *
     457     * @since 6.7.0
     458     *
     459     * @var string
     460     */
     461    public $encoding_confidence = 'tentative';
     462
     463    /**
    432464     * HEAD element pointer.
    433465     *
Note: See TracChangeset for help on using the changeset viewer.