Make WordPress Core


Ignore:
Timestamp:
12/20/2023 05:50:04 PM (11 months ago)
Author:
Bernhard Reiter
Message:

HTML API: Avoid processing incomplete tokens.

Currently the Tag Processor assumes that an input document is a full HTML document. Because of this, if there's lingering content after the last tag match it will treat that content as plaintext and skip over it. This is fine for the Tag Processor because if there is lingering content that isn't a valid tag then there's nothing for next_tag() to match.

However, in order to support a number of feature expansions it is important to recognize that the remaining content may involve partial syntax elements, such as incomplete tags, attributes, or comments.

In this patch we're adding a mode inside the Tag Processor which will flip when we start parsing HTML syntax but the document finishes before the token does. This will provide the ability to:

  • extend the input document,
  • avoid misinterpreting syntax as text, and
  • guess if we have a complete document, know if we have an incomplete document.

In the process of building this patch a few fixes were identified and fixed in the Tag Processor, namely in the handling of incomplete syntax elements.

Props dmsnell, jonsurrell.
Fixes #60122, #60108.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/src/wp-includes/html-api/class-wp-html-tag-processor.php

    r57179 r57211  
    1616 *    This would increase the size of the changes for some operations but leave more
    1717 *    natural-looking output HTML.
    18  *  - Decode HTML character references within class names when matching. E.g. match having
    19  *    class `1<"2` needs to recognize `class="1&lt;&quot;2"`. Currently the Tag Processor
    20  *    will fail to find the right tag if the class name is encoded as such.
    2118 *  - Properly decode HTML character references in `get_attribute()`. PHP's
    2219 *    `html_entity_decode()` is wrong in a couple ways: it doesn't account for the
     
    107104 * given, it will return `true` (the only way to set `false` for an
    108105 * attribute is to remove it).
     106 *
     107 * #### When matching fails
     108 *
     109 * When `next_tag()` returns `false` it could mean different things:
     110 *
     111 *  - The requested tag wasn't found in the input document.
     112 *  - The input document ended in the middle of an HTML syntax element.
     113 *
     114 * When a document ends in the middle of a syntax element it will pause
     115 * the processor. This is to make it possible in the future to extend the
     116 * input document and proceed - an important requirement for chunked
     117 * streaming parsing of a document.
     118 *
     119 * Example:
     120 *
     121 *     $processor = new WP_HTML_Tag_Processor( 'This <div is="a" partial="token' );
     122 *     false === $processor->next_tag();
     123 *
     124 * If a special element (see next section) is encountered but no closing tag
     125 * is found it will count as an incomplete tag. The parser will pause as if
     126 * the opening tag were incomplete.
     127 *
     128 * Example:
     129 *
     130 *     $processor = new WP_HTML_Tag_Processor( '<style>// there could be more styling to come' );
     131 *     false === $processor->next_tag();
     132 *
     133 *     $processor = new WP_HTML_Tag_Processor( '<style>// this is everything</style><div>' );
     134 *     true === $processor->next_tag( 'DIV' );
     135 *
     136 * #### Special elements
     137 *
     138 * Some HTML elements are handled in a special way; their start and end tags
     139 * act like a void tag. These are special because their contents can't contain
     140 * HTML markup. Everything inside these elements is handled in a special way
     141 * and content that _appears_ like HTML tags inside of them isn't. There can
     142 * be no nesting in these elements.
     143 *
     144 * In the following list, "raw text" means that all of the content in the HTML
     145 * until the matching closing tag is treated verbatim without any replacements
     146 * and without any parsing.
     147 *
     148 *  - IFRAME allows no content but requires a closing tag.
     149 *  - NOEMBED (deprecated) content is raw text.
     150 *  - NOFRAMES (deprecated) content is raw text.
     151 *  - SCRIPT content is plaintext apart from legacy rules allowing `</script>` inside an HTML comment.
     152 *  - STYLE content is raw text.
     153 *  - TITLE content is plain text but character references are decoded.
     154 *  - TEXTAREA content is plain text but character references are decoded.
     155 *  - XMP (deprecated) content is raw text.
    109156 *
    110157 * ### Modifying HTML attributes for a found tag
     
    242289 * unquoted values will appear in the output with double-quotes.
    243290 *
     291 * ### Scripting Flag
     292 *
     293 * The Tag Processor parses HTML with the "scripting flag" disabled. This means
     294 * that it doesn't run any scripts while parsing the page. In a browser with
     295 * JavaScript enabled, for example, the script can change the parse of the
     296 * document as it loads. On the server, however, evaluating JavaScript is not
     297 * only impractical, but also unwanted.
     298 *
     299 * Practically this means that the Tag Processor will descend into NOSCRIPT
     300 * elements and process its child tags. Were the scripting flag enabled, such
     301 * as in a typical browser, the contents of NOSCRIPT are skipped entirely.
     302 *
     303 * This allows the HTML API to process the content that will be presented in
     304 * a browser when scripting is disabled, but it offers a different view of a
     305 * page than most browser sessions will experience. E.g. the tags inside the
     306 * NOSCRIPT disappear.
     307 *
     308 * ### Text Encoding
     309 *
     310 * The Tag Processor assumes that the input HTML document is encoded with a
     311 * text encoding compatible with 7-bit ASCII's '<', '>', '&', ';', '/', '=',
     312 * "'", '"', 'a' - 'z', 'A' - 'Z', and the whitespace characters ' ', tab,
     313 * carriage-return, newline, and form-feed.
     314 *
     315 * In practice, this includes almost every single-byte encoding as well as
     316 * UTF-8. Notably, however, it does not include UTF-16. If providing input
     317 * that's incompatible, then convert the encoding beforehand.
     318 *
    244319 * @since 6.2.0
    245320 * @since 6.2.1 Fix: Support for various invalid comments; attribute updates are case-insensitive.
    246321 * @since 6.3.2 Fix: Skip HTML-like content inside rawtext elements such as STYLE.
     322 * @since 6.5.0 Pauses processor when input ends in an incomplete syntax token.
     323 *              Introduces "special" elements which act like void elements, e.g. STYLE.
    247324 */
    248325class WP_HTML_Tag_Processor {
     
    316393     */
    317394    private $stop_on_tag_closers;
     395
     396    /**
     397     * Specifies mode of operation of the parser at any given time.
     398     *
     399     * | State         | Meaning                                                              |
     400     * | --------------|----------------------------------------------------------------------|
     401     * | *Ready*       | The parser is ready to run.                                          |
     402     * | *Complete*    | There is nothing left to parse.                                      |
     403     * | *Incomplete*  | The HTML ended in the middle of a token; nothing more can be parsed. |
     404     * | *Matched tag* | Found an HTML tag; it's possible to modify its attributes.           |
     405     *
     406     * @since 6.5.0
     407     *
     408     * @see WP_HTML_Tag_Processor::STATE_READY
     409     * @see WP_HTML_Tag_Processor::STATE_COMPLETE
     410     * @see WP_HTML_Tag_Processor::STATE_INCOMPLETE
     411     * @see WP_HTML_Tag_Processor::STATE_MATCHED_TAG
     412     *
     413     * @var string
     414     */
     415    private $parser_state = self::STATE_READY;
    318416
    319417    /**
     
    545643     *
    546644     * @since 6.2.0
     645     * @since 6.5.0 No longer processes incomplete tokens at end of document; pauses the processor at start of token.
    547646     *
    548647     * @param array|string|null $query {
     
    563662
    564663        do {
    565             if ( $this->bytes_already_parsed >= strlen( $this->html ) ) {
     664            if ( false === $this->next_token() ) {
    566665                return false;
    567666            }
    568667
    569             // Find the next tag if it exists.
    570             if ( false === $this->parse_next_tag() ) {
    571                 $this->bytes_already_parsed = strlen( $this->html );
    572 
    573                 return false;
    574             }
    575 
    576             // Parse all of its attributes.
    577             while ( $this->parse_next_attribute() ) {
     668            if ( self::STATE_MATCHED_TAG !== $this->parser_state ) {
    578669                continue;
    579670            }
    580671
    581             // Ensure that the tag closes before the end of the document.
    582             if ( $this->bytes_already_parsed >= strlen( $this->html ) ) {
    583                 return false;
    584             }
    585 
    586             $tag_ends_at = strpos( $this->html, '>', $this->bytes_already_parsed );
    587             if ( false === $tag_ends_at ) {
    588                 return false;
    589             }
    590             $this->token_length         = $tag_ends_at - $this->token_starts_at;
    591             $this->bytes_already_parsed = $tag_ends_at;
    592 
    593             // Finally, check if the parsed tag and its attributes match the search query.
    594672            if ( $this->matches() ) {
    595673                ++$already_found;
    596674            }
    597 
    598             /*
    599              * For non-DATA sections which might contain text that looks like HTML tags but
    600              * isn't, scan with the appropriate alternative mode. Looking at the first letter
    601              * of the tag name as a pre-check avoids a string allocation when it's not needed.
    602              */
    603             $t = $this->html[ $this->tag_name_starts_at ];
    604             if (
    605                 ! $this->is_closing_tag &&
     675        } while ( $already_found < $this->sought_match_offset );
     676
     677        return true;
     678    }
     679
     680    /**
     681     * Finds the next token in the HTML document.
     682     *
     683     * An HTML document can be viewed as a stream of tokens,
     684     * where tokens are things like HTML tags, HTML comments,
     685     * text nodes, etc. This method finds the next token in
     686     * the HTML document and returns whether it found one.
     687     *
     688     * If it starts parsing a token and reaches the end of the
     689     * document then it will seek to the start of the last
     690     * token and pause, returning `false` to indicate that it
     691     * failed to find a complete token.
     692     *
     693     * Possible token types, based on the HTML specification:
     694     *
     695     *  - an HTML tag, whether opening, closing, or void.
     696     *  - a text node - the plaintext inside tags.
     697     *  - an HTML comment.
     698     *  - a DOCTYPE declaration.
     699     *  - a processing instruction, e.g. `<?xml version="1.0" ?>`.
     700     *
     701     * The Tag Processor currently only supports the tag token.
     702     *
     703     * @since 6.5.0
     704     *
     705     * @return bool Whether a token was parsed.
     706     */
     707    public function next_token() {
     708        $this->get_updated_html();
     709        $was_at = $this->bytes_already_parsed;
     710
     711        // Don't proceed if there's nothing more to scan.
     712        if (
     713            self::STATE_COMPLETE === $this->parser_state ||
     714            self::STATE_INCOMPLETE === $this->parser_state
     715        ) {
     716            return false;
     717        }
     718
     719        /*
     720         * The next step in the parsing loop determines the parsing state;
     721         * clear it so that state doesn't linger from the previous step.
     722         */
     723        $this->parser_state = self::STATE_READY;
     724
     725        if ( $this->bytes_already_parsed >= strlen( $this->html ) ) {
     726            $this->parser_state = self::STATE_COMPLETE;
     727            return false;
     728        }
     729
     730        // Find the next tag if it exists.
     731        if ( false === $this->parse_next_tag() ) {
     732            if ( self::STATE_INCOMPLETE === $this->parser_state ) {
     733                $this->bytes_already_parsed = $was_at;
     734            }
     735
     736            return false;
     737        }
     738
     739        // Parse all of its attributes.
     740        while ( $this->parse_next_attribute() ) {
     741            continue;
     742        }
     743
     744        // Ensure that the tag closes before the end of the document.
     745        if (
     746            self::STATE_INCOMPLETE === $this->parser_state ||
     747            $this->bytes_already_parsed >= strlen( $this->html )
     748        ) {
     749            // Does this appropriately clear state (parsed attributes)?
     750            $this->parser_state         = self::STATE_INCOMPLETE;
     751            $this->bytes_already_parsed = $was_at;
     752
     753            return false;
     754        }
     755
     756        $tag_ends_at = strpos( $this->html, '>', $this->bytes_already_parsed );
     757        if ( false === $tag_ends_at ) {
     758            $this->parser_state         = self::STATE_INCOMPLETE;
     759            $this->bytes_already_parsed = $was_at;
     760
     761            return false;
     762        }
     763        $this->parser_state         = self::STATE_MATCHED_TAG;
     764        $this->token_length         = $tag_ends_at - $this->token_starts_at;
     765        $this->bytes_already_parsed = $tag_ends_at;
     766
     767        /*
     768         * For non-DATA sections which might contain text that looks like HTML tags but
     769         * isn't, scan with the appropriate alternative mode. Looking at the first letter
     770         * of the tag name as a pre-check avoids a string allocation when it's not needed.
     771         */
     772        $t = $this->html[ $this->tag_name_starts_at ];
     773        if (
     774            ! $this->is_closing_tag &&
     775            (
     776                'i' === $t || 'I' === $t ||
     777                'n' === $t || 'N' === $t ||
     778                's' === $t || 'S' === $t ||
     779                't' === $t || 'T' === $t ||
     780                'x' === $t || 'X' === $t
     781            )
     782        ) {
     783            $tag_name = $this->get_tag();
     784
     785            if ( 'SCRIPT' === $tag_name && ! $this->skip_script_data() ) {
     786                $this->parser_state         = self::STATE_INCOMPLETE;
     787                $this->bytes_already_parsed = $was_at;
     788
     789                return false;
     790            } elseif (
     791                ( 'TEXTAREA' === $tag_name || 'TITLE' === $tag_name ) &&
     792                ! $this->skip_rcdata( $tag_name )
     793            ) {
     794                $this->parser_state         = self::STATE_INCOMPLETE;
     795                $this->bytes_already_parsed = $was_at;
     796
     797                return false;
     798            } elseif (
    606799                (
    607                     'i' === $t || 'I' === $t ||
    608                     'n' === $t || 'N' === $t ||
    609                     's' === $t || 'S' === $t ||
    610                     't' === $t || 'T' === $t
    611                 ) ) {
    612                 $tag_name = $this->get_tag();
    613 
    614                 if ( 'SCRIPT' === $tag_name && ! $this->skip_script_data() ) {
    615                     $this->bytes_already_parsed = strlen( $this->html );
    616                     return false;
    617                 } elseif (
    618                     ( 'TEXTAREA' === $tag_name || 'TITLE' === $tag_name ) &&
    619                     ! $this->skip_rcdata( $tag_name )
    620                 ) {
    621                     $this->bytes_already_parsed = strlen( $this->html );
    622                     return false;
    623                 } elseif (
    624                     (
    625                         'IFRAME' === $tag_name ||
    626                         'NOEMBED' === $tag_name ||
    627                         'NOFRAMES' === $tag_name ||
    628                         'NOSCRIPT' === $tag_name ||
    629                         'STYLE' === $tag_name
    630                     ) &&
    631                     ! $this->skip_rawtext( $tag_name )
    632                 ) {
    633                     /*
    634                      * "XMP" should be here too but its rules are more complicated and require the
    635                      * complexity of the HTML Processor (it needs to close out any open P element,
    636                      * meaning it can't be skipped here or else the HTML Processor will lose its
    637                      * place). For now, it can be ignored as it's a rare HTML tag in practice and
    638                      * any normative HTML should be using PRE instead.
    639                      */
    640                     $this->bytes_already_parsed = strlen( $this->html );
    641                     return false;
    642                 }
    643             }
    644         } while ( $already_found < $this->sought_match_offset );
     800                    'IFRAME' === $tag_name ||
     801                    'NOEMBED' === $tag_name ||
     802                    'NOFRAMES' === $tag_name ||
     803                    'STYLE' === $tag_name ||
     804                    'XMP' === $tag_name
     805                ) &&
     806                ! $this->skip_rawtext( $tag_name )
     807            ) {
     808                $this->parser_state         = self::STATE_INCOMPLETE;
     809                $this->bytes_already_parsed = $was_at;
     810
     811                return false;
     812            }
     813        }
    645814
    646815        return true;
    647816    }
    648817
     818    /**
     819     * Whether the processor paused because the input HTML document ended
     820     * in the middle of a syntax element, such as in the middle of a tag.
     821     *
     822     * Example:
     823     *
     824     *     $processor = new WP_HTML_Tag_Processor( '<input type="text" value="Th' );
     825     *     false      === $processor->get_next_tag();
     826     *     true       === $processor->paused_at_incomplete_token();
     827     *
     828     * @since 6.5.0
     829     *
     830     * @return bool Whether the parse paused at the start of an incomplete token.
     831     */
     832    public function paused_at_incomplete_token() {
     833        return self::STATE_INCOMPLETE === $this->parser_state;
     834    }
    649835
    650836    /**
     
    665851     */
    666852    public function class_list() {
     853        if ( self::STATE_MATCHED_TAG !== $this->parser_state ) {
     854            return;
     855        }
     856
    667857        /** @var string $class contains the string value of the class attribute, with character references decoded. */
    668858        $class = $this->get_attribute( 'class' );
     
    720910     */
    721911    public function has_class( $wanted_class ) {
    722         if ( ! $this->tag_name_starts_at ) {
     912        if ( self::STATE_MATCHED_TAG !== $this->parser_state ) {
    723913            return null;
    724914        }
     
    8171007     */
    8181008    public function set_bookmark( $name ) {
    819         if ( null === $this->tag_name_starts_at ) {
     1009        // It only makes sense to set a bookmark if the parser has paused on a concrete token.
     1010        if ( self::STATE_MATCHED_TAG !== $this->parser_state ) {
    8201011            return false;
    8211012        }
     
    8961087            // Fail if there is no possible tag closer.
    8971088            if ( false === $at || ( $at + $tag_length ) >= $doc_length ) {
    898                 $this->bytes_already_parsed = $doc_length;
    8991089                return false;
    9001090            }
     
    9231113            $at                        += $tag_length;
    9241114            $this->bytes_already_parsed = $at;
     1115
     1116            if ( $at >= strlen( $html ) ) {
     1117                return false;
     1118            }
    9251119
    9261120            /*
     
    10741268                }
    10751269
     1270                if ( $this->bytes_already_parsed >= $doc_length ) {
     1271                    $this->parser_state = self::STATE_INCOMPLETE;
     1272
     1273                    return false;
     1274                }
     1275
    10761276                if ( '>' === $html[ $this->bytes_already_parsed ] ) {
    10771277                    $this->bytes_already_parsed = $closer_potentially_starts_at;
     
    11081308        while ( false !== $at && $at < $doc_length ) {
    11091309            $at = strpos( $html, '<', $at );
     1310
     1311            /*
     1312             * This does not imply an incomplete parse; it indicates that there
     1313             * can be nothing left in the document other than a #text node.
     1314             */
    11101315            if ( false === $at ) {
    11111316                return false;
     
    11141319            $this->token_starts_at = $at;
    11151320
    1116             if ( '/' === $this->html[ $at + 1 ] ) {
     1321            if ( $at + 1 < $doc_length && '/' === $this->html[ $at + 1 ] ) {
    11171322                $this->is_closing_tag = true;
    11181323                ++$at;
     
    11481353             * the document. There is nothing left to parse.
    11491354             */
    1150             if ( $at + 1 >= strlen( $html ) ) {
     1355            if ( $at + 1 >= $doc_length ) {
     1356                $this->parser_state = self::STATE_INCOMPLETE;
     1357
    11511358                return false;
    11521359            }
     
    11621369                 */
    11631370                if (
    1164                     strlen( $html ) > $at + 3 &&
     1371                    $doc_length > $at + 3 &&
    11651372                    '-' === $html[ $at + 2 ] &&
    11661373                    '-' === $html[ $at + 3 ]
     
    11681375                    $closer_at = $at + 4;
    11691376                    // If it's not possible to close the comment then there is nothing more to scan.
    1170                     if ( strlen( $html ) <= $closer_at ) {
     1377                    if ( $doc_length <= $closer_at ) {
     1378                        $this->parser_state = self::STATE_INCOMPLETE;
     1379
    11711380                        return false;
    11721381                    }
     
    11861395                     */
    11871396                    --$closer_at; // Pre-increment inside condition below reduces risk of accidental infinite looping.
    1188                     while ( ++$closer_at < strlen( $html ) ) {
     1397                    while ( ++$closer_at < $doc_length ) {
    11891398                        $closer_at = strpos( $html, '--', $closer_at );
    11901399                        if ( false === $closer_at ) {
     1400                            $this->parser_state = self::STATE_INCOMPLETE;
     1401
    11911402                            return false;
    11921403                        }
    11931404
    1194                         if ( $closer_at + 2 < strlen( $html ) && '>' === $html[ $closer_at + 2 ] ) {
     1405                        if ( $closer_at + 2 < $doc_length && '>' === $html[ $closer_at + 2 ] ) {
    11951406                            $at = $closer_at + 3;
    11961407                            continue 2;
    11971408                        }
    11981409
    1199                         if ( $closer_at + 3 < strlen( $html ) && '!' === $html[ $closer_at + 2 ] && '>' === $html[ $closer_at + 3 ] ) {
     1410                        if ( $closer_at + 3 < $doc_length && '!' === $html[ $closer_at + 2 ] && '>' === $html[ $closer_at + 3 ] ) {
    12001411                            $at = $closer_at + 4;
    12011412                            continue 2;
     
    12101421                 */
    12111422                if (
    1212                     strlen( $html ) > $at + 8 &&
     1423                    $doc_length > $at + 8 &&
    12131424                    '[' === $html[ $at + 2 ] &&
    12141425                    'C' === $html[ $at + 3 ] &&
     
    12211432                    $closer_at = strpos( $html, ']]>', $at + 9 );
    12221433                    if ( false === $closer_at ) {
     1434                        $this->parser_state = self::STATE_INCOMPLETE;
     1435
    12231436                        return false;
    12241437                    }
     
    12341447                 */
    12351448                if (
    1236                     strlen( $html ) > $at + 8 &&
     1449                    $doc_length > $at + 8 &&
    12371450                    ( 'D' === $html[ $at + 2 ] || 'd' === $html[ $at + 2 ] ) &&
    12381451                    ( 'O' === $html[ $at + 3 ] || 'o' === $html[ $at + 3 ] ) &&
     
    12451458                    $closer_at = strpos( $html, '>', $at + 9 );
    12461459                    if ( false === $closer_at ) {
     1460                        $this->parser_state = self::STATE_INCOMPLETE;
     1461
    12471462                        return false;
    12481463                    }
     
    12541469                /*
    12551470                 * Anything else here is an incorrectly-opened comment and transitions
    1256                  * to the bogus comment state - skip to the nearest >.
     1471                 * to the bogus comment state - skip to the nearest >. If no closer is
     1472                 * found then the HTML was truncated inside the markup declaration.
    12571473                 */
    12581474                $at = strpos( $html, '>', $at + 1 );
     1475                if ( false === $at ) {
     1476                    $this->parser_state = self::STATE_INCOMPLETE;
     1477
     1478                    return false;
     1479                }
     1480
    12591481                continue;
    12601482            }
     
    12621484            /*
    12631485             * </> is a missing end tag name, which is ignored.
     1486             *
     1487             * This was also known as the "presumptuous empty tag"
     1488             * in early discussions as it was proposed to close
     1489             * the nearest previous opening tag.
    12641490             *
    12651491             * See https://html.spec.whatwg.org/#parse-error-missing-end-tag-name
     
    12771503                $closer_at = strpos( $html, '>', $at + 2 );
    12781504                if ( false === $closer_at ) {
     1505                    $this->parser_state = self::STATE_INCOMPLETE;
     1506
    12791507                    return false;
    12801508                }
     
    12911519             */
    12921520            if ( $this->is_closing_tag ) {
     1521                // No chance of finding a closer.
     1522                if ( $at + 3 > $doc_length ) {
     1523                    return false;
     1524                }
     1525
    12931526                $closer_at = strpos( $html, '>', $at + 3 );
    12941527                if ( false === $closer_at ) {
     1528                    $this->parser_state = self::STATE_INCOMPLETE;
     1529
    12951530                    return false;
    12961531                }
     
    13171552        $this->bytes_already_parsed += strspn( $this->html, " \t\f\r\n/", $this->bytes_already_parsed );
    13181553        if ( $this->bytes_already_parsed >= strlen( $this->html ) ) {
     1554            $this->parser_state = self::STATE_INCOMPLETE;
     1555
    13191556            return false;
    13201557        }
     
    13391576        $this->bytes_already_parsed += $name_length;
    13401577        if ( $this->bytes_already_parsed >= strlen( $this->html ) ) {
     1578            $this->parser_state = self::STATE_INCOMPLETE;
     1579
    13411580            return false;
    13421581        }
     
    13441583        $this->skip_whitespace();
    13451584        if ( $this->bytes_already_parsed >= strlen( $this->html ) ) {
     1585            $this->parser_state = self::STATE_INCOMPLETE;
     1586
    13461587            return false;
    13471588        }
     
    13521593            $this->skip_whitespace();
    13531594            if ( $this->bytes_already_parsed >= strlen( $this->html ) ) {
     1595                $this->parser_state = self::STATE_INCOMPLETE;
     1596
    13541597                return false;
    13551598            }
     
    13781621
    13791622        if ( $attribute_end >= strlen( $this->html ) ) {
     1623            $this->parser_state = self::STATE_INCOMPLETE;
     1624
    13801625            return false;
    13811626        }
     
    14441689     */
    14451690    private function after_tag() {
    1446         $this->get_updated_html();
    14471691        $this->token_starts_at      = null;
    14481692        $this->token_length         = null;
     
    17872031     */
    17882032    private function get_enqueued_attribute_value( $comparable_name ) {
     2033        if ( self::STATE_MATCHED_TAG !== $this->parser_state ) {
     2034            return false;
     2035        }
     2036
    17892037        if ( ! isset( $this->lexical_updates[ $comparable_name ] ) ) {
    17902038            return false;
     
    18542102     */
    18552103    public function get_attribute( $name ) {
    1856         if ( null === $this->tag_name_starts_at ) {
     2104        if ( self::STATE_MATCHED_TAG !== $this->parser_state ) {
    18572105            return null;
    18582106        }
     
    19342182     */
    19352183    public function get_attribute_names_with_prefix( $prefix ) {
    1936         if ( $this->is_closing_tag || null === $this->tag_name_starts_at ) {
     2184        if (
     2185            self::STATE_MATCHED_TAG !== $this->parser_state ||
     2186            $this->is_closing_tag
     2187        ) {
    19372188            return null;
    19382189        }
     
    19662217     */
    19672218    public function get_tag() {
    1968         if ( null === $this->tag_name_starts_at ) {
     2219        if ( self::STATE_MATCHED_TAG !== $this->parser_state ) {
    19692220            return null;
    19702221        }
     
    19932244     */
    19942245    public function has_self_closing_flag() {
    1995         if ( ! $this->tag_name_starts_at ) {
     2246        if ( self::STATE_MATCHED_TAG !== $this->parser_state ) {
    19962247            return false;
    19972248        }
     
    20252276     */
    20262277    public function is_tag_closer() {
    2027         return $this->is_closing_tag;
     2278        return (
     2279            self::STATE_MATCHED_TAG === $this->parser_state &&
     2280            $this->is_closing_tag
     2281        );
    20282282    }
    20292283
     
    20452299     */
    20462300    public function set_attribute( $name, $value ) {
    2047         if ( $this->is_closing_tag || null === $this->tag_name_starts_at ) {
     2301        if (
     2302            self::STATE_MATCHED_TAG !== $this->parser_state ||
     2303            $this->is_closing_tag
     2304        ) {
    20482305            return false;
    20492306        }
     
    21782435     */
    21792436    public function remove_attribute( $name ) {
    2180         if ( $this->is_closing_tag ) {
     2437        if (
     2438            self::STATE_MATCHED_TAG !== $this->parser_state ||
     2439            $this->is_closing_tag
     2440        ) {
    21812441            return false;
    21822442        }
     
    22552515     */
    22562516    public function add_class( $class_name ) {
    2257         if ( $this->is_closing_tag ) {
     2517        if (
     2518            self::STATE_MATCHED_TAG !== $this->parser_state ||
     2519            $this->is_closing_tag
     2520        ) {
    22582521            return false;
    22592522        }
    22602523
    2261         if ( null !== $this->tag_name_starts_at ) {
    2262             $this->classname_updates[ $class_name ] = self::ADD_CLASS;
    2263         }
     2524        $this->classname_updates[ $class_name ] = self::ADD_CLASS;
    22642525
    22652526        return true;
     
    22752536     */
    22762537    public function remove_class( $class_name ) {
    2277         if ( $this->is_closing_tag ) {
     2538        if (
     2539            self::STATE_MATCHED_TAG !== $this->parser_state ||
     2540            $this->is_closing_tag
     2541        ) {
    22782542            return false;
    22792543        }
     
    24812745        return true;
    24822746    }
     2747
     2748    /**
     2749     * Parser Ready State
     2750     *
     2751     * Indicates that the parser is ready to run and waiting for a state transition.
     2752     * It may not have started yet, or it may have just finished parsing a token and
     2753     * is ready to find the next one.
     2754     *
     2755     * @since 6.5.0
     2756     *
     2757     * @access private
     2758     */
     2759    const STATE_READY = 'STATE_READY';
     2760
     2761    /**
     2762     * Parser Complete State
     2763     *
     2764     * Indicates that the parser has reached the end of the document and there is
     2765     * nothing left to scan. It finished parsing the last token completely.
     2766     *
     2767     * @since 6.5.0
     2768     *
     2769     * @access private
     2770     */
     2771    const STATE_COMPLETE = 'STATE_COMPLETE';
     2772
     2773    /**
     2774     * Parser Incomplete State
     2775     *
     2776     * Indicates that the parser has reached the end of the document before finishing
     2777     * a token. It started parsing a token but there is a possibility that the input
     2778     * HTML document was truncated in the middle of a token.
     2779     *
     2780     * The parser is reset at the start of the incomplete token and has paused. There
     2781     * is nothing more than can be scanned unless provided a more complete document.
     2782     *
     2783     * @since 6.5.0
     2784     *
     2785     * @access private
     2786     */
     2787    const STATE_INCOMPLETE = 'STATE_INCOMPLETE';
     2788
     2789    /**
     2790     * Parser Matched Tag State
     2791     *
     2792     * Indicates that the parser has found an HTML tag and it's possible to get
     2793     * the tag name and read or modify its attributes (if it's not a closing tag).
     2794     *
     2795     * @since 6.5.0
     2796     *
     2797     * @access private
     2798     */
     2799    const STATE_MATCHED_TAG = 'STATE_MATCHED_TAG';
    24832800}
Note: See TracChangeset for help on using the changeset viewer.