Make WordPress Core


Ignore:
Timestamp:
06/03/2024 07:45:57 PM (8 months ago)
Author:
dmsnell
Message:

HTML API: Report real and virtual nodes in the HTML Processor.

HTML is a kind of short-hand for a DOM structure. This means that there are
many cases in HTML where an element's opening tag or closing tag is missing (or
both). This is because many of the parsing rules imply creating elements in the
DOM which may not exist in the text of the HTML.

The HTML Processor, being the higher-level counterpart to the Tag Processor, is
already aware of these nodes, but since it's inception has not paused on them
when scanning through a document. Instead, these are visible when pausing on a
child of such an element, but otherwise not seen.

In this patch the HTML Processor starts exposing those implicitly-created nodes,
including opening tags, and closing tags, that aren't foudn in the text content
of the HTML input document.

Previously, the sequence of matched tokens when scanning with
WP_HTML_Processor::next_token() would depend on how the HTML document was written,
but with this patch, all semantically equal HTML documents will parse and scan in
the same exact manner, presenting an idealized or "perfect" view of the document
the same way as would occur when traversing a DOM in a browser.

Developed in https://github.com/WordPress/wordpress-develop/pull/6348
Discussed in https://core.trac.wordpress.org/ticket/61348

Props audrasjb, dmsnell, gziolo, jonsurrell.
Fixes #61348.

File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/tests/phpunit/tests/html-api/wpHtmlProcessorBreadcrumbs.php

    r57768 r58304  
    232232        $processor = WP_HTML_Processor::create_fragment( $html );
    233233
    234         while ( $processor->step() && null === $processor->get_attribute( 'supported' ) ) {
     234        while ( $processor->next_token() && null === $processor->get_attribute( 'supported' ) ) {
    235235            continue;
    236236        }
    237237
     238        $this->assertNull(
     239            $processor->get_last_error(),
     240            'Bailed on unsupported input before finding supported checkpoint: check test code.'
     241        );
     242
    238243        $this->assertTrue( $processor->get_attribute( 'supported' ), 'Did not find required supported element.' );
    239         $this->assertFalse( $processor->step(), "Didn't properly reject unsupported markup: {$description}" );
     244        $processor->next_token();
     245        $this->assertNotNull( $processor->get_last_error(), "Didn't properly reject unsupported markup: {$description}" );
    240246    }
    241247
     
    248254        return array(
    249255            'A with formatting following unclosed A' => array(
    250                 '<a><strong>Click <a supported><big unsupported>Here</big></a></strong></a>',
     256                '<a><strong>Click <span supported><a unsupported><big>Here</big></a></strong></a>',
    251257                'Unclosed formatting requires complicated reconstruction.',
    252258            ),
     
    326332            'EM inside DIV'                         => array( '<div>The weather is <em target>beautiful</em>.</div>', array( 'HTML', 'BODY', 'DIV', 'EM' ), 1 ),
    327333            'EM after closed EM'                    => array( '<em></em><em target></em>', array( 'HTML', 'BODY', 'EM' ), 2 ),
    328             'EM after closed EMs'                   => array( '<em></em><em><em></em></em><em></em><em></em><em target></em>', array( 'HTML', 'BODY', 'EM' ), 6 ),
     334            'EM after closed EMs'                   => array( '<em></em><em><em></em></em><em></em><em></em><em target></em>', array( 'HTML', 'BODY', 'EM' ), 5 ),
    329335            'EM after unclosed EM'                  => array( '<em><em target></em>', array( 'HTML', 'BODY', 'EM', 'EM' ), 1 ),
    330336            'EM after unclosed EM after DIV'        => array( '<em><div><em target>', array( 'HTML', 'BODY', 'EM', 'DIV', 'EM' ), 1 ),
Note: See TracChangeset for help on using the changeset viewer.