Make WordPress Core

Opened 8 weeks ago

Closed 5 weeks ago

Last modified 5 weeks ago

#62521 closed defect (bug) (fixed)

HTML Processor virtual token seek does not seek to correct location

Reported by: jonsurrell's profile jonsurrell Owned by: gziolo's profile gziolo
Milestone: 6.8 Priority: normal
Severity: normal Version: 6.7
Component: HTML API Keywords: has-patch has-unit-tests
Focuses: Cc:

Description (last modified by jonsurrell)

When an HTML_Processor bookmark is set at a virtual token (a node in the resulting document that does not correspond to an HTML token present in the input string), seek behavior becomes unreliable.

For example:

<?php
$processor = WP_HTML_Processor::create_full_parser( 'text only' );

$advance_and_log_tag = function () use ( $processor ) {
        assert( $processor->next_tag( array( 'tag_closers' => 'visit' ) ) );
        echo str_repeat( '  ', $processor->get_current_depth() ) .
                ( $processor->is_tag_closer() ? '  /' : '' ) .
                $processor->get_token_name() .
                "\n";
};

$advance_and_log_tag();
$advance_and_log_tag();
$advance_and_log_tag();
$advance_and_log_tag();
// Now at `<BODY>` virtual token, not present in the HTML string.
assert( 'BODY' === $processor->get_token_name() && ! $processor->is_tag_closer() );
assert( $processor->set_bookmark( 'apparently <BODY> open tag' ) );
$advance_and_log_tag();
$advance_and_log_tag();
// Now at `</HTML>` virtual token, not present in the HTML string.
assert( $processor->seek( 'apparently <BODY> open tag' ) );
// Expected to return to `<BODY>` open tag.
echo $processor->get_token_name() . "\n";
// prints: #text
assert( 'BODY' === $processor->get_token_name() );
// AssertionError!

The above prints:

  HTML
    HEAD
    /HEAD
    BODY
    /BODY
  /HTML
#text

Fatal error: Uncaught AssertionError: assert('BODY' === $processor->get_token_name()) …

Change History (8)

#1 @jonsurrell
8 weeks ago

  • Description modified (diff)

#2 @jonsurrell
8 weeks ago

Seeking relies on moving to a specific location in the HTML string and advancing to the next token at that location. This means that seeking to virtual tokens unsupported.

I think the most reasonable thing when attempting to set a bookmark at a virtual token is to return false and refuse to set the bookmark.

There could be some much more complicated virtual token accounting, but that would introduce significant complexity and can be considered an enhancement.

#3 @jonsurrell
8 weeks ago

  • Description modified (diff)

This ticket was mentioned in PR #7862 on WordPress/wordpress-develop by @jonsurrell.


8 weeks ago
#4

  • Keywords has-patch has-unit-tests added

Prevent tokens (with unreliable seek behavior) from being set at virtual tokens.

When an HTML_Processor bookmark is set at a virtual token (a node in the resulting document that does not correspond to an HTML token present in the input string), seek behavior becomes unreliable.

For example:

$processor = WP_HTML_Processor::create_full_parser( 'text only' );

$advance_and_log_tag = function () use ( $processor ) {
        assert( $processor->next_tag( array( 'tag_closers' => 'visit' ) ) );
        echo str_repeat( '  ', $processor->get_current_depth() ) .
                ( $processor->is_tag_closer() ? '  /' : '' ) .
                $processor->get_token_name() .
                "\n";
};

$advance_and_log_tag();
$advance_and_log_tag();
$advance_and_log_tag();
$advance_and_log_tag();
// Now at `<BODY>` virtual token, not present in the HTML string.
assert( 'BODY' === $processor->get_token_name() && ! $processor->is_tag_closer() );
assert( $processor->set_bookmark( 'apparently <BODY> open tag' ) );
$advance_and_log_tag();
$advance_and_log_tag();
// Now at `</HTML>` virtual token, not present in the HTML string.
assert( $processor->seek( 'apparently <BODY> open tag' ) );
// Expected to return to `<BODY>` open tag.
echo $processor->get_token_name() . "\n";
// prints: #text
assert( 'BODY' === $processor->get_token_name() );
// AssertionError!

The above prints:

  HTML
    HEAD
    /HEAD
    BODY
    /BODY
  /HTML
#text

Fatal error: Uncaught AssertionError: assert('BODY' === $processor->get_token_name()) …

Trac ticket: https://core.trac.wordpress.org/ticket/62521

#5 @desrosj
8 weeks ago

  • Milestone changed from Awaiting Review to 6.8

@gziolo commented on PR #7862:


5 weeks ago
#6

This looks good. Let's improve the developer hint when incorrect usage is detected.

#7 @gziolo
5 weeks ago

  • Owner set to gziolo
  • Resolution set to fixed
  • Status changed from new to closed

In 59502:

HTML API: Prevent bookmarks from being set on virtual tokens

Fixes the issue when an HTML_Processor bookmark was set at a virtual token (a node in the resulting document that does not correspond to an HTML token present in the input string), seek behavior was unreliable.

Props jonsurrell, gziolo.
Fixes #62521.

Note: See TracTickets for help on using tickets.