Make WordPress Core

Opened 6 weeks ago

Last modified 3 weeks ago

#62290 reopened defect (bug)

Unable to seek to any bookmark in HTML Processor

Reported by: westonruter's profile westonruter Owned by: bernhard-reiter's profile Bernhard Reiter
Milestone: 6.7.2 Priority: normal
Severity: normal Version: 6.6
Component: HTML API Keywords: has-patch has-unit-tests dev-feedback fixed-major
Focuses: Cc:

Description

Given this test case:

<?php
$bookmark_name = 'the-bookmark';

$html = <<<HTML
<html lang="en">
        <head>
                <meta charset="utf-8">
                <title>...</title>
        </head>
        <body>
                <div id="$bookmark_name"></div>
        </body>
</html>
HTML;

$processor = WP_HTML_Processor::create_full_parser( $html );
while ( $processor->next_tag() ) {
        if ( 'DIV' === $processor->get_tag() && ! $processor->set_bookmark( $bookmark_name ) ) {
                throw new Exception( "Failed to set bookmark" );
        }
}
if ( ! $processor->has_bookmark( $bookmark_name ) ) {
        throw new Exception( "Unexpectedly has_bookmark returned false." );
}
if ( ! $processor->seek( $bookmark_name ) ) {
        throw new Exception( "Failed to seek to bookmark." );
}

The call to seek unexpectedly returns false, even though the bookmark was set and has_bookmark() returned true.

Change History (17)

#1 follow-up: @westonruter
6 weeks ago

  • Milestone changed from Awaiting Review to 6.7.1
  • Version changed from trunk to 6.6

The issue also occurs with fragments:

<?php
$bookmark_name = 'the-bookmark';

$html = <<<HTML
<div id="$bookmark_name"></div>
HTML;

$processor = WP_HTML_Processor::create_fragment( $html );
while ( $processor->next_tag() ) {
        if ( 'DIV' === $processor->get_tag() ) {
                if ( ! $processor->set_bookmark( $bookmark_name ) ) {
                        throw new Exception( "Failed to set bookmark" );
                }
        }
}
if ( ! $processor->has_bookmark( $bookmark_name ) ) {
        throw new Exception( "Unexpectedly has_bookmark returned false." );
}
if ( ! $processor->seek( $bookmark_name ) ) {
        throw new Exception( "Failed to seek to bookmark." );
}

With this code, I get no exception if I run it against the 6.5 branch, but I get exceptions when I run it against the 6.6 and 6.7 branches.

#2 in reply to: ↑ 1 ; follow-up: @mi5t4n
6 weeks ago

Replying to westonruter:

The issue also occurs with fragments:

<?php
$bookmark_name = 'the-bookmark';

$html = <<<HTML
<div id="$bookmark_name"></div>
HTML;

$processor = WP_HTML_Processor::create_fragment( $html );
while ( $processor->next_tag() ) {
        if ( 'DIV' === $processor->get_tag() ) {
                if ( ! $processor->set_bookmark( $bookmark_name ) ) {
                        throw new Exception( "Failed to set bookmark" );
                }
        }
}
if ( ! $processor->has_bookmark( $bookmark_name ) ) {
        throw new Exception( "Unexpectedly has_bookmark returned false." );
}
if ( ! $processor->seek( $bookmark_name ) ) {
        throw new Exception( "Failed to seek to bookmark." );
}

With this code, I get no exception if I run it against the 6.5 branch, but I get exceptions when I run it against the 6.6 and 6.7 branches.

Running above code on the current trunk (6.8-alpha-59274-src), does not throw any exceptions.

Last edited 6 weeks ago by mi5t4n (previous) (diff)

#3 in reply to: ↑ 2 @westonruter
6 weeks ago

Replying to mi5t4n:

Running above code on the current trunk (6.8-alpha-59274-src), does not throw any exceptions.

You're right. I guess I didn't re-test in trunk when I switched to create_fragment since create_full_parser didn't exist in 6.5. Or I didn't re-test in trunk when I removed the root HTML element.

The following does fail for me in trunk (6.8-alpha-59287):

<?php
$bookmark_name = 'the-bookmark';

$html = <<<HTML
<html lang="en">
        <head>
                <meta charset="utf-8">
                <title>...</title>
        </head>
        <body>
                <div id="$bookmark_name"></div>
        </body>
</html>
HTML;

try {
        $processor = WP_HTML_Processor::create_full_parser( $html );
        while ( $processor->next_tag() ) {
                if ( 'DIV' === $processor->get_tag() ) {
                        if ( ! $processor->set_bookmark( $bookmark_name ) ) {
                                throw new Exception( "Failed to set bookmark" );
                        }
                }
        }
        if ( ! $processor->has_bookmark( $bookmark_name ) ) {
                throw new Exception( "Unexpectedly has_bookmark returned false." );
        }
        if ( ! $processor->seek( $bookmark_name ) ) {
                throw new Exception( "Failed to seek to bookmark." );
        }
        printf( "PASS (WP %s)\n", get_bloginfo( 'version' ) );
} catch ( Exception $e ) {
        printf( "FAIL: %s (WP %s)\n", $e->getMessage(), get_bloginfo( 'version' ) );
}

I get:

Notice: Function WP_HTML_Tag_Processor::seek was called <strong>incorrectly</strong>. Unknown bookmark name. Please see <a href="https://developer.wordpress.org/advanced-administration/debug/debug-wordpress/">Debugging in WordPress</a> for more information. (This message was added in version 6.2.0.) in /var/www/html/wp-includes/functions.php on line 6087
FAIL: Failed to seek to bookmark. (WP 6.8-alpha-59287)

If I replace create_full_parser with create_fragment I also get a failure but this time with PHP warnings:

Warning: Attempt to read property "node_name" on null in /var/www/html/wp-includes/html-api/class-wp-html-processor.php on line 5518
Warning: Undefined variable $node in /var/www/html/wp-includes/html-api/class-wp-html-processor.php on line 2991
FAIL: Failed to seek to bookmark. (WP 6.8-alpha-59287)

But then if I use create_fragment and I reduce the $html down to just <div id="$bookmark_name"></div> then at that point it passes.

So it seems there are issues with seeking to bookmarks when a full HTML document is passed into either create_fragment or create_full_parser.

@mi5t4n Please confirm whether you're seeing the same.

This ticket was mentioned in PR #7649 on WordPress/wordpress-develop by @jonsurrell.


6 weeks ago
#4

  • Keywords has-patch has-unit-tests added
  • Add failing test case
  • Work on fix

Trac ticket: https://core.trac.wordpress.org/ticket/62290

#5 @jonsurrell
6 weeks ago

https://github.com/WordPress/wordpress-develop/pull/7649 is rough and does not fix the issue, although it includes some ideas and a valid test case. If someone would like to pick up where that leaves off, they're welcome to it.

I will not be able to look at this myself during the next week.

#6 @mi5t4n
6 weeks ago

@westonruter I have pulled the changes from both the github and svn. But, I am still on the 6.8-alpha-59274-src. So, I could not test that code in the WP 6.8-alpha-59287. Any idea on how to resolve this? TIA

#7 @westonruter
6 weeks ago

@mi5t4n there should be no difference in behavior between these two 6.8-alpha versions.

#8 @mi5t4n
6 weeks ago

@westonruter Yes, I am getting similar errors as yours in the 6.8-alpha-59274-src trunk. But, the reason create_fragment is failing, maybe because create_fragment() is not intended to parse the whole HTML document, rather than parts which appears inside the <body> element.

create_fragment() DOC.

File: src/wp-includes/html-api/class-wp-html-processor.php
266: 	 * Use this for cases where you are processing chunks of HTML that
267: 	 * will be found within a bigger HTML document, such as rendered
268: 	 * block output that exists within a post, `the_content` inside a
269: 	 * rendered site layout.

create_fragment() signature.

File: src/wp-includes/html-api/class-wp-html-processor.php
295: 	public static function create_fragment( $html, $context = '<body>', $encoding = 'UTF-8' ) {
296: 		if ( '<body>' !== $context || 'UTF-8' !== $encoding ) {
297: 			return null;
298: 		}

@jonsurrell commented on PR #7649:


4 weeks ago
#9

There are a lot of parallels with https://github.com/WordPress/wordpress-develop/pull/7348 where the state should be reset appropriately according to the context node.

@jonsurrell commented on PR #7649:


4 weeks ago
#10

This is ready for review @gziolo @westonruter @dmsnell.

@jonsurrell commented on PR #7649:


4 weeks ago
#11

@mi5t4n you were active on https://core.trac.wordpress.org/ticket/62290, if you could provide testing for this change it would help move things along 🙏

@jonsurrell commented on PR #7649:


4 weeks ago
#12

Testing in https://github.com/WordPress/wordpress-develop/pull/7742 discovered an issue present on trunk where the fragment parser root-node may not be present on the stack of open elements in a fragment parser.

I've pushed a test and a fix for that to this PR.

#13 @Bernhard Reiter
4 weeks ago

  • Keywords commit added

#14 @Bernhard Reiter
4 weeks ago

  • Owner set to Bernhard Reiter
  • Resolution set to fixed
  • Status changed from new to closed

In 59391:

HTML API: Ensure that full processor can seek to earlier bookmarks.

When the HTML Processor seeks to an earlier place, it returns the the beginning of the document and proceeds forward until it reaches the appropriate location. This requires resetting internal state so that the processor can correctly proceed from the beginning of the document.

The seeking reset logic was not adapted to account for the full processor (i.e. when created via WP_HTML_Processor::create_full_parser()). This change updates the seek logic to account for the full and fragment parsers as well as other state that has been introduced in the interim and should be reset.

Props jonsurrell, dmsnell, westonruter, mi5t4n.
Fixes #62290.

#16 @Bernhard Reiter
4 weeks ago

  • Keywords dev-feedback fixed-major added; commit removed
  • Resolution fixed deleted
  • Status changed from closed to reopened

Re-opening to seek approval by another committer to backport to the 6.7 branch.

#17 @jonsurrell
3 weeks ago

  • Milestone changed from 6.7.1 to 6.7.2

This is an important fix for 6.7, but because the full HTML processors are less common I think it's OK to punt to 6.7.2.

Note: See TracTickets for help on using tickets.