Make WordPress Core

Opened 7 weeks ago

Last modified 3 weeks ago

#62269 reopened defect (bug)

WP_HTML_Processor::next_token() cannot be extended in subclasses to keep track of state

Reported by: westonruter's profile westonruter Owned by: westonruter's profile westonruter
Milestone: 6.7.2 Priority: normal
Severity: normal Version: 6.5
Component: HTML API Keywords: has-unit-tests has-patch fixed-major
Focuses: Cc:

Description (last modified by westonruter)

In the Optimization Detective plugin from the WordPress Core Performance Team there is a need to compute a precise locator for each tag in a document beyond just what get_breadcrumbs() provides. In particular, there is a need to disambiguate between two tags which may be siblings of each other in which case array( 'html', 'body', 'img' ) will be ambiguous. Currently we're using XPaths for this purpose, for example if there are three IMG tags appearing as siblings at the beginning of the BODY, their XPaths are computed as:

  • /*[1][self::HTML]/*[2][self::BODY]/*[1][self::IMG]
  • /*[1][self::HTML]/*[2][self::BODY]/*[2][self::IMG]
  • /*[1][self::HTML]/*[2][self::BODY]/*[3][self::IMG]

In order to compute these XPaths with HTML Tag Processor, the plugin extends the WP_HTML_Tag_Processor class with an wrapped version of next_token() so it can keep track of each new tag encountered to build up the array structure to compute the XPath.

This turns out not to work when extending WP_HTML_Processor because WP_HTML_Processor::next_token() often does recursive calls, resulting in erroneous XPath indices being computed. For example, next_token() is called twice when processing <html> and three times when processing <body>, at least in my sample doc.

The fix seems simple: move the logic from WP_HTML_Processor::next_token() into another private method like WP_HTML_Processor::_next_token() and update any recursive references to also call WP_HTML_Processor::_next_token(). Then WP_HTML_Processor::next_token() can simply just call WP_HTML_Processor::_next_token() and extending classes will be able to rely on each invocation of next_token corresponding to a new token. This would also be similar to what WP_HTML_Tag_Processor::next_token() does in that it is simply wrapping a call to WP_HTML_Tag_Processor::base_class_next_token().

Change History (15)

This ticket was mentioned in PR #7607 on WordPress/wordpress-develop by @westonruter.


7 weeks ago
#1

  • Keywords has-patch has-unit-tests added

#2 @westonruter
7 weeks ago

  • Description modified (diff)
  • Keywords has-patch has-unit-tests removed

#3 @westonruter
7 weeks ago

  • Keywords has-unit-tests added

#4 @westonruter
7 weeks ago

  • Keywords has-patch added

#5 @westonruter
7 weeks ago

  • Owner set to westonruter
  • Status changed from new to assigned

#6 @desrosj
7 weeks ago

  • Milestone changed from 6.7 to 6.7.1

Time ran out for this one with 6.7 RC1 due out any moment. Going to punt.

#7 @westonruter
7 weeks ago

@jonsurrell Do you think my proposed change makes sense?

#8 @jonsurrell
7 weeks ago

Yep, change is approved.

#9 @westonruter
7 weeks ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 59285:

HTML API: Fix extensibility of WP_HTML_Processor::next_token().

Break out logic from the next_token() method into a private method which may call itself recursively. This allows for subclasses to override the next_token() method and be assured that each call to next_token() corresponds with the consumption of one single token. This also parallels how WP_HTML_Tag_Processor::next_token() wraps a private base_class_next_token() method.

Props westonruter, jonsurrell.
Fixes #62269.

@westonruter commented on PR #7607:


7 weeks ago
#10

Committed in r59285.

#11 @westonruter
7 weeks ago

  • Keywords fixed-major added
  • Resolution fixed deleted
  • Status changed from closed to reopened

Reopening for 6.7.1 consideration.

#13 @westonruter
5 weeks ago

In 59364:

HTML API: Improve private method name used by WP_HTML_Processor::next_token().

This renames the private _next_token method to next_visitable_token. It also removes irrelevant assertions from the unit test.

Follow-up to [59285].

Props dmsnell, jonsurrell, westonruter.
See #62269.

@westonruter commented on PR #7730:


5 weeks ago
#14

Committed in r59364

#15 @jonsurrell
3 weeks ago

  • Milestone changed from 6.7.1 to 6.7.2

This is not urgent, punting to 6.7.2.

Note: See TracTickets for help on using tickets.