Opened 5 months ago
Closed 7 weeks ago
#62269 closed defect (bug) (fixed)
WP_HTML_Processor::next_token() cannot be extended in subclasses to keep track of state
Reported by: |
|
Owned by: |
|
---|---|---|---|
Milestone: | 6.7.2 | Priority: | normal |
Severity: | normal | Version: | 6.5 |
Component: | HTML API | Keywords: | has-unit-tests has-patch fixed-major |
Focuses: | Cc: |
Description (last modified by )
In the Optimization Detective plugin from the WordPress Core Performance Team there is a need to compute a precise locator for each tag in a document beyond just what get_breadcrumbs()
provides. In particular, there is a need to disambiguate between two tags which may be siblings of each other in which case array( 'html', 'body', 'img' )
will be ambiguous. Currently we're using XPaths for this purpose, for example if there are three IMG
tags appearing as siblings at the beginning of the BODY
, their XPaths are computed as:
/*[1][self::HTML]/*[2][self::BODY]/*[1][self::IMG]
/*[1][self::HTML]/*[2][self::BODY]/*[2][self::IMG]
/*[1][self::HTML]/*[2][self::BODY]/*[3][self::IMG]
In order to compute these XPaths with HTML Tag Processor, the plugin extends the WP_HTML_Tag_Processor
class with an wrapped version of next_token()
so it can keep track of each new tag encountered to build up the array structure to compute the XPath.
This turns out not to work when extending WP_HTML_Processor
because WP_HTML_Processor::next_token()
often does recursive calls, resulting in erroneous XPath indices being computed. For example, next_token()
is called twice when processing <html>
and three times when processing <body>
, at least in my sample doc.
The fix seems simple: move the logic from WP_HTML_Processor::next_token()
into another private method like WP_HTML_Processor::_next_token()
and update any recursive references to also call WP_HTML_Processor::_next_token()
. Then WP_HTML_Processor::next_token()
can simply just call WP_HTML_Processor::_next_token()
and extending classes will be able to rely on each invocation of next_token
corresponding to a new token. This would also be similar to what WP_HTML_Tag_Processor::next_token()
does in that it is simply wrapping a call to WP_HTML_Tag_Processor::base_class_next_token()
.
Change History (20)
This ticket was mentioned in PR #7607 on WordPress/wordpress-develop by @westonruter.
5 months ago
#1
- Keywords has-patch has-unit-tests added
#6
@
5 months ago
- Milestone changed from 6.7 to 6.7.1
Time ran out for this one with 6.7 RC1 due out any moment. Going to punt.
@westonruter commented on PR #7607:
5 months ago
#10
Committed in r59285.
#11
@
5 months ago
- Keywords fixed-major added
- Resolution fixed deleted
- Status changed from closed to reopened
Reopening for 6.7.1 consideration.
This ticket was mentioned in PR #7730 on WordPress/wordpress-develop by @westonruter.
5 months ago
#12
Trac ticket: https://core.trac.wordpress.org/ticket/62269
Addresses code review feedback from @dmsnell:
@westonruter commented on PR #7730:
5 months ago
#14
Committed in r59364
This ticket was mentioned in Slack in #core by jorbin. View the logs.
3 months ago
This ticket was mentioned in Slack in #core by jorbin. View the logs.
7 weeks ago
This ticket was mentioned in PR #8227 on WordPress/wordpress-develop by @jorbin.
7 weeks ago
#18
This includes the following commits:
- https://core.trac.wordpress.org/changeset/59285
- https://core.trac.wordpress.org/changeset/59364
- https://core.trac.wordpress.org/changeset/59747
The backport needed to be manually merged due to file conflicts.
Trac Ticket: https://core.trac.wordpress.org/ticket/62269
#19
@
7 weeks ago
Used the wrong ticket, there was one additional commit here.
in r59747
Documentation: Update @since to reflect version this might ship in.
When originally committed, this code was targeting 6.7.1. However, it was not backported and included in 6.7.1. Will this be followed up by another version change? You'll need to stay tuned to next week's episode of "As the WordPress Turns" to find out!
Follow-up to [59285] and [59364].
See #62270.
#20
@
7 weeks ago
- Resolution set to fixed
- Status changed from reopened to closed
Fixed in [59757]
HTML API: Fix extensibility of WP_HTML_Processor::next_token().
Break out logic from the next_token() method into a private method which may call itself recursively. This allows for subclasses to override the next_token() method and be assured that each call to next_token() corresponds with the consumption of one single token. This also parallels how WP_HTML_Tag_Processor::next_token() wraps a private base_class_next_token() method.
Reviewed by jonsurrell.
Merges [59285], [59364], and [59747] to 6.7 branch.
Props westonruter, jonsurrell, dmsnell, jorbin.
Trac ticket: https://core.trac.wordpress.org/ticket/62269