Opened 7 weeks ago
Last modified 3 weeks ago
#62269 reopened defect (bug)
WP_HTML_Processor::next_token() cannot be extended in subclasses to keep track of state
Reported by: | westonruter | Owned by: | westonruter |
---|---|---|---|
Milestone: | 6.7.2 | Priority: | normal |
Severity: | normal | Version: | 6.5 |
Component: | HTML API | Keywords: | has-unit-tests has-patch fixed-major |
Focuses: | Cc: |
Description (last modified by )
In the Optimization Detective plugin from the WordPress Core Performance Team there is a need to compute a precise locator for each tag in a document beyond just what get_breadcrumbs()
provides. In particular, there is a need to disambiguate between two tags which may be siblings of each other in which case array( 'html', 'body', 'img' )
will be ambiguous. Currently we're using XPaths for this purpose, for example if there are three IMG
tags appearing as siblings at the beginning of the BODY
, their XPaths are computed as:
/*[1][self::HTML]/*[2][self::BODY]/*[1][self::IMG]
/*[1][self::HTML]/*[2][self::BODY]/*[2][self::IMG]
/*[1][self::HTML]/*[2][self::BODY]/*[3][self::IMG]
In order to compute these XPaths with HTML Tag Processor, the plugin extends the WP_HTML_Tag_Processor
class with an wrapped version of next_token()
so it can keep track of each new tag encountered to build up the array structure to compute the XPath.
This turns out not to work when extending WP_HTML_Processor
because WP_HTML_Processor::next_token()
often does recursive calls, resulting in erroneous XPath indices being computed. For example, next_token()
is called twice when processing <html>
and three times when processing <body>
, at least in my sample doc.
The fix seems simple: move the logic from WP_HTML_Processor::next_token()
into another private method like WP_HTML_Processor::_next_token()
and update any recursive references to also call WP_HTML_Processor::_next_token()
. Then WP_HTML_Processor::next_token()
can simply just call WP_HTML_Processor::_next_token()
and extending classes will be able to rely on each invocation of next_token
corresponding to a new token. This would also be similar to what WP_HTML_Tag_Processor::next_token()
does in that it is simply wrapping a call to WP_HTML_Tag_Processor::base_class_next_token()
.
Change History (15)
This ticket was mentioned in PR #7607 on WordPress/wordpress-develop by @westonruter.
7 weeks ago
#1
- Keywords has-patch has-unit-tests added
#6
@
7 weeks ago
- Milestone changed from 6.7 to 6.7.1
Time ran out for this one with 6.7 RC1 due out any moment. Going to punt.
@westonruter commented on PR #7607:
7 weeks ago
#10
Committed in r59285.
#11
@
7 weeks ago
- Keywords fixed-major added
- Resolution fixed deleted
- Status changed from closed to reopened
Reopening for 6.7.1 consideration.
This ticket was mentioned in PR #7730 on WordPress/wordpress-develop by @westonruter.
5 weeks ago
#12
Trac ticket: https://core.trac.wordpress.org/ticket/62269
Addresses code review feedback from @dmsnell:
@westonruter commented on PR #7730:
5 weeks ago
#14
Committed in r59364
Trac ticket: https://core.trac.wordpress.org/ticket/62269