#60170 closed enhancement (fixed)
HTML API: Scan every token in an HTML document.
Reported by: |
|
Owned by: |
|
---|---|---|---|
Milestone: | 6.5 | Priority: | normal |
Severity: | normal | Version: | 6.5 |
Component: | HTML API | Keywords: | has-patch has-unit-tests has-dev-note |
Focuses: | Cc: |
Description
The Tag Processor currently visits every syntax element in an HTML document but it only exposes inspecting and modifying tags. In order to build certain features it needs to support visiting every kind of syntax token. In this patch the HTML Tag Processor exposes a new system for doing that.
Adding full token-scanning opens up new opportunities, including but not limited to the following list:
- A version of
wp_truncate_html()
that only parses as much HTML as is necessary and which provides the ability to preserve the HTML tags while ignoring their contribution to the HTML string length. gist - Improve the reliability (and potentially the performance of) existing functions like
wp_strip_all_tags()
. - Introduce new filtering pipelines to coalesce multiple filters into a single pass that only operates on text nodes, skipping multiple iterations of multiple buggy regular-expression-based transforms.
- Rendering HTML as text.
This patch introduces the concept of modifiable text, which represents plain text inside an HTML document which cannot contain other tags or syntax tokens. #text
nodes are modifiable text, as are the contents of a SCRIPT or STYLE element, and also the text inside an HTML comment is modifiable text. Modifiable text can be modified without changing the structure of the HTML document.
This patch introduces new methods on the Tag Processor:
get_token_type()
reports what kind of token is currently matched, e.g. a tag or a comment.get_token_name()
roughly reports the DOM name that would be assigned to the node in a browser.get_modifiable_text()
returns the modifiable text for a node..
There is no ability provided yet to modify the modifiable text. See the Github PR for more elaborate descriptions of the new modifiable text concept.
Change History (22)
#3
@
14 months ago
- Owner set to dmsnell
- Resolution set to fixed
- Status changed from new to closed
In 57348:
#5
@
14 months ago
@westonruter check out the section above titled What changes in the Tag Processor?
I've also left a note in the linked ticket. These special tags are now handled atomically in order to guard the fact that contents inside of them is special and needs special rules for escaping and parsing.
This ticket was mentioned in PR #6021 on WordPress/wordpress-develop by @dmsnell.
14 months ago
#7
When parser states were introduced in WordPress/wordpress-develop#5683, nothing in the seek()
method reset the parser state. This is problematic because it could leave the parser in the wrong state.
In this patch the parser state is reset so that it get's properly adjusted on the successive call to next_token()
.
Follows [57348]
See Core-60170
Props @kevin940726 for finding and reporting.
This ticket was mentioned in PR #6021 on WordPress/wordpress-develop by @dmsnell.
14 months ago
#8
- Keywords has-unit-tests added
When parser states were introduced in WordPress/wordpress-develop#5683, nothing in the seek()
method reset the parser state. This is problematic because it could leave the parser in the wrong state.
In this patch the parser state is reset so that it get's properly adjusted on the successive call to next_token()
.
Follows [57348]
See Trac ticket 60170
Props @kevin940726 for finding and reporting.
This ticket was mentioned in PR #6021 on WordPress/wordpress-develop by @dmsnell.
14 months ago
#9
Trac ticket: Core-60428
When parser states were introduced in WordPress/wordpress-develop#5725, nothing in the seek()
method reset the parser state. This is problematic because it could leave the parser in the wrong state.
In this patch the parser state is reset so that it get's properly adjusted on the successive call to next_token()
.
Follows [57211]
See Trac ticket 60170
Props @kevin940726 for finding and reporting.
This ticket was mentioned in PR #6021 on WordPress/wordpress-develop by @dmsnell.
14 months ago
#10
Trac ticket: Core-60428
When parser states were introduced in WordPress/wordpress-develop#5725, nothing in the seek()
method reset the parser state. This is problematic because it could leave the parser in the wrong state.
In this patch the parser state is reset so that it get's properly adjusted on the successive call to next_token()
.
Follows [57211]
See Trac ticket 60170
Props @kevin940726 for finding and reporting.
This ticket was mentioned in PR #6412 on WordPress/wordpress-develop by @dmsnell.
11 months ago
#13
Fixes Core-60170
Funky comments must be at least one character long, but we were only checking _after_ the first character for where it ends. This patch fixes that.
#14
@
11 months ago
- Resolution fixed deleted
- Status changed from closed to reopened
Found an issue with short "funky comments" whereby a closing tag with an invalid tag name of a single character length would skip past the actual closing >
and consume another tag.
For instance, </!><p>
turns into a single funky comment with modifiable text !><p
11 months ago
#15
cc: @cbravobernal @darerodz I've removed one of the Interactivity API tests because it was accidentally passing when it shouldn't have been. it was testing whether the Server Directive Processor bails on unbalanced content, except it was testing with a </ >
which isn't a tag closer but rather a funky comment. That particular funky comment wasn't properly detected before, and fixing it broke the test.
11 months ago
#16
@dmsnell, I assume you would like to include this fix in 6.5.3 as you reopened the ticket, right?
11 months ago
#17
would like to include this fix in 6.5.3 as you reopened the ticket, right?
@gziolo this could have been a mistake on my part, but I was told something that led me to believe I need to reopen a ticket if I'm making a bug-fix for it. maybe I misunderstood. people wanted more of the work to be consolidated on fewer Trac tickets, so I was trying to do that. should I re-close the ticket?
11 months ago
#20
@aaronjorbin, if I am not mistaken, you are leading 6.5.x releases, can you help clarify the workflow that @dmsnell explained in https://github.com/WordPress/wordpress-develop/pull/6412#issuecomment-2074280668.
this could have been a mistake on my part, but I was told something that led me to believe I need to reopen a ticket if I'm making a bug-fix for it. maybe I misunderstood. people wanted more of the work to be consolidated on fewer Trac tickets, so I was trying to do that. should I re-close the ticket?
11 months ago
#21
In general, tickets on closed milestones shouldn't be reopened but the ticket should still be referenced on the follow up commit so that when someone is later looking at the feature, they can see the followup work.
The doc on commit messages talks a bit about how to do this reference, but here is an example
To have a change considered for backport, the process should be to create a ticket on trac and put it in the 6.5.3 milestone. Once there, it follows a workflow with the fixed-major
keyword to signify that code has been committed to trunk, dev-feedback
to indicate that a second committer should sign off on it and dev-reviewed
to show that the second committer has approved it.
Pending any blockers, I'd like to merge this in the next few days and iterate after merge.
There is a known issue handling CDATA sections within SVG elements whose resolution is not clear, particularly because the browsers handle things differently from each other and from the HTML specification. The approach taken at the moment is conservative and will avoid trashing a post in one of the rare cases that a CDATA section in an SVG element containing a
>
character is encountered.