#61974 closed enhancement (fixed)
HTML API: Add method to split text nodes by null or empty prefixes.
Reported by: | dmsnell | Owned by: | dmsnell |
---|---|---|---|
Milestone: | 6.7 | Priority: | normal |
Severity: | normal | Version: | trunk |
Component: | HTML API | Keywords: | has-patch has-unit-tests |
Focuses: | Cc: |
Description
There are places in the HTML Processor that need to parse differently according to whether text content is a sequence of NULL bytes or whitespace characters after decoding. It's awkward and inefficient to do this within the HTML Processor, however, as it requires eagerly decoding text nodes.
The Tag Processor could expose a method to efficiently split apart a text node when needed, and then classify it, to aid in the parsing. This method could further be used to identify inter-element whitespace, which is usually ignored when rendering HTML.
Change History (4)
This ticket was mentioned in PR #7236 on WordPress/wordpress-develop by @dmsnell.
5 days ago
#1
- Keywords has-patch has-unit-tests added
5 days ago
#2
sorry for the late ticket creation, but I thought it was best to separate this as a feature enhancement. it's mostly an internal method, but since it has potential use to application code it can retain its own ticket.
Trac ticket: Core-61974
HTML parsing rules at times differentiate character tokens that are all null bytes, all whitespace, or other content. This patch introduces a new function which may be used to classify text node sub-regions and lead to more efficient application of these parsing rules.
Further, when classified in this way, application code may skip some rules and decoding entirely, improving performance.
## Example script
## html5lib tests