Opened 19 months ago
Last modified 18 months ago
#59291 new enhancement
HTML API: Expose raw tag markup to support existing filters
Reported by: |
|
Owned by: | |
---|---|---|---|
Milestone: | Awaiting Review | Priority: | normal |
Severity: | normal | Version: | 6.4 |
Component: | HTML API | Keywords: | has-patch has-unit-tests |
Focuses: | Cc: |
Description
Many existing filters in Core (e.g. wp_targeted_link_rel
) pass segments of the raw HTML markup for matched tags where code implementing those hooks perform futher analyis on those segments. The HTML API attempts to hide the raw inner markup as much as possible, making it hard to provide full backwards compatability with these existing filters.
In this patch a new function is added which extracts the raw markup to maintain those existing behaviors. It requires a disclaimer to operate to try and encourage folks to use the structural and semantic methods provided by the HTML API.
Where possible, existing code and filters should be updated so that they no longer depend on the raw HTML.
Change History (4)
This ticket was mentioned in PR #5143 on WordPress/wordpress-develop by @dmsnell.
19 months ago
#1
- Keywords has-unit-tests added
@Bernhard Reiter commented on PR #5143:
19 months ago
#2
Sadly needs a rebase now 🙈 Would you mind taking care of that? I'll land it tomorrow morning 😊
Trac ticket: #59291-trac
Many existing filters in Core pass segments of the raw HTML markup for matched tags where code implementing those hooks perform futher analyis on those segments. The HTML API attempts to hide the raw inner markup as much as possible, making it hard to provide full backwards compatability with these existing filters.
In this patch a new function is added which extracts the raw markup to maintain those existing behaviors. It requires a disclaimer to operate to try and encourage folks to use the structural and semantic methods provided by the HTML API.
Where possible, existing code and filters should be updated so that they no longer depend on the raw HTML.
## To Consider
We could canonicalize the output here using the HTML5 serialization algorithm. This implies writing all attributes as double-quoted strings with proper escaping. Whitespace between attributes and the tag name should be normalized. Characters are properly escaped and quoted. This is already trivial to reconstruct through the existing methods, but incurs a runtime cost.
In this patch I've opted to propose a RAW method to avoid that cost, particularly since this is designed to be used in places in Core that are running frequently and on all data (kses, formatting, etc…). Using the raw contents risks leaving "broken" markup broken, leaving existing breakages in place, but should not break existing code relying on those breakages.
If, on the other hand, we normalize the HTML, we could potentially close existing bugs that fail because of the typical kind of parsing failures, regexps looking for double-quoted attributes, etc…
We can introduce another function to generate the normalized output, but I am hesitant to make things look too easy and accidentally encourage people to work around the semantic methods in the HTML API in favor of "easier" regexp parsing.