Make WordPress Core

Opened 9 months ago

Last modified 8 months ago

#59291 new enhancement

HTML API: Expose raw tag markup to support existing filters

Reported by: dmsnell's profile dmsnell Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 6.4
Component: HTML API Keywords: has-patch has-unit-tests
Focuses: Cc:

Description

Many existing filters in Core (e.g. wp_targeted_link_rel) pass segments of the raw HTML markup for matched tags where code implementing those hooks perform futher analyis on those segments. The HTML API attempts to hide the raw inner markup as much as possible, making it hard to provide full backwards compatability with these existing filters.

In this patch a new function is added which extracts the raw markup to maintain those existing behaviors. It requires a disclaimer to operate to try and encourage folks to use the structural and semantic methods provided by the HTML API.

Where possible, existing code and filters should be updated so that they no longer depend on the raw HTML.

Change History (4)

This ticket was mentioned in PR #5143 on WordPress/wordpress-develop by @dmsnell.


9 months ago
#1

  • Keywords has-unit-tests added

Trac ticket: #59291-trac

Many existing filters in Core pass segments of the raw HTML markup for matched tags where code implementing those hooks perform futher analyis on those segments. The HTML API attempts to hide the raw inner markup as much as possible, making it hard to provide full backwards compatability with these existing filters.

In this patch a new function is added which extracts the raw markup to maintain those existing behaviors. It requires a disclaimer to operate to try and encourage folks to use the structural and semantic methods provided by the HTML API.

Where possible, existing code and filters should be updated so that they no longer depend on the raw HTML.

## To Consider

We could canonicalize the output here using the HTML5 serialization algorithm. This implies writing all attributes as double-quoted strings with proper escaping. Whitespace between attributes and the tag name should be normalized. Characters are properly escaped and quoted. This is already trivial to reconstruct through the existing methods, but incurs a runtime cost.

In this patch I've opted to propose a RAW method to avoid that cost, particularly since this is designed to be used in places in Core that are running frequently and on all data (kses, formatting, etc…). Using the raw contents risks leaving "broken" markup broken, leaving existing breakages in place, but should not break existing code relying on those breakages.

If, on the other hand, we normalize the HTML, we could potentially close existing bugs that fail because of the typical kind of parsing failures, regexps looking for double-quoted attributes, etc…

We can introduce another function to generate the normalized output, but I am hesitant to make things look too easy and accidentally encourage people to work around the semantic methods in the HTML API in favor of "easier" regexp parsing.

@Bernhard Reiter commented on PR #5143:


9 months ago
#2

Sadly needs a rebase now 🙈 Would you mind taking care of that? I'll land it tomorrow morning 😊

@dmsnell commented on PR #5143:


9 months ago
#3

Thanks @ockham - I should have created this as a draft as it needs much more consideration before merge.

#4 @dmsnell
8 months ago

#59290 was marked as a duplicate.

Note: See TracTickets for help on using tickets.