Make WordPress Core

Opened 11 months ago

Closed 11 months ago

Last modified 9 months ago

#60122 closed enhancement (fixed)

HTML API: avoid processing incomplete tokens

Reported by: dmsnell's profile dmsnell Owned by: bernhard-reiter's profile Bernhard Reiter
Milestone: 6.5 Priority: normal
Severity: normal Version: 6.5
Component: HTML API Keywords: has-patch has-unit-tests needs-dev-note
Focuses: Cc:

Description

Currently the Tag Processor assumes that an input document is a full HTML document. Because of this, if there's lingering content after the last tag match it will treat that content as plaintext and skip over it. This is fine for the Tag Processor because if there is lingering content that isn't a valid tag then there's nothing for next_tag() to match.

However, in order to support a number of feature expansions it is important to recognize that the remaining content may involve partial syntax elements, such as incomplete tags, attributes, or comments.

In this patch we're adding a mode inside the Tag Processor which will flip when we start parsing HTML syntax but the document finishes before the token does. This will provide the ability to:

extend the input document
avoid misinterpreting syntax as text
guess if we have a complete document, know if we have an incomplete document

Change History (9)

#1 @Bernhard Reiter
11 months ago

  • Owner set to Bernhard Reiter
  • Resolution set to fixed
  • Status changed from new to closed

In 57211:

HTML API: Avoid processing incomplete tokens.

Currently the Tag Processor assumes that an input document is a full HTML document. Because of this, if there's lingering content after the last tag match it will treat that content as plaintext and skip over it. This is fine for the Tag Processor because if there is lingering content that isn't a valid tag then there's nothing for next_tag() to match.

However, in order to support a number of feature expansions it is important to recognize that the remaining content may involve partial syntax elements, such as incomplete tags, attributes, or comments.

In this patch we're adding a mode inside the Tag Processor which will flip when we start parsing HTML syntax but the document finishes before the token does. This will provide the ability to:

  • extend the input document,
  • avoid misinterpreting syntax as text, and
  • guess if we have a complete document, know if we have an incomplete document.

In the process of building this patch a few fixes were identified and fixed in the Tag Processor, namely in the handling of incomplete syntax elements.

Props dmsnell, jonsurrell.
Fixes #60122, #60108.

#2 @kebbet
10 months ago

With changeset in trunk during 6.5, please milestone the ticket to 6.5 @bernhard-reiter, thanks!

#3 @dmsnell
10 months ago

  • Milestone changed from Awaiting Review to 6.5

thanks @kebbet - I've set the milestone to 6.5

#4 @stevenlinx
9 months ago

  • Keywords needs-dev-note added

I assume this is a misc one.

#5 @dmsnell
9 months ago

@stevenlinx can you reword what you are stating or asking? I'm not sure what to make of your comment, but it seems like you are asking a question.

#6 @stevenlinx
9 months ago

@dmsnell , thank you for your question.

"misc" means "misc dev note".

It's a term people use within the Release Doc team.

It means, given the size of the change (paragraph sized) and its notability, I designate the dev note to go to the "Miscellaneous developer changes in WordPress X.Y" blog post, which stores a list of paragraph sized dev notes.

If you disagree about its size or where the dev note shall go, you may state the reasons why it should go elsewhere.

If you do agree, nothing need to be done on your end.

#7 @dmsnell
9 months ago

thanks for clarifying @stevenlinx - the demonstratives "this" and "one" were definitely confusing without more context. I thought you might be talking about classifying the ticket, but I wasn't sure and didn't want to assume.

happy to help note these changes, but in fact I have been preparing one update post for all of the HTML API changes in 6.5. would it be best to ensure this makes it into that post or is there reason to split all the work into separate notes?

https://make.wordpress.org/core/?p=110857&preview=1&_ppp=939d7aa28d

#8 @stevenlinx
9 months ago

"... I have been preparing one update post for all of the HTML API changes in 6.5. would it be best to ensure this makes it into that post"

Thank you for letting me know.

Yes, I think the change in question is better to be placed in the standalone dev note post you linked, along with other HTML API updates.

I'll cross out this dev note being a "misc" one and recategorize it as a "combined minis".

#9 @dmsnell
9 months ago

Thank you @stevenlinx!

Note: See TracTickets for help on using tickets.