Make WordPress Core

Opened 13 months ago

Closed 13 months ago

Last modified 11 months ago

#60092 closed enhancement (fixed)

HTML API: HTML parser should fail parsing unsupported tags

Reported by: jonsurrell's profile jonsurrell Owned by: jonsurrell's profile jonsurrell
Milestone: 6.5 Priority: normal
Severity: normal Version: 6.5
Component: HTML API Keywords: has-patch has-unit-tests commit needs-dev-note
Focuses: Cc:

Description

The HTML API HTML processor does not yet support all tags. Many tags have some complicated rules in the "in body" insertion mode. For example, list elements are implemented in [https://github.com/WordPress/wordpress-develop/pull/5539 PR 5539.

Implementing these special rules is blocking the implementation for a catch all rule for "any other tag" because we need to prevent special rules from being handled by the catch-all.

Any other start tag
Reconstruct the active formatting elements, if any.

Insert an HTML element for the token.


This change ensures the HTML processor fails when handling special tags. This is the same as existing behavior, but will allow us to implement the catch-all "any other tag" handling without unintentionally handling special elements.

Additionally, we add tests that assert the special elements are unhandled. As they tags are implemented, this should help to ensure they're removed from the unsupported tag list.

This is part of https://github.com/WordPress/gutenberg/discussions/54579.

Change History (17)

This ticket was mentioned in PR #5762 on WordPress/wordpress-develop by @jonsurrell.


13 months ago
#1

  • Keywords has-patch has-unit-tests added

Trac ticket: https://core.trac.wordpress.org/ticket/60092

This implements part of the work described here: https://github.com/WordPress/gutenberg/discussions/54579

Abort with "unsupported" for all called-out tag names in the IN BODY section so that we can implement the "Any other start tag" and "Any other end tag" rules, including support for custom elements.

#2 @jonsurrell
13 months ago

FYI dmsnell westonruter

@dmsnell commented on PR #5762:


13 months ago
#3

@sirreal I've manually reviewed each tag to ensure that it's listed where it should be and missing where it shouldn't. although it doesn't require special handling, I have added in SARCASM so that we can add support for that by itself, perhaps as the last tag for which we add support - I'm sentimental or something.

note that I uncovered failing tests only by implementing the _any other tag_ sections here. we can remove that handling, but given your changes I think it's a nice boost to add them in, and it was hard to review the changes without ensuring that those tags are or aren't properly processed.

I welcome your thoughts on this. thanks for the hard work!

@jonsurrell commented on PR #5762:


13 months ago
#4

I uncovered failing tests only by implementing the any other tag sections here

I was worried about that myself, I think it makes sense to add the _any other tag_ handling here as well. I'll review your changes.

@jonsurrell commented on PR #5762:


13 months ago
#5

I ran the test suite from #5794 against this and I see we handle ~30 more cases (previously skipped as unsupported). No new errors.

@dmsnell commented on PR #5762:


13 months ago
#6

@sirreal it looks like something may have gone wrong in the last two commits as this is now changing the .github directory

@jonsurrell commented on PR #5762:


13 months ago
#7

Thanks @dmsnell, I removed the offending commits.

@dmsnell commented on PR #5762:


13 months ago
#8

@sirreal I added the WPCS exclusion for goto statements in here to silence the warning. that came from #5539

@jonsurrell commented on PR #5762:


13 months ago
#9

I've updated the title to reflect that we explicitly handle all tags now. Tags that require special handling will fail when unsupported and the "all other tags" rules have been implemented (by @dmsnell).

I reviewed the implementation of the all other tags handling. The opener rule is simple. The closer rule is much more difficult to follow, but I've carefully compared the implementation with the specification and I believe it's correct. I'm also confident in these changes thanks to the tests here and from #5794 which I've also run.

@jonsurrell commented on PR #5762:


13 months ago
#10

I pulled one fix from #5794 into this PR where we'd try to access a nonexistent $formatting_element->bookmark_name.

I fixed the tests for this to:

  • Use associative arrays in data provider so tests get better names
  • Close the elements so that the processor enters error state on the unsupported tag instead of pausing.

@dmsnell commented on PR #5762:


13 months ago
#11

@sirreal I've made several changes to this PR

  • refactored the closing tag logic and eliminated the goto. I confirmed that it's safe to remove elements from the stack of open elements while walking _up_.
  • added the list of supported tags to the docblock at the start of the class.
  • updated comments that will generate linting nags.
  • fixed a typo in one of the tests: DEFN was listed but it should have been DFN.

additionally I ran the html5lib test suite against this branch and saw no regressions. the 10 failures relate to the tests not starting in IN BODY context, I believe, and are present in trunk

Tests: 1814, Assertions: 2685, Errors: 1, Failures: 10, Skipped: 802.

I'm a bit concerned that we haven't fully evaluated every branch in the "any other tag" logic; in fact I may try and run coverage on our test suite if I can get that working and make sure.

@dmsnell commented on PR #5762:


13 months ago
#12

try and run coverage on our test suite if I can get that working and make sure.

good news everybody! I ran the coverage tests and it looks like we have 100% coverage for these changes. we are missing coverage for other parts of IN BODY, so maybe we should target 100% of step_in_body() for 6.5

@jonsurrell commented on PR #5762:


13 months ago
#13

I've reviewed the additional changes and they look good to me. Consider this my approval 👍

#14 @Bernhard Reiter
13 months ago

  • Keywords commit added
  • Milestone changed from Awaiting Review to 6.5

#15 @Bernhard Reiter
13 months ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 57248:

HTML API: Add explicit handling or failure for all tags.

The HTML API HTML processor does not yet support all tags. Many tags (e.g. list elements) have some complicated rules in the "in body" insertion mode.

Implementing these special rules is blocking the implementation for a catch-all rule for "any other tag" because we need to prevent special rules from being handled by the catch-all.

Any other start tag
Reconstruct the active formatting elements, if any.

Insert an HTML element for the token.


This change ensures the HTML Processor fails when handling special tags. This is the same as existing behavior, but will allow us to implement the catch-all "any other tag" handling without unintentionally handling special elements.

Additionally, we add tests that assert the special elements are unhandled. As these tags are implemented, this should help to ensure they're removed from the unsupported tag list.

Props jonsurrell, dmsnell.
Fixes #60092.

#17 @stevenlinx
11 months ago

  • Keywords needs-dev-note added

as part of a standalone dev note post

Note: See TracTickets for help on using tickets.