#60092 closed enhancement (fixed)
HTML API: HTML parser should fail parsing unsupported tags
Reported by: | jonsurrell | Owned by: | jonsurrell |
---|---|---|---|
Milestone: | 6.5 | Priority: | normal |
Severity: | normal | Version: | 6.5 |
Component: | HTML API | Keywords: | has-patch has-unit-tests commit needs-dev-note |
Focuses: | Cc: |
Description
The HTML API HTML processor does not yet support all tags. Many tags have some complicated rules in the "in body" insertion mode. For example, list elements are implemented in [https://github.com/WordPress/wordpress-develop/pull/5539 PR 5539.
Implementing these special rules is blocking the implementation for a catch all rule for "any other tag" because we need to prevent special rules from being handled by the catch-all.
Any other start tag
Reconstruct the active formatting elements, if any.
Insert an HTML element for the token.
…
This change ensures the HTML processor fails when handling special tags. This is the same as existing behavior, but will allow us to implement the catch-all "any other tag" handling without unintentionally handling special elements.
Additionally, we add tests that assert the special elements are unhandled. As they tags are implemented, this should help to ensure they're removed from the unsupported tag list.
This is part of https://github.com/WordPress/gutenberg/discussions/54579.
Change History (17)
This ticket was mentioned in PR #5762 on WordPress/wordpress-develop by @jonsurrell.
13 months ago
#1
- Keywords has-patch has-unit-tests added
13 months ago
#3
@sirreal I've manually reviewed each tag to ensure that it's listed where it should be and missing where it shouldn't. although it doesn't require special handling, I have added in SARCASM so that we can add support for that by itself, perhaps as the last tag for which we add support - I'm sentimental or something.
note that I uncovered failing tests only by implementing the _any other tag_ sections here. we can remove that handling, but given your changes I think it's a nice boost to add them in, and it was hard to review the changes without ensuring that those tags are or aren't properly processed.
I welcome your thoughts on this. thanks for the hard work!
@jonsurrell commented on PR #5762:
13 months ago
#4
I uncovered failing tests only by implementing the any other tag sections here
I was worried about that myself, I think it makes sense to add the _any other tag_ handling here as well. I'll review your changes.
@jonsurrell commented on PR #5762:
13 months ago
#5
I ran the test suite from #5794 against this and I see we handle ~30 more cases (previously skipped as unsupported). No new errors.
13 months ago
#6
@sirreal it looks like something may have gone wrong in the last two commits as this is now changing the .github
directory
@jonsurrell commented on PR #5762:
13 months ago
#7
Thanks @dmsnell, I removed the offending commits.
13 months ago
#8
@sirreal I added the WPCS exclusion for goto statements in here to silence the warning. that came from #5539
@jonsurrell commented on PR #5762:
13 months ago
#9
I've updated the title to reflect that we explicitly handle all tags now. Tags that require special handling will fail when unsupported and the "all other tags" rules have been implemented (by @dmsnell).
I reviewed the implementation of the all other tags handling. The opener rule is simple. The closer rule is much more difficult to follow, but I've carefully compared the implementation with the specification and I believe it's correct. I'm also confident in these changes thanks to the tests here and from #5794 which I've also run.
@jonsurrell commented on PR #5762:
13 months ago
#10
I pulled one fix from #5794 into this PR where we'd try to access a nonexistent $formatting_element->bookmark_name
.
I fixed the tests for this to:
- Use associative arrays in data provider so tests get better names
- Close the elements so that the processor enters error state on the unsupported tag instead of pausing.
13 months ago
#11
@sirreal I've made several changes to this PR
- refactored the closing tag logic and eliminated the
goto
. I confirmed that it's safe to remove elements from the stack of open elements while walking _up_. - added the list of supported tags to the docblock at the start of the class.
- updated comments that will generate linting nags.
- fixed a typo in one of the tests:
DEFN
was listed but it should have beenDFN
.
additionally I ran the html5lib test suite against this branch and saw no regressions. the 10 failures relate to the tests not starting in IN BODY context, I believe, and are present in trunk
Tests: 1814, Assertions: 2685, Errors: 1, Failures: 10, Skipped: 802.
I'm a bit concerned that we haven't fully evaluated every branch in the "any other tag" logic; in fact I may try and run coverage on our test suite if I can get that working and make sure.
13 months ago
#12
try and run coverage on our test suite if I can get that working and make sure.
good news everybody! I ran the coverage tests and it looks like we have 100% coverage for these changes. we are missing coverage for other parts of IN BODY, so maybe we should target 100% of step_in_body()
for 6.5
@jonsurrell commented on PR #5762:
13 months ago
#13
I've reviewed the additional changes and they look good to me. Consider this my approval 👍
@Bernhard Reiter commented on PR #5762:
13 months ago
#16
Committed to Core in https://core.trac.wordpress.org/changeset/57248.
Trac ticket: https://core.trac.wordpress.org/ticket/60092
This implements part of the work described here: https://github.com/WordPress/gutenberg/discussions/54579