#60283 closed enhancement (fixed)
HTML API: Support all HTML tags in standard
Reported by: | jonsurrell | Owned by: | dmsnell |
---|---|---|---|
Milestone: | 6.5 | Priority: | normal |
Severity: | normal | Version: | |
Component: | HTML API | Keywords: | has-patch has-unit-tests needs-dev-note |
Focuses: | Cc: |
Description
The HTML API does not yet support all HTML tags in the HTML standard. Support should be introduced for all HTML tags.
This ticket does not cover foreign content like SVG.
Unsupported tags are listed in WP_HTML_Processor::step_in_body
, currently at this location.
Change History (35)
This ticket was mentioned in PR #5895 on WordPress/wordpress-develop by @jonsurrell.
10 months ago
#1
- Keywords has-patch has-unit-tests added
This ticket was mentioned in Slack in #core by jorbin. View the logs.
10 months ago
@jonsurrell commented on PR #5895:
10 months ago
#4
@dmsnell Will you review please?
@jonsurrell commented on PR #5895:
10 months ago
#5
I compared this with the html5lib-tests branch (#5794) and this includes 4 additional tests, no failures.
This ticket was mentioned in PR #5897 on WordPress/wordpress-develop by @jonsurrell.
10 months ago
#6
Implement HR tag handling
Trac ticket: https://core.trac.wordpress.org/ticket/60283
This ticket was mentioned in PR #5903 on WordPress/wordpress-develop by @jonsurrell.
10 months ago
#7
Add PRE and LISTING tag handling.
A start tag whose tag name is one of: "pre", "listing"
If the stack of open elements has a p element in button scope, then close a p element.
Insert an HTML element for the token.
If the next token is a U+000A LINE FEED (LF) character token, then ignore that token and move on to the next one. (Newlines at the start of pre blocks are ignored as an authoring convenience.)
Set the frameset-ok flag to "not ok".
An end tag whose tag name is one of: "address", "article", "aside", "blockquote", "button", "center", "details", "dialog", "dir", "div", "dl", "fieldset", "figcaption", "figure", "footer", "header", "hgroup", "listing", "main", "menu", "nav", "ol", "pre", "search", "section", "summary", "ul"
If the stack of open elements does not have an element in scope that is an HTML element with the same tag name as that of the token, then this is a parse error; ignore the token.
Otherwise, run these steps:
If the current node is not an HTML element with the same tag name as that of the token, then this is a parse error.
Pop elements from the stack of open elements until an HTML element with the same tag name as the token has been popped from the stack.
Trac ticket: https://core.trac.wordpress.org/ticket/60283
@jonsurrell commented on PR #5903:
10 months ago
#8
This increases html5lib-test assertions to 347 vs 336 on #5794 (11 new passing tests).
This ticket was mentioned in PR #5905 on WordPress/wordpress-develop by @jonsurrell.
10 months ago
#9
I started work on <plaintext>
tags, but they don't make much sense until we can support text nodes (https://github.com/WordPress/wordpress-develop/pull/5683). Pausing this work for now.
Trac ticket: https://core.trac.wordpress.org/ticket/60283
This ticket was mentioned in PR #5906 on WordPress/wordpress-develop by @jonsurrell.
10 months ago
#10
- Add param, source, track tag handling
- html5lib-tests
Trac ticket: https://core.trac.wordpress.org/ticket/60283
@jonsurrell commented on PR #5897:
10 months ago
#11
This tag has special "in select" handling: https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inselect
@jonsurrell commented on PR #5895:
10 months ago
#12
KEYGEN has special handling in "in select" mode:
https://html.spec.whatwg.org/multipage/parsing.html#parsing-main-inselect
This ticket was mentioned in PR #5907 on WordPress/wordpress-develop by @jonsurrell.
10 months ago
#13
- html5lib-tests
- Add INPUT tag handling
Trac ticket:
I tested this with:
-
src/wp-includes/html-api/class-wp-html-processor.php
diff --git a/src/wp-includes/html-api/class-wp-html-processor.php b/src/wp-includes/html-api/class-wp-html-processor.php index c3874fa51a..b2d267fd98 100644
a b class WP_HTML_Processor extends WP_HTML_Tag_Processor { 170 170 * 171 171 * @var WP_HTML_Processor_State 172 172 */ 173 p rivate$state = null;173 public $state = null; 174 174 175 175 /** 176 176 * Used to create unique bookmark names. -
tests/phpunit/tests/html-api/wpHtmlProcessor.php
diff --git a/tests/phpunit/tests/html-api/wpHtmlProcessor.php b/tests/phpunit/tests/html-api/wpHtmlProcessor.php index d2cf982d72..adc076eae7 100644
a b class Tests_HtmlApi_WpHtmlProcessor extends WP_UnitTestCase { 132 132 $this->assertFalse( $p->next_tag( 'EM' ), 'Should have aborted before finding second EM as it required reconstructing the first EM.' ); 133 133 } 134 134 135 /** 136 * Ensures that input elements correctly set (or not) frameset_ok. 137 * 138 * @ticket 60283 139 * 140 * @covers WP_HTML_Processor::step_in_body 141 */ 142 public function test_input_tag_framest_ok() { 143 $p = WP_HTML_Processor::create_fragment( '<input type="x">' ); 144 $this->assertTrue( $p->next_tag() ); 145 $this->assertTrue( $p->state->frameset_ok ); 146 147 $p = WP_HTML_Processor::create_fragment( '<input type="hidden">' ); 148 $this->assertTrue( $p->next_tag() ); 149 $this->assertFalse( $p->state->frameset_ok ); 150 151 $p = WP_HTML_Processor::create_fragment( '<input type="hIdDEn">' ); 152 $this->assertTrue( $p->next_tag() ); 153 $this->assertFalse( $p->state->frameset_ok ); 154 155 $p = WP_HTML_Processor::create_fragment( '<input>' ); 156 $this->assertTrue( $p->next_tag() ); 157 $this->assertFalse( $p->state->frameset_ok ); 158 } 159 135 160 /** 136 161 * Ensures that special handling of unsupported tags is cleaned up 137 162 * as handling is implemented. Otherwise there's risk of leaving special
#14
@
10 months ago
- Owner set to dmsnell
- Resolution set to fixed
- Status changed from new to closed
In 57314:
10 months ago
#19
Merged in https://core.trac.wordpress.org/changeset/57316
91e51f92a8c2c5e81b2a08cf2600615612cc5ab5
10 months ago
#21
Merged in 85de4aa70dd4f02d1aeb87277b65535438530bcd
https://core.trac.wordpress.org/changeset/57317
10 months ago
#22
Forgot to push the conflict resolution, but it was limited to the line of the HTML Processor's class documentation listing supported tags. The resolution was to add both the KEYGEN
and the LISTING
elements to that line.
10 months ago
#23
I think we should actually change the Tag Processor before doing this, because we will continue to step into new tags if we don't.
@jonsurrell commented on PR #5906:
10 months ago
#26
We should try and add these to the class-level docblock when we add support … please find a way to fit them in towards the top of the file so that people can see what's supported.
I've missed that in all the recent PRs. Thanks for handling them so far, I'll try to include those in the future 👍
#29
@
10 months ago
I'm not following closely the amazing work on the HTML processor so I may be missing something but I'm not sure I understand why non-conforming and obsolete HTML elements are being added to the processor.
For example, the LISTING and PARAM elements are listed in the 'Non-conforming features' of the spec and 'must not be used by authors'.
https://html.spec.whatwg.org/#non-conforming-features
16.2 Non-conforming features
Elements in the following list are entirely obsolete, and must not be used by authors:
listing
Use pre and code instead.
param
Use the data attribute of the object element to set the URL of the external resource.
@dmsnell can you please expand a bit, when you have a chance?
Also, since there are commits still referncing this ticket, I guess the ticket should stay open?
#30
@
10 months ago
hi @afercia 👋
I'm glad you asked this great question.
it's hard to start answering without a preamble, which is that "invalid," "non-conforming," "non-normative,", "broken," and other words describing HTML must be understood as broad and highly context-sensitive terms. for the sake of any HTML parser there is no such thing as "invalid" markup. there is, but it doesn't mean anything other than "handle the markup as specified and note somewhere that it was invalid, if you want to communicate that it's invalid."
the section of the specification you are quoting is specifically discussing authoring HTML and yes, nobody should be creating a new LISTING tag. unfortunately for us and other parsers and browsers, that HTML might exist and we might have to read and understand it. there's no way to simply ignore a deprecated, obsolete, or non-conforming tag if carries its own semantic rules.
the XMP element, for example, switches into a separate parsing mode and the PARAM element is void. if we didn't include support for these tags then we would get off track when parsing any HTML that includes them. sadly, this means that every valid parser will have to support or at least recognize these elements forever in order to avoid breaking content already on the web. In fact, I recently observed that Github is sending </xmp>
on their pages and I'm guessing it's specifically to eagerly mitigate any attack vector based on the invalid XMP, where if someone were to inject <xmp>
onto the page then everything after it would be swallowed as plaintext until a closing </xmp>
is found. Without that guard then the injection would effectively trash the page from that point forward.
it's not currently possible to generate tags in the HTML API, though we do allow creating invalid attribute names as long as they don't break the HTML syntax. these discussions are relevant because the HTML API provides great opportunity to enforce whatever rules we want. for example, should we disallow setting invalid tabindex
values? I suspect that in time we'll find that there are cases we want to be strict but usually we will want to be permissive because of the risk of over-zealously rejecting content that otherwise would behave well in browsers.
for now though, the simple answer is that we must support any existing or legacy rules for deprecated elements in the spec to avoid mis-parsing HTML inputs containing them.
Also, since there are commits still referencing this ticket, I guess the ticket should stay open?
I was hoping that this ticket could close and remain closed. it has a broad title and I'd prefer that no new PRs are opened to be associated with it. the existing PLAINTEXT PR is almost surely not merging into 6.5. this ticket, as broad as it is, is also not comprehensive and so it's not being discussed in a way I'd expect a big tracking ticket to work.
what's your advice? should I rename the ticket? dissociate the already-associated PR from it?
#31
@
10 months ago
@dmsnell thanks for your extended explanation :)
should I rename the ticket? dissociate the already-associated PR from it?
If you're planning to add more tags, I'd think this ticket must stay open as the work on it is not completed.
#32
@
10 months ago
If you're planning to add more tags, I'd think this ticket must stay open as the work on it is not completed.
we are going to add more tags, but I prefer not to on this ticket. if I change the title to something like "HTML API: Support more tags" would that clarify?
every other addition has come in separate tickets, and I find it distracting to mush all the conversations into one. that is, if we keep this ticket open until the HTML Processor is complete I'm not sure what the point of the ticket is. it's not for discussion about whether we should support the tags; and if it's simply to relate the work to the HTML Processor, then maybe we should be using See #58517
or Follows [56274]
?
#33
@
10 months ago
Traditionally the process on Trac is a little different from the one typically used on GitHub for example in the Gutenberg project. On Trac, it's not uncommon, sometimes even recommended, to use a ticket as a 'tracking ticket' and refer commits to it.
Trac isn't only a development tool. It represents the history of WordPress. Grouping related changes under a single ticket can help making the history of the project clearer even after many years.
#34
@
10 months ago
Thanks for sharing your thoughts, @afercia. I don't think there's any comparison to Github here, or at least none was intended on my part; this is solely about Trac.
There are a few things to note:
- None of the HTML tag support patches have been done under a single Trac ticket. Some are individually not that complicated, but some are mini-projects on their own.
- We'll likely be having lingering support patches for some amount of time.
- This Trac ticket was not opened for the purposes I think you are describing; more or less it was accidentally created with a title that sounds like what you're suggesting.
- As written, this ticket even excludes large portions of the HTML5 specification from its scope, making it non-comprehensive even for the title'd purpose.
So, as I responded to your first prompting, I think it would make sense, if we're requiring people not to split work into smaller pieces, to reuse #58517 since that seems to be more like what you are talking about requiring. I would appreciate your thoughts on that point; I'd like to imagine that this Trac ticket never existed, but it was created, and here we are.
Also I'd like some thoughts from folks if there's strong feelings here or if it'd be fine to stay the course on the HTML API developments taking place in separate tickets. It can be hard to anticipate beforehand how complicated some section of the HTML5 specification will be to implement, and the discussion on this ticket, holding only a handful of relatively easy parts to implement, is already difficult for me personally to follow; there's too much noise and scrolling. Additionally, we inevitably get to things like the PLAINTEXT
PR, which doesn't really belong here because as we got into it we realized that it's part of the Tag Processor's work and not part of the HTML Processor, so any merging of the work between the two is pushing the inherent relation that's there.
Add "area", "br", "embed", "img", "keygen", "wbr" tag handling
This exception is currently unsupported,
<br>
is handled, but the special exception for</br>
is not:Trac ticket: https://core.trac.wordpress.org/ticket/60283