#62357 closed feature request (fixed)
HTML API: Allow creating fragment parser from a context node
Reported by: | jonsurrell | Owned by: | jonsurrell |
---|---|---|---|
Milestone: | 6.8 | Priority: | normal |
Severity: | normal | Version: | |
Component: | HTML API | Keywords: | has-patch has-unit-tests commit |
Focuses: | Cc: |
Description
The HTML API has two ways of creating HTML processors, create_fragment
and
create_full_parser
.
create_fragment
is a static method that creates a fragment parser from a very reasonable default context node, a BODY
element.
However, it may be necessary to create a fragment parser with a different context node, for example when working with HTML inside of an SVG or TABLE tag may be parsed very differently.
One strong motivation for this is the implementation of a set_inner_html
method which requires parsing an HTML fragment with the appropriate context node in order to change its contents.
Change History (25)
This ticket was mentioned in PR #7348 on WordPress/wordpress-develop by @jonsurrell.
4 weeks ago
#1
- Keywords has-patch has-unit-tests added
This ticket was mentioned in PR #7777 on WordPress/wordpress-develop by @jonsurrell.
3 weeks ago
#2
This PR builds on (and requires) #7348.
This modifies ::create_fragment( $html, $context )
to use a full processor and create_fragment_at_node
instead of the other way around. This makes more sense and makes the main factory methods more clear, where the state required for fragments is set up in create_fragment_at_node
instead of in both create_fragment
and create_fragment_at_node
.
This allows for more HTML contexts to be provided to the basic create_fragment
where the provided context HTML is appended to <!DOCTYPE html>
, a full processor is created, the last tag opener is found, and a fragment parser is created at that node via create_fragment_at_node
.
The HTML5lib tests are updated accordingly to use this new method to create fragments.
Trac ticket: https://core.trac.wordpress.org/ticket/62357
@jonsurrell commented on PR #7348:
3 weeks ago
#4
@jonsurrell commented on PR #7777:
3 weeks ago
#5
there’s a hidden performance surprise in there
I thought about that. One thing we could do is optimize the most common case where the context is <body>
and have some optimized setup that skips creating a full parser.
Do we need to prefix the DOCTYPE declaration though? What if someone wants a parser in
<!DOCTYPE quirks><p><table><td>
? I don’t know if that makes sense or is warranted.
I added the DOCTYPE because I suspect we usually want to be in no-quirks mode. If that is not added, then the <body>
context that has been used until now would behave differently. The most advanced and low-level option is still available to create a full processor and then create a fragment parser from that if folks _really_ want a fragment in quirks mode.
I'd like to better document this behavior and explain the more advanced option.
3 weeks ago
#6
I thought about that. One thing we could do is optimize the most common case where the context is <body> and have some optimized setup that skips creating a full parser.
The opposite of that is that <body>
doesn’t require that much processing in a case like this.
The advanced option is still available…I'd like to better document this behavior and explain the more advanced option.
This seems proper. That makes a safe default with an escape hatch for those who want to do something where they adopt their own risk.
@Bernhard Reiter commented on PR #7348:
2 weeks ago
#7
How does this relate to create_fragment
's $context
argument? The latter is currently hard-wired to `<body>`.
It seems like create_fragment_at_current_node
will allow to bypass this limitation (as it sets the new parser's context_node
). Is this the only way we'll be able to create a fragment with a non-<body>
context?
If yes: Should we deprecate create_fragment
's $context
arg?
If no: Would it make sense to move some of the logic from create_fragment_at_current_node
to create_fragment
, in order to have the latter support different contexts?
@jonsurrell commented on PR #7348:
2 weeks ago
#8
Those are all good questions and they're not addressed in this PR, but are in #7777.
I wanted to keep these two PRs independent because ::create_fragment
is still the most common way to create a fragment and I didn't want to refactor it in the same changeset.
This method serves as an "advanced" way of creating fragments (with a different doctype, node, or in quirks mode), while ::create_fragment
remains unchanged.
#7777 changes this so that instead of create_fragment_at_current_node
calling create_fragment
and then changing a bunch of internal state, the processor _always_ starts from a full HTML processor and then creates a fragment processor from that. This basically aligns with the HTML specification. This will allow more contexts to be used in create_fragment
as well where it will create a full processor, find the last tag opener in the context, and create the fragment at that location.
There's some relevant conversation on #7777 https://github.com/WordPress/wordpress-develop/pull/7777#pullrequestreview-2433840351.
@jonsurrell commented on PR #7348:
2 weeks ago
#9
Thank you!
How would you like to land this?
I don't feel too strongly, I this can be landed closing #62357, then https://github.com/WordPress/wordpress-develop/pull/7777 could be against another ticket like "allow more fragment context in ::create_fragment
".
https://github.com/WordPress/wordpress-develop/pull/7777 supersedes https://github.com/WordPress/wordpress-develop/pull/7141. That PR was associated with #61576, but that was a general "HTAML API in 6.7" ticket. We can sort out https://github.com/WordPress/wordpress-develop/pull/7777 later.
@jonsurrell commented on PR #7777:
2 weeks ago
#10
This can pull in tests from https://github.com/WordPress/wordpress-develop/pull/7141 and should also do this (mentioned in that description):
Review all documentation looking for places stating that <body> is the only supported context.
@Bernhard Reiter commented on PR #7348:
2 weeks ago
#13
Committed to Core in https://core.trac.wordpress.org/changeset/59444/.
This ticket was mentioned in PR #7859 on WordPress/wordpress-develop by @jonsurrell.
2 weeks ago
#14
Prevent fragments from being created at tag closers. This was noticed by @ockham in https://github.com/WordPress/wordpress-develop/pull/7348:
This got me thinking -- should we disallow create_fragment_at_current_node from being called when the processor is paused on a tag closer?
(If we do, then we should be able to remove this guard, as it would be guaranteed that we're at a tag opener.)
Follow-up [59444]
Trac ticket: https://core.trac.wordpress.org/ticket/62357
#15
@
2 weeks ago
- Resolution fixed deleted
- Status changed from closed to reopened
Reopening for follow-up change.
@jonsurrell commented on PR #7777:
2 weeks ago
#16
I've pulled some valuable changes like tests and documentation from https://github.com/WordPress/wordpress-develop/pull/7141 into this PR.
@jonsurrell commented on PR #7777:
2 weeks ago
#17
https://github.com/WordPress/wordpress-develop/pull/7348 has landed. I've merged trunk, added some improvements like tests and _doing_it_wrong
from https://github.com/WordPress/wordpress-develop/pull/7141.
I'm opening this up for review now 🎉
@cbravobernal commented on PR #7859:
2 weeks ago
#19
Committed in https://core.trac.wordpress.org/changeset/59450.
@Bernhard Reiter commented on PR #7777:
11 days ago
#20
The PR says that there are currently conflicts; would you mind rebasing this? 😊
@jonsurrell commented on PR #7777:
11 days ago
#21
Branch is up-to-date and should be conflict free now 👍
@Bernhard Reiter commented on PR #7777:
9 days ago
#22
Branch is up-to-date and should be conflict free now 👍
Conflicting again now 🙈 Mind rebasing once again? 😅 I'll review today.
@jonsurrell commented on PR #7777:
9 days ago
#23
I'll create a dedicated ticket for this. It's a very logical step after https://github.com/WordPress/wordpress-develop/pull/7348, but does introduce a new feature and is a significant refactor.
@jonsurrell commented on PR #7777:
9 days ago
#24
I've created https://core.trac.wordpress.org/ticket/62584 for this.
@Bernhard Reiter commented on PR #7777:
9 days ago
#25
Committed to Core in https://core.trac.wordpress.org/changeset/59467.
## Work in progress
Trac ticket: https://core.trac.wordpress.org/ticket/62357
This supports work on
set_inner_html
in https://github.com/WordPress/wordpress-develop/pull/7326.