Make WordPress Core

Opened 4 weeks ago

Closed 2 weeks ago

Last modified 9 days ago

#62357 closed feature request (fixed)

HTML API: Allow creating fragment parser from a context node

Reported by: jonsurrell's profile jonsurrell Owned by: jonsurrell's profile jonsurrell
Milestone: 6.8 Priority: normal
Severity: normal Version:
Component: HTML API Keywords: has-patch has-unit-tests commit
Focuses: Cc:

Description

The HTML API has two ways of creating HTML processors, create_fragment and
create_full_parser.

create_fragment is a static method that creates a fragment parser from a very reasonable default context node, a BODY element.

However, it may be necessary to create a fragment parser with a different context node, for example when working with HTML inside of an SVG or TABLE tag may be parsed very differently.

One strong motivation for this is the implementation of a set_inner_html method which requires parsing an HTML fragment with the appropriate context node in order to change its contents.

Change History (25)

This ticket was mentioned in PR #7348 on WordPress/wordpress-develop by @jonsurrell.


4 weeks ago
#1

  • Keywords has-patch has-unit-tests added

## Work in progress

OK, but incomplete, skipped, or risky tests!
-Tests: 1500, Assertions: 1079, Skipped: 421.
+Tests: 1686, Assertions: 1222, Skipped: 464.

Trac ticket: https://core.trac.wordpress.org/ticket/62357

This supports work on set_inner_html in https://github.com/WordPress/wordpress-develop/pull/7326.

This ticket was mentioned in PR #7777 on WordPress/wordpress-develop by @jonsurrell.


3 weeks ago
#2

This PR builds on (and requires) #7348.

This modifies ::create_fragment( $html, $context ) to use a full processor and create_fragment_at_node instead of the other way around. This makes more sense and makes the main factory methods more clear, where the state required for fragments is set up in create_fragment_at_node instead of in both create_fragment and create_fragment_at_node.

This allows for more HTML contexts to be provided to the basic create_fragment where the provided context HTML is appended to <!DOCTYPE html>, a full processor is created, the last tag opener is found, and a fragment parser is created at that node via create_fragment_at_node.

The HTML5lib tests are updated accordingly to use this new method to create fragments.

Trac ticket: https://core.trac.wordpress.org/ticket/62357

#3 @jonsurrell
3 weeks ago

  • Milestone changed from Awaiting Review to 6.8

@jonsurrell commented on PR #7777:


3 weeks ago
#5

there’s a hidden performance surprise in there

I thought about that. One thing we could do is optimize the most common case where the context is <body> and have some optimized setup that skips creating a full parser.

Do we need to prefix the DOCTYPE declaration though? What if someone wants a parser in <!DOCTYPE quirks><p><table><td>? I don’t know if that makes sense or is warranted.

I added the DOCTYPE because I suspect we usually want to be in no-quirks mode. If that is not added, then the <body> context that has been used until now would behave differently. The most advanced and low-level option is still available to create a full processor and then create a fragment parser from that if folks _really_ want a fragment in quirks mode.

I'd like to better document this behavior and explain the more advanced option.

@dmsnell commented on PR #7777:


3 weeks ago
#6

I thought about that. One thing we could do is optimize the most common case where the context is <body> and have some optimized setup that skips creating a full parser.

The opposite of that is that <body> doesn’t require that much processing in a case like this.

The advanced option is still available…I'd like to better document this behavior and explain the more advanced option.

This seems proper. That makes a safe default with an escape hatch for those who want to do something where they adopt their own risk.

@Bernhard Reiter commented on PR #7348:


2 weeks ago
#7

How does this relate to create_fragment's $context argument? The latter is currently hard-wired to `<body>`.

It seems like create_fragment_at_current_node will allow to bypass this limitation (as it sets the new parser's context_node). Is this the only way we'll be able to create a fragment with a non-<body> context?

If yes: Should we deprecate create_fragment's $context arg?
If no: Would it make sense to move some of the logic from create_fragment_at_current_node to create_fragment, in order to have the latter support different contexts?

@jonsurrell commented on PR #7348:


2 weeks ago
#8

Those are all good questions and they're not addressed in this PR, but are in #7777.

I wanted to keep these two PRs independent because ::create_fragment is still the most common way to create a fragment and I didn't want to refactor it in the same changeset.

This method serves as an "advanced" way of creating fragments (with a different doctype, node, or in quirks mode), while ::create_fragment remains unchanged.

#7777 changes this so that instead of create_fragment_at_current_node calling create_fragment and then changing a bunch of internal state, the processor _always_ starts from a full HTML processor and then creates a fragment processor from that. This basically aligns with the HTML specification. This will allow more contexts to be used in create_fragment as well where it will create a full processor, find the last tag opener in the context, and create the fragment at that location.

There's some relevant conversation on #7777 https://github.com/WordPress/wordpress-develop/pull/7777#pullrequestreview-2433840351.

@jonsurrell commented on PR #7348:


2 weeks ago
#9

Thank you!

How would you like to land this?

I don't feel too strongly, I this can be landed closing #62357, then https://github.com/WordPress/wordpress-develop/pull/7777 could be against another ticket like "allow more fragment context in ::create_fragment".

https://github.com/WordPress/wordpress-develop/pull/7777 supersedes https://github.com/WordPress/wordpress-develop/pull/7141. That PR was associated with #61576, but that was a general "HTAML API in 6.7" ticket. We can sort out https://github.com/WordPress/wordpress-develop/pull/7777 later.

@jonsurrell commented on PR #7777:


2 weeks ago
#10

This can pull in tests from https://github.com/WordPress/wordpress-develop/pull/7141 and should also do this (mentioned in that description):

Review all documentation looking for places stating that <body> is the only supported context.

#11 @gziolo
2 weeks ago

  • Keywords commit added

#12 @Bernhard Reiter
2 weeks ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 59444:

HTML API: Add method to create fragment at node.

HTML Fragment parsing always happens with a context node, which may impact how a fragment of HTML is parsed. HTML Fragment Processors can be instantiated with a BODY context node via WP_HTML_Processor::create_fragment( $html ).

This changeset adds a static method called create_fragment_at_current_node( string $html_fragment ). It can only be called when the processor is paused at a #tag, with some additional constraints:

  • The opening and closing tags must appear in the HTML input (no virtual tokens).
  • No "self-contained" elements are allowed ( IFRAME, SCRIPT, TITLE, etc.).

If successful, the method will return a WP_HTML_Processor instance whose context is inherited from the node that the method was called from.

Props jonsurrell, bernhard-reiter, gziolo.
Fixes #62357.

This ticket was mentioned in PR #7859 on WordPress/wordpress-develop by @jonsurrell.


2 weeks ago
#14

Prevent fragments from being created at tag closers. This was noticed by @ockham in https://github.com/WordPress/wordpress-develop/pull/7348:

This got me thinking -- should we disallow create_fragment_at_current_node from being called when the processor is paused on a tag closer?

(If we do, then we should be able to remove this guard, as it would be guaranteed that we're at a tag opener.)

Follow-up [59444]
Trac ticket: https://core.trac.wordpress.org/ticket/62357

#15 @jonsurrell
2 weeks ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

Reopening for follow-up change.

@jonsurrell commented on PR #7777:


2 weeks ago
#16

I've pulled some valuable changes like tests and documentation from https://github.com/WordPress/wordpress-develop/pull/7141 into this PR.

@jonsurrell commented on PR #7777:


2 weeks ago
#17

https://github.com/WordPress/wordpress-develop/pull/7348 has landed. I've merged trunk, added some improvements like tests and _doing_it_wrong from https://github.com/WordPress/wordpress-develop/pull/7141.

I'm opening this up for review now 🎉

#18 @cbravobernal
2 weeks ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 59450:

HTML API: Prevent fragment creation on close tag.

Prevent fragments from being created at tag closers.

Follow-up to [59444].

Props jonsurrell, bernhard-reiter.
Fixes #62357.

@Bernhard Reiter commented on PR #7777:


11 days ago
#20

The PR says that there are currently conflicts; would you mind rebasing this? 😊

@jonsurrell commented on PR #7777:


11 days ago
#21

Branch is up-to-date and should be conflict free now 👍

@Bernhard Reiter commented on PR #7777:


9 days ago
#22

Branch is up-to-date and should be conflict free now 👍

Conflicting again now 🙈 Mind rebasing once again? 😅 I'll review today.

@jonsurrell commented on PR #7777:


9 days ago
#23

I'll create a dedicated ticket for this. It's a very logical step after https://github.com/WordPress/wordpress-develop/pull/7348, but does introduce a new feature and is a significant refactor.

Note: See TracTickets for help on using tickets.