Make WordPress Core

Opened 7 weeks ago

Last modified 6 weeks ago

#62653 new feature request

Add CSS selector based tag navigation to HTML and Tag Processors

Reported by: jonsurrell's profile jonsurrell Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version:
Component: HTML API Keywords: has-patch has-unit-tests
Focuses: Cc:

Description (last modified by jonsurrell)

CSS selectors provide an excellent way to navigate through an HTML document. They are a familiar and interface for many web developers to navigate elements in a page and fit well with the HTML API. They're a core concept in the Block API where attributes may be sourced based on a selector.

HTML API: Add attribute sourcing describes some of the needs based on selectors used in existing Core blocks:

.blocks-gallery-caption
.blocks-gallery-item
.blocks-gallery-item__caption
.book-author
.message
a
a[download]
audio
blockquote
cite
code
div
figcaption
figure > a
figure a
figure img
figure video,figure img
h1,h2,h3,h4,h5,h6
img
li
ol,ul
p
pre
tbody tr
td,th
tfoot tr
thead tr
video

Some use cases:

Prior work:

Change History (10)

#1 @jonsurrell
7 weeks ago

  • Description modified (diff)

This ticket was mentioned in PR #7857 on WordPress/wordpress-develop by @jonsurrell.


6 weeks ago
#2

  • Keywords has-patch has-unit-tests added

Introduce CSS selector based traversal of HTML documents in the HTML API. Add new select_all and select methods to the Tag Processor and HTML Processor.

// With select_all to traverse a document stopping on matching selectors
$processor = WP_HTML_Processor::create_full_parser( '<p match><div att match><em><i match></i><a match>' );
foreach ( $processor->select_all( 'p, [att], em > *' ) as $_ ) {
        assert( $processor->get_attribute( 'match' ) );
}

// With select to move to a matching selector
$processor = WP_HTML_Processor::create_full_parser( '<p match><div att match><em><i match></i><a match>' );
assert( $processor->select( 'p, [att], em > *' ) );
assert( $processor->get_attribute( 'match' ) );
assert( 'P' === $processor->get_tag() );

A subset of the CSS selector grammar is available as described here:

https://github.com/WordPress/wordpress-develop/blob/355c9a24e0d983813ae73e8cacc59287833d2846/src/wp-includes/html-api/class-wp-css-compound-selector-list.php#L24-L39

Notable variations from selectors specification:

Pseudo-element selectors are not supported. Pseudo elements will not exist in the HTML and it's unclear what benefit they would bring.

Pseudo-class selectors are not supported. Pseudo classes could be useful, but the logic to parse and match pseudo-class selectors would add significant complexity. There's also a lot of variety in pseudo-selectors, and rather than supporting simpler selectors (e.g. :empty) and not supporting more complex selectors (e.g. :nth-child()), pseudo-class selectors are completely unsupported. This is a clear and simple rule that greatly simplifies the implementation.

Complex selectors have limited support. Complex selectors are combined selectors using one of the combinators: (whitespace), >, +, or ~. The _Tag Processor does not support complex selectors_ at all, it has no concept of document structure, and all complex selectors are structural. The _HTML Processor does not support the sibling combinators_ + or ~, it only supports a parent/child or ancestor/descendant relationship. These selectors can be handled without tracking additional state in the document by analyzing breadcrumbs. Finally, only type selectors are allowed in non-final position, again because this allows matching against breadcrumbs without tracking additional state:

  • Supported: body heading > h1.page-title[attribute]
  • Unsupported (class / id selector in non final position): #page > main, .page main
  • Unsupported (sibling selectors not supported): ul li ~ li, ul li + li.

Importantly, the selectors supported by the HTML Processor should be sufficient to support all core block attribute selectors according to this PR:

https://github.com/WordPress/wordpress-develop/blob/88e7d30892288caf35cf33b5f99faa75fa7e6e30/src/wp-includes/html-api/class-wp-html-attribute-sourcer.php#L18-L49

_Most_ of the listed selectors are also supported by the Tag Processor with the exception of the complex selectors:

figure > a
figure a
figure img
figure video,figure img
tbody tr
tfoot tr
thead tr

---

## Open questions

### Implementation

The implementation introduces a number of classes. The classes roughly correspond to different parts of the selector grammar. Parsing is handled in the _list classes that represent the top of the grammar. Matching logic is handled by each selector class. The selector classes implement a matches interface.

### Selector traversal APIs

  • select_all is implemented as a generator and (except _doing_it_wrong) has no way to differentiate between "nothing matched" and "the selector was invalid or unsupported". It simply yields nothing in both cases.
  • select uses select_all internally and has the same limitation.
  • select_all expects the document position to remain the same in order to visit all matching tags. Because none of the supported selectors rely on stateful logic, this is not an issue at this time.

---

Todo:

  • [ ] Get and address feedback on open questions.
  • [ ] Split into smaller PRs for review. Specifically, the selector lists can be split into sepearate PRs starting with compound and following with complex selectors.

https://github.com/sirreal/html-api-debugger/pull/5 can be used for testing the parsing and matching behavior.

Trac ticket: …

#3 @jonsurrell
6 weeks ago

As noted in the linked PR, it's not ready for final review. However, I've listed a number of open questions and would appreciate feedback on the questions and high-level feedback on the implementation in general.

@jonsurrell commented on PR #7857:


6 weeks ago
#5

I'm not sure how useful it would be, but I think it would be interesting to add namespaces to type selectors. We could declare 4 namespaces to support: *, html, svg, and math. Then things could be selected like:

  • Any MathML element: math|*
  • SVG Title elements: svg|title

#6 @gziolo
6 weeks ago

It's excellent to see progress on this front. It'll further enhance the developer experience when interacting with HTML API.

I also want to emphasize the more significant impact on WordPress core when this functionality gets introduced. In practical terms, it means that eventually, we will be able to read on the server sourced block attributes from HTML saved for blocks. This would be a huge win for all the use cases where developers want to consume blocks as structured data. The hope is that when all the development concludes, then when rendering the block on the server, the developer will have access to the value of every attribute inside render_callback through the $attributes param. Additionally, two long-standing tickets would be simple to implement:

  • Expose block data directly through REST API endpoints (#53603). When retrieving a post via the REST API you could get the block data as part of the content object instead of the HTML.
  • Expand get_post's $filter parameter to allow block data as a return shape (#53602).

#7 @jonsurrell
6 weeks ago

  • Description modified (diff)

@jonsurrell commented on PR #7857:


6 weeks ago
#8

I'm not happy with how the implementation is distributed. I'd like to move in one of two directions:

Move the parsing logic for selectors into the selector classes, so each selector class is responsible for parsing itself.

I've changed the class structures so that all the selectors extend an abstract base class that provides some constants and utilities for parsing selectors. Each selector class now defines how it is parsed to construct an instance of itself.

This ticket was mentioned in Slack in #core by jonsurrell. View the logs.


6 weeks ago

#10 @jonsurrell
6 weeks ago

I searched for selectors in Gutenberg block library at commit b1d943f and came up with this list of the selectors used:

.blocks-gallery-caption
.blocks-gallery-item
.blocks-gallery-item__caption
.wp-block-form-input__input
.wp-block-form-input__label-content
a
a:not([download])
a[download]
audio
blockquote
cite
div.wp-block-gallery figure.blocks-gallery-image img
figcaption
figure > a
figure a
figure img
figure video,figure img
figure
footer
h1,h2,h3,h4,h5,h6
h2
hr
img
input
ol,ul
p
pre
table
tbody tr
td,th
tfoot tr
thead tr
ul.wp-block-gallery .blocks-gallery-item

The implementation proposed in PR7857 does not support the following selectors from that list:

  • a:not([download]) (pseudo-class :not() unsupported)
  • div.wp-block-gallery figure.blocks-gallery-image img (use of class selectors in non-final position)
  • ul.wp-block-gallery .blocks-gallery-item (use of class selectors in non-final position)
Last edited 6 weeks ago by jonsurrell (previous) (diff)
Note: See TracTickets for help on using tickets.