Make WordPress Core

Opened 6 months ago

Last modified 5 months ago

#60229 new enhancement

HTML API: Introduce HTML Templating

Reported by: dmsnell's profile dmsnell Owned by:
Milestone: Priority: normal
Severity: normal Version: 6.5
Component: HTML API Keywords: has-patch has-unit-tests needs-dev-note dev-feedback
Focuses: Cc:

Description

WordPress relies on developers remembering to perform proper escaping when building HTML strings. There's no mechanism to ensure that output HTML is safe. This patch introduces WP_HTML_Template::render( $template, $args ) to do just that.

<?php
echo WP_HTML_Template::render(
        <<<HTML
<a href="</%url>">
        <img src="</%url>">
        </%url>
</a>
HTML,
        array( 'url' => 'https://s.wp.com/i/atat.png?w=640&h=480&alt="atat>atst"' ),
);

outputs

<a href="https://s.wp.com/i/atat.png?w=640&amp;h=480&amp;alt=&quot;atat&gt;atst&quot;">
<img src="https://s.wp.com/i/atat.png?w=640&amp;h=480&amp;alt=&quot;atat&gt;atst&quot;">
https://s.wp.com/i/atat.png?w=640&amp;h=480&amp;alt=&quot;atat&gt;atst&quot;
</a>

This proposed templating syntax uses closing tags containing invalid tag names, so-called "funky comments," as placeholders, because they are converted to HTML comments in the DOM and because there is near universal existing support for them in all browsers, and because the syntax cannot be nested. The % at the front indicates that the value for the placeholder should come from the args array with a key named according to what follows the %.

This proposal does not yet consider nested HTML, or "raw" HTML. It currently escapes all content. It would be great if the templating engine can properly and safely handle HTML passed into it without risking unintentional exposure, but there must also be some way to communicate that a value inside is already escaped and that its safety is maintained.

By relying on the HTML API, this templating only supports replacement of values inside HTML attributes or in plaintext (#text) nodes. It's not possible to inject HTML tags (unless nested support can be safely added), comments, or other HTML syntax.

Change History (15)

#1 @peterwilsoncc
6 months ago

I think this ticket should have a discussion on make/core before being milestoned.

While templateing was mentioned in the HTML API progress report mid-2023 it was done so as an interesting aside rather than a firm proposal.

Without an architecture discussion I think it's premature to implement this in the current release cycle.

#2 @dmsnell
6 months ago

  • Milestone 6.5 deleted

Thanks @peterwilsoncc. Fine to remove the milestone. It's my hope that we can get this in for 6.5, so pending that discussion I'd still like to target it, but I can remove the milestone. (I'm new learning how to set the milestones and didn't see any explainers in any of the material I've read through)

This ticket was mentioned in Slack in #core by dmsnell. View the logs.


6 months ago

#4 @joemcgill
6 months ago

@dmsnell I see that this is marked as having a patch, but I'm not seeing one linked to the ticket. Do you already have a PoC PR that can be reviewed to better understand how you envision this being implemented?

This ticket was mentioned in PR #5949 on WordPress/wordpress-develop by @dmsnell.


6 months ago
#5

Trac ticket: Core-60229

🚧👷‍♂️🏗️ This feature is currently being proposed for the *WordPress 6.6* release cycle, being pushed back from 6.5 to allow for more time to let the design work proceed.

## Todo

  • [ ] embed replacement inside the Tag Processor
  • [ ] don't allow replacement of escaped funky comment syntax (currently occurs inside attribute value based on how this is built at the moment)
  • [ ] figure out what rules need to apply for nested HTML
  • [ ] figure out how to differentiate nested HTML without requiring sigils or other extra syntax

## Description

Introduces a function providing _context-aware auto-escaping_ HTML templating.

  • generate HTML markup in PHP without needing to escape anything.
  • replace placeholder content with data provided by PHP.
  • supply true to create a boolean attribute or false/null remove an attribute.

### Why use these strange funky comments?

A funky comment looks like a tag closer, but the first character of what would be the tag name is a symbol. This was first mentioned in an HTML API progress report: there's a particular kind of HTML syntax error that meets a number of needs we have seen in attempting to replace dynamic content on the server. Namely:

  • Funky comments cannot be nested, by construction.
  • once inside a funky comment, all bytes are allowed until the first ASCII >, making parsing trivial and reliable.
  • The first symbol provides a very convenient sigil form to differentiate multiple kinds of bits of content: a placeholder, a shortcode, a translation, etc…
  • These are pure HTML and not a new superset syntax of HTML, meaning that when editing in an HTML editor they should stand out properly.
  • Being existing HTML syntax, they cannot break syntax boundaries and cause further parsing problems down the line
  • They are concise and easily hand-written.

### What's in this patch?

This preliminary PR only renders child text and doesn't provide a mechanism for composing rendered HTML or rendering pre-processed HTML. That means everything passed as a child will be escaped to render verbatim in the browser. There are open questions about how best to represent nested HTML, and without a clear answer, this code prefers trust and safety over features.

  • does not render nested HTML (escapes everything)
  • does not escape URLs differently than other attributes
  • relies on esc_attr() and esc_html() which today are the same thing in WordPress, but which could be greatly expanded to improve the overall escaping situation
echo WP_HTML_Template::render(
        <<<HTML
<a href="</%url>">
        <img src="</%url>">
        </%url>
</a>
HTML,
        array( 'url' => 'https://s.wp.com/i/atat.png?w=640&h=480&alt="atat>atst"' ),
);

outputs

[[Image(https://i0.wp.com/s.wp.com/i/atat.png?w=640&h=480&alt="atat>atst")]] https://s.wp.com/i/atat.png?w=640&h=480&alt="atat>atst"

This proposed templating syntax uses closing tags containing invalid tag names, so-called "funky comments," as placeholders, because they are converted to HTML comments in the DOM and because there is near universal existing support for them in all browsers, and because the syntax cannot be nested. The % at the front indicates that the value for the placeholder should come from the args array with a key named according to what follows the %.

#6 follow-up: @dmsnell
6 months ago

thanks for asking @joemcgill - this is my oversight, because I created the PR on my own fork and the Core-60229 didn't link up. I recreated it just now in WordPress/wordpress-develop now that the major refactor to the Tag Processor is complete that this relies on.

the PR is now linked properly

#7 in reply to: ↑ 6 @joemcgill
6 months ago

Replying to dmsnell:

thanks for asking @joemcgill - this is my oversight, because I created the PR on my own fork and the Core-60229 didn't link up. I recreated it just now in WordPress/wordpress-develop now that the major refactor to the Tag Processor is complete that this relies on.

the PR is now linked properly

❤️ Thanks!

@gziolo commented on PR #5949:


5 months ago
#8

I was chatting with @nerrad about createInterpolateElement implemented in @wordpress/element JavaScript package. The related dev note: https://make.wordpress.org/core/2020/07/17/introducing-createinterpolateelement/. What are your thoughts on offering a similar experience in PHP? For the context, I was wondering what it would require to reuse the same templating system in JavaScript.

The example from the dev note illustrates it better:

Error: Failed to load processor jsx
No macro or processor named 'jsx' found

@dmsnell commented on PR #5949:


5 months ago
#9

What are your thoughts on offering a similar experience in PHP?

This is an interesting idea @gziolo but my initial reaction is wondering what this provides beyond the proposed templating. It feels different than JavaScript because we're already dealing with actual HTML strings. I'd be curious to see some of the tradeoffs this would bring.

In contrast, with the ability to "spread" attributes there's a built-in opportunity to make HTML tags placeholders. <span ...spanAttributes>something</span>. Also, if we pass things in like as with JavaScript, we're parsing _two_ strings for each placeholder instead of one. In JavaScript all we're doing is directly creating an object for the replacement, but in PHP we're still creating a string.

@gziolo commented on PR #5949:


5 months ago
#10

In contrast, with the ability to "spread" attributes there's a built-in opportunity to make HTML tags placeholders. <span ...spanAttributes>something</span>.

Right, that's a good point. The same functionality is available but with a different syntax that is, in fact, a bit closer to what people familiar with JSX would expect.

#11 @peterwilsoncc
5 months ago

Rather than reviewing a pull request, I'd like to see much more discussion on whether this is necessary and something WordPress developers ought to maintain in to the future.

With the introduction of the site editor, WordPress themes have moved away from PHP templates in favor of HTML files and JSON data. As more features are added to the site editor, the current need for PHP in some instances will be reduced further with dynamic blocks. A templating API will leave future core contributors having to maintain both the new and the old methods of escaping.

#12 @dmsnell
5 months ago

@peterwilsoncc I'll definitely be posting on Make soon; the PR is here so that there's actual work to examine and discuss, but I've been simultaneously distracted by and considerate of the schedule for 6.5 to wait to invite the broader discussion on HTML generation.

WordPress themes have moved away from PHP templates in favor of HTML files and JSON data. As more features are added to the site editor, the current need for PHP in some instances will be reduced further with dynamic blocks. A templating API will leave future core contributors having to maintain both the new and the old methods of escaping.

While I hear what you are saying about the use of PHP in WordPress themes, I've also observed that we've driven towards more PHP to support various blocks, whereby many developers are actively thinking about and producing PHP that produces HTML than before.

The goal of this templating work is to provide a system of convenient HTML creation that's easier by default than the current array of methods, which are any old mixture of PHP and HTML, and frequently forgetting to call escaping. The primary idea is to remove the need for everyone to have to remember to manually perform all of the steps and do so in the right way, choosing the right escaping in the right times, in order to ensure that WordPress presents a safe experience.

#13 @alexsanford1
5 months ago

I like the idea of moving to a safe templating language to replace raw PHP templates, from a security perspective. I'm curious whether you've given any thought to using an existing (and mature) syntax/implementation rather than creating something custom? I realize that approach would likely come with its own issues, but may also solve a lot of problems regarding adoption (having a robust feature set for the templating language) and security (using a system that already has many security bugs worked out of its implementation).

#14 follow-up: @dmsnell
5 months ago

whether you've given any thought to using an existing (and mature) syntax/implementation rather than creating something custom?

Thanks for asking @alexsanford1! Yes absolutely. This is in part the end of the work discussed regarding dynamic server-side tokens, a project that ultimately began years ago looking to find placeholder syntax, which in part has guided the development of the HTML API itself.

As briefly discussed in the HTML API Progress Report and in the linked-PR's description, the choice of this syntax ultimately fell down to finding a safe syntax embedded within HTML itself. It's actually this very syntax which relies on one of the most mature syntaxes and implementations in any system (HTML) and in so doing, avoids the category of issues created when adding new syntax into HTML.

This is both new syntax and also nothing new. It's new inside attribute values because there's no escape hatch there, but it's not new in any other place. This means that unlike every known templating syntax I've seen (outside of JSX), it's not possible to break HTML boundaries with the placeholders. I draw a strong contrast to using something like {{{ }}} which is a level of parsing above or below HTML. A superset syntax raises questions about how to handle the template syntax when it clashes with the HTML. By embracing the HTML syntax WordPress would never have to answer that question, because a misplaced placeholder is going to be interpreted in some other way, perhaps as an HTML attribute name, but won't break the document.

This syntax places additional constraints that I personally consider beneficial. They cannot be used to for the tag name, cannot be used to concatenate HTML elements, and most importantly, cannot be nested at all (because as HTML syntax, they terminate at the first > following their opening. Should the server fail to replace these, a browser will interpret them as HTML comments, and the biggest leak will be the contents of the placeholder, the page at large being interpreted as if they never existed rather than breaking the HTML structure.

All of this is to say that I believe this is the end of the search for a mature existing syntax and not the prompt for one. These avoid the security issues inherent in almost every templating syntax I've ever seen, and provide a context-aware auto-escaping feature that means developers don't have to worry about escaping, because the system does. It it likely, as you call out, this will mean they don't bring all of the features available in other templating systems, or get them as fast. Concerning the HTML API though reliability and trust are more important than speed and features.

#15 in reply to: ↑ 14 @alexsanford1
5 months ago

Thank you, that background is very helpful! :)

Note: See TracTickets for help on using tickets.