Opened 4 years ago
Last modified 10 months ago
#50867 new enhancement
An API which encourages automatic escaping of HTML
Reported by: | noisysocks | Owned by: | |
---|---|---|---|
Milestone: | Awaiting Review | Priority: | normal |
Severity: | normal | Version: | |
Component: | HTML API | Keywords: | has-patch needs-unit-tests needs-docs dev-feedback 2nd-opinion |
Focuses: | Cc: |
Description (last modified by )
It's common in WordPress to write PHP code that assembles a large bit of HTML using conditional logic. A good example of this is render_block_core_navigation_link. Unfortunately this type of code can become difficult to read and error prone. For example, we've had several reported XSS vulnerabilities in code like this.
How do we feel about adding an API for safely building large bits of HTML?
Attached is a patch which implements an API inspired by createElement
in @wordpress/element
and the external classnames
JavaScript library.
The primary interface is wp_el
. It takes three arguments: an HTML tag name, an array of HTML attributes, and an array of child elements. You can nest calls to wp_el
within each other to cleanly create deeply nested HTML.
<?php echo wp_el( 'figure', array( 'class' => 'my-image' ), array( wp_el( 'img', array( 'src' => 'https://pbs.twimg.com/media/Ed_W9VQXkAAtYQY?format=jpg&name=medium' ) ), wp_el( 'figcaption', null, 'A cold refreshing glass of pilk' ), ) );
<figure class="my-image"><img src="https://pbs.twimg.com/media/Ed_W9VQXkAAtYQY?format=jpg&name=medium" /><figcaption>A cold refreshing glass of pilk</figcaption></figure>
Optional arguments and automatic handling of non associative array values can be used to make usage quite succinct.
<?php echo wp_el( 'hr' ); echo wp_el( 'input', 'required' );
<hr /><input required />
The key design detail is that all strings are automatically escaped. If you want to output unescaped HTML you have to do it explicitly.
<?php echo wp_el( 'article', null, wp_dangerous_html( '<marquee>Unescaped HTML</marquee>' ) );
<article><marquee>Unescaped HTML</marquee></article>
Lastly, wp_classnames
provides a convenient way to assemble HTML class
attributes.
<?php echo wp_el( 'li', array( 'class' => wp_classnames( array( 'my-link', 'is-current' => $post->ID === $id, ) ), ), array( wp_el( 'a', array( 'href' => $post->guid, ), $post->post_name ), ) );
<li class="my-link"><a href="http://localhost:8888/?p=215">navigation-stored-in-old-way</a></li>
Thoughts? Are there alternative approaches common in the PHP ecosystem? Does such an API belong in Core? What other approaches can we take to prevent unescaped strings from being output?
Attachments (1)
Change History (19)
#1
@
4 years ago
- Summary changed from An API for assembling large bits of HTML to An API which encourages automatic escaping of HTML
#5
follow-up:
↓ 7
@
4 years ago
Is there existing precedent for this in the wider PHP ecosystem? Any existing libraries that we can utilise or get inspiration from?
#6
@
4 years ago
This is a fun idea. My first thought is I like it.
In addition to escaping, I would love to see some automated accessibility helpers. For example, throwing a _doing_it_wrong
when an img doesn't contain an alt attribute.
#7
in reply to:
↑ 5
@
4 years ago
To better illustrate this proposal, I rewrote render_block_core_navigation_link
using wp_el
. You can see how that looks here:
https://github.com/WordPress/gutenberg/compare/try/refactor-navigation-link-render_callback
Replying to johnbillion:
Is there existing precedent for this in the wider PHP ecosystem? Any existing libraries that we can utilise or get inspiration from?
I suppose it's more common in other PHP codebases to use a templating language. See #33472. It's a much more heavyweight solution, though. I like that this proposal is optional: another tool to go into a developer's toolbox.
#8
@
4 years ago
To better illustrate this proposal, I rewrote render_block_core_navigation_link using wp_el. You can see how that looks here
https://github.com/WordPress/gutenberg/compare/try/refactor-navigation-link-render_callback
From that, I think wp_classnames
has definite merit, that part of the code is much easier to read.
wp_el
is a little harder to read in my opinion, mostly because of the amount of indentation, but I still think it's an improvement over the original concatenated string.
One of the worst parts I find about assembling HTML using current techniques is having to match up opening and closing tags, particularly when there's some conditional logic involved around those tags. It's too easy to make a mistake and end up with invalid HTML.
I like that the proposal here doesn't suffer from that problem that tags are implicitly closed by the function.
#9
@
4 years ago
Thanks for creating this ticket. I come from a Drupal background, where we had a render API for as long as we remember.
- Drupal render API (Array based, supports associative arrays converted to HTML structures).
- https://github.com/spatie/html-element - a bit modern approach.
- PEAR QuickHTML (It's a pear package 😓).
The problem we faced with the Drupal render API (perhaps the most widely used implementation due to Drupal's popularity is that the implementation of the feature practically creates a new domain language that can out off outsiders to WordPress.
In Drupal, each HTML element is an array, which supports nested elements too. Due to historical reasons, they are not objects, which makes it not so easy to autocomplete in IDEs, generate documentation, etc.
Onna higher level, having such an API will be quite benefiting in many areas.
- Automatic and context-aware string escaping.
- Rendering content to other formats than HTML.
- Automatic CSRF/Validation support for forms.
- Positibility to expose hooks to easily alter HTML.
- Caching improvements such as partial caching, lazy rendering, etc.
#11
@
2 years ago
This is great @noisysocks! I'd love to have such an API in core. It's like a reverse of tokenizing the generated HTML.
#12
@
22 months ago
Expanding on what @zieladam wrote, I tossed together another version of this before I realized this Ticket exists.
https://github.com/WordPress/wordpress-develop/pull/4065
That relies on the new Tag Processor to create the HTML, which at first seems a bit heavy-handed, but then maybe isn't so bad because it's working on tiny little strings and shares the rules of the HTML engine powering the new HTML API.
There's a lot of room for an HTML-creator in the new HTML API. There's no concession for CSS classes in my PR apart from the possibility of exporting the embedded Tag Processor directly, which opens up add_class
and remove_class
. That's one of the simpler utilities to tack on though, obviously since it's a small function. Conditional attributes are supported by nature since a boolean attribute's false
value translate into omitting that attribute from the HTML entirely.
From what I could tell, even where it's difficult to grasp what's happening in a builder call, it's also quite difficult to grasp in the existing calls. Worse, it's hard to see if there are obvious defects because of the mix of PHP and HTML syntax.
<<?php echo $safe_type; ?> controls="controls" preload="none" width="<?php echo (int) $theme_width; ?>" <?php if ( 'video' === $safe_type ) { echo ' height="', (int) $theme_height, '"'; } ?> ></<?php echo $safe_type; ?>>
<?php echo WP_HTML::make_tag( $safe_type, array( 'controls' => true, 'preload' => 'none', 'width' => (int) $theme_width, 'height' => 'video' === $safe_type ? (int) $theme_height : false ) );
For example, throwing a _doing_it_wrong when an img doesn't contain an alt attribute.
👍 surely these rules will frustrate developers, but it's sound technically.
---
the HTML API focuses on strings, despite introducing new classes into WP. the string-nature limits memory bloat. this is why I've proposed accepting the children
parameter as a string.
This ticket was mentioned in PR #5884 on WordPress/wordpress-develop by @dmsnell.
11 months ago
#13
Trac ticket: Core-50867
In this patch the WP_HTML::tag()
method is introduced for safely creating HTML content for a given tag and a number of attributes or inner text. No support is proposed yet for inner tags.
#14
@
11 months ago
I've created a new proposal based on the HTML API whose only purpose is to create a new tag. Although I will add illustrative tests soon, my day is over and I need to sign off for now, but I wanted to share this.
Originally I had hoped that we could get a much more valuable HTML templating system into 6.5, but over the past week I've realized that's a bit too rushed.
With WP_HTML::tag()
it's possible to create safe HTML. It doesn't support nested tags at the moment, as I think that opens some of the more complicated design questions that templating does. Still, when the time comes, I think we'll find that we have to wrap any inner tags in a class to ensure that we don't invite unsafe string operations that could break the output; that class would be a call to WP_HTML::tag()
or some variant, meaning the _only_ user- or develop-input we allow ends up as an encoded string _or_ the result of calling WP_HTML::tag()
with encoded strings.
This is different than general purpose templating and it won't be usable everywhere, but already it provides a helpful utility with additional conveniences over current HTML-generating PHP code. For instance, it's possible to pass attribute values as true
for a boolean attribute or false
to ensure no attribute of the given name appears in the markup.
<?php echo WP_HTML::tag( 'div', array( 'class' => 'is-safe' ), 'Hello, world!' ); // <div class="is-safe">Hello, world!</div> echo WP_HTML::tag( 'input', array( 'type' => '"></script>', 'disabled' => true ), 'Is this > that?' ); // <input type=""></script>" disabled> echo WP_HTML::tag( 'p', null, 'Is this > that?' ); // <p>Is this > that?</p> echo WP_HTML::tag( 'wp-emoji', array( 'name' => ':smile:' ), null, 'self-closing' ); // <wp-emoji name=":smile:" />
10 months ago
#18
If I could suggest one thing to put on your list, if it isn't there already: This new method and any future templating methods should be added to the list of auto-escaped functions in the core coding standards. Until then, developers will need to // phpcs:ignore WordPress.Security.EscapeOutput.OutputNotEscaped
each use of the new methods, or they'll need to amend their PHPCS configurations to exempt the methods themselves. Or, perhaps worse, they'll run wp_kses_post()
over the output just to get rid of the error. The need for a workaround might unnecessarily slow adoption of the new APIs by making them appear to be "not worth the hassle."
If I have availability to take things up with PHPCS when the time comes, I'll be happy to, but I'm not sure that I will, so I thought I'd mention it separately.
10 months ago
#19
This new method and any future templating methods should be added to the list of auto-escaped functions in the core coding standards.
Should also other methods be on that list? Example, set_attribute
from the `WP_HTML_Tag_Processor. Or did I misunderstood the comment left by @dmsnell in https://github.com/WordPress/wordpress-develop/pull/5888#discussion_r1476793062?
10 months ago
#20
I'd be grateful for help adding this to where it belongs in the WPCS ruleset.
@gziolo that's a good point, but I don't think set_attribute()
and friends are where it will matter. My guess, which is uninformed, is that we only need one or two exemptions for get_updated_html()
and __toString()
. It's another question if we need these both on the Tag Processor and also on the HTML Processor; I'm not sure if the rules assume that subclasses inherit this property of their base class.
On this note it would be rather helpful if we could indicate _not_ to escape things to methods like set_attribute()
. Do we even have any rules that say "be sure not to escape this!"?
A big yes from me. Concatenating HTML is error-prone for the reasons you mention, this looks like it'd make the process safer and also provide a better structure for building markup.
The implementation also doesn't look too complicated.
I feel like it'd be good to also escape the result of
wp_classnames
, as there's potential for it to be used separately fromwp_el
.