Make WordPress Core

Opened 4 years ago

Last modified 7 months ago

#50867 new enhancement

An API which encourages automatic escaping of HTML

Reported by: noisysocks's profile noisysocks Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version:
Component: HTML API Keywords: has-patch needs-unit-tests needs-docs dev-feedback 2nd-opinion
Focuses: Cc:

Description (last modified by noisysocks)

It's common in WordPress to write PHP code that assembles a large bit of HTML using conditional logic. A good example of this is render_block_core_navigation_link. Unfortunately this type of code can become difficult to read and error prone. For example, we've had several reported XSS vulnerabilities in code like this.

How do we feel about adding an API for safely building large bits of HTML?

Attached is a patch which implements an API inspired by createElement in @wordpress/element and the external classnames JavaScript library.

The primary interface is wp_el. It takes three arguments: an HTML tag name, an array of HTML attributes, and an array of child elements. You can nest calls to wp_el within each other to cleanly create deeply nested HTML.

<?php
echo wp_el(
        'figure',
        array( 'class' => 'my-image' ),
        array(
                wp_el( 'img', array( 'src' => 'https://pbs.twimg.com/media/Ed_W9VQXkAAtYQY?format=jpg&name=medium' ) ),
                wp_el(
                        'figcaption',
                        null,
                        'A cold refreshing glass of pilk'
                ),
        )
);
<figure class="my-image"><img src="https://pbs.twimg.com/media/Ed_W9VQXkAAtYQY?format=jpg&amp;name=medium" /><figcaption>A cold refreshing glass of pilk</figcaption></figure>

Optional arguments and automatic handling of non associative array values can be used to make usage quite succinct.

<?php
echo wp_el( 'hr' );
echo wp_el( 'input', 'required' );
<hr /><input required />

The key design detail is that all strings are automatically escaped. If you want to output unescaped HTML you have to do it explicitly.

<?php
echo wp_el(
        'article',
        null,
        wp_dangerous_html( '<marquee>Unescaped HTML</marquee>' )
);
<article><marquee>Unescaped HTML</marquee></article>

Lastly, wp_classnames provides a convenient way to assemble HTML class attributes.

<?php
echo wp_el(
        'li',
        array(
                'class' => wp_classnames(
                        array(
                                'my-link',
                                'is-current' => $post->ID === $id,
                        )
                ),
        ),
        array(
                wp_el(
                        'a',
                        array(
                                'href' => $post->guid,
                        ),
                        $post->post_name
                ),
        )
);
<li class="my-link"><a href="http://localhost:8888/?p=215">navigation-stored-in-old-way</a></li>

Thoughts? Are there alternative approaches common in the PHP ecosystem? Does such an API belong in Core? What other approaches can we take to prevent unescaped strings from being output?

Attachments (1)

50867.2.diff (4.6 KB) - added by noisysocks 4 years ago.

Download all attachments as: .zip

Change History (19)

@noisysocks
4 years ago

#1 @noisysocks
4 years ago

  • Summary changed from An API for assembling large bits of HTML to An API which encourages automatic escaping of HTML

#2 @noisysocks
4 years ago

  • Description modified (diff)

#3 @andraganescu
4 years ago

  • Keywords dev-feedback 2nd-opinion added

#4 @talldanwp
4 years ago

A big yes from me. Concatenating HTML is error-prone for the reasons you mention, this looks like it'd make the process safer and also provide a better structure for building markup.

The implementation also doesn't look too complicated.

I feel like it'd be good to also escape the result of wp_classnames, as there's potential for it to be used separately from wp_el.

#5 follow-up: @johnbillion
4 years ago

Is there existing precedent for this in the wider PHP ecosystem? Any existing libraries that we can utilise or get inspiration from?

#6 @jorbin
4 years ago

This is a fun idea. My first thought is I like it.

In addition to escaping, I would love to see some automated accessibility helpers. For example, throwing a _doing_it_wrong when an img doesn't contain an alt attribute.

#7 in reply to: ↑ 5 @noisysocks
4 years ago

To better illustrate this proposal, I rewrote render_block_core_navigation_link using wp_el. You can see how that looks here:

https://github.com/WordPress/gutenberg/compare/try/refactor-navigation-link-render_callback


Replying to johnbillion:

Is there existing precedent for this in the wider PHP ecosystem? Any existing libraries that we can utilise or get inspiration from?

I suppose it's more common in other PHP codebases to use a templating language. See #33472. It's a much more heavyweight solution, though. I like that this proposal is optional: another tool to go into a developer's toolbox.

#8 @talldanwp
4 years ago

To better illustrate this proposal, I rewrote render_block_core_navigation_link using wp_el. You can see how that looks here

https://github.com/WordPress/gutenberg/compare/try/refactor-navigation-link-render_callback

From that, I think wp_classnames has definite merit, that part of the code is much easier to read.

wp_el is a little harder to read in my opinion, mostly because of the amount of indentation, but I still think it's an improvement over the original concatenated string.

One of the worst parts I find about assembling HTML using current techniques is having to match up opening and closing tags, particularly when there's some conditional logic involved around those tags. It's too easy to make a mistake and end up with invalid HTML.

I like that the proposal here doesn't suffer from that problem that tags are implicitly closed by the function.

Version 0, edited 4 years ago by talldanwp (next)

#9 @ayeshrajans
4 years ago

Thanks for creating this ticket. I come from a Drupal background, where we had a render API for as long as we remember.

The problem we faced with the Drupal render API (perhaps the most widely used implementation due to Drupal's popularity is that the implementation of the feature practically creates a new domain language that can out off outsiders to WordPress.

In Drupal, each HTML element is an array, which supports nested elements too. Due to historical reasons, they are not objects, which makes it not so easy to autocomplete in IDEs, generate documentation, etc.

Onna higher level, having such an API will be quite benefiting in many areas.

  • Automatic and context-aware string escaping.
  • Rendering content to other formats than HTML.
  • Automatic CSRF/Validation support for forms.
  • Positibility to expose hooks to easily alter HTML.
  • Caching improvements such as partial caching, lazy rendering, etc.

#11 @zieladam
2 years ago

This is great @noisysocks! I'd love to have such an API in core. It's like a reverse of tokenizing the generated HTML.

#12 @dmsnell
19 months ago

Expanding on what @zieladam wrote, I tossed together another version of this before I realized this Ticket exists.

https://github.com/WordPress/wordpress-develop/pull/4065

That relies on the new Tag Processor to create the HTML, which at first seems a bit heavy-handed, but then maybe isn't so bad because it's working on tiny little strings and shares the rules of the HTML engine powering the new HTML API.

There's a lot of room for an HTML-creator in the new HTML API. There's no concession for CSS classes in my PR apart from the possibility of exporting the embedded Tag Processor directly, which opens up add_class and remove_class. That's one of the simpler utilities to tack on though, obviously since it's a small function. Conditional attributes are supported by nature since a boolean attribute's false value translate into omitting that attribute from the HTML entirely.

From what I could tell, even where it's difficult to grasp what's happening in a builder call, it's also quite difficult to grasp in the existing calls. Worse, it's hard to see if there are obvious defects because of the mix of PHP and HTML syntax.

<<?php echo $safe_type; ?> controls="controls" preload="none" width="<?php echo (int) $theme_width; ?>"
        <?php
        if ( 'video' === $safe_type ) {
                echo ' height="', (int) $theme_height, '"';
        }
        ?>
></<?php echo $safe_type; ?>>
<?php echo WP_HTML::make_tag(
        $safe_type,
        array(
                'controls' => true,
                'preload'  => 'none',
                'width'    => (int) $theme_width,
                'height'   => 'video' === $safe_type ? (int) $theme_height : false
        )
);

For example, throwing a _doing_it_wrong when an img doesn't contain an alt attribute.

👍 surely these rules will frustrate developers, but it's sound technically.

---

the HTML API focuses on strings, despite introducing new classes into WP. the string-nature limits memory bloat. this is why I've proposed accepting the children parameter as a string.

This ticket was mentioned in PR #5884 on WordPress/wordpress-develop by @dmsnell.


8 months ago
#13

Trac ticket: Core-50867

In this patch the WP_HTML::tag() method is introduced for safely creating HTML content for a given tag and a number of attributes or inner text. No support is proposed yet for inner tags.

#14 @dmsnell
8 months ago

I've created a new proposal based on the HTML API whose only purpose is to create a new tag. Although I will add illustrative tests soon, my day is over and I need to sign off for now, but I wanted to share this.

Originally I had hoped that we could get a much more valuable HTML templating system into 6.5, but over the past week I've realized that's a bit too rushed.

With WP_HTML::tag() it's possible to create safe HTML. It doesn't support nested tags at the moment, as I think that opens some of the more complicated design questions that templating does. Still, when the time comes, I think we'll find that we have to wrap any inner tags in a class to ensure that we don't invite unsafe string operations that could break the output; that class would be a call to WP_HTML::tag() or some variant, meaning the _only_ user- or develop-input we allow ends up as an encoded string _or_ the result of calling WP_HTML::tag() with encoded strings.

This is different than general purpose templating and it won't be usable everywhere, but already it provides a helpful utility with additional conveniences over current HTML-generating PHP code. For instance, it's possible to pass attribute values as true for a boolean attribute or false to ensure no attribute of the given name appears in the markup.

<?php
echo WP_HTML::tag( 'div', array( 'class' => 'is-safe' ), 'Hello, world!' );
// <div class="is-safe">Hello, world!</div>

echo WP_HTML::tag( 'input', array( 'type' => '"></script>', 'disabled' => true ), 'Is this > that?' );
// <input type="&quot;&gt;&lt;/script&gt;" disabled>

echo WP_HTML::tag( 'p', null, 'Is this > that?' );
// <p>Is this &gt; that?</p>

echo WP_HTML::tag( 'wp-emoji', array( 'name' => ':smile:' ), null, 'self-closing' );
// <wp-emoji name=":smile:" />

#17 @dmsnell
8 months ago

  • Component changed from General to HTML API

@dlh commented on PR #5884:


7 months ago
#18

If I could suggest one thing to put on your list, if it isn't there already: This new method and any future templating methods should be added to the list of auto-escaped functions in the core coding standards. Until then, developers will need to // phpcs:ignore WordPress.Security.EscapeOutput.OutputNotEscaped each use of the new methods, or they'll need to amend their PHPCS configurations to exempt the methods themselves. Or, perhaps worse, they'll run wp_kses_post() over the output just to get rid of the error. The need for a workaround might unnecessarily slow adoption of the new APIs by making them appear to be "not worth the hassle."

If I have availability to take things up with PHPCS when the time comes, I'll be happy to, but I'm not sure that I will, so I thought I'd mention it separately.

@gziolo commented on PR #5884:


7 months ago
#19

This new method and any future templating methods should be added to the list of auto-escaped functions in the core coding standards.

Should also other methods be on that list? Example, set_attribute from the `WP_HTML_Tag_Processor. Or did I misunderstood the comment left by @dmsnell in https://github.com/WordPress/wordpress-develop/pull/5888#discussion_r1476793062?

@dmsnell commented on PR #5884:


7 months ago
#20

I'd be grateful for help adding this to where it belongs in the WPCS ruleset.

@gziolo that's a good point, but I don't think set_attribute() and friends are where it will matter. My guess, which is uninformed, is that we only need one or two exemptions for get_updated_html() and __toString(). It's another question if we need these both on the Tag Processor and also on the HTML Processor; I'm not sure if the rules assume that subclasses inherit this property of their base class.

On this note it would be rather helpful if we could indicate _not_ to escape things to methods like set_attribute(). Do we even have any rules that say "be sure not to escape this!"?

Note: See TracTickets for help on using tickets.