WordPress.org

Make WordPress Core

Opened 4 months ago

Last modified 4 months ago

#50867 new enhancement

An API which encourages automatic escaping of HTML

Reported by: noisysocks Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version:
Component: General Keywords: has-patch needs-unit-tests needs-docs dev-feedback 2nd-opinion
Focuses: Cc:

Description (last modified by noisysocks)

It's common in WordPress to write PHP code that assembles a large bit of HTML using conditional logic. A good example of this is render_block_core_navigation_link. Unfortunately this type of code can become difficult to read and error prone. For example, we've had several reported XSS vulnerabilities in code like this.

How do we feel about adding an API for safely building large bits of HTML?

Attached is a patch which implements an API inspired by createElement in @wordpress/element and the external classnames JavaScript library.

The primary interface is wp_el. It takes three arguments: an HTML tag name, an array of HTML attributes, and an array of child elements. You can nest calls to wp_el within each other to cleanly create deeply nested HTML.

<?php
echo wp_el(
        'figure',
        array( 'class' => 'my-image' ),
        array(
                wp_el( 'img', array( 'src' => 'https://pbs.twimg.com/media/Ed_W9VQXkAAtYQY?format=jpg&name=medium' ) ),
                wp_el(
                        'figcaption',
                        null,
                        'A cold refreshing glass of pilk'
                ),
        )
);
<figure class="my-image"><img src="https://pbs.twimg.com/media/Ed_W9VQXkAAtYQY?format=jpg&amp;name=medium" /><figcaption>A cold refreshing glass of pilk</figcaption></figure>

Optional arguments and automatic handling of non associative array values can be used to make usage quite succinct.

<?php
echo wp_el( 'hr' );
echo wp_el( 'input', 'required' );
<hr /><input required />

The key design detail is that all strings are automatically escaped. If you want to output unescaped HTML you have to do it explicitly.

<?php
echo wp_el(
        'article',
        null,
        wp_dangerous_html( '<marquee>Unescaped HTML</marquee>' )
);
<article><marquee>Unescaped HTML</marquee></article>

Lastly, wp_classnames provides a convenient way to assemble HTML class attributes.

<?php
echo wp_el(
        'li',
        array(
                'class' => wp_classnames(
                        array(
                                'my-link',
                                'is-current' => $post->ID === $id,
                        )
                ),
        ),
        array(
                wp_el(
                        'a',
                        array(
                                'href' => $post->guid,
                        ),
                        $post->post_name
                ),
        )
);
<li class="my-link"><a href="http://localhost:8888/?p=215">navigation-stored-in-old-way</a></li>

Thoughts? Are there alternative approaches common in the PHP ecosystem? Does such an API belong in Core? What other approaches can we take to prevent unescaped strings from being output?

Attachments (1)

50867.2.diff (4.6 KB) - added by noisysocks 4 months ago.

Download all attachments as: .zip

Change History (10)

@noisysocks
4 months ago

#1 @noisysocks
4 months ago

  • Summary changed from An API for assembling large bits of HTML to An API which encourages automatic escaping of HTML

#2 @noisysocks
4 months ago

  • Description modified (diff)

#3 @andraganescu
4 months ago

  • Keywords dev-feedback 2nd-opinion added

#4 @talldanwp
4 months ago

A big yes from me. Concatenating HTML is error-prone for the reasons you mention, this looks like it'd make the process safer and also provide a better structure for building markup.

The implementation also doesn't look too complicated.

I feel like it'd be good to also escape the result of wp_classnames, as there's potential for it to be used separately from wp_el.

#5 follow-up: @johnbillion
4 months ago

Is there existing precedent for this in the wider PHP ecosystem? Any existing libraries that we can utilise or get inspiration from?

#6 @jorbin
4 months ago

This is a fun idea. My first thought is I like it.

In addition to escaping, I would love to see some automated accessibility helpers. For example, throwing a _doing_it_wrong when an img doesn't contain an alt attribute.

#7 in reply to: ↑ 5 @noisysocks
4 months ago

To better illustrate this proposal, I rewrote render_block_core_navigation_link using wp_el. You can see how that looks here:

https://github.com/WordPress/gutenberg/compare/try/refactor-navigation-link-render_callback


Replying to johnbillion:

Is there existing precedent for this in the wider PHP ecosystem? Any existing libraries that we can utilise or get inspiration from?

I suppose it's more common in other PHP codebases to use a templating language. See #33472. It's a much more heavyweight solution, though. I like that this proposal is optional: another tool to go into a developer's toolbox.

#8 @talldanwp
4 months ago

To better illustrate this proposal, I rewrote render_block_core_navigation_link using wp_el. You can see how that looks here

https://github.com/WordPress/gutenberg/compare/try/refactor-navigation-link-render_callback

From that, I think wp_classnames has definite merit, that part of the code is much easier to read.

wp_el is a little difficult to read in my opinion, mostly because of the amount of indentation, but I still think it's an improvement over the original concatenated string.

One of the worst parts I find about assembling HTML using current techniques is having to match up opening and closing tags, particularly when there's some conditional logic involved around those tags. It's too easy to make a mistake and end up with invalid HTML.

I like that the proposal here doesn't suffer from that problem that tags are implicitly closed by the function.

Last edited 4 months ago by talldanwp (previous) (diff)

#9 @ayeshrajans
4 months ago

Thanks for creating this ticket. I come from a Drupal background, where we had a render API for as long as we remember.

The problem we faced with the Drupal render API (perhaps the most widely used implementation due to Drupal's popularity is that the implementation of the feature practically creates a new domain language that can out off outsiders to WordPress.

In Drupal, each HTML element is an array, which supports nested elements too. Due to historical reasons, they are not objects, which makes it not so easy to autocomplete in IDEs, generate documentation, etc.

Onna higher level, having such an API will be quite benefiting in many areas.

  • Automatic and context-aware string escaping.
  • Rendering content to other formats than HTML.
  • Automatic CSRF/Validation support for forms.
  • Positibility to expose hooks to easily alter HTML.
  • Caching improvements such as partial caching, lazy rendering, etc.
Note: See TracTickets for help on using tickets.