Make WordPress Core

Opened 3 years ago

Last modified 4 months ago

#50867 new enhancement

An API which encourages automatic escaping of HTML

Reported by: noisysocks's profile noisysocks Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version:
Component: General Keywords: has-patch needs-unit-tests needs-docs dev-feedback 2nd-opinion
Focuses: Cc:

Description (last modified by noisysocks)

It's common in WordPress to write PHP code that assembles a large bit of HTML using conditional logic. A good example of this is render_block_core_navigation_link. Unfortunately this type of code can become difficult to read and error prone. For example, we've had several reported XSS vulnerabilities in code like this.

How do we feel about adding an API for safely building large bits of HTML?

Attached is a patch which implements an API inspired by createElement in @wordpress/element and the external classnames JavaScript library.

The primary interface is wp_el. It takes three arguments: an HTML tag name, an array of HTML attributes, and an array of child elements. You can nest calls to wp_el within each other to cleanly create deeply nested HTML.

<?php
echo wp_el(
        'figure',
        array( 'class' => 'my-image' ),
        array(
                wp_el( 'img', array( 'src' => 'https://pbs.twimg.com/media/Ed_W9VQXkAAtYQY?format=jpg&name=medium' ) ),
                wp_el(
                        'figcaption',
                        null,
                        'A cold refreshing glass of pilk'
                ),
        )
);
<figure class="my-image"><img src="https://pbs.twimg.com/media/Ed_W9VQXkAAtYQY?format=jpg&amp;name=medium" /><figcaption>A cold refreshing glass of pilk</figcaption></figure>

Optional arguments and automatic handling of non associative array values can be used to make usage quite succinct.

<?php
echo wp_el( 'hr' );
echo wp_el( 'input', 'required' );
<hr /><input required />

The key design detail is that all strings are automatically escaped. If you want to output unescaped HTML you have to do it explicitly.

<?php
echo wp_el(
        'article',
        null,
        wp_dangerous_html( '<marquee>Unescaped HTML</marquee>' )
);
<article><marquee>Unescaped HTML</marquee></article>

Lastly, wp_classnames provides a convenient way to assemble HTML class attributes.

<?php
echo wp_el(
        'li',
        array(
                'class' => wp_classnames(
                        array(
                                'my-link',
                                'is-current' => $post->ID === $id,
                        )
                ),
        ),
        array(
                wp_el(
                        'a',
                        array(
                                'href' => $post->guid,
                        ),
                        $post->post_name
                ),
        )
);
<li class="my-link"><a href="http://localhost:8888/?p=215">navigation-stored-in-old-way</a></li>

Thoughts? Are there alternative approaches common in the PHP ecosystem? Does such an API belong in Core? What other approaches can we take to prevent unescaped strings from being output?

Attachments (1)

50867.2.diff (4.6 KB) - added by noisysocks 3 years ago.

Download all attachments as: .zip

Change History (13)

@noisysocks
3 years ago

#1 @noisysocks
3 years ago

  • Summary changed from An API for assembling large bits of HTML to An API which encourages automatic escaping of HTML

#2 @noisysocks
3 years ago

  • Description modified (diff)

#3 @andraganescu
3 years ago

  • Keywords dev-feedback 2nd-opinion added

#4 @talldanwp
3 years ago

A big yes from me. Concatenating HTML is error-prone for the reasons you mention, this looks like it'd make the process safer and also provide a better structure for building markup.

The implementation also doesn't look too complicated.

I feel like it'd be good to also escape the result of wp_classnames, as there's potential for it to be used separately from wp_el.

#5 follow-up: @johnbillion
3 years ago

Is there existing precedent for this in the wider PHP ecosystem? Any existing libraries that we can utilise or get inspiration from?

#6 @jorbin
3 years ago

This is a fun idea. My first thought is I like it.

In addition to escaping, I would love to see some automated accessibility helpers. For example, throwing a _doing_it_wrong when an img doesn't contain an alt attribute.

#7 in reply to: ↑ 5 @noisysocks
3 years ago

To better illustrate this proposal, I rewrote render_block_core_navigation_link using wp_el. You can see how that looks here:

https://github.com/WordPress/gutenberg/compare/try/refactor-navigation-link-render_callback


Replying to johnbillion:

Is there existing precedent for this in the wider PHP ecosystem? Any existing libraries that we can utilise or get inspiration from?

I suppose it's more common in other PHP codebases to use a templating language. See #33472. It's a much more heavyweight solution, though. I like that this proposal is optional: another tool to go into a developer's toolbox.

#8 @talldanwp
3 years ago

To better illustrate this proposal, I rewrote render_block_core_navigation_link using wp_el. You can see how that looks here

https://github.com/WordPress/gutenberg/compare/try/refactor-navigation-link-render_callback

From that, I think wp_classnames has definite merit, that part of the code is much easier to read.

wp_el is a little difficult to read in my opinion, mostly because of the amount of indentation, but I still think it's an improvement over the original concatenated string.

One of the worst parts I find about assembling HTML using current techniques is having to match up opening and closing tags, particularly when there's some conditional logic involved around those tags. It's too easy to make a mistake and end up with invalid HTML.

I like that the proposal here doesn't suffer from that problem that tags are implicitly closed by the function.

Last edited 3 years ago by talldanwp (previous) (diff)

#9 @ayeshrajans
3 years ago

Thanks for creating this ticket. I come from a Drupal background, where we had a render API for as long as we remember.

The problem we faced with the Drupal render API (perhaps the most widely used implementation due to Drupal's popularity is that the implementation of the feature practically creates a new domain language that can out off outsiders to WordPress.

In Drupal, each HTML element is an array, which supports nested elements too. Due to historical reasons, they are not objects, which makes it not so easy to autocomplete in IDEs, generate documentation, etc.

Onna higher level, having such an API will be quite benefiting in many areas.

  • Automatic and context-aware string escaping.
  • Rendering content to other formats than HTML.
  • Automatic CSRF/Validation support for forms.
  • Positibility to expose hooks to easily alter HTML.
  • Caching improvements such as partial caching, lazy rendering, etc.

#11 @zieladam
10 months ago

This is great @noisysocks! I'd love to have such an API in core. It's like a reverse of tokenizing the generated HTML.

#12 @dmsnell
4 months ago

Expanding on what @zieladam wrote, I tossed together another version of this before I realized this Ticket exists.

https://github.com/WordPress/wordpress-develop/pull/4065

That relies on the new Tag Processor to create the HTML, which at first seems a bit heavy-handed, but then maybe isn't so bad because it's working on tiny little strings and shares the rules of the HTML engine powering the new HTML API.

There's a lot of room for an HTML-creator in the new HTML API. There's no concession for CSS classes in my PR apart from the possibility of exporting the embedded Tag Processor directly, which opens up add_class and remove_class. That's one of the simpler utilities to tack on though, obviously since it's a small function. Conditional attributes are supported by nature since a boolean attribute's false value translate into omitting that attribute from the HTML entirely.

From what I could tell, even where it's difficult to grasp what's happening in a builder call, it's also quite difficult to grasp in the existing calls. Worse, it's hard to see if there are obvious defects because of the mix of PHP and HTML syntax.

<<?php echo $safe_type; ?> controls="controls" preload="none" width="<?php echo (int) $theme_width; ?>"
        <?php
        if ( 'video' === $safe_type ) {
                echo ' height="', (int) $theme_height, '"';
        }
        ?>
></<?php echo $safe_type; ?>>
<?php echo WP_HTML::make_tag(
        $safe_type,
        array(
                'controls' => true,
                'preload'  => 'none',
                'width'    => (int) $theme_width,
                'height'   => 'video' === $safe_type ? (int) $theme_height : false
        )
);

For example, throwing a _doing_it_wrong when an img doesn't contain an alt attribute.

👍 surely these rules will frustrate developers, but it's sound technically.

---

the HTML API focuses on strings, despite introducing new classes into WP. the string-nature limits memory bloat. this is why I've proposed accepting the children parameter as a string.

Note: See TracTickets for help on using tickets.