Make WordPress Core

Opened 8 months ago

Last modified 4 months ago

#61501 new enhancement

KSES Allow more custom data attributes to align with browsers

Reported by: jonsurrell's profile jonsurrell Owned by:
Milestone: 6.8 Priority: normal
Severity: normal Version: 6.6
Component: Formatting Keywords: has-patch has-unit-tests
Focuses: Cc:

Description

KSES data attribute handling is unnecessarily restrictive. There are a number of valid data attribute that will be handled well and predictably across all browsers that are not allowed by KSES.

KSES rules should be relaxed to allow more data attributes and better aligning with browsers.

Follow up to #61052.

Change History (5)

#1 @sabernhardt
7 months ago

  • Component changed from General to Formatting

#2 @jonsurrell
4 months ago

PR 6492 has a patch for this.

#3 @jonsurrell
4 months ago

#62131 and Gutenberg PR 65803 demonstrate some motivation to allow more data attributes.

The Interactivity API has a "class" directive like this:

<div data-wp-class--highlight="state.highlight"></div>

This adds or removes the "highlight" class on the element depending on the value of the Interactivity API store's state.highlight value.

Tailwind is a popular CSS framework that can make use of the class directive. However, some of the class names that would need to appear as class directives are not allowed by kses:

<?php
echo wp_kses( '<div data-wp-class--top-[3px]="state">', 'post' );
// <div>

#4 @gziolo
4 months ago

  • Milestone changed from Awaiting Review to 6.8
  • Version set to 6.6

This ticket was mentioned in PR #6429 on WordPress/wordpress-develop by @dmsnell.


4 months ago
#5

  • Keywords has-patch has-unit-tests added

Trac ticket: Core-61501.
_(was Core-61052)_
Alternative in #6598

Allow for additional custom data attributes in wp_kses_attr_check().

In this patch the set of allowable custom data attribute names is expanded to permit those including dashes in the dataset name. The term dataset name here means the name appearing to JavaScript in a browser. This is found by stripping away the data- prefix, lowercasing, and then turning every dash-letter combination into a capital letter.

The Interactivity API relies on data attributes of the form data-wp-bind--enabled. The corresponding dataset name here is wpBind-Enabled, and is currently blocked by wp_kses_attr_check(). This patch allows that and any other data attribute through which would include those dashes in the translated name.

This patch is structured in two parts:

  1. Ensure that Core can recognize whether a given attribute represents a custom data attribute.
  2. Once recognizing the custom data attributes, apply a WordPress-specific set of constraints to determine whether to allow it.

The effect of this change is allowing a double-dash when previously only a single dash after the initial data- prefix was allowed. It's more complicated than this, because, as evidenced in the test suite, double dashes translate into different names based on where they appear and what follows them in the attribute name.

<details><summary>Code used to produce this table</summary>

<style>
td, th {
        margin-right: 1em;
}
</style>
<?php

require_once __PATH_TO_REPO__ . '/wordpress-develop/src/wp-load.php';

$variations = [
        'data-',
        'post-id',
        'data-post-id',
        'data-Post-ID',
        'data-uPPer-cAsE',
        "data-\u{2003}",
        'data-🐄-pasture',
        'data-wp-bind--enabled',
        'data-over_under',
        'data--before',
        'data-after-',
        'data-after-after--',
        'data--one--two--',
        'data---one---two---',
        'data-[wish:granted]',
];

echo '<table><tr><th>Attribute Name<th>Dataset Name<th>Before<th>After</tr>';
foreach ( $variations as $name ) {
        $_name    = $name;
        $_value   = 'some-value';
        $_whole   = "{$_name}=\"{$_value}\"";
        $_vless   = 'n';
        $_element = 'div';
        $_allowed_html = [ 'div' => [ 'data-*' => true ] ];

        $was_allowed = wp_old_kses_check( $_name, $_value, $_whole, $_vless, $_element, $_allowed_html );
        $was_allowed = $was_allowed ? '✅' : '🚫';

        $_name    = $name;
        $_value   = 'some-value';
        $_whole   = "{$_name}=\"{$_value}\"";
        $_vless   = 'n';
        $_element = 'div';
        $_allowed_html = [ 'div' => [ 'data-*' => true ] ];

        $is_allowed = wp_kses_attr_check( $_name, $_value, $_whole, $_vless, $_element, $_allowed_html );
        $is_allowed = $is_allowed ? '✅' : '🚫';

        echo <<<HTML
        <tr>
        <th><code>{$name}</code>
        <td style="font-family: monospace;" is-data {$name}>{$name}
        <td>{$was_allowed}
        <td>{$is_allowed}
        HTML;
}
echo '</table>';

?>
<script>
document.querySelectorAll( '[is-data]' ).forEach( node => {
        node.innerText = Object.keys( node.dataset ).join( ', ' );
} );
</script>
<?php

function wp_old_kses_check( &$name, &$value, &$whole, $vless, $element, $allowed_html ) {
        $name_low    = strtolower( $name );
        $element_low = strtolower( $element );

        if ( ! isset( $allowed_html[ $element_low ] ) ) {
                $name  = '';
                $value = '';
                $whole = '';
                return false;
        }

        $allowed_attr = $allowed_html[ $element_low ];

        if ( ! isset( $allowed_attr[ $name_low ] ) || '' === $allowed_attr[ $name_low ] ) {
                /*
                 * Allow `data-*` attributes.
                 *
                 * When specifying `$allowed_html`, the attribute name should be set as
                 * `data-*` (not to be mixed with the HTML 4.0 `data` attribute, see
                 * https://www.w3.org/TR/html40/struct/objects.html#adef-data).
                 *
                 * Note: the attribute name should only contain `A-Za-z0-9_-` chars,
                 * double hyphens `--` are not accepted by WordPress.
                 */
                if ( str_starts_with( $name_low, 'data-' ) && ! empty( $allowed_attr['data-*'] )
                        && preg_match( '/^data(?:-[a-z0-9_]+)+$/', $name_low, $match )
                ) {
                        /*
                         * Add the whole attribute name to the allowed attributes and set any restrictions
                         * for the `data-*` attribute values for the current element.
                         */
                        $allowed_attr[ $match[0] ] = $allowed_attr['data-*'];
                } else {
                        $name  = '';
                        $value = '';
                        $whole = '';
                        return false;
                }
        }

        if ( 'style' === $name_low ) {
                $new_value = safecss_filter_attr( $value );

                if ( empty( $new_value ) ) {
                        $name  = '';
                        $value = '';
                        $whole = '';
                        return false;
                }

                $whole = str_replace( $value, $new_value, $whole );
                $value = $new_value;
        }

        if ( is_array( $allowed_attr[ $name_low ] ) ) {
                // There are some checks.
                foreach ( $allowed_attr[ $name_low ] as $currkey => $currval ) {
                        if ( ! wp_kses_check_attr_val( $value, $vless, $currkey, $currval ) ) {
                                $name  = '';
                                $value = '';
                                $whole = '';
                                return false;
                        }
                }
        }

        return true;
}

</details>

https://github.com/WordPress/wordpress-develop/assets/5431237/15caab7f-1c69-4fe4-9094-f497cfcaf5e9

These are recommendations. If these naming recommendations are not followed, no errors will occur. The attributes will still be matched using CSS attribute selectors, with the attribute being case insensitive and any attribute value being case-sensitive. Attributes not conforming to these three recommendations will also still be recognized by the JavaScript HTMLElement.dataset property and user-agents will include the attribute in the DOMStringMap containing all the custom data attributes for an HTMLElement.

https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/data-*

Note: See TracTickets for help on using tickets.