Make WordPress Core

Opened 4 weeks ago

Last modified 11 hours ago

#61052 new enhancement

WP_KSES data attributes: Allow double dash

Reported by: cbravobernal's profile cbravobernal Owned by:
Milestone: 6.6 Priority: normal
Severity: normal Version: 6.5
Component: Security Keywords: has-patch has-unit-tests needs-testing
Focuses: Cc:

Description

Right now, WordPress does not allow double hyphens -- inside data attributes.

https://github.com/WordPress/wordpress-develop/blob/16237a11586b022861933fa738acd957eef6653e/src/wp-includes/kses.php#L1267

That's a problem for Interactivity API directives like data-wp-on--event or data-wp-bind--value.

Right now, all PHP with those directives parsed with this function, will not be rendered, stopping all kind of interactions.

Because of that, static blocks are not supported yet(as they are saved in DB, and pass through a wp_kses before sending them to the browser).

Some plugins are starting to notice the issue, like WooCommerce:
https://github.com/woocommerce/woocommerce/issues/46722

Can this function be updated to allow them?

Is there any concern or edge case I could miss before creating a PR?

Change History (11)

#1 @cbravobernal
4 weeks ago

  • Summary changed from WP_KSES: Allow double dash to WP_KSES data attributes: Allow double dash

This ticket was mentioned in PR #6429 on WordPress/wordpress-develop by @dmsnell.


4 weeks ago
#2

  • Keywords has-patch has-unit-tests added

Trac ticket: Core-61052

Allow spec-compliant data-attributes in wp_kses_attr_check()

#3 @dmsnell
4 weeks ago

I've proposed a patch which is permissive here. Core code may be used to more restrictions, but all of these are allowed in the browser and appear after requesting the dataset for an element. Is there a good reason to reject such valid data attributes, even if they are somewhat obscure?

If there's one imaginable objection it would be to prevent storing data attributes named things like data-no-<img-"tag"-in-here. This is because such code might trip up weaker parsers down the line. However, this would or should be caught higher up than in wp_kses_attr_check() and so I think it's fine here not having that additional restriction.

#4 @gziolo
4 weeks ago

In this case, it would be easier to start with a very specific expectation that covers only the syntax allowed for directives:

  • attribute name prefixed with data-wp-
  • attribute name contains only a-z, 0-9 characters, single dash -, and double dash --
  • value should contain one of state, context, actions, callbacks
  • value can start with !
  • value can be prefixed with a namespace, not sure what is allowed
  • value can only contain a-Z, 0-9, :, .

#5 @gziolo
4 weeks ago

  • Component changed from General to Security
  • Milestone changed from Awaiting Review to 6.6
  • Version set to 6.5

#6 @oglekler
4 days ago

  • Keywords needs-testing added

#7 @jonsurrell
22 hours ago

I checked into some history about why double dashes (or other characters) are not allowed in data attributes. The change landed in r43981 and was discussed in #33121, especially starting after this comment:

This (two hyphens or end hyphen) is true but it does some strange things to the element.dataset property available in JavaScript

Good point. Lets not allow it :)

The reasoning does not seem to be related to any security issues, but more around the potential for strange behavior when accessed via dataset thanks to its automatic dash-style to camelCase conversion.

Given the immediate need to allow double-dashes, the history, and the fact that more restrictive data attribute handling does not seem to have been an issue, I'd try to move ahead with a minimal PR that just allows leading, trailing, or double-dashes.

Ping @azaozz and @peterwilsoncc as the folks involved in the original data attributes with -- decision.

This ticket was mentioned in PR #6598 on WordPress/wordpress-develop by @jonsurrell.


22 hours ago
#8

Allow data attrbitues with leading, trailing, and double - in the attribute name.

This relaxes the hyphen restriction introduced in https://core.trac.wordpress.org/changeset/43981. It is intended to be a simple change to allow data attributes used frequently by the Interactivity API like data-wp-on--click="…".

It is _not_ a substitute for https://github.com/WordPress/wordpress-develop/pull/6429. I'd like to see that move ahead the data attribute restrictions further relaxed to align with what's allowed by the specification. This _is_ intended to be a smaller and easier change to solve the immediate need that can be landed without too much difficulty.

Trac ticket: https://core.trac.wordpress.org/ticket/61052

@dmsnell commented on PR #6429:


18 hours ago
#9

I dug into the relevant section of the spec, but I think the interesting point here is that data attributes in the browser are handled much more loosely (as already mentioned in the description). The spec seems to explicitly disallow ASCII uppercase and : in data attributes, but Chrome doesn't seem to care at all.

this is going to be another case where it's evident that the HTML5 specification is a family of interrelated specifications for different purposes. there are different rules for what we allow _producing_ than there are for what we demand for _reading_.

for example, attributes aren't allowed to be created with brackets in their name, like [if]="var.hasAccent" and yet every parser _must_ recognize this _as_ an attribute. so it's more or less a blur and everything that's allowed is spec, but then we "shouldn't" do things that are against the rules, but they are benign.

on data attributes what I was most concerned about is matching browser behavior. we should have a direct correspondence between the attributes on the server that appear in a dataset for an Element in the browser. if we find cases that all three browsers reject, let's add those in the test suite and note the omission.

@dmsnell commented on PR #6429:


13 hours ago
#10

I've rearranged the code somewhat, doing what I originally wanted to avoid, in creating a new function. Still, the desire to add a test suite overcame my hesitation.

Now, I've added wp_kses_transform_custom_data_attribute_name, which is a mouthful, but does exactly what I believe Core needs to do. That is, it needs to start by recognizing what _is_ and what _isn't_ a data attribute.

Beyond this, we can talk about arbitrarily rejecting some of them, but we can add that on top of this work.

That is, if we only want to allow a subset of data attributes, we can start inside the if where we've detected one, and then reject it, knowing for sure we didn't miss any.

By the way, I ran a test in Safari, Chrome, and Firefox to confirm the test cases I added. They all operate uniformly. Surprisingly, data- (with nothing after the one -) is a valid data attribute whose name is the empty string. This test confirms the relationship between attribute name and appearing in an element's dataset property.

<details><summary>Test Code</summary>

<?php

$variations = [
        'data-',
        'post-id',
        'data-post-id',
        'data-Post-ID',
        "data-\u{2003}",
        'data-🐄-pasture',
        'data-wp-bind--enabled',
        'data--before',
        'data-after-',
        'data---one---two---',
        'data-[wish:granted]',
];

foreach ( $variations as $name ) {
        echo "<div><code>{$name}</code>: <span {$name}='{$name}'></span></div>\n";
}

?>
<script>
document.querySelectorAll( 'span' ).forEach( node => {
        node.innerText = JSON.stringify( node.dataset );
} );
</script>

</details>

https://github.com/WordPress/wordpress-develop/assets/5431237/2e81c244-2645-4324-8839-81ab0dca314b

#11 @peterwilsoncc
11 hours ago

Thanks for the callout, Jon, I hadn't seen this ticket.

Given a core API is using double hyphens, then I think we need to allow it. I'll follow up with a review on the linked pull request.

Note: See TracTickets for help on using tickets.