Make WordPress Core

Opened 9 years ago

Closed 5 years ago

Last modified 4 years ago

#33121 closed defect (bug) (fixed)

wp_kses_attr_check fails to process html data-* attributes

Reported by: isoftware's profile isoftware Owned by: peterwilsoncc's profile peterwilsoncc
Milestone: 5.0 Priority: normal
Severity: major Version: 4.2.3
Component: Editor Keywords: has-patch has-unit-tests commit fixed-5.0
Focuses: Cc:

Description (last modified by SergeyBiryukov)

wp_kses_attr_check does not take into account data attributes and retuns false on valid elements.

You can reproduce this issue with shortcodes in data attributes using the following scenario
<a data-trigger="[button-trigger]" href="#">[button-name]</>

Attachments (7)

33121.diff (1.2 KB) - added by azaozz 5 years ago.
33121.2.diff (1.2 KB) - added by azaozz 5 years ago.
33121.3.diff (5.3 KB) - added by peterwilsoncc 5 years ago.
33121.4.diff (6.3 KB) - added by peterwilsoncc 5 years ago.
33121.5.diff (4.7 KB) - added by azaozz 5 years ago.
33121.6.diff (5.6 KB) - added by peterwilsoncc 5 years ago.
33121.7.diff (7.5 KB) - added by peterwilsoncc 5 years ago.

Download all attachments as: .zip

Change History (35)

#1 @SergeyBiryukov
9 years ago

  • Component changed from General to Formatting
  • Description modified (diff)
  • Focuses template removed

#2 @ocean90
9 years ago

  • Keywords close added

It depends on the unfiltered_html capability which only Administrators and Editors have by default. The data- attributes are not whitelisted.

#3 @isoftware
9 years ago

Thank you for your swift repsonse.

I have the following shortcode on a site that has only one admin user so all content is created from admin:

...
<a data-slide-index="[counter-increment output='true']" href="#"></a>
...

On debug I can see that the counter-increment shortcode is returning the correct integer value but after the wp_kses_attr_check returns false, wp_kses_one_attr returns empty string and the returned shortcode content is :

...
<a href="#"></a>
...

#4 @isoftware
9 years ago

  • Keywords close removed

In addition I could not find a way to generically whitelist the data-attributes. One needs to hook into the wp_kses_allowed_html filter to register explicitly data attributes for specific elements which seems quite complicated for the shortcodes example.

#5 follow-up: @miqrogroove
9 years ago

  • Milestone Awaiting Review deleted
  • Resolution set to invalid
  • Status changed from new to closed

#6 in reply to: ↑ 5 @isoftware
9 years ago

Replying to miqrogroove:

Please see the explanation at https://make.wordpress.org/core/2015/07/23/changes-to-the-shortcode-api/

Please note that in the link supplied there is no mention of html data attributes beeing excluded and there should be a mention somewhere in the documentation that they are considered unsafe.

#7 @sswells
9 years ago

  • Resolution invalid deleted
  • Severity changed from normal to major
  • Status changed from closed to reopened

Can we please reopen this? This looks like a bug to me, and not explained by the API changes. For example, if I use something like this:
&lt;a href="[the-link]" data-trigger="[button-trigger]">[button-name]&lt;/a>

Then the shortcode inside the data attribute is not longer processed, but the others are. The shortcodes should be processed inside a data attribute, just as they are inside an href.

#8 @miqrogroove
9 years ago

  • Component changed from Formatting to Shortcodes
  • Keywords close added

#9 @SergeyBiryukov
9 years ago

  • Milestone set to Awaiting Review

#10 @miqrogroove
8 years ago

  • Milestone Awaiting Review deleted
  • Resolution set to wontfix
  • Status changed from reopened to closed

At the 9 September meeting, participants agreed to close this ticket because fixing it would require a reversion of a previous security patch and would involve generally unwanted changes to the Shortcode API.

For details, please see https://make.wordpress.org/core/2015/09/29/shortcode-roadmap-draft-two/

#11 @pento
6 years ago

  • Keywords needs-patch needs-unit-tests added; close removed
  • Milestone set to 5.0
  • Resolution wontfix deleted
  • Status changed from closed to reopened

Reopening, as data-* attributes will need to be allowed by KSES for several default Gutenberg blocks.

@azaozz
5 years ago

#12 @azaozz
5 years ago

  • Component changed from Shortcodes to Editor
  • Keywords has-patch added; needs-patch removed

In 33121.diff:

Wondering if we should pass the restrictions (array of possible values) or just short circuit as soon as we match a data-* attribute. The thing is that the restrictions will apply to all data attributes in the element, not to a specific attribute.

Last edited 5 years ago by azaozz (previous) (diff)

@azaozz
5 years ago

#13 @azaozz
5 years ago

In 33121.2.diff: also, do not override a specifically set data-something attribute.

Last edited 5 years ago by azaozz (previous) (diff)

#14 @peterwilsoncc
5 years ago

  • Keywords has-unit-tests added; needs-unit-tests removed

In 33121.3.diff I've taken a quite different approach:

  • Removed hard coding of data- prefix, any wildcard can be added in _wp_add_global_attributes().
  • Wildcard checks are only run if the full attribute doesn't exist in $allowed_attr
  • Added in a couple of unit tests
  • Regardless of approach, test_wp_kses_attr_data_attribute_is_allowed() should be a valid test.

#15 @azaozz
5 years ago

Looking at 33121.3.diff, don't think it is a good idea to allow adding attributes with a wildcard. (Note that _wp_add_global_attributes() is just a convenience function to add some attribute names to all tags. There is no way to check how an attribute name was added. So plugins will also be able to add the data-* attribute name which is the expected behavior.)

That would mean --somebody-- can add on-* or even o-* and allow all onerror, onclick, onmouseover, etc. attributes.

What's worse there is no way to sanitize the values of these wildcard attributes (this part in the patch $allowed_attr[$name_low] = true;), so things like onerror="alert(document.cookie)" become very possible and not immediately recognizable.

KSES already has support for allowing any attribute on any tag: adding it by full name. So if somebody wants to add onerror for images, they can easily do that and specify a list of acceptable values. Why would we need to bypass that with a wildcard and reduce functionality (not possible to check/sanitize attribute values)?

The only exception to the above are attributes with variable names, and the only HTML 5.0 attribute with a variable name is data-*. Thinking we should extend KSES to allow data attributes, but hardcode the data- part. This is inline with the existing logic in KSES.

Uh, sorry for the longer comment :) The TL;DR: don't think allowing wildcard attributes in KSES is a good thing. It brings us to a pretty dangerous place and at the same time reduces some of the existing functionality: sanitizing attribute values.

Last edited 5 years ago by azaozz (previous) (diff)

#16 @azaozz
5 years ago

@peterwilsoncc thanks for adding the test :)

Looking at data--invaild="gone" and data-also-invaild-="gone", it seems having two hyphens or a hyphen as last char of the data-* attribute name is valid per https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/data-* and https://www.w3.org/TR/REC-xml/#NT-Name. Also seems quite a few chars are valid there, but still thinking we should only support a-z0-9_-.

Also preg_match( '/^' . preg_quote( $prefix ) . '(-[a-z0-9_]+)*$/', $name_low ) would mean we don't allow attribute names containing two hyphens like data-wp-id (which is somewhat common).

Last edited 5 years ago by azaozz (previous) (diff)

#17 follow-up: @peterwilsoncc
5 years ago

tl;dr in 33121.4.diff:

  • Test added to ensure prefix is followed by a hyphen
  • Test added to ensure attributes with multiple hyphens allowed
  • Regex altered to require one or more instance of (-[a-z0-9_]+), ie '/^' . preg_quote( $prefix ) . '(-[a-z0-9_]+)+$/'

Replying to azaozz:

@peterwilsoncc thanks for adding the test :)

And thanks for the review.

Looking at data--invaild="gone" and data-also-invaild-="gone", it seems having two hyphens or a hyphen as last char of the data-* attribute name is valid per https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/data-* and https://www.w3.org/TR/REC-xml/#NT-Name. Also seems quite a few chars are valid there, but still thinking we should only support a-z0-9_-.

This is true but it does some strange things to the element.dataset property available in JavaScript so I decided to prevent it. I've created a bin with an example https://jsbin.com/muloxeq/edit?html,js,console,output

I'm happy to change this if needs be.

The TL;DR: don't think allowing wildcard attributes in KSES is a good thing. It brings us to a pretty dangerous place and at the same time reduces some of the existing functionality: sanitizing attribute values.

I'm not sure this is the case if we require the hyphen following any prefixes, so a developer won't be able to add href-* and bypass checking. The regex change I mention below hardens against this.

I'm also prepared to be misunderstanding something, so are you able to let me know if that's the case.

That would mean --somebody-- can add on-* or even o-* and allow all onerror, onclick, onmouseover, etc. attributes.

This isn't the case as the hyphen is required before any characters in the regex group (-[a-z0-9_]+).

As it's important to block, I've added a test in 33121.4.diff to ensure against it.

Also preg_match( '/^' . preg_quote( $prefix ) . '(-[a-z0-9_]+)*$/', $name_low ) would mean we don't allow attribute names containing two hyphens like data-wp-id (which is somewhat common).

This is also incorrect, I've added such an example as a test in 33121.4.diff.

However, the zero or more regex (*) did allow users to add data="something", so I've changed that in the latest patch to be one or more (+).

#18 in reply to: ↑ 17 @azaozz
5 years ago

Replying to peterwilsoncc:

This (two hyphens or end hyphen) is true but it does some strange things to the element.dataset property available in JavaScript

Good point. Lets not allow it :)

I'm not sure this (overriding of unset attributes) is the case if we require the hyphen following any prefixes, so a developer won't be able to add href-* and bypass checking.

Ah, I see, my error. Now the regex matches only attribute names that have a hyphen.

I'm still not sold on supporting all of these attribute names "globally" with no possibility to check the values. Except data-*, none of them have variable names and imho should be added the same way like the rest of the attributes are added: by full name.

In any case, what is the purpose of allowing all attributes with a hyphen?

  • It would support data-* which we should have.
  • It would also support aria-* which would be nice to have. There are about 45 aria attributes and many of them have a list of possible values, like:
        aria-busy ( true | false ) 'false'
        aria-checked ( true | false | mixed | undefined ) 'undefined'
        aria-disabled ( true | false ) 'false'
        aria-expanded ( true | false | undefined ) 'undefined'
        aria-grabbed ( true | false | undefined ) 'undefined'
        aria-hidden ( true | false ) 'false'
        aria-invalid ( grammar | false | spelling | true ) 'false'
        aria-pressed ( true | false | mixed | undefined ) 'undefined'
        aria-selected ( true | false | undefined ) 'undefined'
    

(see http://www.w3.org/MarkUp/DTD/aria-attributes-1.mod)

For a security/sanitizing function I'd expect all of these to be hard-coded, both the attribute name and possible values, after a security audit of whether any of the values may be misused. Again, this is inline with how KSES works.

Apart from data-* and aria-* there are accept-charset and http-equiv (see https://developer.mozilla.org/en-US/docs/Web/HTML/Attributes). Don't think it makes sense supporting these.

Am I missing something? Are there other user cases for allowing attributes with a hyphen and not support checking of their values?

Last edited 5 years ago by azaozz (previous) (diff)

@azaozz
5 years ago

#19 @azaozz
5 years ago

In 33121.5.diff:

  • Based on 33121.2.diff.
  • Fix the regex so it doesn't allow trailing hyphen or two consecutive hyphens.
  • Add the tests.

All tests from 33121.4.diff pass except the array( '33121-*', true ), changed that to array( 'data-*', true ), :)

#20 @azaozz
5 years ago

Thinking more about this, a "middle point" between the two approaches would be to tweak _wp_add_global_attributes() so it can return just the array of $global_attributes. Then check the prefix against it. That way only hard-coded wildcard attributes will be allowed. (That would still require sanity checking/security auditing of all aria-* attributes if we decide to add them.)

#21 @peterwilsoncc
5 years ago

I like the mixed approach in 33121.5.diff.

The main thing I was trying to avoid was doubling up on adding the data-* prefix in wp_kses_attr_check() and _wp_add_global_attributes().

It occurs to me that in your patches the prefix doesn't need to be added to _wp_add_global_attributes(). If core needs multiple allowed prefixes in the future we can create a new private function, until then we can use your regex.

#22 @peterwilsoncc
5 years ago

33121.6.diff is based on the previous patch with:

  • multiline comment format changed
  • @since added to docblocks
  • Modified test_wildcard_attribute_prefixes to try variations of data- prefix to validate

Following my last comment I could see why you included data-* in two locations. Let's go with the compromise version.

#23 @pento
5 years ago

  • Keywords commit added

33121.6.diff looks good, let's party!

#24 @peterwilsoncc
5 years ago

I call 33121.7.diff a valuable reminder I should have run the full test suite three days ago ;)

  • alters some REST API tests ensuring API requests are run through kses for users without unfiltered_html caps.

#25 @peterwilsoncc
5 years ago

  • Owner set to peterwilsoncc
  • Resolution set to fixed
  • Status changed from reopened to closed

In 43727:

KSES: Allow HTML data-* attributes.

Add global support for HTML attributes prefixed data- for authors and contributors, as required by the new editor.

Props azaozz, peterwilsoncc.
Fixes #33121.

#26 @peterwilsoncc
5 years ago

  • Keywords fixed-5.0 added
  • Resolution fixed deleted
  • Status changed from closed to reopened

Reopened for trunk.

#27 follow-up: @jeremyfelt
5 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 43981:

KSES: Allow HTML data-* attributes.

Add global support for HTML attributes prefixed data- for authors and contributors, as required by the new editor.

Merges [43727] to trunk.

Props azaozz, peterwilsoncc.
Fixes #33121.

#28 in reply to: ↑ 27 @yura1980
4 years ago

Replying to jeremyfelt:

In 43981:

KSES: Allow HTML data-* attributes.

Add global support for HTML attributes prefixed data- for authors and contributors, as required by the new editor.

Merges [43727] to trunk.

Props azaozz, peterwilsoncc.
Fixes #33121.

I'm trying to modify HTML markup inside WooCommerce Gutenberg block using this filter:

woocommerce_blocks_product_grid_item_html

wp_kses and wp_kses_post functions stripping out 'add to cart' button's data attributes with underscore sign:

data-product_id
data-product_sku

I'm wondering can I prevent stripping these attributes by wp_kses function?

Last edited 4 years ago by yura1980 (previous) (diff)
Note: See TracTickets for help on using tickets.