WordPress.org

Make WordPress Core

Opened 4 years ago

Closed 10 months ago

#33121 closed defect (bug) (fixed)

wp_kses_attr_check fails to process html data-* attributes

Reported by: isoftware Owned by: peterwilsoncc
Milestone: 5.0 Priority: normal
Severity: major Version: 4.2.3
Component: Editor Keywords: has-patch has-unit-tests commit fixed-5.0
Focuses: Cc:
PR Number:

Description (last modified by SergeyBiryukov)

wp_kses_attr_check does not take into account data attributes and retuns false on valid elements.

You can reproduce this issue with shortcodes in data attributes using the following scenario
<a data-trigger="[button-trigger]" href="#">[button-name]</>

Attachments (7)

33121.diff (1.2 KB) - added by azaozz 12 months ago.
33121.2.diff (1.2 KB) - added by azaozz 12 months ago.
33121.3.diff (5.3 KB) - added by peterwilsoncc 12 months ago.
33121.4.diff (6.3 KB) - added by peterwilsoncc 12 months ago.
33121.5.diff (4.7 KB) - added by azaozz 12 months ago.
33121.6.diff (5.6 KB) - added by peterwilsoncc 12 months ago.
33121.7.diff (7.5 KB) - added by peterwilsoncc 12 months ago.

Download all attachments as: .zip

Change History (34)

#1 @SergeyBiryukov
4 years ago

  • Component changed from General to Formatting
  • Description modified (diff)
  • Focuses template removed

#2 @ocean90
4 years ago

  • Keywords close added

It depends on the unfiltered_html capability which only Administrators and Editors have by default. The data- attributes are not whitelisted.

#3 @isoftware
4 years ago

Thank you for your swift repsonse.

I have the following shortcode on a site that has only one admin user so all content is created from admin:

...
<a data-slide-index="[counter-increment output='true']" href="#"></a>
...

On debug I can see that the counter-increment shortcode is returning the correct integer value but after the wp_kses_attr_check returns false, wp_kses_one_attr returns empty string and the returned shortcode content is :

...
<a href="#"></a>
...

#4 @isoftware
4 years ago

  • Keywords close removed

In addition I could not find a way to generically whitelist the data-attributes. One needs to hook into the wp_kses_allowed_html filter to register explicitly data attributes for specific elements which seems quite complicated for the shortcodes example.

#5 follow-up: @miqrogroove
4 years ago

  • Milestone Awaiting Review deleted
  • Resolution set to invalid
  • Status changed from new to closed

#6 in reply to: ↑ 5 @isoftware
4 years ago

Replying to miqrogroove:

Please see the explanation at https://make.wordpress.org/core/2015/07/23/changes-to-the-shortcode-api/

Please note that in the link supplied there is no mention of html data attributes beeing excluded and there should be a mention somewhere in the documentation that they are considered unsafe.

#7 @sswells
4 years ago

  • Resolution invalid deleted
  • Severity changed from normal to major
  • Status changed from closed to reopened

Can we please reopen this? This looks like a bug to me, and not explained by the API changes. For example, if I use something like this:
&lt;a href="[the-link]" data-trigger="[button-trigger]">[button-name]&lt;/a>

Then the shortcode inside the data attribute is not longer processed, but the others are. The shortcodes should be processed inside a data attribute, just as they are inside an href.

#8 @miqrogroove
4 years ago

  • Component changed from Formatting to Shortcodes
  • Keywords close added

#9 @SergeyBiryukov
4 years ago

  • Milestone set to Awaiting Review

#10 @miqrogroove
4 years ago

  • Milestone Awaiting Review deleted
  • Resolution set to wontfix
  • Status changed from reopened to closed

At the 9 September meeting, participants agreed to close this ticket because fixing it would require a reversion of a previous security patch and would involve generally unwanted changes to the Shortcode API.

For details, please see https://make.wordpress.org/core/2015/09/29/shortcode-roadmap-draft-two/

#11 @pento
13 months ago

  • Keywords needs-patch needs-unit-tests added; close removed
  • Milestone set to 5.0
  • Resolution wontfix deleted
  • Status changed from closed to reopened

Reopening, as data-* attributes will need to be allowed by KSES for several default Gutenberg blocks.

@azaozz
12 months ago

#12 @azaozz
12 months ago

  • Component changed from Shortcodes to Editor
  • Keywords has-patch added; needs-patch removed

In 33121.diff:

Wondering if we should pass the restrictions (array of possible values) or just short circuit as soon as we match a data-* attribute. The thing is that the restrictions will apply to all data attributes in the element, not to a specific attribute.

Last edited 12 months ago by azaozz (previous) (diff)

@azaozz
12 months ago

#13 @azaozz
12 months ago

In 33121.2.diff: also, do not override a specifically set data-something attribute.

Last edited 12 months ago by azaozz (previous) (diff)

#14 @peterwilsoncc
12 months ago

  • Keywords has-unit-tests added; needs-unit-tests removed

In 33121.3.diff I've taken a quite different approach:

  • Removed hard coding of data- prefix, any wildcard can be added in _wp_add_global_attributes().
  • Wildcard checks are only run if the full attribute doesn't exist in $allowed_attr
  • Added in a couple of unit tests
  • Regardless of approach, test_wp_kses_attr_data_attribute_is_allowed() should be a valid test.

#15 @azaozz
12 months ago

Looking at 33121.3.diff, don't think it is a good idea to allow adding attributes with a wildcard. That would mean --somebody-- can add on-* or even o-* and allow all onerror, onclick, onmouseover, etc. attributes.

What's worse there is no way to sanitize the values of these wildcard attributes (this part in the patch $allowed_attr[$name_low] = true;), so things like onerror="alert(document.cookie)" become very possible and not immediately recognizable.

KSES already has support for allowing any attribute on any tag: adding it by full name. So if somebody wants to add onerror for images, they can easily do that and specify a list of acceptable values. Why would we need to bypass that with a wildcard and reduce functionality (not possible to check/sanitize attribute values)?

The only exception to the above are attributes with variable names, and the only HTML 5.0 attribute with a variable name is data-*. Thinking we should extend KSES to allow data attributes, but hardcode the data- part. This is inline with the existing logic in KSES.

Uh, sorry for the longer comment :) The TL;DR: don't think allowing wildcard attributes in KSES is a good thing. It brings us to a pretty dangerous place and at the same time reduces some of the existing functionality: sanitizing attribute values.

Version 1, edited 12 months ago by azaozz (previous) (next) (diff)

#16 @azaozz
12 months ago

@peterwilsoncc thanks for adding the test :)

Looking at data--invaild="gone" and data-also-invaild-="gone", it seems having two hyphens or a hyphen as last char of the data-* attribute name is valid per https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/data-* and https://www.w3.org/TR/REC-xml/#NT-Name. Also seems quite a few chars are valid there, but still thinking we should only support a-z0-9_-.

Also preg_match( '/^' . preg_quote( $prefix ) . '(-[a-z0-9_]+)*$/', $name_low ) would mean we don't allow attribute names containing two hyphens like data-wp-id (which is somewhat common).

Last edited 12 months ago by azaozz (previous) (diff)

#17 follow-up: @peterwilsoncc
12 months ago

tl;dr in 33121.4.diff:

  • Test added to ensure prefix is followed by a hyphen
  • Test added to ensure attributes with multiple hyphens allowed
  • Regex altered to require one or more instance of (-[a-z0-9_]+), ie '/^' . preg_quote( $prefix ) . '(-[a-z0-9_]+)+$/'

Replying to azaozz:

@peterwilsoncc thanks for adding the test :)

And thanks for the review.

Looking at data--invaild="gone" and data-also-invaild-="gone", it seems having two hyphens or a hyphen as last char of the data-* attribute name is valid per https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/data-* and https://www.w3.org/TR/REC-xml/#NT-Name. Also seems quite a few chars are valid there, but still thinking we should only support a-z0-9_-.

This is true but it does some strange things to the element.dataset property available in JavaScript so I decided to prevent it. I've created a bin with an example https://jsbin.com/muloxeq/edit?html,js,console,output

I'm happy to change this if needs be.

The TL;DR: don't think allowing wildcard attributes in KSES is a good thing. It brings us to a pretty dangerous place and at the same time reduces some of the existing functionality: sanitizing attribute values.

I'm not sure this is the case if we require the hyphen following any prefixes, so a developer won't be able to add href-* and bypass checking. The regex change I mention below hardens against this.

I'm also prepared to be misunderstanding something, so are you able to let me know if that's the case.

That would mean --somebody-- can add on-* or even o-* and allow all onerror, onclick, onmouseover, etc. attributes.

This isn't the case as the hyphen is required before any characters in the regex group (-[a-z0-9_]+).

As it's important to block, I've added a test in 33121.4.diff to ensure against it.

Also preg_match( '/^' . preg_quote( $prefix ) . '(-[a-z0-9_]+)*$/', $name_low ) would mean we don't allow attribute names containing two hyphens like data-wp-id (which is somewhat common).

This is also incorrect, I've added such an example as a test in 33121.4.diff.

However, the zero or more regex (*) did allow users to add data="something", so I've changed that in the latest patch to be one or more (+).

#18 in reply to: ↑ 17 @azaozz
12 months ago

Replying to peterwilsoncc:

This (two hyphens or end hyphen) is true but it does some strange things to the element.dataset property available in JavaScript

Good point. Lets not allow it :)

I'm not sure this (overriding of unset attributes) is the case if we require the hyphen following any prefixes, so a developer won't be able to add href-* and bypass checking.

Ah, I see, my error. Now the regex matches only attribute names that have a hyphen.

I'm still not sold on supporting all of these attribute names "globally" with no possibility to check the values. Except data-*, none of them have variable names and imho should be added the same way like the rest of the attributes are added: by full name.

In any case, what is the purpose of allowing all attributes with a hyphen?

  • It would support data-* which we should have.
  • It would also support aria-* which would be nice to have. There are about 45 aria attributes and many of them have a list of possible values, like:
        aria-busy ( true | false ) 'false'
        aria-checked ( true | false | mixed | undefined ) 'undefined'
        aria-disabled ( true | false ) 'false'
        aria-expanded ( true | false | undefined ) 'undefined'
        aria-grabbed ( true | false | undefined ) 'undefined'
        aria-hidden ( true | false ) 'false'
        aria-invalid ( grammar | false | spelling | true ) 'false'
        aria-pressed ( true | false | mixed | undefined ) 'undefined'
        aria-selected ( true | false | undefined ) 'undefined'
    

(see http://www.w3.org/MarkUp/DTD/aria-attributes-1.mod)

For a security/sanitizing function I'd expect all of these to be hard-coded, both the attribute name and possible values, after a security audit of whether any of the values may be misused. Again, this is inline with how KSES works.

Apart from data-* and aria-* there are accept-charset and http-equiv (see https://developer.mozilla.org/en-US/docs/Web/HTML/Attributes). Don't think it makes sense supporting these.

Am I missing something? Are there other user cases for allowing attributes with a hyphen and not support checking of their values?

Last edited 12 months ago by azaozz (previous) (diff)

@azaozz
12 months ago

#19 @azaozz
12 months ago

In 33121.5.diff:

  • Based on 33121.2.diff.
  • Fix the regex so it doesn't allow trailing hyphen or two consecutive hyphens.
  • Add the tests.

All tests from 33121.4.diff pass except the array( '33121-*', true ), changed that to array( 'data-*', true ), :)

#20 @azaozz
12 months ago

Thinking more about this, a "middle point" between the two approaches would be to tweak _wp_add_global_attributes() so it can return just the array of $global_attributes. Then check the prefix against it. That way only hard-coded wildcard attributes will be allowed. (That would still require sanity checking/security auditing of all aria-* attributes if we decide to add them.)

#21 @peterwilsoncc
12 months ago

I like the mixed approach in 33121.5.diff.

The main thing I was trying to avoid was doubling up on adding the data-* prefix in wp_kses_attr_check() and _wp_add_global_attributes().

It occurs to me that in your patches the prefix doesn't need to be added to _wp_add_global_attributes(). If core needs multiple allowed prefixes in the future we can create a new private function, until then we can use your regex.

#22 @peterwilsoncc
12 months ago

33121.6.diff is based on the previous patch with:

  • multiline comment format changed
  • @since added to docblocks
  • Modified test_wildcard_attribute_prefixes to try variations of data- prefix to validate

Following my last comment I could see why you included data-* in two locations. Let's go with the compromise version.

#23 @pento
12 months ago

  • Keywords commit added

33121.6.diff looks good, let's party!

#24 @peterwilsoncc
12 months ago

I call 33121.7.diff a valuable reminder I should have run the full test suite three days ago ;)

  • alters some REST API tests ensuring API requests are run through kses for users without unfiltered_html caps.

#25 @peterwilsoncc
12 months ago

  • Owner set to peterwilsoncc
  • Resolution set to fixed
  • Status changed from reopened to closed

In 43727:

KSES: Allow HTML data-* attributes.

Add global support for HTML attributes prefixed data- for authors and contributors, as required by the new editor.

Props azaozz, peterwilsoncc.
Fixes #33121.

#26 @peterwilsoncc
12 months ago

  • Keywords fixed-5.0 added
  • Resolution fixed deleted
  • Status changed from closed to reopened

Reopened for trunk.

#27 @jeremyfelt
10 months ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 43981:

KSES: Allow HTML data-* attributes.

Add global support for HTML attributes prefixed data- for authors and contributors, as required by the new editor.

Merges [43727] to trunk.

Props azaozz, peterwilsoncc.
Fixes #33121.

Note: See TracTickets for help on using tickets.