#33121 closed defect (bug) (fixed)
wp_kses_attr_check fails to process html data-* attributes
Reported by: | isoftware | Owned by: | peterwilsoncc |
---|---|---|---|
Milestone: | 5.0 | Priority: | normal |
Severity: | major | Version: | 4.2.3 |
Component: | Editor | Keywords: | has-patch has-unit-tests commit fixed-5.0 |
Focuses: | Cc: |
Description (last modified by )
wp_kses_attr_check does not take into account data attributes and retuns false on valid elements.
You can reproduce this issue with shortcodes in data attributes using the following scenario
<a data-trigger="[button-trigger]" href="#">[button-name]</>
Attachments (7)
Change History (35)
#1
@
9 years ago
- Component changed from General to Formatting
- Description modified (diff)
- Focuses template removed
#3
@
9 years ago
Thank you for your swift repsonse.
I have the following shortcode on a site that has only one admin user so all content is created from admin:
... <a data-slide-index="[counter-increment output='true']" href="#"></a> ...
On debug I can see that the counter-increment shortcode is returning the correct integer value but after the wp_kses_attr_check returns false, wp_kses_one_attr returns empty string and the returned shortcode content is :
... <a href="#"></a> ...
#4
@
9 years ago
- Keywords close removed
In addition I could not find a way to generically whitelist the data-attributes. One needs to hook into the wp_kses_allowed_html filter to register explicitly data attributes for specific elements which seems quite complicated for the shortcodes example.
#5
follow-up:
↓ 6
@
9 years ago
- Milestone Awaiting Review deleted
- Resolution set to invalid
- Status changed from new to closed
Please see the explanation at https://make.wordpress.org/core/2015/07/23/changes-to-the-shortcode-api/
#6
in reply to:
↑ 5
@
9 years ago
Replying to miqrogroove:
Please see the explanation at https://make.wordpress.org/core/2015/07/23/changes-to-the-shortcode-api/
Please note that in the link supplied there is no mention of html data attributes beeing excluded and there should be a mention somewhere in the documentation that they are considered unsafe.
#7
@
9 years ago
- Resolution invalid deleted
- Severity changed from normal to major
- Status changed from closed to reopened
Can we please reopen this? This looks like a bug to me, and not explained by the API changes. For example, if I use something like this:
<a href="[the-link]" data-trigger="[button-trigger]">[button-name]</a>
Then the shortcode inside the data attribute is not longer processed, but the others are. The shortcodes should be processed inside a data attribute, just as they are inside an href.
#10
@
9 years ago
- Milestone Awaiting Review deleted
- Resolution set to wontfix
- Status changed from reopened to closed
At the 9 September meeting, participants agreed to close this ticket because fixing it would require a reversion of a previous security patch and would involve generally unwanted changes to the Shortcode API.
For details, please see https://make.wordpress.org/core/2015/09/29/shortcode-roadmap-draft-two/
#11
@
6 years ago
- Keywords needs-patch needs-unit-tests added; close removed
- Milestone set to 5.0
- Resolution wontfix deleted
- Status changed from closed to reopened
Reopening, as data-*
attributes will need to be allowed by KSES for several default Gutenberg blocks.
#12
@
6 years ago
- Component changed from Shortcodes to Editor
- Keywords has-patch added; needs-patch removed
In 33121.diff:
- Allow
data-*
attributes. Needs to be set asdata-*
when adding it, not to be mixed with the HTML 4.0data
attribute (https://www.w3.org/TR/html40/struct/objects.html#adef-data). - Add
data-*
to all elements by default.
Wondering if we should pass the restrictions (array of possible values) or just short circuit as soon as we match a data-*
attribute. The thing is that the restrictions will apply to all data attributes in the element, not to a specific attribute.
#13
@
6 years ago
In 33121.2.diff: also, do not override a specifically set data-something
attribute.
#14
@
6 years ago
- Keywords has-unit-tests added; needs-unit-tests removed
In 33121.3.diff I've taken a quite different approach:
- Removed hard coding of
data-
prefix, any wildcard can be added in_wp_add_global_attributes()
. - Wildcard checks are only run if the full attribute doesn't exist in
$allowed_attr
- Added in a couple of unit tests
- Regardless of approach,
test_wp_kses_attr_data_attribute_is_allowed()
should be a valid test.
#15
@
6 years ago
Looking at 33121.3.diff, don't think it is a good idea to allow adding attributes with a wildcard. (Note that _wp_add_global_attributes()
is just a convenience function to add some attribute names to all tags. There is no way to check how an attribute name was added. So plugins will also be able to add the data-*
attribute name which is the expected behavior.)
That would mean --somebody-- can add on-*
or even o-*
and allow all onerror
, onclick
, onmouseover
, etc. attributes.
What's worse there is no way to sanitize the values of these wildcard attributes (this part in the patch $allowed_attr[$name_low] = true;
), so things like onerror="alert(document.cookie)"
become very possible and not immediately recognizable.
KSES already has support for allowing any attribute on any tag: adding it by full name. So if somebody wants to add onerror
for images, they can easily do that and specify a list of acceptable values. Why would we need to bypass that with a wildcard and reduce functionality (not possible to check/sanitize attribute values)?
The only exception to the above are attributes with variable names, and the only HTML 5.0 attribute with a variable name is data-*
. Thinking we should extend KSES to allow data attributes, but hardcode the data-
part. This is inline with the existing logic in KSES.
Uh, sorry for the longer comment :) The TL;DR: don't think allowing wildcard attributes in KSES is a good thing. It brings us to a pretty dangerous place and at the same time reduces some of the existing functionality: sanitizing attribute values.
#16
@
6 years ago
@peterwilsoncc thanks for adding the test :)
Looking at data--invaild="gone"
and data-also-invaild-="gone"
, it seems having two hyphens or a hyphen as last char of the data-* attribute name is valid per https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/data-* and https://www.w3.org/TR/REC-xml/#NT-Name. Also seems quite a few chars are valid there, but still thinking we should only support a-z0-9_-
.
Also preg_match( '/^' . preg_quote( $prefix ) . '(-[a-z0-9_]+)*$/', $name_low )
would mean we don't allow attribute manes containing two hyphens like data-wp-id
(which is somewhat common).
#17
follow-up:
↓ 18
@
6 years ago
tl;dr in 33121.4.diff:
- Test added to ensure prefix is followed by a hyphen
- Test added to ensure attributes with multiple hyphens allowed
- Regex altered to require one or more instance of
(-[a-z0-9_]+)
, ie'/^' . preg_quote( $prefix ) . '(-[a-z0-9_]+)+$/'
Replying to azaozz:
@peterwilsoncc thanks for adding the test :)
And thanks for the review.
Looking at
data--invaild="gone"
anddata-also-invaild-="gone"
, it seems having two hyphens or a hyphen as last char of the data-* attribute name is valid per https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/data-* and https://www.w3.org/TR/REC-xml/#NT-Name. Also seems quite a few chars are valid there, but still thinking we should only support a-z0-9_-.
This is true but it does some strange things to the element.dataset
property available in JavaScript so I decided to prevent it. I've created a bin with an example https://jsbin.com/muloxeq/edit?html,js,console,output
I'm happy to change this if needs be.
The TL;DR: don't think allowing wildcard attributes in KSES is a good thing. It brings us to a pretty dangerous place and at the same time reduces some of the existing functionality: sanitizing attribute values.
I'm not sure this is the case if we require the hyphen following any prefixes, so a developer won't be able to add href-*
and bypass checking. The regex change I mention below hardens against this.
I'm also prepared to be misunderstanding something, so are you able to let me know if that's the case.
That would mean --somebody-- can add
on-*
or eveno-*
and allow allonerror
,onclick
,onmouseover
, etc. attributes.
This isn't the case as the hyphen is required before any characters in the regex group (-[a-z0-9_]+)
.
As it's important to block, I've added a test in 33121.4.diff to ensure against it.
Also
preg_match( '/^' . preg_quote( $prefix ) . '(-[a-z0-9_]+)*$/', $name_low )
would mean we don't allow attribute names containing two hyphens likedata-wp-id
(which is somewhat common).
This is also incorrect, I've added such an example as a test in 33121.4.diff.
However, the zero or more regex (*
) did allow users to add data="something"
, so I've changed that in the latest patch to be one or more (+
).
#18
in reply to:
↑ 17
@
6 years ago
Replying to peterwilsoncc:
This (two hyphens or end hyphen) is true but it does some strange things to the
element.dataset
property available in JavaScript
Good point. Lets not allow it :)
I'm not sure this (overriding of unset attributes) is the case if we require the hyphen following any prefixes, so a developer won't be able to add
href-*
and bypass checking.
Ah, I see, my error. Now the regex matches only attribute names that have a hyphen.
I'm still not sold on supporting all of these attribute names "globally" with no possibility to check the values. Except data-*
, none of them have variable names and imho should be added the same way like the rest of the attributes are added: by full name.
In any case, what is the purpose of allowing all attributes with a hyphen?
- It would support
data-*
which we should have. - It would also support
aria-*
which would be nice to have. There are about 45 aria attributes and many of them have a list of possible values, like:aria-busy ( true | false ) 'false' aria-checked ( true | false | mixed | undefined ) 'undefined' aria-disabled ( true | false ) 'false' aria-expanded ( true | false | undefined ) 'undefined' aria-grabbed ( true | false | undefined ) 'undefined' aria-hidden ( true | false ) 'false' aria-invalid ( grammar | false | spelling | true ) 'false' aria-pressed ( true | false | mixed | undefined ) 'undefined' aria-selected ( true | false | undefined ) 'undefined'
(see http://www.w3.org/MarkUp/DTD/aria-attributes-1.mod)
For a security/sanitizing function I'd expect all of these to be hard-coded, both the attribute name and possible values, after a security audit of whether any of the values may be misused. Again, this is inline with how KSES works.
Apart from data-*
and aria-*
there are accept-charset
and http-equiv
(see https://developer.mozilla.org/en-US/docs/Web/HTML/Attributes). Don't think it makes sense supporting these.
Am I missing something? Are there other user cases for allowing attributes with a hyphen and not support checking of their values?
#19
@
6 years ago
In 33121.5.diff:
- Based on 33121.2.diff.
- Fix the regex so it doesn't allow trailing hyphen or two consecutive hyphens.
- Add the tests.
All tests from 33121.4.diff pass except the array( '33121-*', true ),
changed that to array( 'data-*', true ),
:)
#20
@
6 years ago
Thinking more about this, a "middle point" between the two approaches would be to tweak _wp_add_global_attributes()
so it can return just the array of $global_attributes
. Then check the prefix against it. That way only hard-coded wildcard attributes will be allowed. (That would still require sanity checking/security auditing of all aria-*
attributes if we decide to add them.)
#21
@
6 years ago
I like the mixed approach in 33121.5.diff.
The main thing I was trying to avoid was doubling up on adding the data-*
prefix in wp_kses_attr_check()
and _wp_add_global_attributes()
.
It occurs to me that in your patches the prefix doesn't need to be added to _wp_add_global_attributes()
. If core needs multiple allowed prefixes in the future we can create a new private function, until then we can use your regex.
#22
@
6 years ago
33121.6.diff is based on the previous patch with:
- multiline comment format changed
@since
added to docblocks- Modified
test_wildcard_attribute_prefixes
to try variations ofdata-
prefix to validate
Following my last comment I could see why you included data-*
in two locations. Let's go with the compromise version.
#24
@
6 years ago
I call 33121.7.diff a valuable reminder I should have run the full test suite three days ago ;)
- alters some REST API tests ensuring API requests are run through kses for users without
unfiltered_html
caps.
#25
@
6 years ago
- Owner set to peterwilsoncc
- Resolution set to fixed
- Status changed from reopened to closed
In 43727:
#26
@
6 years ago
- Keywords fixed-5.0 added
- Resolution fixed deleted
- Status changed from closed to reopened
Reopened for trunk.
#27
follow-up:
↓ 28
@
6 years ago
- Resolution set to fixed
- Status changed from reopened to closed
In 43981:
#28
in reply to:
↑ 27
@
4 years ago
Replying to jeremyfelt:
In 43981:
I'm trying to modify HTML markup inside WooCommerce Gutenberg block using this filter:
woocommerce_blocks_product_grid_item_html
wp_kses and wp_kses_post functions stripping out 'add to cart' button's data attributes with underscore sign:
data-product_id data-product_sku
I'm wondering can I prevent stripping these attributes by wp_kses function?
It depends on the
unfiltered_html
capability which only Administrators and Editors have by default. The data- attributes are not whitelisted.