#33121 closed defect (bug) (fixed)
wp_kses_attr_check fails to process html data-* attributes
Reported by: | isoftware | Owned by: | peterwilsoncc |
---|---|---|---|
Milestone: | 5.0 | Priority: | normal |
Severity: | major | Version: | 4.2.3 |
Component: | Editor | Keywords: | has-patch has-unit-tests commit fixed-5.0 |
Focuses: | Cc: |
Description (last modified by )
wp_kses_attr_check does not take into account data attributes and retuns false on valid elements.
You can reproduce this issue with shortcodes in data attributes using the following scenario
<a data-trigger="[button-trigger]" href="#">[button-name]</>
Attachments (7)
Change History (35)
#1
@
9 years ago
- Component changed from General to Formatting
- Description modified (diff)
- Focuses template removed
#3
@
9 years ago
Thank you for your swift repsonse.
I have the following shortcode on a site that has only one admin user so all content is created from admin:
... <a data-slide-index="[counter-increment output='true']" href="#"></a> ...
On debug I can see that the counter-increment shortcode is returning the correct integer value but after the wp_kses_attr_check returns false, wp_kses_one_attr returns empty string and the returned shortcode content is :
... <a href="#"></a> ...
#4
@
9 years ago
- Keywords close removed
In addition I could not find a way to generically whitelist the data-attributes. One needs to hook into the wp_kses_allowed_html filter to register explicitly data attributes for specific elements which seems quite complicated for the shortcodes example.
#5
follow-up:
↓ 6
@
9 years ago
- Milestone Awaiting Review deleted
- Resolution set to invalid
- Status changed from new to closed
Please see the explanation at https://make.wordpress.org/core/2015/07/23/changes-to-the-shortcode-api/
#6
in reply to:
↑ 5
@
9 years ago
Replying to miqrogroove:
Please see the explanation at https://make.wordpress.org/core/2015/07/23/changes-to-the-shortcode-api/
Please note that in the link supplied there is no mention of html data attributes beeing excluded and there should be a mention somewhere in the documentation that they are considered unsafe.
#7
@
9 years ago
- Resolution invalid deleted
- Severity changed from normal to major
- Status changed from closed to reopened
Can we please reopen this? This looks like a bug to me, and not explained by the API changes. For example, if I use something like this:
<a href="[the-link]" data-trigger="[button-trigger]">[button-name]</a>
Then the shortcode inside the data attribute is not longer processed, but the others are. The shortcodes should be processed inside a data attribute, just as they are inside an href.
#10
@
9 years ago
- Milestone Awaiting Review deleted
- Resolution set to wontfix
- Status changed from reopened to closed
At the 9 September meeting, participants agreed to close this ticket because fixing it would require a reversion of a previous security patch and would involve generally unwanted changes to the Shortcode API.
For details, please see https://make.wordpress.org/core/2015/09/29/shortcode-roadmap-draft-two/
#11
@
6 years ago
- Keywords needs-patch needs-unit-tests added; close removed
- Milestone set to 5.0
- Resolution wontfix deleted
- Status changed from closed to reopened
Reopening, as data-*
attributes will need to be allowed by KSES for several default Gutenberg blocks.
#12
@
6 years ago
- Component changed from Shortcodes to Editor
- Keywords has-patch added; needs-patch removed
In 33121.diff:
- Allow
data-*
attributes. Needs to be set asdata-*
when adding it, not to be mixed with the HTML 4.0data
attribute (https://www.w3.org/TR/html40/struct/objects.html#adef-data). - Add
data-*
to all elements by default.
Wondering if we should pass the restrictions (array of possible values) or just short circuit as soon as we match a data-*
attribute. The thing is that the restrictions will apply to all data attributes in the element, not to a specific attribute.
#13
@
6 years ago
In 33121.2.diff: also, do not override a specifically set data-something
attribute.
#14
@
6 years ago
- Keywords has-unit-tests added; needs-unit-tests removed
In 33121.3.diff I've taken a quite different approach:
- Removed hard coding of
data-
prefix, any wildcard can be added in_wp_add_global_attributes()
. - Wildcard checks are only run if the full attribute doesn't exist in
$allowed_attr
- Added in a couple of unit tests
- Regardless of approach,
test_wp_kses_attr_data_attribute_is_allowed()
should be a valid test.
#15
@
6 years ago
Looking at 33121.3.diff, don't think it is a good idea to allow adding attributes with a wildcard. That would mean --somebody-- can add on-*
or even o-*
and allow all onerror
, onclick
, onmouseover
, etc. attributes.
What's worse there is no way to sanitize the values of these wildcard attributes (this part in the patch $allowed_attr[$name_low] = true;
), so things like onerror="alert(document.cookie)"
become very possible and not immediately recognizable.
KSES already has support for allowing any attribute on any tag: adding it by full name. So if somebody wants to add onerror
for images, they can easily do that and specify a list of acceptable values. Why would we need to bypass that with a wildcard and reduce functionality (not possible to check/sanitize attribute values)?
The only exception to the above are attributes with variable names, and the only HTML 5.0 attribute with a variable name is data-*
. Thinking we should extend KSES to allow data attributes, but hardcode the data-
part. This is inline with the existing logic in KSES.
Uh, sorry for the longer comment :) The TL;DR: don't think allowing wildcard attributes in KSES is a good thing. It brings us to a pretty dangerous place and at the same time reduces some of the existing functionality: sanitizing attribute values.
#16
@
6 years ago
@peterwilsoncc thanks for adding the test :)
Looking at data--invaild="gone"
and data-also-invaild-="gone"
, it seems having two hyphens or a hyphen as last char of the data-* attribute name is valid per https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/data-* and https://www.w3.org/TR/REC-xml/#NT-Name. Also seems quite a few chars are valid there, but still thinking we should only support a-z0-9_-
.
Also preg_match( '/^' . preg_quote( $prefix ) . '(-[a-z0-9_]+)*$/', $name_low )
would mean we don't allow attribute names containing two hyphens like data-wp-id
(which is somewhat common).
#17
follow-up:
↓ 18
@
6 years ago
tl;dr in 33121.4.diff:
- Test added to ensure prefix is followed by a hyphen
- Test added to ensure attributes with multiple hyphens allowed
- Regex altered to require one or more instance of
(-[a-z0-9_]+)
, ie'/^' . preg_quote( $prefix ) . '(-[a-z0-9_]+)+$/'
Replying to azaozz:
@peterwilsoncc thanks for adding the test :)
And thanks for the review.
Looking at
data--invaild="gone"
anddata-also-invaild-="gone"
, it seems having two hyphens or a hyphen as last char of the data-* attribute name is valid per https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/data-* and https://www.w3.org/TR/REC-xml/#NT-Name. Also seems quite a few chars are valid there, but still thinking we should only support a-z0-9_-.
This is true but it does some strange things to the element.dataset
property available in JavaScript so I decided to prevent it. I've created a bin with an example https://jsbin.com/muloxeq/edit?html,js,console,output
I'm happy to change this if needs be.
The TL;DR: don't think allowing wildcard attributes in KSES is a good thing. It brings us to a pretty dangerous place and at the same time reduces some of the existing functionality: sanitizing attribute values.
I'm not sure this is the case if we require the hyphen following any prefixes, so a developer won't be able to add href-*
and bypass checking. The regex change I mention below hardens against this.
I'm also prepared to be misunderstanding something, so are you able to let me know if that's the case.
That would mean --somebody-- can add
on-*
or eveno-*
and allow allonerror
,onclick
,onmouseover
, etc. attributes.
This isn't the case as the hyphen is required before any characters in the regex group (-[a-z0-9_]+)
.
As it's important to block, I've added a test in 33121.4.diff to ensure against it.
Also
preg_match( '/^' . preg_quote( $prefix ) . '(-[a-z0-9_]+)*$/', $name_low )
would mean we don't allow attribute names containing two hyphens likedata-wp-id
(which is somewhat common).
This is also incorrect, I've added such an example as a test in 33121.4.diff.
However, the zero or more regex (*
) did allow users to add data="something"
, so I've changed that in the latest patch to be one or more (+
).
#18
in reply to:
↑ 17
@
6 years ago
Replying to peterwilsoncc:
This (two hyphens or end hyphen) is true but it does some strange things to the
element.dataset
property available in JavaScript
Good point. Lets not allow it :)
I'm not sure this (overriding of unset attributes) is the case if we require the hyphen following any prefixes, so a developer won't be able to add
href-*
and bypass checking.
Ah, I see, my error. Now the regex matches only attribute names that have a hyphen.
I'm still not sold on supporting all of these attribute names "globally" with no possibility to check the values. Except data-*
, none of them have variable names and imho should be added the same way like the rest of the attributes are added: by full name.
In any case, what is the purpose of allowing all attributes with a hyphen?
- It would support
data-*
which we should have. - It would also support
aria-*
which would be nice to have. There are about 45 aria attributes and many of them have a list of possible values, like:aria-busy ( true | false ) 'false' aria-checked ( true | false | mixed | undefined ) 'undefined' aria-disabled ( true | false ) 'false' aria-expanded ( true | false | undefined ) 'undefined' aria-grabbed ( true | false | undefined ) 'undefined' aria-hidden ( true | false ) 'false' aria-invalid ( grammar | false | spelling | true ) 'false' aria-pressed ( true | false | mixed | undefined ) 'undefined' aria-selected ( true | false | undefined ) 'undefined'
(see http://www.w3.org/MarkUp/DTD/aria-attributes-1.mod)
For a security/sanitizing function I'd expect all of these to be hard-coded, both the attribute name and possible values, after a security audit of whether any of the values may be misused. Again, this is inline with how KSES works.
Apart from data-*
and aria-*
there are accept-charset
and http-equiv
(see https://developer.mozilla.org/en-US/docs/Web/HTML/Attributes). Don't think it makes sense supporting these.
Am I missing something? Are there other user cases for allowing attributes with a hyphen and not support checking of their values?
#19
@
6 years ago
In 33121.5.diff:
- Based on 33121.2.diff.
- Fix the regex so it doesn't allow trailing hyphen or two consecutive hyphens.
- Add the tests.
All tests from 33121.4.diff pass except the array( '33121-*', true ),
changed that to array( 'data-*', true ),
:)
#20
@
6 years ago
Thinking more about this, a "middle point" between the two approaches would be to tweak _wp_add_global_attributes()
so it can return just the array of $global_attributes
. Then check the prefix against it. That way only hard-coded wildcard attributes will be allowed. (That would still require sanity checking/security auditing of all aria-*
attributes if we decide to add them.)
#21
@
6 years ago
I like the mixed approach in 33121.5.diff.
The main thing I was trying to avoid was doubling up on adding the data-*
prefix in wp_kses_attr_check()
and _wp_add_global_attributes()
.
It occurs to me that in your patches the prefix doesn't need to be added to _wp_add_global_attributes()
. If core needs multiple allowed prefixes in the future we can create a new private function, until then we can use your regex.
#22
@
6 years ago
33121.6.diff is based on the previous patch with:
- multiline comment format changed
@since
added to docblocks- Modified
test_wildcard_attribute_prefixes
to try variations ofdata-
prefix to validate
Following my last comment I could see why you included data-*
in two locations. Let's go with the compromise version.
#24
@
6 years ago
I call 33121.7.diff a valuable reminder I should have run the full test suite three days ago ;)
- alters some REST API tests ensuring API requests are run through kses for users without
unfiltered_html
caps.
#25
@
6 years ago
- Owner set to peterwilsoncc
- Resolution set to fixed
- Status changed from reopened to closed
In 43727:
#26
@
6 years ago
- Keywords fixed-5.0 added
- Resolution fixed deleted
- Status changed from closed to reopened
Reopened for trunk.
#27
follow-up:
↓ 28
@
6 years ago
- Resolution set to fixed
- Status changed from reopened to closed
In 43981:
#28
in reply to:
↑ 27
@
4 years ago
Replying to jeremyfelt:
In 43981:
I'm trying to modify HTML markup inside WooCommerce Gutenberg block using this filter:
woocommerce_blocks_product_grid_item_html
wp_kses and wp_kses_post functions stripping out 'add to cart' button's data attributes with underscore sign:
data-product_id data-product_sku
I'm wondering can I prevent stripping these attributes by wp_kses function?
It depends on the
unfiltered_html
capability which only Administrators and Editors have by default. The data- attributes are not whitelisted.