#49464 closed defect (bug) (fixed)
wp_kses_hair and wp_kses_hair_parse regex is not allowing digits or underscores in attribute names
Reported by: | codeforest | Owned by: | whyisjake |
---|---|---|---|
Milestone: | 5.5 | Priority: | normal |
Severity: | major | Version: | 5.3.2 |
Component: | Formatting | Keywords: | has-patch has-unit-tests |
Focuses: | Cc: |
Description
If we have a shortcode inside HTML tag like this:
<a href="https://example.com/[op_get_param param='promoCode' default='Zvonko']" data-op3-timer-seconds="0">Some link</a>
The regex inside wp_kses_hair and wp_kses_hair_parse is stripping data-op3-timer-seconds as invalid attribute name, while it is a legal one.
XML elements must follow these naming rules (source: https://www.w3schools.com/xml/xml_elements.asp):
- Element names are case-sensitive
- Element names must start with a letter or underscore
- Element names cannot start with the letters xml (or XML, or Xml, etc)
- Element names can contain letters, digits, hyphens, underscores, and periods
- Element names cannot contain spaces
The solution would be to adjust the regex for attribute names to allow for digits that are not on the first place.
// original regex line '[-a-zA-Z:]+' // Attribute name.
// new regex line, we are allowing digits if not on the first place '[_a-zA-Z][-_a-zA-Z0-9:]*' // Attribute name.
Attachments (3)
Change History (21)
#2
@
5 years ago
- Keywords has-patch needs-unit-tests added
- Milestone changed from Awaiting Review to 5.5
#3
@
5 years ago
This would fix a bug I was examining exactly at the same time. wp_kses_post
is stripping out "data-" attributes if they are containg underscores, like data-test_test
.
$test1 = wp_kses_post('<a href="http://google.de">Google</a>');
$test2 = wp_kses_post('<a data-test="xxx" href="http://google.de">Google</a>');
$test3 = wp_kses_post('<a data-test_test="yyy" href="http://google.de">Google</a>');
1 and 2 would be fine, but for 3 the attribute gets stripped out.
The RegEx from the patch would solve this:
Before patch:
https://regex101.com/r/bAeYTE/1
After patch:
https://regex101.com/r/Hbnfmo/1
#8
@
5 years ago
- Summary changed from wp_kses_hair and wp_kses_hair_parse regex is not allowing digits in attribute names to wp_kses_hair and wp_kses_hair_parse regex is not allowing digits or underscores in attribute names
#9
follow-up:
↓ 10
@
5 years ago
Hi @codeforest - welcome to WordPress Trac!
You are right the regex for HTML attributes do not follow the standard, and your patch fixes it.
There is a small chunk in the patch (under valid_unicode
function) that only has changes in indentation, so I will reroll the patch with those alignments fixed.
#10
in reply to:
↑ 9
@
5 years ago
Replying to ayeshrajans:
There is a small chunk in the patch (under
valid_unicode
function) that only has changes in indentation, so I will reroll the patch with those alignments fixed.
Thanks Ayesh, I missed this change, probably my IDE did not like the indenting there. Sorry and thanks again.
This ticket was mentioned in Slack in #core by david.baumwald. View the logs.
4 years ago
#13
in reply to:
↑ 12
@
4 years ago
Replying to codeforest:
Do we know if this is accepted for 5.5?
It is milestoned for 5.5 and the 5.5 lead (@whyisjake) is watching the ticket, so I think it looks good.
There are many things going on, but hopefully this is merged before the beta on 7th July 2020
https://make.wordpress.org/core/5-5/
Patch for the above