WordPress.org

Make WordPress Core

Opened 3 months ago

Last modified 3 months ago

#49464 new defect (bug)

wp_kses_hair and wp_kses_hair_parse regex is not allowing digits or underscores in attribute names

Reported by: codeforest Owned by:
Milestone: 5.5 Priority: normal
Severity: major Version: 5.3.2
Component: Formatting Keywords: has-patch has-unit-tests
Focuses: Cc:

Description

If we have a shortcode inside HTML tag like this:

<a href="https://example.com/[op_get_param param='promoCode' default='Zvonko']" data-op3-timer-seconds="0">Some link</a>

The regex inside wp_kses_hair and wp_kses_hair_parse is stripping data-op3-timer-seconds as invalid attribute name, while it is a legal one.

XML elements must follow these naming rules (source: https://www.w3schools.com/xml/xml_elements.asp):

  • Element names are case-sensitive
  • Element names must start with a letter or underscore
  • Element names cannot start with the letters xml (or XML, or Xml, etc)
  • Element names can contain letters, digits, hyphens, underscores, and periods
  • Element names cannot contain spaces

The solution would be to adjust the regex for attribute names to allow for digits that are not on the first place.

    // original regex line
    '[-a-zA-Z:]+'   // Attribute name.
    // new regex line, we are allowing digits if not on the first place
    '[_a-zA-Z][-_a-zA-Z0-9:]*'   // Attribute name.

Attachments (3)

#49464_updated_regex_for_attribute_names.diff (1007 bytes) - added by codeforest 3 months ago.
Patch for the above
#49464_updated_regex_for_attribute_names_with_unit_tests.diff (3.8 KB) - added by codeforest 3 months ago.
New Patch with Unit tests included
49464.3.patch (3.4 KB) - added by ayeshrajans 3 months ago.
Tests: https://travis-ci.org/Ayesh/wordpress-develop/jobs/652822665

Download all attachments as: .zip

Change History (13)

#1 @SergeyBiryukov
3 months ago

  • Component changed from General to Formatting

#2 @SergeyBiryukov
3 months ago

  • Keywords has-patch needs-unit-tests added
  • Milestone changed from Awaiting Review to 5.5

#3 @zodiac1978
3 months ago

This would fix a bug I was examining exactly at the same time. wp_kses_post is stripping out "data-" attributes if they are containg underscores, like data-test_test.

$test1 = wp_kses_post('<a href="http://google.de">Google</a>');

$test2 = wp_kses_post('<a data-test="xxx" href="http://google.de">Google</a>');

$test3 = wp_kses_post('<a data-test_test="yyy" href="http://google.de">Google</a>');

1 and 2 would be fine, but for 3 the attribute gets stripped out.

The RegEx from the patch would solve this:

Before patch:
https://regex101.com/r/bAeYTE/1

After patch:
https://regex101.com/r/Hbnfmo/1

@codeforest
3 months ago

New Patch with Unit tests included

#4 @johnpgreen
3 months ago

Likely related: #48608.

#5 follow-up: @dlh
3 months ago

Somewhat duplicative of #34406 as well.

#6 in reply to: ↑ 5 @codeforest
3 months ago

Replying to dlh:

Somewhat duplicative of #34406 as well.

In a way, that patch from #34406 has a mistake, as it allows attribute names to start with a digit, which is actually not allowed.

#7 @zodiac1978
3 months ago

  • Keywords has-unit-tests added; needs-unit-tests removed

Let's close #48608 and #34406 in favor of this ticket because it has the best solution IMHO.

Maybe the title could be changed to show the broader scope of the solution. The underscore is added as well.

#8 @codeforest
3 months ago

  • Summary changed from wp_kses_hair and wp_kses_hair_parse regex is not allowing digits in attribute names to wp_kses_hair and wp_kses_hair_parse regex is not allowing digits or underscores in attribute names

#9 follow-up: @ayeshrajans
3 months ago

Hi @codeforest - welcome to WordPress Trac!
You are right the regex for HTML attributes do not follow the standard, and your patch fixes it.

There is a small chunk in the patch (under valid_unicode function) that only has changes in indentation, so I will reroll the patch with those alignments fixed.

#10 in reply to: ↑ 9 @codeforest
3 months ago

Replying to ayeshrajans:

There is a small chunk in the patch (under valid_unicode function) that only has changes in indentation, so I will reroll the patch with those alignments fixed.

Thanks Ayesh, I missed this change, probably my IDE did not like the indenting there. Sorry and thanks again.

Note: See TracTickets for help on using tickets.