WordPress.org

Make WordPress Core

Opened 6 weeks ago

Last modified 6 weeks ago

#49464 new defect (bug)

wp_kses_hair and wp_kses_hair_parse regex is not allowing digits or underscores in attribute names

Reported by: codeforest Owned by:
Milestone: 5.5 Priority: normal
Severity: major Version: 5.3.2
Component: Formatting Keywords: has-patch has-unit-tests
Focuses: Cc:

Description

If we have a shortcode inside HTML tag like this:

<a href="https://example.com/[op_get_param param='promoCode' default='Zvonko']" data-op3-timer-seconds="0">Some link</a>

The regex inside wp_kses_hair and wp_kses_hair_parse is stripping data-op3-timer-seconds as invalid attribute name, while it is a legal one.

XML elements must follow these naming rules (source: https://www.w3schools.com/xml/xml_elements.asp):

  • Element names are case-sensitive
  • Element names must start with a letter or underscore
  • Element names cannot start with the letters xml (or XML, or Xml, etc)
  • Element names can contain letters, digits, hyphens, underscores, and periods
  • Element names cannot contain spaces

The solution would be to adjust the regex for attribute names to allow for digits that are not on the first place.

    // original regex line
    '[-a-zA-Z:]+'   // Attribute name.
    // new regex line, we are allowing digits if not on the first place
    '[_a-zA-Z][-_a-zA-Z0-9:]*'   // Attribute name.

Attachments (3)

#49464_updated_regex_for_attribute_names.diff (1007 bytes) - added by codeforest 6 weeks ago.
Patch for the above
#49464_updated_regex_for_attribute_names_with_unit_tests.diff (3.8 KB) - added by codeforest 6 weeks ago.
New Patch with Unit tests included
49464.3.patch (3.4 KB) - added by ayeshrajans 6 weeks ago.
Tests: https://travis-ci.org/Ayesh/wordpress-develop/jobs/652822665

Download all attachments as: .zip

Change History (13)

#1 @SergeyBiryukov
6 weeks ago

  • Component changed from General to Formatting

#2 @SergeyBiryukov
6 weeks ago

  • Keywords has-patch needs-unit-tests added
  • Milestone changed from Awaiting Review to 5.5

#3 @zodiac1978
6 weeks ago

This would fix a bug I was examining exactly at the same time. wp_kses_post is stripping out "data-" attributes if they are containg underscores, like data-test_test.

$test1 = wp_kses_post('<a href="http://google.de">Google</a>');

$test2 = wp_kses_post('<a data-test="xxx" href="http://google.de">Google</a>');

$test3 = wp_kses_post('<a data-test_test="yyy" href="http://google.de">Google</a>');

1 and 2 would be fine, but for 3 the attribute gets stripped out.

The RegEx from the patch would solve this:

Before patch:
https://regex101.com/r/bAeYTE/1

After patch:
https://regex101.com/r/Hbnfmo/1

@codeforest
6 weeks ago

New Patch with Unit tests included

#4 @johnpgreen
6 weeks ago

Likely related: #48608.

#5 follow-up: @dlh
6 weeks ago

Somewhat duplicative of #34406 as well.

#6 in reply to: ↑ 5 @codeforest
6 weeks ago

Replying to dlh:

Somewhat duplicative of #34406 as well.

In a way, that patch from #34406 has a mistake, as it allows attribute names to start with a digit, which is actually not allowed.

#7 @zodiac1978
6 weeks ago

  • Keywords has-unit-tests added; needs-unit-tests removed

Let's close #48608 and #34406 in favor of this ticket because it has the best solution IMHO.

Maybe the title could be changed to show the broader scope of the solution. The underscore is added as well.

#8 @codeforest
6 weeks ago

  • Summary changed from wp_kses_hair and wp_kses_hair_parse regex is not allowing digits in attribute names to wp_kses_hair and wp_kses_hair_parse regex is not allowing digits or underscores in attribute names

#9 follow-up: @ayeshrajans
6 weeks ago

Hi @codeforest - welcome to WordPress Trac!
You are right the regex for HTML attributes do not follow the standard, and your patch fixes it.

There is a small chunk in the patch (under valid_unicode function) that only has changes in indentation, so I will reroll the patch with those alignments fixed.

#10 in reply to: ↑ 9 @codeforest
6 weeks ago

Replying to ayeshrajans:

There is a small chunk in the patch (under valid_unicode function) that only has changes in indentation, so I will reroll the patch with those alignments fixed.

Thanks Ayesh, I missed this change, probably my IDE did not like the indenting there. Sorry and thanks again.

Note: See TracTickets for help on using tickets.