Make WordPress Core

Opened 5 years ago

Closed 4 years ago

Last modified 4 years ago

#49464 closed defect (bug) (fixed)

wp_kses_hair and wp_kses_hair_parse regex is not allowing digits or underscores in attribute names

Reported by: codeforest's profile codeforest Owned by: whyisjake's profile whyisjake
Milestone: 5.5 Priority: normal
Severity: major Version: 5.3.2
Component: Formatting Keywords: has-patch has-unit-tests
Focuses: Cc:

Description

If we have a shortcode inside HTML tag like this:

<a href="https://example.com/[op_get_param param='promoCode' default='Zvonko']" data-op3-timer-seconds="0">Some link</a>

The regex inside wp_kses_hair and wp_kses_hair_parse is stripping data-op3-timer-seconds as invalid attribute name, while it is a legal one.

XML elements must follow these naming rules (source: https://www.w3schools.com/xml/xml_elements.asp):

  • Element names are case-sensitive
  • Element names must start with a letter or underscore
  • Element names cannot start with the letters xml (or XML, or Xml, etc)
  • Element names can contain letters, digits, hyphens, underscores, and periods
  • Element names cannot contain spaces

The solution would be to adjust the regex for attribute names to allow for digits that are not on the first place.

    // original regex line
    '[-a-zA-Z:]+'   // Attribute name.
    // new regex line, we are allowing digits if not on the first place
    '[_a-zA-Z][-_a-zA-Z0-9:]*'   // Attribute name.

Attachments (3)

#49464_updated_regex_for_attribute_names.diff (1007 bytes) - added by codeforest 5 years ago.
Patch for the above
#49464_updated_regex_for_attribute_names_with_unit_tests.diff (3.8 KB) - added by codeforest 5 years ago.
New Patch with Unit tests included
49464.3.patch (3.4 KB) - added by ayeshrajans 5 years ago.
Tests: https://travis-ci.org/Ayesh/wordpress-develop/jobs/652822665

Download all attachments as: .zip

Change History (21)

#1 @SergeyBiryukov
5 years ago

  • Component changed from General to Formatting

#2 @SergeyBiryukov
5 years ago

  • Keywords has-patch needs-unit-tests added
  • Milestone changed from Awaiting Review to 5.5

#3 @zodiac1978
5 years ago

This would fix a bug I was examining exactly at the same time. wp_kses_post is stripping out "data-" attributes if they are containg underscores, like data-test_test.

$test1 = wp_kses_post('<a href="http://google.de">Google</a>');

$test2 = wp_kses_post('<a data-test="xxx" href="http://google.de">Google</a>');

$test3 = wp_kses_post('<a data-test_test="yyy" href="http://google.de">Google</a>');

1 and 2 would be fine, but for 3 the attribute gets stripped out.

The RegEx from the patch would solve this:

Before patch:
https://regex101.com/r/bAeYTE/1

After patch:
https://regex101.com/r/Hbnfmo/1

@codeforest
5 years ago

New Patch with Unit tests included

#4 @johnpgreen
5 years ago

Likely related: #48608.

#5 follow-up: @dlh
5 years ago

Somewhat duplicative of #34406 as well.

#6 in reply to: ↑ 5 @codeforest
5 years ago

Replying to dlh:

Somewhat duplicative of #34406 as well.

In a way, that patch from #34406 has a mistake, as it allows attribute names to start with a digit, which is actually not allowed.

#7 @zodiac1978
5 years ago

  • Keywords has-unit-tests added; needs-unit-tests removed

Let's close #48608 and #34406 in favor of this ticket because it has the best solution IMHO.

Maybe the title could be changed to show the broader scope of the solution. The underscore is added as well.

#8 @codeforest
5 years ago

  • Summary changed from wp_kses_hair and wp_kses_hair_parse regex is not allowing digits in attribute names to wp_kses_hair and wp_kses_hair_parse regex is not allowing digits or underscores in attribute names

#9 follow-up: @ayeshrajans
5 years ago

Hi @codeforest - welcome to WordPress Trac!
You are right the regex for HTML attributes do not follow the standard, and your patch fixes it.

There is a small chunk in the patch (under valid_unicode function) that only has changes in indentation, so I will reroll the patch with those alignments fixed.

#10 in reply to: ↑ 9 @codeforest
5 years ago

Replying to ayeshrajans:

There is a small chunk in the patch (under valid_unicode function) that only has changes in indentation, so I will reroll the patch with those alignments fixed.

Thanks Ayesh, I missed this change, probably my IDE did not like the indenting there. Sorry and thanks again.

This ticket was mentioned in Slack in #core by david.baumwald. View the logs.


4 years ago

#12 follow-up: @codeforest
4 years ago

Do we know if this is accepted for 5.5?

#13 in reply to: ↑ 12 @zodiac1978
4 years ago

Replying to codeforest:

Do we know if this is accepted for 5.5?

It is milestoned for 5.5 and the 5.5 lead (@whyisjake) is watching the ticket, so I think it looks good.

There are many things going on, but hopefully this is merged before the beta on 7th July 2020
https://make.wordpress.org/core/5-5/

#14 @whyisjake
4 years ago

  • Owner set to whyisjake
  • Status changed from new to accepted

#15 @whyisjake
4 years ago

#34406 was marked as a duplicate.

#16 @whyisjake
4 years ago

#48608 was marked as a duplicate.

#17 @whyisjake
4 years ago

  • Resolution set to fixed
  • Status changed from accepted to closed

In 48132:

Formatting: Extend wp_kses_hair and wp_kses_hair_parse to allow digits and underscores.

Fixes a lot of issues around parsing XML/HTML attributes.

Fixes #49464.

See #34406, #48608.

Props codeforest, zodiac1978, johnpgreen, dlh, ayeshrajans, johnpgreen, rilwis, travisnorthcutt, miqrogroove, chriscct7, whyisjake.

#18 @SergeyBiryukov
4 years ago

In 48139:

Docs: Use sentence case for comments in tests/kses.php, per the documentation standards.

Follow-up to [48132].

See #49464, #49572.

Note: See TracTickets for help on using tickets.