WordPress.org

Make WordPress Core

Opened 15 months ago

Closed 11 months ago

Last modified 11 months ago

#49464 closed defect (bug) (fixed)

wp_kses_hair and wp_kses_hair_parse regex is not allowing digits or underscores in attribute names

Reported by: codeforest Owned by: whyisjake
Milestone: 5.5 Priority: normal
Severity: major Version: 5.3.2
Component: Formatting Keywords: has-patch has-unit-tests
Focuses: Cc:

Description

If we have a shortcode inside HTML tag like this:

<a href="https://example.com/[op_get_param param='promoCode' default='Zvonko']" data-op3-timer-seconds="0">Some link</a>

The regex inside wp_kses_hair and wp_kses_hair_parse is stripping data-op3-timer-seconds as invalid attribute name, while it is a legal one.

XML elements must follow these naming rules (source: https://www.w3schools.com/xml/xml_elements.asp):

  • Element names are case-sensitive
  • Element names must start with a letter or underscore
  • Element names cannot start with the letters xml (or XML, or Xml, etc)
  • Element names can contain letters, digits, hyphens, underscores, and periods
  • Element names cannot contain spaces

The solution would be to adjust the regex for attribute names to allow for digits that are not on the first place.

    // original regex line
    '[-a-zA-Z:]+'   // Attribute name.
    // new regex line, we are allowing digits if not on the first place
    '[_a-zA-Z][-_a-zA-Z0-9:]*'   // Attribute name.

Attachments (3)

#49464_updated_regex_for_attribute_names.diff (1007 bytes) - added by codeforest 15 months ago.
Patch for the above
#49464_updated_regex_for_attribute_names_with_unit_tests.diff (3.8 KB) - added by codeforest 15 months ago.
New Patch with Unit tests included
49464.3.patch (3.4 KB) - added by ayeshrajans 15 months ago.
Tests: https://travis-ci.org/Ayesh/wordpress-develop/jobs/652822665

Download all attachments as: .zip

Change History (21)

#1 @SergeyBiryukov
15 months ago

  • Component changed from General to Formatting

#2 @SergeyBiryukov
15 months ago

  • Keywords has-patch needs-unit-tests added
  • Milestone changed from Awaiting Review to 5.5

#3 @zodiac1978
15 months ago

This would fix a bug I was examining exactly at the same time. wp_kses_post is stripping out "data-" attributes if they are containg underscores, like data-test_test.

$test1 = wp_kses_post('<a href="http://google.de">Google</a>');

$test2 = wp_kses_post('<a data-test="xxx" href="http://google.de">Google</a>');

$test3 = wp_kses_post('<a data-test_test="yyy" href="http://google.de">Google</a>');

1 and 2 would be fine, but for 3 the attribute gets stripped out.

The RegEx from the patch would solve this:

Before patch:
https://regex101.com/r/bAeYTE/1

After patch:
https://regex101.com/r/Hbnfmo/1

@codeforest
15 months ago

New Patch with Unit tests included

#4 @johnpgreen
15 months ago

Likely related: #48608.

#5 follow-up: @dlh
15 months ago

Somewhat duplicative of #34406 as well.

#6 in reply to: ↑ 5 @codeforest
15 months ago

Replying to dlh:

Somewhat duplicative of #34406 as well.

In a way, that patch from #34406 has a mistake, as it allows attribute names to start with a digit, which is actually not allowed.

#7 @zodiac1978
15 months ago

  • Keywords has-unit-tests added; needs-unit-tests removed

Let's close #48608 and #34406 in favor of this ticket because it has the best solution IMHO.

Maybe the title could be changed to show the broader scope of the solution. The underscore is added as well.

#8 @codeforest
15 months ago

  • Summary changed from wp_kses_hair and wp_kses_hair_parse regex is not allowing digits in attribute names to wp_kses_hair and wp_kses_hair_parse regex is not allowing digits or underscores in attribute names

#9 follow-up: @ayeshrajans
15 months ago

Hi @codeforest - welcome to WordPress Trac!
You are right the regex for HTML attributes do not follow the standard, and your patch fixes it.

There is a small chunk in the patch (under valid_unicode function) that only has changes in indentation, so I will reroll the patch with those alignments fixed.

#10 in reply to: ↑ 9 @codeforest
15 months ago

Replying to ayeshrajans:

There is a small chunk in the patch (under valid_unicode function) that only has changes in indentation, so I will reroll the patch with those alignments fixed.

Thanks Ayesh, I missed this change, probably my IDE did not like the indenting there. Sorry and thanks again.

This ticket was mentioned in Slack in #core by david.baumwald. View the logs.


11 months ago

#12 follow-up: @codeforest
11 months ago

Do we know if this is accepted for 5.5?

#13 in reply to: ↑ 12 @zodiac1978
11 months ago

Replying to codeforest:

Do we know if this is accepted for 5.5?

It is milestoned for 5.5 and the 5.5 lead (@whyisjake) is watching the ticket, so I think it looks good.

There are many things going on, but hopefully this is merged before the beta on 7th July 2020
https://make.wordpress.org/core/5-5/

#14 @whyisjake
11 months ago

  • Owner set to whyisjake
  • Status changed from new to accepted

#15 @whyisjake
11 months ago

#34406 was marked as a duplicate.

#16 @whyisjake
11 months ago

#48608 was marked as a duplicate.

#17 @whyisjake
11 months ago

  • Resolution set to fixed
  • Status changed from accepted to closed

In 48132:

Formatting: Extend wp_kses_hair and wp_kses_hair_parse to allow digits and underscores.

Fixes a lot of issues around parsing XML/HTML attributes.

Fixes #49464.

See #34406, #48608.

Props codeforest, zodiac1978, johnpgreen, dlh, ayeshrajans, johnpgreen, rilwis, travisnorthcutt, miqrogroove, chriscct7, whyisjake.

#18 @SergeyBiryukov
11 months ago

In 48139:

Docs: Use sentence case for comments in tests/kses.php, per the documentation standards.

Follow-up to [48132].

See #49464, #49572.

Note: See TracTickets for help on using tickets.