WordPress.org

Make WordPress Core

Opened 6 months ago

Closed 7 weeks ago

Last modified 7 weeks ago

#49464 closed defect (bug) (fixed)

wp_kses_hair and wp_kses_hair_parse regex is not allowing digits or underscores in attribute names

Reported by: codeforest Owned by: whyisjake
Milestone: 5.5 Priority: normal
Severity: major Version: 5.3.2
Component: Formatting Keywords: has-patch has-unit-tests
Focuses: Cc:

Description

If we have a shortcode inside HTML tag like this:

<a href="https://example.com/[op_get_param param='promoCode' default='Zvonko']" data-op3-timer-seconds="0">Some link</a>

The regex inside wp_kses_hair and wp_kses_hair_parse is stripping data-op3-timer-seconds as invalid attribute name, while it is a legal one.

XML elements must follow these naming rules (source: https://www.w3schools.com/xml/xml_elements.asp):

  • Element names are case-sensitive
  • Element names must start with a letter or underscore
  • Element names cannot start with the letters xml (or XML, or Xml, etc)
  • Element names can contain letters, digits, hyphens, underscores, and periods
  • Element names cannot contain spaces

The solution would be to adjust the regex for attribute names to allow for digits that are not on the first place.

    // original regex line
    '[-a-zA-Z:]+'   // Attribute name.
    // new regex line, we are allowing digits if not on the first place
    '[_a-zA-Z][-_a-zA-Z0-9:]*'   // Attribute name.

Attachments (3)

#49464_updated_regex_for_attribute_names.diff (1007 bytes) - added by codeforest 6 months ago.
Patch for the above
#49464_updated_regex_for_attribute_names_with_unit_tests.diff (3.8 KB) - added by codeforest 6 months ago.
New Patch with Unit tests included
49464.3.patch (3.4 KB) - added by ayeshrajans 6 months ago.
Tests: https://travis-ci.org/Ayesh/wordpress-develop/jobs/652822665

Download all attachments as: .zip

Change History (21)

#1 @SergeyBiryukov
6 months ago

  • Component changed from General to Formatting

#2 @SergeyBiryukov
6 months ago

  • Keywords has-patch needs-unit-tests added
  • Milestone changed from Awaiting Review to 5.5

#3 @zodiac1978
6 months ago

This would fix a bug I was examining exactly at the same time. wp_kses_post is stripping out "data-" attributes if they are containg underscores, like data-test_test.

$test1 = wp_kses_post('<a href="http://google.de">Google</a>');

$test2 = wp_kses_post('<a data-test="xxx" href="http://google.de">Google</a>');

$test3 = wp_kses_post('<a data-test_test="yyy" href="http://google.de">Google</a>');

1 and 2 would be fine, but for 3 the attribute gets stripped out.

The RegEx from the patch would solve this:

Before patch:
https://regex101.com/r/bAeYTE/1

After patch:
https://regex101.com/r/Hbnfmo/1

@codeforest
6 months ago

New Patch with Unit tests included

#4 @johnpgreen
6 months ago

Likely related: #48608.

#5 follow-up: @dlh
6 months ago

Somewhat duplicative of #34406 as well.

#6 in reply to: ↑ 5 @codeforest
6 months ago

Replying to dlh:

Somewhat duplicative of #34406 as well.

In a way, that patch from #34406 has a mistake, as it allows attribute names to start with a digit, which is actually not allowed.

#7 @zodiac1978
6 months ago

  • Keywords has-unit-tests added; needs-unit-tests removed

Let's close #48608 and #34406 in favor of this ticket because it has the best solution IMHO.

Maybe the title could be changed to show the broader scope of the solution. The underscore is added as well.

#8 @codeforest
6 months ago

  • Summary changed from wp_kses_hair and wp_kses_hair_parse regex is not allowing digits in attribute names to wp_kses_hair and wp_kses_hair_parse regex is not allowing digits or underscores in attribute names

#9 follow-up: @ayeshrajans
6 months ago

Hi @codeforest - welcome to WordPress Trac!
You are right the regex for HTML attributes do not follow the standard, and your patch fixes it.

There is a small chunk in the patch (under valid_unicode function) that only has changes in indentation, so I will reroll the patch with those alignments fixed.

#10 in reply to: ↑ 9 @codeforest
6 months ago

Replying to ayeshrajans:

There is a small chunk in the patch (under valid_unicode function) that only has changes in indentation, so I will reroll the patch with those alignments fixed.

Thanks Ayesh, I missed this change, probably my IDE did not like the indenting there. Sorry and thanks again.

This ticket was mentioned in Slack in #core by david.baumwald. View the logs.


8 weeks ago

#12 follow-up: @codeforest
8 weeks ago

Do we know if this is accepted for 5.5?

#13 in reply to: ↑ 12 @zodiac1978
8 weeks ago

Replying to codeforest:

Do we know if this is accepted for 5.5?

It is milestoned for 5.5 and the 5.5 lead (@whyisjake) is watching the ticket, so I think it looks good.

There are many things going on, but hopefully this is merged before the beta on 7th July 2020
https://make.wordpress.org/core/5-5/

#14 @whyisjake
7 weeks ago

  • Owner set to whyisjake
  • Status changed from new to accepted

#15 @whyisjake
7 weeks ago

#34406 was marked as a duplicate.

#16 @whyisjake
7 weeks ago

#48608 was marked as a duplicate.

#17 @whyisjake
7 weeks ago

  • Resolution set to fixed
  • Status changed from accepted to closed

In 48132:

Formatting: Extend wp_kses_hair and wp_kses_hair_parse to allow digits and underscores.

Fixes a lot of issues around parsing XML/HTML attributes.

Fixes #49464.

See #34406, #48608.

Props codeforest, zodiac1978, johnpgreen, dlh, ayeshrajans, johnpgreen, rilwis, travisnorthcutt, miqrogroove, chriscct7, whyisjake.

#18 @SergeyBiryukov
7 weeks ago

In 48139:

Docs: Use sentence case for comments in tests/kses.php, per the documentation standards.

Follow-up to [48132].

See #49464, #49572.

Note: See TracTickets for help on using tickets.