Make WordPress Core

Opened 6 years ago

Last modified 6 days ago

#43457 new defect (bug)

`wp_html_split` valid HTML attributes issues

Reported by: soulseekah's profile soulseekah Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version:
Component: Shortcodes Keywords: has-patch has-unit-tests
Focuses: Cc:

Description

There are a handful of valid HTML attributes that shatter wp_html_split.

Since it works by looking for the < character we can break it in many ways, starting from:

https://mathiasbynens.be/demo/crazy-class
https://mathiasbynens.be/demo/html5-id

And ending in the less exotic and crazy:

<span data-content="<p>abcd</p>">loading...</span>

Same goes for CSS attribute selectors in <style> tags.

Related #43456, #39153, #40191

Attachments (1)

43457.tests.diff (943 bytes) - added by soulseekah 6 years ago.

Download all attachments as: .zip

Change History (4)

This ticket was mentioned in PR #5697 on WordPress/wordpress-develop by co6x0.


6 days ago
#1

  • Keywords has-patch has-unit-tests added

Ensures valid HTML is worked correctly by wptexturize(), wp_html_split(), etc.
I started working on this PR when I noticed that using TailwindCSS child selectors would break the layout of block theme (also reported in Trac ticket: 57381).

I have identified a problem with the regular expression defined in _get_wptexturize_split_regex() used in wptexturize().
This problem seemed to be affecting get_the_block_template_html() and causing the block theme layout collapse described above.
Changing this regex fixes the layout issue.

Also, wp_html_split() uses almost the same regex.
Other trac tickets caused by this function will also be fixed by updating to a similar regex.

According to the HTML reference at html.spec.whatwg.org, attribute values can contain a variety of characters.
With this in mind, I have modified the regex to exclude matching characters within quotation marks.
This fixes the misplacement of GREATER-THAN SIGN(>) and prevents other valid HTML structures from being mishandled.

I've included tests to cover these changes in tests/phpunit/tests/formatting/wpTexturize.php and tests/phpunit/tests/formatting/wpHtmlSplit.php. If there's anything I've missed, please let me know.

Trac ticket: https://core.trac.wordpress.org/ticket/43457
Trac ticket: https://core.trac.wordpress.org/ticket/45387
Trac ticket: https://core.trac.wordpress.org/ticket/57381

co6x0 commented on PR #5697:


6 days ago
#2

Added commit.
Removed tranformation of & to &#038; in HTML attribute values modified by <https://core.trac.wordpress.org/ticket/35008>.

This ticket seems to have been created because the W3C HTML Validator found it to be invalid HTML, but as of now, the & in the URL is valid.

co6x0 commented on PR #5697:


6 days ago
#3

Note: See TracTickets for help on using tickets.