Make WordPress Core

Opened 7 years ago

Last modified 3 months ago

#43457 new defect (bug)

`wp_html_split` valid HTML attributes issues

Reported by: soulseekah's profile soulseekah Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version:
Component: Shortcodes Keywords: has-patch has-unit-tests
Focuses: Cc:

Description

There are a handful of valid HTML attributes that shatter wp_html_split.

Since it works by looking for the < character we can break it in many ways, starting from:

https://mathiasbynens.be/demo/crazy-class
https://mathiasbynens.be/demo/html5-id

And ending in the less exotic and crazy:

<span data-content="<p>abcd</p>">loading...</span>

Same goes for CSS attribute selectors in <style> tags.

Related #43456, #39153, #40191

Attachments (1)

43457.tests.diff (943 bytes) - added by soulseekah 7 years ago.

Download all attachments as: .zip

Change History (6)

This ticket was mentioned in PR #5697 on WordPress/wordpress-develop by co6x0.


10 months ago
#1

  • Keywords has-patch has-unit-tests added

Ensures valid HTML is worked correctly by wptexturize(), wp_html_split(), etc.
I started working on this PR when I noticed that using TailwindCSS child selectors would break the layout of block theme (also reported in Trac ticket: 57381).

I have identified a problem with the regular expression defined in _get_wptexturize_split_regex() used in wptexturize().
This problem seemed to be affecting get_the_block_template_html() and causing the block theme layout collapse described above.
Changing this regex fixes the layout issue.

Also, wp_html_split() uses almost the same regex.
Other trac tickets caused by this function will also be fixed by updating to a similar regex.

According to the HTML reference at html.spec.whatwg.org, attribute values can contain a variety of characters.
With this in mind, I have modified the regex to exclude matching characters within quotation marks.
This fixes the misplacement of GREATER-THAN SIGN(>) and prevents other valid HTML structures from being mishandled.

I've included tests to cover these changes in tests/phpunit/tests/formatting/wpTexturize.php and tests/phpunit/tests/formatting/wpHtmlSplit.php. If there's anything I've missed, please let me know.

Trac ticket: https://core.trac.wordpress.org/ticket/43457
Trac ticket: https://core.trac.wordpress.org/ticket/45387
Trac ticket: https://core.trac.wordpress.org/ticket/57381

co6x0 commented on PR #5697:


10 months ago
#2

Added commit.
Removed tranformation of & to &#038; in HTML attribute values modified by <https://core.trac.wordpress.org/ticket/35008>.

This ticket seems to have been created because the W3C HTML Validator found it to be invalid HTML, but as of now, the & in the URL is valid.

@dmsnell commented on PR #5697:


4 months ago
#4

howdy! just wanted to stop by and mention that I've been exploring updating these same functions using the HTML API, which provides a full spec-compliant parse of the HTML stream.

You can find some rough notes on the broader roadmap

@co6x0 commented on PR #5697:


3 months ago
#5

@dmsnell
Thank you for letting me know.
Handling this with regular expressions has been challenging, so it would be wonderful if we could address it using the HTML API.

Please let me know if there's anything I can help with!
I will close this PR now.

Note: See TracTickets for help on using tickets.