Opened 7 years ago
Last modified 3 months ago
#43457 new defect (bug)
`wp_html_split` valid HTML attributes issues
Reported by: | soulseekah | Owned by: | |
---|---|---|---|
Milestone: | Awaiting Review | Priority: | normal |
Severity: | normal | Version: | |
Component: | Shortcodes | Keywords: | has-patch has-unit-tests |
Focuses: | Cc: |
Description
There are a handful of valid HTML attributes that shatter wp_html_split
.
Since it works by looking for the <
character we can break it in many ways, starting from:
https://mathiasbynens.be/demo/crazy-class
https://mathiasbynens.be/demo/html5-id
And ending in the less exotic and crazy:
<span data-content="<p>abcd</p>">loading...</span>
Same goes for CSS attribute selectors in <style> tags.
Attachments (1)
Change History (6)
This ticket was mentioned in PR #5697 on WordPress/wordpress-develop by co6x0.
10 months ago
#1
- Keywords has-patch has-unit-tests added
10 months ago
#2
Added commit.
Removed tranformation of &
to &
in HTML attribute values modified by <https://core.trac.wordpress.org/ticket/35008>.
This ticket seems to have been created because the W3C HTML Validator found it to be invalid HTML, but as of now, the &
in the URL is valid.
4 months ago
#4
howdy! just wanted to stop by and mention that I've been exploring updating these same functions using the HTML API, which provides a full spec-compliant parse of the HTML stream.
You can find some rough notes on the broader roadmap
Ensures valid HTML is worked correctly by wptexturize(), wp_html_split(), etc.
I started working on this PR when I noticed that using TailwindCSS child selectors would break the layout of block theme (also reported in
Trac ticket: 57381
).I have identified a problem with the regular expression defined in
_get_wptexturize_split_regex()
used inwptexturize()
.This problem seemed to be affecting get_the_block_template_html() and causing the block theme layout collapse described above.
Changing this regex fixes the layout issue.
Also,
wp_html_split()
uses almost the same regex.Other trac tickets caused by this function will also be fixed by updating to a similar regex.
According to the HTML reference at html.spec.whatwg.org, attribute values can contain a variety of characters.
With this in mind, I have modified the regex to exclude matching characters within quotation marks.
This fixes the misplacement of
GREATER-THAN SIGN(>)
and prevents other valid HTML structures from being mishandled.I've included tests to cover these changes in
tests/phpunit/tests/formatting/wpTexturize.php
andtests/phpunit/tests/formatting/wpHtmlSplit.php
. If there's anything I've missed, please let me know.Trac ticket: https://core.trac.wordpress.org/ticket/43457
Trac ticket: https://core.trac.wordpress.org/ticket/45387
Trac ticket: https://core.trac.wordpress.org/ticket/57381