#63694 (HTML Processing Improvements in 6.9) – WordPress Trac

This ticket was mentioned in PR #9248 on WordPress/wordpress-develop by @dmsnell.

9 months ago #1

Keywords has-patch added

Trac ticket: Core-63694

wp_kses_hair() is built around an impressive state machine for parsing the $attr of an HTML tag, that is, the span of text after the tag name and before the closing >. Unfortunately, that parsing code doesn’t fully-implement the HTML specification and may be prone to mis-parsing.

This patch replaces the existing state machine with a straight-forward use of the HTML API to parse the attributes for us, constructing a shell take for the $attr string and reading the attributes structurally. This shell is necessary because a previous stage of the pipeline has already separated what it thinks is the so-called “attribute list” from a tag.

This ticket was mentioned in PR #9264 on WordPress/wordpress-develop by @dmsnell.

9 months ago #2

Keywords has-unit-tests added

Trac ticket: Core-63694

Prep work for #9248.

This ticket was mentioned in PR #9259 on WordPress/wordpress-develop by @dmsnell.

9 months ago #3

Trac ticket: Core-63694

Prep work for #9248.

This ticket was mentioned in PR #9258 on WordPress/wordpress-develop by @dmsnell.

9 months ago #4

Trac ticket: Core-63694

Prep work for #9248.

This ticket was mentioned in PR #9257 on WordPress/wordpress-develop by @dmsnell.

9 months ago #5

Trac ticket: Core-63694

Prep work for #9248

This ticket was mentioned in PR #9255 on WordPress/wordpress-develop by @dmsnell.

9 months ago #6

Trac ticket: Core-63694

Prep work for #9252.
Part of #9248.

This ticket was mentioned in PR #9270 on WordPress/wordpress-develop by @dmsnell.

9 months ago #7

Trac ticket: Core-63694

This probably improves the performance in terms of both CPU time and memory compared to the old PCRE-based approach.

This ticket was mentioned in PR #9271 on WordPress/wordpress-develop by @dmsnell.

8 months ago #8

Trac ticket: Core-63694

This ticket was mentioned in PR #9272 on WordPress/wordpress-develop by @dmsnell.

8 months ago #9

Trac ticket: Core-63694

This also decodes the URL whereas the previous code didn’t, so strings like http:// will be properly decoded as http://.

This ticket was mentioned in Slack in #core by benjamin_zekavica. View the logs.

8 months ago

#11 @dmsnell
8 months ago

In 60485:

HTML API: Use assertEqualHTML() in wp_rel_ugc() tests.

In some tests, the expected output was updated to its pure HTML state, removing the wrapping call to wp_slash(). Instead, stripslashes() has been applied to the output of the code under function. This leaves more readable test failures.

Developed in https://github.com/WordPress/wordpress-develop/pull/9255
Discussed in https://core.trac.wordpress.org/ticket/63694

Props dmsnell, jonsurrell.
See #63694.

@dmsnell commented on PR #9255:

8 months ago #12

Merged in https://github.com/WordPress/wordpress-develop/commit/6f60f80be0941091b4f9b9e6d2e344797872703d

#13 @dmsnell
8 months ago

In 60486:

HTML API: Use assertEqualHTML() in wp_kses() tests.

Developed in https://github.com/WordPress/wordpress-develop/pull/9257
Discussed in https://core.trac.wordpress.org/ticket/63694

Props dmsnell, jonsurrell.
See #63694.

@dmsnell commented on PR #9257:

8 months ago #14

Merged in https://github.com/WordPress/wordpress-develop/commit/737c2a2685b099a8a309848bfaff618849dbbe9a

#15 @dmsnell
8 months ago

In 60487:

HTML API: Use assertEqualHTML() in post filtering tests.

Developed in https://github.com/WordPress/wordpress-develop/pull/9258
Discussed in https://core.trac.wordpress.org/ticket/63694

Props dmsnell, jonsurrell.
See #63694.

@dmsnell commented on PR #9258:

8 months ago #16

Merged in https://github.com/WordPress/wordpress-develop/commit/f99471e5310f1c68d8486d79fe117ecf99f6a696

#17 @jonsurrell
8 months ago

GB50050 may be interesting candidate for improvement. It seems related to these efforts.

@jonsurrell commented on PR #9270:

8 months ago #18

I believe this would fix https://core.trac.wordpress.org/ticket/45387.

#19 @nerrad
7 months ago

Owner set to nerrad
Resolution set to fixed
Status changed from new to closed

In 60665:

HTML API: Reliably parse HTML in get_url_in_content()

As part of a larger effort in #63694, this utlizes WP_HTML_Tag_Processor instead of regex to parse the string passed into get_url_in_content.

As a benefit this also decodes the URL whereas the previous code didn’t, so strings like http:// will be properly decoded as http://.

Developed in: https://github.com/WordPress/wordpress-develop/pull/9272
Discussed in: https://core.trac.wordpress.org/ticket/63694

Props dmsnell, jonsurrell, nerrad.
Fixes #63694.

@dmsnell commented on PR #9272:

7 months ago #20

Merged in [60665]

#21 @TobiasBg
7 months ago

Resolution fixed deleted
Status changed from closed to reopened

As there's an is_string() check already, the ! empty( $href ) should be changed to '' !== $href.

#22 @TobiasBg
7 months ago

Pinging @nerrad to make sure this doesn't get lost. Thanks!

This ticket was mentioned in PR #9809 on WordPress/wordpress-develop by @TobiasBg.

7 months ago #23

As there's an is_string() check already, the ! empty( $href ) can be simplified to a string comparison, as other variable types that are checked in empty() won't appear.

empty() also returns false for the string "0" which would however be a valid (relative) URL and thus should be detectable by the function.

Trac ticket: https://core.trac.wordpress.org/ticket/63694#comment:21

#24 @SergeyBiryukov
7 months ago

Resolution set to fixed
Status changed from reopened to closed

In 60726:

Coding Standards: Simplify a conditional in get_url_in_content().

As there's an is_string() check already, ! empty( $href ) can be simplified to a string comparison, as the other variable types that are checked in empty() won't appear.

empty() also returns false for the string "0" which would however be a valid (relative) URL and thus should be detectable by the function.

Follow-up to [60665].

Props TobiasBg.
Fixes #63694.

@dmsnell commented on PR #9271:

7 months ago #25

Rude, @github-actions. Learn some manners.

#26 @dmsnell
7 months ago

Resolution fixed deleted
Status changed from closed to reopened

@dmsnell commented on PR #9270:

7 months ago #27

@github-actions why don’t I come in and mess with all of your work unsolicited, huh?

@dmsnell commented on PR #9259:

7 months ago #28

@github-actions: You are older than two weeks, therefore you have been deemed obsolete. Closing your bot.

@dmsnell commented on PR #9264:

7 months ago #29

If I had a dime, @github-actions, for every time you barged in and interrupted my work, I wouldn’t need to work.

Oh if only you gave some indication of where to go to restrain your over-zealous ideology, forcing your mindset arbitrarily on those around you, oh what a bug report or patch I would love to provide. But no, you are faceless, left only to deny and reject and delete. You are @github-actions-I-will-destroy bot, born to raze and raised to burn.

#30 @dmsnell
7 months ago

#63810 was marked as a duplicate.

This ticket was mentioned in PR #9850 on WordPress/wordpress-develop by @dmsnell.

7 months ago #31

Trac ticket: Core-63694
See: #9270,

This ticket was mentioned in PR #9851 on WordPress/wordpress-develop by @dmsnell.

7 months ago #32

Trac ticket: Core-63694
Replaces #6651
See: #9270, #9850

This ticket was mentioned in Slack in #core by welcher. View the logs.

6 months ago

This ticket was mentioned in PR #10043 on WordPress/wordpress-develop by @dmsnell.

6 months ago #34

Trac ticket: Core-63694.

This patch introduces a new CSS helper module containing a new function, wp_split_class_names(). This function wraps some code to rely on the HTML API to take an HTML class attribute value and return a Generator to iterate over the classes in that value.

Many existing functions perform ad-hoc parsing of CSS class names, usually by splitting on a space character. However, there are issues with this approach:

There is no decoding of HTML character references, which is normative inside HTML attributes.
There is no handling of null bytes.
Class names can be split by more than just the space character.
There is no handling of duplicates, and while mostly benign, code forgetting to account for duplicates can lead to defects.

The new function handles the nuances to let developers focus on reading CSS class names, adding new class names, and removing class names. This serves a middleground between legacy code interacting with CSS class names in isolation and code processing full HTML documents.

@westonruter commented on PR #10043:

6 months ago #35

The name isn’t great.

What about wp_parse_css_class_names()? I think this would be more clear. Mentioning “css” makes it clear you're not talking about PHP class names somehow. And “parse” implies it's not as simple as just splitting on whitespace tokens.

@westonruter commented on PR #10043:

6 months ago #36

Should it be more useful to people wanting to conditionally add class names? Something more akin to classnames() in JS? We could pass varargs which are string|false or an array of additional class names to add.

Seems cool, but do we have any use cases for this in core PHP? It would be nice to include some example implementations in the core codebase for this function to actually leverage it.

@dmsnell commented on PR #10043:

6 months ago #37

What about wp_parse_css_class_names()?

I like this, though I still like split since it communicates the intent. parse here feels like it communicates more than it performs. I am changing it to wp_split_css_class_list() — maybe something like wp_explode_css_class_names() would also work, at the cost of getting long.

Would love to continue stewing on the name. Overly-short, overly-long, it’s hard to find one that’s just right.

@dmsnell commented on PR #10043:

6 months ago #38

I’ve turned this into a static method on the Tag Processor, but I instantly don’t like it because it lost the nuance of decoding HTML character references.

This is a conundrum, however, because existing code mixes decoded and non-decoded class names. For example, code will read the class attribute on an HTML string, but then add new raw class names to a list. While it’s unlikely that someone adds a class whose name _should_ be &, if they do so, there’Í a discrepancy between the existing classes and this new one — what should be escaped or unescaped?

---

I may revert the last commit. While it’s helpful that this function properly splits and deduplicates that class names, decoding the HTML character references was an important piece as well, and I think that’s a bit harder to merge into the Tag Processor’s interface.

@dmsnell commented on PR #10043:

6 months ago #39

@westonruter I tossed out some refactors in #10215. They highlight two things to me:

there needs to be more clarity around whether the inputs are HTML escaped or not.
the functions should return an array and not an iterator.

It also leads me to feel like having a new separate function is best and exporting the internals of the HTML API is a mistake. Perhaps there is room for two new functions:

wp_parse_html_class_attribute()
wp_split_decoded_class_list()

Something like this to more clearly communicate whether things like null bytes and character references shall be transformed or whether it’s assumed that the class names are the “raw” and unescaped class names build within source code.

This ticket was mentioned in PR #10218 on WordPress/wordpress-develop by @dmsnell.

6 months ago #40

Trac ticket: Core-63694.
See wordpress/gutenberg#72264.

For classic themes, image blocks need to create a DIV wrapper which contains alignment classes from the inner FIGURE. This has been processed using PCRE matching.

With this change the HTML API is used instead of PCRE functions to provide more semantic transformation, clearer intent, and eliminate possible parsing issues.

@dmsnell commented on PR #10218:

5 months ago #41

@tellthemachines I updated this patch, it still had the wrong negation in it that you found in the Gutenberg side. Here is the diff of the diffs between this patch and the one applied in Gutenberg.

--- /var/folders/lv/12zyh9p565q7mmycrw6zqkvw0000gn/T//.psub.C4OXyR	2025-10-17 15:55:08
+++ /var/folders/lv/12zyh9p565q7mmycrw6zqkvw0000gn/T//.psub.zmgqZj	2025-10-17 15:55:09
@@ -1,11 +1,19 @@
-diff --git a/src/wp-includes/block-supports/layout.php b/src/wp-includes/block-supports/layout.php
-index 454eea3c80..63eb384e77 100644
---- a/src/wp-includes/block-supports/layout.php
-+++ b/src/wp-includes/block-supports/layout.php
-@@ -1074,50 +1074,53 @@ add_filter( 'render_block_core/group', 'wp_restore_group_inner_container', 10, 2
+diff --git a/lib/block-supports/layout.php b/lib/block-supports/layout.php
+index bc6da575724..667c7b5c614 100644
+--- a/lib/block-supports/layout.php
++++ b/lib/block-supports/layout.php
+@@ -1113,7 +1113,6 @@ if ( function_exists( 'wp_restore_group_inner_container' ) ) {
+ }
+ add_filter( 'render_block_core/group', 'gutenberg_restore_group_inner_container', 10, 2 );
+
+-
+ /**
+  * For themes without theme.json file, make sure
+  * to restore the outer div for the aligned image block
+@@ -1124,50 +1123,53 @@ add_filter( 'render_block_core/group', 'gutenberg_restore_group_inner_container'
   * @return string Filtered block content.
   */
- function wp_restore_image_outer_container( $block_content, $block ) {
+ function gutenberg_restore_image_outer_container( $block_content, $block ) {
 -	$image_with_align = "
 -/# 1) everything up to the class attribute contents
 -(
@@ -90,4 +98,4 @@
 +	return "{$wrapper_processor->get_updated_html()}{$figure_processor->get_updated_html()}</div>";
  }

- add_filter( 'render_block_core/image', 'wp_restore_image_outer_container', 10, 2 );
+ if ( function_exists( 'wp_restore_image_outer_container' ) ) {

I’m going to merge this, based on a high confidence that the changes are identical now. But we might want to confirm during the Beta phase that I did this right 😄

#42 @dmsnell
5 months ago

In 60968:

HTML API: Backport from Gutenberg of layout image container refactor.

For classic themes, image blocks need to create a DIV wrapper which contains alignment classes from the inner FIGURE. This has been processed using PCRE matching.

With this change the HTML API is used instead of PCRE functions to provide more semantic transformation, clearer intent, and to eliminate possible parsing issues.

Developed in https://github.com/WordPress/wordpress-develop/pull/10218
Discussed in https://core.trac.wordpress.org/ticket/63694

Gutenberg patch in https://github.com/WordPress/gutenberg/pull/72264

Props dmsnell, isabel_brison.
See #63694.

@dmsnell commented on PR #10218:

5 months ago #43

Merged in https://github.com/wordpress/wordpress-develop/commit/e37c4b5b7f1e68675806fbc414f8ab285c99962e
[60968]

#44 @dmsnell
5 months ago

In 60971:

HTML API: Rely on assertEqualHTML in media tests.

As part of ongoing work to improve the reliability of HTML parsing code in WordPress, this patch replaces strict string-equality tests with semantic tests using assertEqualHTML() to more direct assert intended behaviors.

Developed in https://github.com/WordPress/wordpress-develop/pull/9264
Discussed in https://core.trac.wordpress.org/ticket/63694

Props dmsnell, jonsurrell.
See #63694

@dmsnell commented on PR #9264:

5 months ago #45

Merged in https://github.com/wordpress/wordpress-develop/commit/86546fd68fe20c2b93f2193428f13c2a6627726f
[60971]

#46 @dmsnell
5 months ago

In 60972:

HTML API: Rely on assertEqualHTML in oEmbed filtering tests.

As part of ongoing work to improve the reliability of HTML parsing code in WordPress, this patch replaces the use of PCRE matches in oEmbed filtering tests with semantic assertions via the HTML API and assertEqualHTML().

Developed in https://github.com/WordPress/wordpress-develop/pull/9259
Discussed in https://core.trac.wordpress.org/ticket/63694

Props dmsnell, jonsurrell.
See #63694

@dmsnell commented on PR #9259:

5 months ago #47

Merged in https://github.com/wordpress/wordpress-develop/commit/b84fc32f9b30149db715560119f37795d508dc27
[60972]

#48 @dmsnell
5 months ago

In 60974:

HTML API: Rely on assertEqualHTML in wp_rel_nofollow() tests.

As part of ongoing work to improve the reliability of HTML parsing code in WordPress, this patch replaces strict string-equality checks with semantic checks via assertEqualHTML() for more-direct assertions.

Developed in https://github.com/WordPress/wordpress-develop/pull/9251
Discussed in https://core.trac.wordpress.org/ticket/63694

Props dmsnell, jonsurrell.
See #63694

#49 @wildworks
5 months ago

The 6.9 Beta1 release is coming soon and I would like to know the status of this ticket. Should we close this ticket?

#50 @wildworks
5 months ago

Resolution set to fixed
Status changed from reopened to closed

As the Beta1 release begins, I will close this ticket. If there are any other issues that need to be addressed, please leave a comment.

This ticket was mentioned in Slack in #core by wildworks. View the logs.

5 months ago

This ticket was mentioned in Slack in #core by desrosj. View the logs.

5 months ago

@jonsurrell commented on PR #9248:

3 months ago #53

This function had quirks that change with this PR and I want to understand them.

I created a test suite for wp_kses_hair(), then I merged this branch and updated to get a diff of test changes. I also looked at several of the most popular results from WP Directory to understand usage.

My review of the most common usages on suggest that _this change is safe to make and would not negatively impact plugin authors_.

Historically the value and whole properties of the returned array indicate the raw parsed bytes from the HTML (with some exceptions). This means that HTML character references are not decoded. This represents an abstraction leak between the HTML and structural return value.
- Should this refactor leave the messy return values in place or should it decode the attribute values to enforce the view of the world developers are imagining when calling it? (that all values are normal PHP strings and not HTML text node strings)?

This is a tricky question. It doesn't _seem_ like folks rely on specifics of the input representation being present in the output, however it's certainly possible.

In one of the examples from plugins, esc_attr() is called on the attribute value to construct a new HTML string. This should be perfectly fine because the original HTML was re-encoded in this PR and esc_attr() will avoid double-encoding. They also statically wrap with ", which made the esc_attr() necessary because the attribute value could have contained "!

After some reflection, I believe the behavior you've implemented here _is a good decision_. Consider that the input is HTML and the output (value and whole) have always been some form of HTML. The difference here is a _normalization_ of the HTML in the output.

---

<summary>behavior diff</summary>

tests/phpunit/tests/kses/wpKsesHair.php

diff --git a/tests/phpunit/tests/kses/wpKsesHair.php b/tests/phpunit/tests/kses/wpKsesHair.php
index 2ed83679f2e3d..05d573bc070bc 100644

  public function data_attribute_parsing() {
                                 'title' => array(
                                         'name'  => 'title',
                                         'value' => 'My Title',
                                         'whole' => "title='My Title'",
+                                        'whole' => 'title="My Title"',
                                         'vless' => 'n',
                                 ),
                         ),
-…
+ public function data_attribute_parsing() {
                         array(
                                 'title' => array(
                                         'name'  => 'title',
                                         'value' => '&#60;test&#62;',
                                         'whole' => 'title="&#60;test&#62;"',
+                                        'value' => '&lt;test&gt;',
+                                        'whole' => 'title="&lt;test&gt;"',
                                         'vless' => 'n',
                                 ),
                         ),
-…
+ public function data_attribute_parsing() {
                         array(
                                 'title' => array(
                                         'name'  => 'title',
                                         'value' => '&#x3C;hex&#x3E;',
                                         'whole' => 'title="&#x3C;hex&#x3E;"',
+                                        'value' => '&lt;hex&gt;',
+                                        'whole' => 'title="&lt;hex&gt;"',
                                         'vless' => 'n',
                                 ),
                         ),
-…
+ public function data_attribute_parsing() {
                         array(
                                 'title' => array(
                                         'name'  => 'title',
                                         'value' => '&#X3C;HEX&#X3E;',
                                         'whole' => 'title="&#X3C;HEX&#X3E;"',
+                                        'value' => '&lt;HEX&gt;',
+                                        'whole' => 'title="&lt;HEX&gt;"',
                                         'vless' => 'n',
                                 ),
                         ),
-…
+ public function data_attribute_parsing() {
                         array(
                                 'title' => array(
                                         'name'  => 'title',
                                         'value' => '&invalid; &#; &#x;',
                                         'whole' => 'title="&invalid; &#; &#x;"',
+                                        'value' => '&amp;invalid; &amp;#; &amp;#x;',
+                                        'whole' => 'title="&amp;invalid; &amp;#; &amp;#x;"',
                                         'vless' => 'n',
                                 ),
                         ),
-…
+ public function data_attribute_parsing() {
                                 'data-text' => array(
                                         'name'  => 'data-text',
                                         'value' => 'Single quoted value',
                                         'whole' => "data-text='Single quoted value'",
+                                        'whole' => 'data-text="Single quoted value"',
                                         'vless' => 'n',
                                 ),
                         ),
-…
+ public function data_attribute_parsing() {
                                 'alt'   => array(
                                         'name'  => 'alt',
                                         'value' => 'single',
                                         'whole' => "alt='single'",
+                                        'whole' => 'alt="single"',
                                         'vless' => 'n',
                                 ),
                                 'id'    => array(
-…
+ public function data_attribute_parsing() {
                         array(
                                 'title' => array(
                                         'name'  => 'title',
                                         'value' => "It's working",
                                         'whole' => 'title="It\'s working"',
+                                        'value' => 'It&apos;s working',
+                                        'whole' => 'title="It&apos;s working"',
                                         'vless' => 'n',
                                 ),
                         ),
-…
+ public function data_attribute_parsing() {
                         array(
                                 'title' => array(
                                         'name'  => 'title',
                                         'value' => 'He said "hello"',
                                         'whole' => 'title=\'He said "hello"\'',
+                                        'value' => 'He said &quot;hello&quot;',
+                                        'whole' => 'title="He said &quot;hello&quot;"',
                                         'vless' => 'n',
                                 ),
                         ),
-…
+ public function data_attribute_parsing() {
                 yield 'invalid attribute name starting with number' => array(
                         '1invalid="value"',
+                        array(),
+                        array(
+                                '1invalid' => array(
+                                        'name'  => '1invalid',
+                                        'value' => 'value',
+                                        'whole' => '1invalid="value"',
+                                        'vless' => 'n',
+                                ),
+                        ),
                 );
                 yield 'invalid attribute name special chars' => array(
                         '@invalid="value" $bad="value"',
+                        array(),
+                        array(
+                                '@invalid' => array(
+                                        'name'  => '@invalid',
+                                        'value' => 'value',
+                                        'whole' => '@invalid="value"',
+                                        'vless' => 'n',
+                                ),
+                                '$bad'     => array(
+                                        'name'  => '$bad',
+                                        'value' => 'value',
+                                        'whole' => '$bad="value"',
+                                        'vless' => 'n',
+                                ),
+                        ),
                 );
                 yield 'duplicate attributes first wins' => array(
-…
+ public function data_attribute_parsing() {
                 yield 'malformed unclosed double quote' => array(
                         'title="unclosed class="test"',
+                        array(),
+                        array(
+                                'title' => array(
+                                        'name'  => 'title',
+                                        'value' => 'unclosed class=',
+                                        'whole' => 'title="unclosed class="',
+                                        'vless' => 'n',
+                                ),
+                                'test"' => array(
+                                        'name'  => 'test"',
+                                        'value' => '',
+                                        'whole' => 'test"',
+                                        'vless' => 'y',
+                                ),
+                        ),
                 );
                 yield 'very long attribute value' => array(
-…
+ public function data_attribute_parsing() {
                                 'alt'   => array(
                                         'name'  => 'alt',
                                         'value' => '',
                                         'whole' => "alt=''",
+                                        'whole' => 'alt=""',
                                         'vless' => 'n',
                                 ),
                                 'class' => array(
-…
+ public function data_attribute_parsing() {
                 yield 'forward slashes between attributes' => array(
                         'att / att2=2 /// att3="3"',
                         array(
                                 'att'   => array(
+                                'att'  => array(
                                         'name'  => 'att',
                                         'value' => '',
                                         'whole' => 'att',
-…
+ public function data_attribute_parsing() {
                                 'att'  => array(
                                         'name'  => 'att',
                                         'value' => 'val',
                                         'whole' => "att='val'",
+                                        'whole' => 'att="val"',
                                         'vless' => 'n',
                                 ),
                                 'att2' => array(
                                         'name'  => 'att2',
                                         'value' => 'val2',
                                         'whole' => "att2='val2'",
+                                        'whole' => 'att2="val2"',
                                         'vless' => 'n',
                                 ),
                         ),
-…
+ public function data_attribute_parsing() {
                                 'att'  => array(
                                         'name'  => 'att',
                                         'value' => 'val',
                                         'whole' => "att='val'",
+                                        'whole' => 'att="val"',
                                         'vless' => 'n',
                                 ),
                                 'att2' => array(
                                         'name'  => 'att2',
                                         'value' => 'val2',
                                         'whole' => "att2='val2'",
+                                        'whole' => 'att2="val2"',
                                         'vless' => 'n',
                                 ),
                         ),
-…
+ public function data_attribute_parsing() {
                                 'att'  => array(
                                         'name'  => 'att',
                                         'value' => 'val',
                                         'whole' => "att='val'",
+                                        'whole' => 'att="val"',
                                         'vless' => 'n',
                                 ),
                                 'att2' => array(
                                         'name'  => 'att2',
                                         'value' => 'val2',
                                         'whole' => "att2='val2'",
+                                        'whole' => 'att2="val2"',
                                         'vless' => 'n',
                                 ),
                         ),
-…
+ public function data_attribute_parsing() {
                                 'att'  => array(
                                         'name'  => 'att',
                                         'value' => 'val',
                                         'whole' => "att='val'",
+                                        'whole' => 'att="val"',
                                         'vless' => 'n',
                                 ),
                                 'att2' => array(
                                         'name'  => 'att2',
                                         'value' => 'val2',
                                         'whole' => "att2='val2'",
+                                        'whole' => 'att2="val2"',
                                         'vless' => 'n',
                                 ),
                         ),
-…
+ public function data_attribute_parsing() {
                 // Malformed Equals Patterns.
                 yield 'multiple equals signs' => array(
                         'att=="val"',
+                        array(),
+                        array(
+                                'att' => array(
+                                        'name'  => 'att',
+                                        'value' => '=&quot;val&quot;',
+                                        'whole' => 'att="=&quot;val&quot;"',
+                                        'vless' => 'n',
+                                ),
+                        ),
                 );
                 yield 'equals with strange spacing' => array(
                         'att= ="val"',
+                        array(),
+                        array(
+                                'att' => array(
+                                        'name'  => 'att',
+                                        'value' => '=&quot;val&quot;',
+                                        'whole' => 'att="=&quot;val&quot;"',
+                                        'vless' => 'n',
+                                ),
+                        ),
                 );
                 yield 'triple equals signs' => array(
                         'att==="val"',
+                        array(),
+                        array(
+                                'att' => array(
+                                        'name'  => 'att',
+                                        'value' => '==&quot;val&quot;',
+                                        'whole' => 'att="==&quot;val&quot;"',
+                                        'vless' => 'n',
+                                ),
+                        ),
                 );
                 yield 'equals echo pattern' => array(
                         "att==echo 'something'",
                         array(
                                 'att' => array(
+                                'att'         => array(
                                         'name'  => 'att',
                                         'value' => '=echo',
                                         'whole' => 'att="=echo"',
                                         'vless' => 'n',
                                 ),
+                                "'something'" => array(
+                                        'name'  => "'something'",
+                                        'value' => '',
+                                        'whole' => "'something'",
+                                        'vless' => 'y',
+                                ),
                         ),
                 );
                 yield 'attribute starting with equals' => array(
                         '= bool k=v',
                         array(
+                                '='    => array(
+                                        'name'  => '=',
+                                        'value' => '',
+                                        'whole' => '=',
+                                        'vless' => 'y',
+                                ),
                                 'bool' => array(
                                         'name'  => 'bool',
                                         'value' => '',
-…
+ public function data_attribute_parsing() {
                 yield 'mixed quotes and equals chaos' => array(
                         'k=v ="' . "' j=w",
                         array(
                                 'k' => array(
+                                'k'        => array(
                                         'name'  => 'k',
                                         'value' => 'v',
                                         'whole' => 'k="v"',
                                         'vless' => 'n',
                                 ),
+                                '="' . "'" => array(
+                                        'name'  => '="' . "'",
+                                        'value' => '',
+                                        'whole' => '="' . "'",
+                                        'vless' => 'y',
+                                ),
+                                'j'        => array(
+                                        'name'  => 'j',
+                                        'value' => 'w',
+                                        'whole' => 'j="w"',
+                                        'vless' => 'n',
+                                ),
                         ),
                 );
                 yield 'triple equals quoted whitespace' => array(
                         '==="  "',
+                        array(),
+                        array(
+                                '=' => array(
+                                        'name'  => '=',
+                                        'value' => '=&quot;',
+                                        'whole' => '=="=&quot;"',
+                                        'vless' => 'n',
+                                ),
+                                '"' => array(
+                                        'name'  => '"',
+                                        'value' => '',
+                                        'whole' => '"',
+                                        'vless' => 'y',
+                                ),
+                        ),
                 );
                 yield 'boolean with contradictory value' => array(
-…
+ public function data_attribute_parsing() {
                 yield 'empty attribute name with value' => array(
                         '="value" class="test"',
                         array(
+                                'class' => array(
+                                '="value"' => array(
+                                        'name'  => '="value"',
+                                        'value' => '',
+                                        'whole' => '="value"',
+                                        'vless' => 'y',
+                                ),
+                                'class'    => array(
                                         'name'  => 'class',
                                         'value' => 'test',
                                         'whole' => 'class="test"',
-…
+ public function data_protocol_filtering() {
                                 'href' => array(
                                         'name'  => 'href',
                                         'value' => 'alert(1)',
                                         'whole' => "href='alert(1)'",
+                                        'whole' => 'href="alert(1)"',
                                         'vless' => 'n',
                                 ),
                         ),
-…
+ public function data_protocol_filtering() {
                         array(
                                 'src' => array(
                                         'name'  => 'src',
                                         'value' => 'text/html,<script>alert(1)</script>',
                                         'whole' => 'src="text/html,<script>alert(1)</script>"',
+                                        'value' => 'text/html,&lt;script&gt;alert(1)&lt;/script&gt;',
+                                        'whole' => 'src="text/html,&lt;script&gt;alert(1)&lt;/script&gt;"',
                                         'vless' => 'n',
                                 ),
                         ),

</details>

Here are two examples from the most most popular plugins in the WP Directory search:

From YITH (this appears to be part of the yith library used in many of their plugins):

/**
         * Transform attributes array to HTML attributes string.
         * If using a string, the attributes will be escaped.
         * Prefer using arrays.
         *
         * @param array|string $attributes The attributes.
         * @param bool         $echo       Set to true to print it directly; false otherwise.
         *
         * @return string
         * @since 3.7.0
         * @since 3.8.0 Escaping attributes when using strings; allow value-less attributes by setting value to null.
         */
        function yith_plugin_fw_html_attributes_to_string( $attributes = array(), $echo = false ) {
                $html_attributes = '';


                if ( ! ! $attributes ) {
                        if ( is_string( $attributes ) ) {
                                $parsed_attrs = wp_kses_hair( $attributes, wp_allowed_protocols() );
                                $attributes   = array();
                                foreach ( $parsed_attrs as $attr ) {
                                        $attributes[ $attr['name'] ] = 'n' === $attr['vless'] ? $attr['value'] : null;
                                }
                        }


                        if ( is_array( $attributes ) ) {
                                $html_attributes = array();
                                foreach ( $attributes as $key => $value ) {
                                        if ( ! is_null( $value ) ) {
                                                $html_attributes[] = esc_attr( $key ) . '="' . esc_attr( $value ) . '"';
                                        } else {
                                                $html_attributes[] = esc_attr( $key );
                                        }
                                }
                                $html_attributes = implode( ' ', $html_attributes );
                        }
                }


                if ( $echo ) {
                        // Already escaped above.
                        echo $html_attributes; // phpcs:ignore WordPress.Security.EscapeOutput.OutputNotEscaped
                }


                return $html_attributes;
        }

And Jetpack:

$params = wp_kses_hair( $params, array( 'http' ) );


                                $width  = isset( $params['width'] ) ? (int) $params['width']['value'] : 0;
                                $height = isset( $params['height'] ) ? (int) $params['height']['value'] : 0;
                                $wh     = '';


                                if ( $width && $height ) {
                                        $wh = "&w=$width&h=$height";
                                }


                                $url = esc_url_raw( "https://www.youtube.com/watch?v={$match[3]}{$wh}" );

Reported by:	dmsnell	Owned by:	nerrad
Milestone:	6.9	Priority:	normal
Severity:	normal	Version:	6.9
Component:	HTML API	Keywords:	has-patch has-unit-tests
Focuses:		Cc:

Make WordPress Core

Context Navigation

#63694 closed enhancement (fixed)

HTML Processing Improvements in 6.9

Description

Change History (53)

This ticket was mentioned in ​PR #9248 on ​WordPress/wordpress-develop by ​@dmsnell. 9 months ago #1

This ticket was mentioned in ​PR #9264 on ​WordPress/wordpress-develop by ​@dmsnell. 9 months ago #2

This ticket was mentioned in ​PR #9259 on ​WordPress/wordpress-develop by ​@dmsnell. 9 months ago #3

This ticket was mentioned in ​PR #9258 on ​WordPress/wordpress-develop by ​@dmsnell. 9 months ago #4

This ticket was mentioned in ​PR #9257 on ​WordPress/wordpress-develop by ​@dmsnell. 9 months ago #5

This ticket was mentioned in ​PR #9255 on ​WordPress/wordpress-develop by ​@dmsnell. 9 months ago #6

This ticket was mentioned in ​PR #9270 on ​WordPress/wordpress-develop by ​@dmsnell. 9 months ago #7

This ticket was mentioned in ​PR #9271 on ​WordPress/wordpress-develop by ​@dmsnell. 8 months ago #8

This ticket was mentioned in ​PR #9272 on ​WordPress/wordpress-develop by ​@dmsnell. 8 months ago #9

This ticket was mentioned in ​Slack in #core by benjamin_zekavica. ​View the logs. 8 months ago

#11 @dmsnell 8 months ago

​@dmsnell commented on ​PR #9255: 8 months ago #12

#13 @dmsnell 8 months ago

​@dmsnell commented on ​PR #9257: 8 months ago #14

#15 @dmsnell 8 months ago

​@dmsnell commented on ​PR #9258: 8 months ago #16

#17 @jonsurrell 8 months ago

​@jonsurrell commented on ​PR #9270: 8 months ago #18

#19 @nerrad 7 months ago

​@dmsnell commented on ​PR #9272: 7 months ago #20

#21 @TobiasBg 7 months ago

#22 @TobiasBg 7 months ago

This ticket was mentioned in ​PR #9809 on ​WordPress/wordpress-develop by ​@TobiasBg. 7 months ago #23

#24 @SergeyBiryukov 7 months ago

​@dmsnell commented on ​PR #9271: 7 months ago #25

#26 @dmsnell 7 months ago

​@dmsnell commented on ​PR #9270: 7 months ago #27

​@dmsnell commented on ​PR #9259: 7 months ago #28

​@dmsnell commented on ​PR #9264: 7 months ago #29

#30 @dmsnell 7 months ago

This ticket was mentioned in ​PR #9850 on ​WordPress/wordpress-develop by ​@dmsnell. 7 months ago #31

This ticket was mentioned in ​PR #9851 on ​WordPress/wordpress-develop by ​@dmsnell. 7 months ago #32

This ticket was mentioned in ​Slack in #core by welcher. ​View the logs. 6 months ago

This ticket was mentioned in ​PR #10043 on ​WordPress/wordpress-develop by ​@dmsnell. 6 months ago #34

​@westonruter commented on ​PR #10043: 6 months ago #35

​@westonruter commented on ​PR #10043: 6 months ago #36

​@dmsnell commented on ​PR #10043: 6 months ago #37

​@dmsnell commented on ​PR #10043: 6 months ago #38

​@dmsnell commented on ​PR #10043: 6 months ago #39

This ticket was mentioned in ​PR #10218 on ​WordPress/wordpress-develop by ​@dmsnell. 6 months ago #40

​@dmsnell commented on ​PR #10218: 5 months ago #41

#42 @dmsnell 5 months ago

​@dmsnell commented on ​PR #10218: 5 months ago #43

#44 @dmsnell 5 months ago

​@dmsnell commented on ​PR #9264: 5 months ago #45

#46 @dmsnell 5 months ago

​@dmsnell commented on ​PR #9259: 5 months ago #47

#48 @dmsnell 5 months ago

#49 @wildworks 5 months ago

#50 @wildworks 5 months ago

This ticket was mentioned in ​Slack in #core by wildworks. ​View the logs. 5 months ago

This ticket was mentioned in ​Slack in #core by desrosj. ​View the logs. 5 months ago

​@jonsurrell commented on ​PR #9248: 3 months ago #53

tests/phpunit/tests/kses/wpKsesHair.php

Download in other formats:

This ticket was mentioned in PR #9248 on WordPress/wordpress-develop by @dmsnell.

9 months ago #1

This ticket was mentioned in PR #9264 on WordPress/wordpress-develop by @dmsnell.

9 months ago #2

This ticket was mentioned in PR #9259 on WordPress/wordpress-develop by @dmsnell.

9 months ago #3

This ticket was mentioned in PR #9258 on WordPress/wordpress-develop by @dmsnell.

9 months ago #4

This ticket was mentioned in PR #9257 on WordPress/wordpress-develop by @dmsnell.

9 months ago #5

This ticket was mentioned in PR #9255 on WordPress/wordpress-develop by @dmsnell.

9 months ago #6

This ticket was mentioned in PR #9270 on WordPress/wordpress-develop by @dmsnell.

9 months ago #7

This ticket was mentioned in PR #9271 on WordPress/wordpress-develop by @dmsnell.

8 months ago #8

This ticket was mentioned in PR #9272 on WordPress/wordpress-develop by @dmsnell.

8 months ago #9

This ticket was mentioned in Slack in #core by benjamin_zekavica. View the logs.

8 months ago

#11 @dmsnell
8 months ago

@dmsnell commented on PR #9255:

8 months ago #12

#13 @dmsnell
8 months ago

@dmsnell commented on PR #9257:

8 months ago #14

#15 @dmsnell
8 months ago

@dmsnell commented on PR #9258:

8 months ago #16

#17 @jonsurrell
8 months ago

@jonsurrell commented on PR #9270:

8 months ago #18

#19 @nerrad
7 months ago

@dmsnell commented on PR #9272:

7 months ago #20

#21 @TobiasBg
7 months ago

#22 @TobiasBg
7 months ago

This ticket was mentioned in PR #9809 on WordPress/wordpress-develop by @TobiasBg.

7 months ago #23

#24 @SergeyBiryukov
7 months ago

@dmsnell commented on PR #9271:

7 months ago #25

#26 @dmsnell
7 months ago

@dmsnell commented on PR #9270:

7 months ago #27

@dmsnell commented on PR #9259:

7 months ago #28

@dmsnell commented on PR #9264:

7 months ago #29

#30 @dmsnell
7 months ago

This ticket was mentioned in PR #9850 on WordPress/wordpress-develop by @dmsnell.

7 months ago #31

This ticket was mentioned in PR #9851 on WordPress/wordpress-develop by @dmsnell.

7 months ago #32

This ticket was mentioned in Slack in #core by welcher. View the logs.

6 months ago

This ticket was mentioned in PR #10043 on WordPress/wordpress-develop by @dmsnell.

6 months ago #34

@westonruter commented on PR #10043:

6 months ago #35

@westonruter commented on PR #10043:

6 months ago #36

@dmsnell commented on PR #10043:

6 months ago #37

@dmsnell commented on PR #10043:

6 months ago #38

@dmsnell commented on PR #10043:

6 months ago #39

This ticket was mentioned in PR #10218 on WordPress/wordpress-develop by @dmsnell.

6 months ago #40

@dmsnell commented on PR #10218:

5 months ago #41

#42 @dmsnell
5 months ago

@dmsnell commented on PR #10218:

5 months ago #43

#44 @dmsnell
5 months ago

@dmsnell commented on PR #9264:

5 months ago #45

#46 @dmsnell
5 months ago

@dmsnell commented on PR #9259:

5 months ago #47

#48 @dmsnell
5 months ago

#49 @wildworks
5 months ago

#50 @wildworks
5 months ago

This ticket was mentioned in Slack in #core by wildworks. View the logs.

5 months ago

This ticket was mentioned in Slack in #core by desrosj. View the logs.

5 months ago

@jonsurrell commented on PR #9248:

3 months ago #53