#64340 closed defect (bug) (fixed)
HTML API may double-escape class names when adding repeatedly
| Reported by: |
|
Owned by: |
|
|---|---|---|---|
| Milestone: | 6.9.1 | Priority: | normal |
| Severity: | normal | Version: | 6.9 |
| Component: | HTML API | Keywords: | has-patch has-unit-tests dev-reviewed |
| Focuses: | Cc: |
Description (last modified by )
WP_HTML_Tag_Processor and WP_HTML_Processor may incorrectly encode class names containing the characters &, <, >, ", or ' when modifying them via class methods like ::add_class() and calling ::get_updated_html().
For example:
<?php $p = new WP_HTML_Tag_Processor('<div></div>'); $p->next_tag(); $p->add_class('&'); echo $p->get_updated_html() . "\n"; $p->add_class('OK'); echo $p->get_updated_html() . "\n";
Will print:
<div class="&"></div> <div class="&amp; OK"></div>
Notice that the first pass is correct, & has been correctly encoded in the class attribute as &. However, after calling ::add_class() and ::get_updated_html() again, the & has incorrectly been double-encoded as &amp;.
The same code in WordPress 6.8 would print:
<div class="&"></div> <div class="& OK"></div>
This is related to [60919] that was released in WordPress 6.9. The double-encoding behavior was present before, but it was "corrected" in this case by the use of esc_attr() that avoids any double-encoding. When esc_attr() usage was removed in [60919], the double-escaping behavior manifests causing this issue.
This was originally reported by GitHub user ktmn in Gutenberg issue 73713.
Change History (10)
This ticket was mentioned in PR #10591 on WordPress/wordpress-develop by @jonsurrell.
7 weeks ago
#1
- Keywords has-patch has-unit-tests added
@jonsurrell commented on PR #10591:
7 weeks ago
#3
@ktmn this is a proposed fix for https://github.com/WordPress/gutenberg/issues/73713.
#5
@
7 weeks ago
- Keywords dev-feedback added
- Resolution fixed deleted
- Status changed from closed to reopened
Reopening for backport of [61346] to the 6.9 branch for inclusion in 6.9.1.
#7
@
7 weeks ago
There's another report - https://github.com/WordPress/gutenberg/issues/73733. Agreed that this is a good candidate for the minor release.
ktmn commented on PR #10591:
7 weeks ago
#9
The fix works well for me.
Couple things I tested (working as intended I think):
$html = '<div class="& &"></div>';
$p = new WP_HTML_Tag_Processor($html);
$p->next_tag();
var_dump($p->has_class('&')); // true
var_dump($p->has_class('&')); // false
var_dump($p->has_class('&amp;')); // false
$p->add_class('test');
$html = $p->get_updated_html();
echo $html; // <div class="& test"></div>
$html = '<div class="&"></div>';
$p = new WP_HTML_Tag_Processor($html);
$p->next_tag();
$p->add_class('&');
$html = $p->get_updated_html();
echo $html; // <div class="&"></div>
$html = '<div class="&"></div>';
$p = new WP_HTML_Tag_Processor($html);
$p->next_tag();
$p->remove_class('&');
$html = $p->get_updated_html();
echo $html; // <div ></div>
$html = '<div class="&"></div>';
$p = new WP_HTML_Tag_Processor($html);
$p->next_tag();
$p->remove_class('&');
$html = $p->get_updated_html();
echo $html; // <div class="&"></div>
$html = '<div class="&"></div>';
$p = new WP_HTML_Tag_Processor($html);
$p->next_tag();
$p->remove_class('&');
$html = $p->get_updated_html();
echo $html; // <div ></div>
$html = '<div class="&"></div>';
$p = new WP_HTML_Tag_Processor($html);
$p->next_tag();
$p->remove_class('&');
$html = $p->get_updated_html();
echo $html; // <div class="&"></div>
$html = '<div class="&"></div>';
$p = new WP_HTML_Tag_Processor($html);
$p->next_tag();
$p->add_class('&');
$html = $p->get_updated_html();
echo $html; // <div class="& &amp;"></div>
$html = '<div class="&"></div>';
$p = new WP_HTML_Tag_Processor($html);
$p->next_tag();
$p->add_class('&amp;');
$html = $p->get_updated_html();
echo $html; // <div class="& &amp;amp;"></div>
Any & is always converted to &, for CSS selectors both work OK:
echo <<<HTML
<style>.\[\&\]\:text-red { color: red; }</style>
<div class="[&]:text-red">Text is red</div>
<div class="[&]:text-red">Text is still red</div>
HTML;
(not encoding the chars when not needed would look prettier (and shorter) though, if it's possible to do it safely)
While the class names here get encoded, CSS stays working:
echo '<style>.\[\&_\.class\\_\\_name\]\:font-bold .class__name { font-weight: bold; }</style>';
$html = <<<HTML
<div class="[&_.class\_\_name]:font-bold"><p class="class__name">Text</p></div>
<div class="[&_.class\_\_name]:font-bold"><p class="class__name">Text</p></div>
HTML;
$p = new WP_HTML_Tag_Processor($html);
while($p->next_tag()) {
$p->add_class('test');
}
$html = $p->get_updated_html();
echo $html;
// <div class="[&_.class\_\_name]:font-bold test"><p class="class__name test">Text</p></div>
// <div class="[&_.class\_\_name]:font-bold test"><p class="class__name test">Text</p></div>
echo <<<HTML
<style>
.\[\&\:\:before\]\:content-\[\'\\2193\'\]:before,
.\[\&\:\:before\]\:content-\[\"\\2193\"\]:before,
.\[\&\:\:before\]\:content-\[\'↓\'\]:before,
.\[\&\:\:before\]\:content-\[\"↓\"\]:before {
content: "↓";
}
</style>
HTML;
$html = <<<HTML
<p class="[&::before]:content-['↓']">Test</p>
<p class="[&::before]:content-["↓"]">Test</p>
<p class="[&::before]:content-['↓']">Test</p>
<p class="[&::before]:content-["↓"]">Test</p>
<p class='[&::before]:content-["↓"]'>Test</p>
<p class="[&::before]:content-['\\2193']">Test</p>
<p class="[&::before]:content-["\\2193"]">Test</p>
<p class="[&::before]:content-['\\2193']">Test</p>
<p class="[&::before]:content-["\\2193"]">Test</p>
HTML;
$p = new WP_HTML_Tag_Processor($html);
while($p->next_tag('p')) {
$p->add_class('test');
}
$html = $p->get_updated_html();
echo $html;
// <p class="[&::before]:content-['↓'] test">Test</p>
// <p class="[&::before]:content-["↓"] test">Test</p>
// <p class="[&::before]:content-['↓'] test">Test</p>
// <p class="[&::before]:content-["↓"] test">Test</p>
// <p class="[&::before]:content-["↓"] test">Test</p>
// <p class="[&::before]:content-['\2193'] test">Test</p>
// <p class="[&::before]:content-["\2193"] test">Test</p>
// <p class="[&::before]:content-['\2193'] test">Test</p>
// <p class="[&::before]:content-["\2193"] test">Test</p>
Encoding stayed stable after running processor output through the processor again.
@jonsurrell commented on PR #10591:
7 weeks ago
#10
Awesome testing, thanks for following up @ktmn. I agree, the HTML processing appears to be correct in the examples you shared.
---
Any
&is always converted to&, for CSS selectors both work OK
This is correct whenever the class attribute is modified. There are a few cases where the class attribute did not require modification and the tag processor leaves the plain & class attribute value as-is. We can see it in the examples that add a class that is already present or remove a class that is not present.
---
(not encoding the chars when not needed would look prettier (and shorter) though, if it's possible to do it safely)
This is technically possible and allowed, but generally not worth the extra complexity. & _will decode as_ &, but & may be interpreted in different ways depending on what follows, so & is just less error-prone. It's possible to encode fewer things, but encoding all of &<>"' aligns with what WordPress generally does and is safe.
---
The last two (more complex) cases don't render exactly the same for me. I think it's more related to how the style is being printed and may be related to quoting, escaping, PHP versions, and possibly even GitHub's code fence escaping. It's hard to say.
Here are the working style elements for me the do have the correct behavior with the example output from the tag processor:
<style>.\[\&_\.class\\_\\_name\]\:font-bold .class__name { font-weight: bold; }</style> <style> .\[\&\:\:before\]\:content-\[\'\\2193\'\]:before, .\[\&\:\:before\]\:content-\[\"\\2193\"\]:before, .\[\&\:\:before\]\:content-\[\'↓\'\]:before, .\[\&\:\:before\]\:content-\[\"↓\"\]:before { content: "↓"; } </style>
Here's the example in the Live DOM Viewer. If you don't know it, it's a very handy tool for inspecting HTML processing.
Trac ticket: