Make WordPress Core

Opened 5 months ago

Closed 4 months ago

#63630 closed defect (bug) (fixed)

Encoded HTML entities are decoded for users without unfiltered_html

Reported by: jonsurrell's profile jonsurrell Owned by: jonsurrell's profile jonsurrell
Milestone: 6.9 Priority: normal
Severity: normal Version: 2.0
Component: General Keywords: has-patch has-unit-tests dev-feedback 2nd-opinion
Focuses: Cc:

Description (last modified by jonsurrell)

When a user without the unfiltered_html capability authors a post with text that appears to be a numeric (decimal or hex) HTML character reference, the desired text is replaced with the HTML character reference.

For example, a authors a post with the text ' or the following HTM in the block editor:

<!-- wp:paragraph -->
<p>&amp;#39;</p>
<!-- /wp:paragraph -->

The user's intent is to write the text &#39; which is correctly encoded as the HTML &amp;#39.

However, when the post is saved the post content is transformed to unescape the desired HTML escaping and leave the unescaped numeric HTML character reference in the HTML, causing the corresponding character to be rendered by the browser, or ' in the example.

Querying the post_content field of the post reveals the same:

<!-- wp:paragraph -->
<p>&#039;</p>
<!-- /wp:paragraph -->

When the post is published, the text ' is displayed instead of the expected &#39;.

Change History (15)

#1 @jonsurrell
5 months ago

  • Description modified (diff)

#2 @jonsurrell
5 months ago

It's helpful to compare this with the desired behavior where the platform behaves as expected.

If a user without unfiltered_html capability creates a post with the text content &amp;, the block editor will produce the following post content:

<!-- wp:paragraph -->
<p>&amp;amp;</p>
<!-- /wp:paragraph -->

The text has been correctly HTML encoded and this is preserved after save. The editor will load the same content and the frontend will display the desired text rendered in the browser: &amp;.

This ticket was mentioned in PR #9099 on WordPress/wordpress-develop by @jonsurrell.


5 months ago
#3

  • Keywords has-patch has-unit-tests added

Prevent the wp_kses_normalize_entities function from transforming inputs like &amp;#39; to &#039;, changing its value. That transformation changes the input in a way that is not normalized results in significantly different HTML.

Trac ticket: https://core.trac.wordpress.org/ticket/63630

This change includes https://github.com/WordPress/wordpress-develop/pull/9095 which should be reviewed and landed first.

#4 @jonsurrell
5 months ago

I believe this issue goes back as far as [649] when kses.php was introduced in WordPress.

The unfiltered_html behavior changed in [2896], however I did not check how that changed the behavior on post save.

I found what appears to be a recent version of KSES where the normalize entities transforms follows the same order as proposed in PR 9099 which suggests that it was a bug in KSES and that the fix is appropriate. Unfortunately, I was unable to find commit history that discusses the change.

#10 @jonsurrell
5 months ago

  • Keywords needs-testing dev-feedback added

This ticket was mentioned in Slack in #core by jonsurrell. View the logs.


5 months ago

#12 @SirLouen
5 months ago

  • Keywords needs-testing removed

Patch Test Report

Description

✅ This report validates that the indicated patch works as expected.

Patch tested: https://github.com/WordPress/wordpress-develop/pull/9099.diff

Environment

  • WordPress: 6.9-alpha-60093-src
  • PHP: 8.2.28
  • Server: nginx/1.29.0
  • Database: mysqli (Server: 8.4.5 / Client: mysqlnd 8.2.28)
  • Browser: Chrome 138.0.0.0
  • OS: Windows 10/11
  • Theme: Twenty Twenty-Five 1.2
  • MU Plugins:
    • Exporting Test 1.0.0
  • Plugins:
    • Test Reports 1.2.0

Reproduction Steps

  1. Follow instructions provided in OP (or check screenshots in supp artifacts)

Actual Results

  1. ✅ Issue resolved with patch.

Additional Notes

  • Patch works as expected but formatting and kses in general is something I've never gone too far deeply; I wonder if helping users without unescaping capabilities, to unescape certain chars, could help some users to build their way up for security concerning applications.

Without unfiltered html cap before patch

Backend:
https://i.imgur.com/pBl3Ctz.png
Frontend:
https://i.imgur.com/BwkYgBJ.png

With unfiltered html cap

Backend
https://i.imgur.com/whI0uah.png
Frontend
https://i.imgur.com/Tu9G2Xv.png

Without unfiltered html cap after patch

Frontend
https://i.imgur.com/SYZeY7a.png

@jonsurrell commented on PR #9099:


5 months ago
#13

PR 9095 landed in [60446], this change is now unblocked.

@dmsnell commented on PR #9099:


5 months ago
#14

as a clarification, I wanted to note that while this introduces a change of behavior, it’s not doing so in a way I would expect it to break things.

previously, those characters might be corrupted through Core, and if some plugin looks for them and tries to repair them, perhaps that code would now see different data coming through the filter stack.

in all of the cases where the behavior is different though, the existing options are fundamentally broken because the corruption happened by Core at the start. this should only improve the situation.

This ticket was mentioned in Slack in #core by jonsurrell. View the logs.


5 months ago

This ticket was mentioned in Slack in #core by benjamin_zekavica. View the logs.


5 months ago

#17 @jonsurrell
5 months ago

  • Milestone changed from Awaiting Review to 6.9

I'm adding this to 6.9. The patch has tests, testing feedback, and an approval. I'd still like to get feedback from more reviewers on the change in case I've overlooked how this could negatively impact users.

#18 @jonsurrell
5 months ago

  • Keywords 2nd-opinion added

@jonsurrell commented on PR #9099:


4 months ago
#19

I shared this and requested review at the July 16, 2025 Core devchat.

I plan to land this in the next few days unless there are reviews that raise concerns or other issues.

#20 @jonsurrell
4 months ago

  • Owner set to jonsurrell
  • Resolution set to fixed
  • Status changed from assigned to closed

In 60616:

KSES: Prevent normalization from unescaping escaped numeric character references.

Fixes an issue where wp_kses_normalize_entities would transform inputs like "&amp;#39;" into "&#039;", changing the intended HTML text.

This behavior has present since the initial version of KSES was introduced in [649].

[2896] applied the normalization to post content for users without the "unfiltered_html" capability.

Developed in https://github.com/WordPress/wordpress-develop/pull/9099.

Props jonsurrell, dmsnell, sirlouen.
Fixes #63630.

Note: See TracTickets for help on using tickets.