Make WordPress Core

Opened 8 years ago

Last modified 18 hours ago

#43946 new defect (bug)

Inconsistent encoding of comment_content

Reported by: leewillis77's profile leewillis77 Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 4.9.5
Component: Comments Keywords: needs-patch
Focuses: Cc:

Description

The contents of the comments.comment_content field in the WordPress database are stored inconsistently depending on whether the user creating the comment was logged in or not.

> select comment_ID, comment_content, comment_type from wp_comments where comment_ID > 7;
+------------+-----------------------------------------------------+--------------+
| comment_ID | comment_content                                     | comment_type |
+------------+-----------------------------------------------------+--------------+
|          8 | This is a good & strong comment (comment, anon) |              |
|          9 | This is a good & strong comment (comment, auth)     |              |
+------------+-----------------------------------------------------+--------------+

In the example above, comment 8 was left by an anonymous user, and comment 9 was left by an authenticated user. Notice how the comment_content field has been HTML-escaped for ID 8, but not for ID 9.

Steps to reproduce:

  • Install a fresh copy of WordPress
  • As the site administrator, navigate to the default "hello-world" post
  • Leave a comment including characters that should be escaped when output to HTML, e.g. &
  • Log out
  • Navigate to the default "hello-world" post
  • Leave a comment including characters that should be escaped when output to HTML, e.g. &

Compare the contents of the two comments in the wp_comments table.

Attachments (1)

43946.diff (868 bytes) - added by subrataemfluence 8 years ago.

Download all attachments as: .zip

Change History (13)

This ticket was mentioned in Slack in #core-comments by presskopp. View the logs.


8 years ago

#2 @subrataemfluence
8 years ago

  • Keywords needs-patch added

I confirm that following the steps mentioned I could reproduce the issue!

#3 @subrataemfluence
8 years ago

  • Keywords has-patch added; needs-patch removed

#4 follow-up: @leewillis77
8 years ago

Thanks for the reproduction, and the proposed patch. However, I'm not sure that patch really addresses the issue. In general, best practice is to store content "raw" and escape it as appropriate on usage (bearing in mind that different output schemes will have different encoding needs, e.g. XML vs HTML vs JSON, so ideally any such patch should do that (ensuring that it *is* properly escaped on usage).

I also think any patch needs to deal somehow with legacy data.

#5 in reply to: ↑ 4 @subrataemfluence
8 years ago

So, basically what you mean is "NOT" to escape while storing and rather keep it raw. Escape the value the way it is required when outputted. Meaning you want it to be other way round. Am I right?

Replying to leewillis77:

Thanks for the reproduction, and the proposed patch. However, I'm not sure that patch really addresses the issue. In general, best practice is to store content "raw" and escape it as appropriate on usage (bearing in mind that different output schemes will have different encoding needs, e.g. XML vs HTML vs JSON, so ideally any such patch should do that (ensuring that it *is* properly escaped on usage).

I also think any patch needs to deal somehow with legacy data.

#6 @subrataemfluence
8 years ago

  • Keywords reporter-feedback added

I tried to reproduce the same issue today with a fresh WP install and twentysebenteen activated. For both Logged-In and Non-Logged-In statuses, the& changed to & in the database. The way it store the data (for both user_id 0 and user_id > 0 ) today is different from yesterdaty! Straneg!!

The patch is not applied yet.

Here is a screenshot of the database records in wp_comments table.
https://tinyurl.com/y83q25ym

#7 @leewillis77
8 years ago

Hmm, it depends what you write:

"This is great & awesome" < triggers the error

"This is great &amp; awesome" < does not - the & in &amp; does not get escaped

As a side affect, the latter comment also gets displayed in a browser as "This is great & awesome" rather than "This is great &amp; awesome" - a prime example of why the current scenario is broken....

#8 @callumbw95
3 months ago

  • Keywords needs-testing added; reporter-feedback removed

Hey all,

I have just taken a look at this in 6.8 and this is still a persisting issue today. As of such I have modernised the original patch into a PR as this solves the issue absolutely perfectly, and then we can check this against the current test suite! Props to @subrataemfluence for the original patch!

I really don't believe there is much of a barrier to getting this into core, so hopefully we can get this over the line in the next release! 😃

Last edited 3 months ago by callumbw95 (previous) (diff)

This ticket was mentioned in PR #10374 on WordPress/wordpress-develop by @callumbw95.


3 months ago
#9

Update comment content to pass it through html_esc() regardless of being logged in / not.

Trac ticket: #43946

#10 @westonruter
3 months ago

  • Keywords needs-patch added; has-patch removed

The issue is that when a user is logged-in, namely as an administrator, then they can do unfiltered_html. This means they can add arbitrary HTML. Even a non-administrator, however, is allowed to use a subset of HTML which is allow-listed by Kses. Passing all submitted comment text through esc_html() will break the ability for users to format any of their comments with HTML. So unfortunately the proposed patch isn't the right solution.

#11 @sabernhardt
3 months ago

If the comment needs to normalize entities like the ampersand before saving it to the database, wp_kses_normalize_entities() might fit.

#12 @sajib1223
18 hours ago

  • Keywords needs-testing removed

Reproduction Report

Description

This report validates whether the issue can be reproduced.

Environment

  • WordPress: 6.9
  • PHP: 8.3.29
  • Server: nginx/1.27.5
  • Database: mysqli (Server: 8.0.33 / Client: mysqlnd 8.3.29)
  • Browser: Firefox 147.0
  • OS: Windows 10/11
  • Theme: Twenty Twenty-Five 1.4
  • MU Plugins:
    • Object Cache Pro (MU) 1.19.0
  • Plugins:
    • Test Reports 1.2.1

Actual Results

  1. ✅ Error condition occurs (reproduced).

Steps to reproduce

  1. Install fresh WordPress installation
  2. Navigate to the default "Hello world!" post as administrator
  3. Submit comment: This is great & awesome
  4. Log out
  5. Navigate to the same post as a visitor (logged out)
  6. Submit comment: This is great & awesome
  7. Check the wp_comments table in the database

Additional Notes

Comments are stored inconsistently in the comment_content field depending on authentication status:

  • Logged-out user: HTML entities are escaped (& becomes &amp;)
  • Logged-in user: HTML entities are stored unescaped (raw &)

Supplemental Artifacts

Database query results showing the inconsistency:

https://files.catbox.moe/jvqr9x.png

Last edited 18 hours ago by sajib1223 (previous) (diff)
Note: See TracTickets for help on using tickets.