WordPress.org

Make WordPress Core

Opened 8 weeks ago

Last modified 7 weeks ago

#43946 new defect (bug)

Inconsistent encoding of comment_content

Reported by: leewillis77 Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 4.9.5
Component: Comments Keywords: has-patch reporter-feedback
Focuses: Cc:

Description

The contents of the comments.comment_content field in the WordPress database are stored inconsistently depending on whether the user creating the comment was logged in or not.

> select comment_ID, comment_content, comment_type from wp_comments where comment_ID > 7;
+------------+-----------------------------------------------------+--------------+
| comment_ID | comment_content                                     | comment_type |
+------------+-----------------------------------------------------+--------------+
|          8 | This is a good & strong comment (comment, anon) |              |
|          9 | This is a good & strong comment (comment, auth)     |              |
+------------+-----------------------------------------------------+--------------+

In the example above, comment 8 was left by an anonymous user, and comment 9 was left by an authenticated user. Notice how the comment_content field has been HTML-escaped for ID 8, but not for ID 9.

Steps to reproduce:

  • Install a fresh copy of WordPress
  • As the site administrator, navigate to the default "hello-world" post
  • Leave a comment including characters that should be escaped when output to HTML, e.g. &
  • Log out
  • Navigate to the default "hello-world" post
  • Leave a comment including characters that should be escaped when output to HTML, e.g. &

Compare the contents of the two comments in the wp_comments table.

Attachments (1)

43946.diff (868 bytes) - added by subrataemfluence 8 weeks ago.

Download all attachments as: .zip

Change History (8)

This ticket was mentioned in Slack in #core-comments by presskopp. View the logs.


8 weeks ago

#2 @subrataemfluence
8 weeks ago

  • Keywords needs-patch added

I confirm that following the steps mentioned I could reproduce the issue!

#3 @subrataemfluence
8 weeks ago

  • Keywords has-patch added; needs-patch removed

#4 follow-up: @leewillis77
8 weeks ago

Thanks for the reproduction, and the proposed patch. However, I'm not sure that patch really addresses the issue. In general, best practice is to store content "raw" and escape it as appropriate on usage (bearing in mind that different output schemes will have different encoding needs, e.g. XML vs HTML vs JSON, so ideally any such patch should do that (ensuring that it *is* properly escaped on usage).

I also think any patch needs to deal somehow with legacy data.

#5 in reply to: ↑ 4 @subrataemfluence
7 weeks ago

So, basically what you mean is "NOT" to escape while storing and rather keep it raw. Escape the value the way it is required when outputted. Meaning you want it to be other way round. Am I right?

Replying to leewillis77:

Thanks for the reproduction, and the proposed patch. However, I'm not sure that patch really addresses the issue. In general, best practice is to store content "raw" and escape it as appropriate on usage (bearing in mind that different output schemes will have different encoding needs, e.g. XML vs HTML vs JSON, so ideally any such patch should do that (ensuring that it *is* properly escaped on usage).

I also think any patch needs to deal somehow with legacy data.

#6 @subrataemfluence
7 weeks ago

  • Keywords reporter-feedback added

I tried to reproduce the same issue today with a fresh WP install and twentysebenteen activated. For both Logged-In and Non-Logged-In statuses, the& changed to & in the database. The way it store the data (for both user_id 0 and user_id > 0 ) today is different from yesterdaty! Straneg!!

The patch is not applied yet.

Here is a screenshot of the database records in wp_comments table. https://tinyurl.com/y83q25ym

#7 @leewillis77
7 weeks ago

Hmm, it depends what you write:

"This is great & awesome" < triggers the error

"This is great &amp; awesome" < does not - the & in &amp; does not get escaped

As a side affect, the latter comment also gets displayed in a browser as "This is great & awesome" rather than "This is great &amp; awesome" - a prime example of why the current scenario is broken....

Note: See TracTickets for help on using tickets.