Opened 7 years ago
Last modified 7 years ago
#43946 new defect (bug)
Inconsistent encoding of comment_content
Reported by: | leewillis77 | Owned by: | |
---|---|---|---|
Milestone: | Awaiting Review | Priority: | normal |
Severity: | normal | Version: | 4.9.5 |
Component: | Comments | Keywords: | has-patch reporter-feedback |
Focuses: | Cc: |
Description
The contents of the comments.comment_content field in the WordPress database are stored inconsistently depending on whether the user creating the comment was logged in or not.
> select comment_ID, comment_content, comment_type from wp_comments where comment_ID > 7; +------------+-----------------------------------------------------+--------------+ | comment_ID | comment_content | comment_type | +------------+-----------------------------------------------------+--------------+ | 8 | This is a good & strong comment (comment, anon) | | | 9 | This is a good & strong comment (comment, auth) | | +------------+-----------------------------------------------------+--------------+
In the example above, comment 8 was left by an anonymous user, and comment 9 was left by an authenticated user. Notice how the comment_content field has been HTML-escaped for ID 8, but not for ID 9.
Steps to reproduce:
- Install a fresh copy of WordPress
- As the site administrator, navigate to the default "hello-world" post
- Leave a comment including characters that should be escaped when output to HTML, e.g. &
- Log out
- Navigate to the default "hello-world" post
- Leave a comment including characters that should be escaped when output to HTML, e.g. &
Compare the contents of the two comments in the wp_comments table.
Attachments (1)
Change History (8)
This ticket was mentioned in Slack in #core-comments by presskopp. View the logs.
7 years ago
#4
follow-up:
↓ 5
@
7 years ago
Thanks for the reproduction, and the proposed patch. However, I'm not sure that patch really addresses the issue. In general, best practice is to store content "raw" and escape it as appropriate on usage (bearing in mind that different output schemes will have different encoding needs, e.g. XML vs HTML vs JSON, so ideally any such patch should do that (ensuring that it *is* properly escaped on usage).
I also think any patch needs to deal somehow with legacy data.
#5
in reply to:
↑ 4
@
7 years ago
So, basically what you mean is "NOT" to escape while storing and rather keep it raw. Escape the value the way it is required when outputted. Meaning you want it to be other way round. Am I right?
Replying to leewillis77:
Thanks for the reproduction, and the proposed patch. However, I'm not sure that patch really addresses the issue. In general, best practice is to store content "raw" and escape it as appropriate on usage (bearing in mind that different output schemes will have different encoding needs, e.g. XML vs HTML vs JSON, so ideally any such patch should do that (ensuring that it *is* properly escaped on usage).
I also think any patch needs to deal somehow with legacy data.
#6
@
7 years ago
- Keywords reporter-feedback added
I tried to reproduce the same issue today with a fresh WP install and twentysebenteen activated. For both Logged-In and Non-Logged-In statuses, the&
changed to &
; in the database. The way it store the data (for both user_id 0 and user_id > 0 ) today is different from yesterdaty! Straneg!!
The patch is not applied yet.
Here is a screenshot of the database records in wp_comments
table.
https://tinyurl.com/y83q25ym
#7
@
7 years ago
Hmm, it depends what you write:
"This is great & awesome" < triggers the error
"This is great & awesome" < does not - the & in & does not get escaped
As a side affect, the latter comment also gets displayed in a browser as "This is great & awesome" rather than "This is great & awesome" - a prime example of why the current scenario is broken....
I confirm that following the steps mentioned I could reproduce the issue!