Opened 14 years ago
Closed 14 years ago
#9934 closed defect (bug) (fixed)
Apostrophe in comment author causes comment to be spammed - esc_html
Reported by: |
|
Owned by: |
|
---|---|---|---|
Milestone: | 2.8 | Priority: | high |
Severity: | blocker | Version: | |
Component: | Comments | Keywords: | |
Focuses: | Cc: |
Description
Since [11380] - which added esc_html filtering to many items - comments containing an apostrophe (and possibly other characters) in the author name field are flagged as spam by Wordpress.
The root cause is that esc_html() uses decimal entity encoding, so O'Connor
becomes O'Connor
. But wp_blacklist_check() regards any comment containing a decimal entity as spam (and worse, does so silently and without any way for the blog administrator to stop it).
Possible solutions:
- esc_html() should use hex entity encoding, not decimal
- comment_author_name shouldn't use esc_html()
- wp_blacklist_check() shouldn't spam comments containing decimal entities
All three are trivial fixes so I haven't included a patch. I'd favour (1) if only because it eliminates the regression and reverts to the old behaviour.
Attachments (1)
Change History (22)
#4
in reply to:
↑ description
@
14 years ago
Replying to tellyworth:
- wp_blacklist_check() shouldn't spam comments containing decimal entities
This needs to be fixed. Aside from being an un-filterable hack, it's probably needlessly blacklisting trackbacks with such entities.
#5
@
14 years ago
see also: http://core.trac.wordpress.org/ticket/6992#comment:22 and follow-ups
#6
@
14 years ago
I'd encourage Tellyworth's 4th option, storing escaped/converted data in the database is almost always a bad idea. It's causes problems in other areas where WordPress does that, and it's going to be a real pain to undo.
#12
@
14 years ago
- Resolution set to fixed
- Status changed from assigned to closed
See #9965 for the blacklist issue. The regression should be fixed so I'll close this ticket.
#13
@
14 years ago
- Resolution fixed deleted
- Status changed from closed to reopened
I just re-tested with r11480 and the problem is still present. Investigating.
#14
@
14 years ago
Confirmed, the same problem is still present even after [11460].
wp_specialchars is used on comment_author prior to comment spam filtering. wp_specialchars() calls _wp_specialchars(), which encodes an apostrophe to its decimal numeric entity (formatting.php around line 273).
Removing the blacklist entity check as per #9965 will fix it but that's just covering up the symptom. The real issue is that WP is futzing with comment data before passing it to spam filters, which hampers their ability to produce accurate results.
#15
@
14 years ago
That moves wp_allow_comment() before wp_filter_comment(). I don't know if that will bust any plugins though.
#16
@
14 years ago
wp_specialchars(), when passed only one argument, calls esc_html(). esc_html() defaults to ENT_QUOTES. wp_specialchars() used to default to ENT_NOQUOTES.
Do we need esc_html_db() for these instances. (Yes, I know we should escape as little as possible when sending to the db, but I'm going for the minimal fix for 2.8.)
Actually there's a fourth option, and I think this ought to be the long-term fix:
Spam filtering really needs to happen on raw POST data, before plugins and sanitizers have the opportunity to screw with it. esc_html()'s behaviour would be fine if it occurred only at display time. But the data passed to spam filters (and, importantly, the data stored in the wp_comments table - which is subsequently used when reporting false positives and missed spam to Akismet and other spam filtering services) need to be as close as possible to the original.