WordPress.org

Make WordPress Core

Opened 10 years ago

Closed 10 years ago

#9934 closed defect (bug) (fixed)

Apostrophe in comment author causes comment to be spammed - esc_html

Reported by: tellyworth Owned by: markjaquith
Milestone: 2.8 Priority: high
Severity: blocker Version:
Component: Comments Keywords:
Focuses: Cc:

Description

Since [11380] - which added esc_html filtering to many items - comments containing an apostrophe (and possibly other characters) in the author name field are flagged as spam by Wordpress.

The root cause is that esc_html() uses decimal entity encoding, so O'Connor becomes O'Connor. But wp_blacklist_check() regards any comment containing a decimal entity as spam (and worse, does so silently and without any way for the blog administrator to stop it).

Possible solutions:

  1. esc_html() should use hex entity encoding, not decimal
  1. comment_author_name shouldn't use esc_html()
  1. wp_blacklist_check() shouldn't spam comments containing decimal entities

All three are trivial fixes so I haven't included a patch. I'd favour (1) if only because it eliminates the regression and reverts to the old behaviour.

Attachments (1)

9934.diff (655 bytes) - added by ryan 10 years ago.

Download all attachments as: .zip

Change History (22)

#1 @ryan
10 years ago

  • Owner set to markjaquith
  • Status changed from new to assigned

#2 @tellyworth
10 years ago

Actually there's a fourth option, and I think this ought to be the long-term fix:

Spam filtering really needs to happen on raw POST data, before plugins and sanitizers have the opportunity to screw with it. esc_html()'s behaviour would be fine if it occurred only at display time. But the data passed to spam filters (and, importantly, the data stored in the wp_comments table - which is subsequently used when reporting false positives and missed spam to Akismet and other spam filtering services) need to be as close as possible to the original.

#3 @ryan
10 years ago

  • Milestone changed from Unassigned to 2.8

#4 in reply to: ↑ description @filosofo
10 years ago

Replying to tellyworth:

  1. wp_blacklist_check() shouldn't spam comments containing decimal entities

This needs to be fixed. Aside from being an un-filterable hack, it's probably needlessly blacklisting trackbacks with such entities.

#6 @josephscott
10 years ago

I'd encourage Tellyworth's 4th option, storing escaped/converted data in the database is almost always a bad idea. It's causes problems in other areas where WordPress does that, and it's going to be a real pain to undo.

#7 @ryan
10 years ago

  • Severity changed from major to blocker

#8 @ryan
10 years ago

(In [11460]) Don't use esc_html() for DB bound data. see #9934

#9 @ryan
10 years ago

That good enough for now?

#10 @filosofo
10 years ago

Let's leave the ticket open until we can do something about wp_blacklist_check()

#11 @ryan
10 years ago

Remove entity encoding check? Having it in a blacklist check is unexpected.

#12 @ryan
10 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

See #9965 for the blacklist issue. The regression should be fixed so I'll close this ticket.

#13 @tellyworth
10 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

I just re-tested with r11480 and the problem is still present. Investigating.

#14 @tellyworth
10 years ago

Confirmed, the same problem is still present even after [11460].

wp_specialchars is used on comment_author prior to comment spam filtering. wp_specialchars() calls _wp_specialchars(), which encodes an apostrophe to its decimal numeric entity (formatting.php around line 273).

Removing the blacklist entity check as per #9965 will fix it but that's just covering up the symptom. The real issue is that WP is futzing with comment data before passing it to spam filters, which hampers their ability to produce accurate results.

@ryan
10 years ago

#15 @ryan
10 years ago

That moves wp_allow_comment() before wp_filter_comment(). I don't know if that will bust any plugins though.

#16 @ryan
10 years ago

wp_specialchars(), when passed only one argument, calls esc_html(). esc_html() defaults to ENT_QUOTES. wp_specialchars() used to default to ENT_NOQUOTES.

Do we need esc_html_db() for these instances. (Yes, I know we should escape as little as possible when sending to the db, but I'm going for the minimal fix for 2.8.)

#17 @Denis-de-Bernardy
10 years ago

esc_html_raw() (consistent with esc_url_raw()) might be a better name?

#18 @ryan
10 years ago

(In [11488]) Use _wp_specialchars to get NOQUOTES. see #9934

#19 @ryan
10 years ago

_wp_specialchars() defaults to ENT_NOQUOTES, which should restore previous behavior. The broader issue of entity encoding on the way to the DB can be dealt with in 2.9.

#20 @Denis-de-Bernardy
10 years ago

fixed, right?

#21 @ryan
10 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed
Note: See TracTickets for help on using tickets.