Make WordPress Core

Opened 11 years ago

Closed 10 years ago

Last modified 10 years ago

#23231 closed defect (bug) (fixed)

preprocess_comment should get the normalised comment data

Reported by: westi's profile westi Owned by: dkotter's profile dkotter
Milestone: 4.0 Priority: low
Severity: minor Version: 3.5
Component: Comments Keywords: westi-like has-patch good-first-bug
Focuses: Cc:


In wp_new_comment we normalise some data like user_ID/user_id so that we support people providing both versions but we fire the filter to preprocess the comment first.

This means if someone writes a filter they have to re-implement this normalisation if they want to filter the comment data properly.

An example of where this currently goes wrong is when Akismet tries to fetch the roles of the user submitting the comment -

This only works if the comment comes via a route that supplies the data correctly and not one that doesn't.

Attachments (3)

comment.diff (801 bytes) - added by dkotter 10 years ago.
23231.diff (1.2 KB) - added by dkotter 10 years ago.
23231.2.diff (1.4 KB) - added by dkotter 10 years ago.

Download all attachments as: .zip

Change History (16)

#1 @westi
11 years ago

There is some history as to why we support both in #11222 / [12267] / [12300]

10 years ago

#2 @dkotter
10 years ago

  • Keywords has-patch added; needs-patch removed

Added a patch that normalizes the user_ID data before sending on to the preprocess_comment filter. I wasn't sure whether we should leave the normalizing of the same data after the filter, as it ends up with duplicate code just a few lines apart, but for now I've left that there. My thought was someone could change the data in the filter call, so re-normalizing it might make sense.

#3 @helen
10 years ago

  • Keywords good-first-bug added; easy removed
  • Owner set to dkotter
  • Status changed from new to assigned

#4 @SergeyBiryukov
10 years ago

  • Milestone changed from Awaiting Review to 4.0

#5 @nacin
10 years ago

Hi dkotter, this patch looks like a good start. I think we could probably split the code after the filter into two parts — normalization and sanitization. Standardize on user_id before the filter, cast it to an integer after. Basically, take the if statement and move it up, while leave the elseif there.

10 years ago

#6 @dkotter
10 years ago

Alright, added a new patch, made the changes as requested, which I think is a better approach. Again, let me know if more is needed here. Thanks!

#7 @nacin
10 years ago

This is exactly what I was envisioning. The patch is perfect. I did however catch an odd situation. Imagine:

  • Function is passed user_id 1
  • Filter sets user_ID 2

With the current code, the function takes user_ID 2 and overrides user_id 1 with it. With the patched code, the function will end up with user_id 1. By normalizing user_id we lose the fact that user_ID could have been set by the filter.

(Also, with the initial patch, a user_ID would have become a user_id, then have been passed to the filter. The original user_ID will then override any changes the filter makes to user_id.)

I'll be honest, I don't entirely care about breaking this, but we still probably shouldn't. Open to suggestions.

#8 follow-up: @dkotter
10 years ago

Hmm.. yeah. That's probably (hopefully?) an edge case and seems more like someone shooting themselves in the foot, trying to change the user ID of a comment, but it is a possibility and something we should probably try and avoid.

The only thought I had was to keep the normalization part above the filter, setting both user_id and user_ID there, then after the filter, check if those values are equal. If they're not, then one of those was changed in the filter so at that point, normalize the data again, so we don't end up with two different values. The problem I see there is there's not a way I could think of to tell which one was changed and which one wasn't, so as you mention, if someone changes user_id and not user_ID, and we normalize back to user_ID, they lose their changes.

So not sure if there is a good answer here. We don't want two separate ID's set, but not sure if we want to "guess" on what value (either user_id or user_ID) someone actually wanted to use.

#9 in reply to: ↑ 8 ; follow-up: @SergeyBiryukov
10 years ago

Replying to dkotter:

The problem I see there is there's not a way I could think of to tell which one was changed and which one wasn't

We could probably just save the initial values to separate variables and then compare them to filtered values.

10 years ago

#10 in reply to: ↑ 9 @dkotter
10 years ago

Replying to SergeyBiryukov:

We could probably just save the initial values to separate variables and then compare them to filtered values.

Ah, yeah, not sure why I didn't think of that. Most recent patch takes that approach. Normalizes data before filter, saving user_id to a variable, then after filter, checks if it's been changed, and if so, re-normalizes things.

#11 @SergeyBiryukov
10 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 28915:

Normalize 'user_id' and 'user_ID' values in wp_new_comment() before passing the comment data to 'preprocess_comment' filter.

props dkotter.
fixes #23231.

#12 @xknown
10 years ago

There's a PHP warning when $commentdata doesn't contain the user_ID or user_id fields. For example, in this line there's likely a PHP warning

#13 @SergeyBiryukov
10 years ago

In 28922:

Avoid a PHP notice in wp_new_comment() if user ID is not passed.

see #23231.

Note: See TracTickets for help on using tickets.