Make WordPress Core

Opened 10 years ago

Last modified 3 weeks ago

#26858 assigned defect (bug)

Comment duplicate check has a slow query

Reported by: pento's profile pento Owned by: pbearne's profile pbearne
Milestone: Future Release Priority: normal
Severity: normal Version: 2.0
Component: Database Keywords: needs-testing has-patch
Focuses: performance Cc:


When checking for duplicate comments, the query is pretty slow:

SELECT comment_ID FROM wp_comments WHERE comment_post_ID = '1' AND comment_parent = '0' AND comment_approved != 'trash' AND ( comment_author = 'foo' OR comment_author_email = '' ) AND comment_content = 'some content' LIMIT 1;

This will use the comment_post_ID index, which means it will scan all comments on that post. For a post with thousands of comments, this starts to get slow.

Instead, we should change KEY comment_post_ID (comment_post_ID) to KEY comment_post_ID (comment_post_ID,comment_content(255)), to cover this case.

Props MySQL Performance Blog for finding this.

Attachments (2)

26858.diff (1.8 KB) - added by pento 10 years ago.
26858.2.diff (1.6 KB) - added by DrewAPicture 9 years ago.

Download all attachments as: .zip

Change History (15)

10 years ago

#1 @nacin
10 years ago

Peter Zaitsev emailed me about this and I was looking into it earlier. Sounds good to me.

What I'm wondering is if we could actually ditch the comment_content conditional. If we could query all comments against the post with one more limiting factor — an index on comment_author and/or comment_author_email — we might be able to avoid adding a one-off comment_content index (we don't do a query like this anywhere else). And then we'd pull all comments against a post, by that author, into PHP, and look for duplicate content that way.

As we do an OR right now for comment_author and comment_author_email, a different index wouldn't actually help. I don't know how much sense this makes, though, and am trying to figure out the use case. Someone double-submits an identical comment, it gets caught, even if we compare author = % AND author_email = %s. If someone submits a comment, then goes back and changes *only* their name or email, it wouldn't get caught. I guess that kind of a correction is possible.

There are a few other ugly comment queries (including another in that post, compounded by a plugin-added comment_subscribe field); perhaps it's worth looking into these for 3.9 and seeing if a few smart indexes could handle a lot of our problems.

#2 @pento
10 years ago

That's a good point, adding a one-off index isn't a great solution.

A problem could be if an admin has unchecked the "Comment author must fill out name and e-mail" option, that would possibly result in many comments having empty name/email fields. I don't expect that to be a common problem, though.

#3 @johnbillion
10 years ago

  • Milestone changed from 3.9 to Future Release

#4 @chriscct7
9 years ago

  • Focuses performance added
  • Keywords needs-refresh added; has-patch removed

9 years ago

#5 @DrewAPicture
9 years ago

  • Keywords needs-refresh removed
  • Milestone changed from Future Release to 4.4
  • Owner set to pento
  • Status changed from new to assigned

26858.2.diff refreshes the patch (hopefully I did that right). Also fixes a typo from the original patch:


KEY comment_post_ID_content (comment_post_ID, comment_content(255))

Should've been:

KEY comment_post_ID (comment_post_ID,comment_content(255))

Also, following [31349] I opted to use $max_index_length instead of 255.

#6 @DrewAPicture
9 years ago

  • Keywords has-patch added

#7 @pento
9 years ago

  • Keywords dev-feedback has-patch removed
  • Milestone changed from 4.4 to Future Release
  • Owner pento deleted

Adding a single index isn't the right way to go - as @nacin mentioned, this needs a deeper dive into the queries used by the various comment functions, to see if we can make several of them better at once, rather than adding an index that's only used by one function.

As a side note, comment_post_ID_content is the correct index name - it's a compound index, so it's named for both columns - like comment_approved_date_gmt (comment_approved,comment_date_gmt).

#8 @desrosj
19 months ago

  • Keywords needs-testing added
  • Milestone set to Future Release

Re-adding a milestone to this one.

This could benefit from some investigation with the latest version of WordPress to figure out what has changed (if anything).

Perhaps the performance team would be able to dig in here? cc: @flixos90 @mxbclang

#9 @mxbclang
19 months ago

Thanks, @desrosj! We can take a look.

This ticket was mentioned in PR #6301 on WordPress/wordpress-develop by @pbearne.

3 months ago

  • Keywords has-patch added

The database version for WordPress has been updated. As a result, a new database upgrade routine "upgrade_660" has been introduced. Also, an adjustment has been made to refine the structure of the 'comment_post_ID' index in the comments table, enhancing database performance.

Trac ticket: 26858

#11 @pbearne
3 months ago

  • Keywords has-patch removed

I have refreshed the patch

#12 @pbearne
3 months ago

  • Owner set to pbearne

#13 @pbearne
3 weeks ago

  • Keywords has-patch added
Note: See TracTickets for help on using tickets.