Make WordPress Core

Opened 4 years ago

Last modified 2 years ago

#26858 assigned defect (bug)

Comment duplicate check has a slow query

Reported by: pento Owned by:
Milestone: Future Release Priority: normal
Severity: normal Version: 2.0
Component: Database Keywords:
Focuses: performance Cc:


When checking for duplicate comments, the query is pretty slow:

SELECT comment_ID FROM wp_comments WHERE comment_post_ID = '1' AND comment_parent = '0' AND comment_approved != 'trash' AND ( comment_author = 'foo' OR comment_author_email = 'foo@bar.com' ) AND comment_content = 'some content' LIMIT 1;

This will use the comment_post_ID index, which means it will scan all comments on that post. For a post with thousands of comments, this starts to get slow.

Instead, we should change KEY comment_post_ID (comment_post_ID) to KEY comment_post_ID (comment_post_ID,comment_content(255)), to cover this case.

Props MySQL Performance Blog for finding this.

Attachments (2)

26858.diff (1.8 KB) - added by pento 4 years ago.
26858.2.diff (1.6 KB) - added by DrewAPicture 2 years ago.

Download all attachments as: .zip

Change History (9)

4 years ago

#1 @nacin
4 years ago

Peter Zaitsev emailed me about this and I was looking into it earlier. Sounds good to me.

What I'm wondering is if we could actually ditch the comment_content conditional. If we could query all comments against the post with one more limiting factor — an index on comment_author and/or comment_author_email — we might be able to avoid adding a one-off comment_content index (we don't do a query like this anywhere else). And then we'd pull all comments against a post, by that author, into PHP, and look for duplicate content that way.

As we do an OR right now for comment_author and comment_author_email, a different index wouldn't actually help. I don't know how much sense this makes, though, and am trying to figure out the use case. Someone double-submits an identical comment, it gets caught, even if we compare author = % AND author_email = %s. If someone submits a comment, then goes back and changes *only* their name or email, it wouldn't get caught. I guess that kind of a correction is possible.

There are a few other ugly comment queries (including another in that post, compounded by a plugin-added comment_subscribe field); perhaps it's worth looking into these for 3.9 and seeing if a few smart indexes could handle a lot of our problems.

#2 @pento
4 years ago

That's a good point, adding a one-off index isn't a great solution.

A problem could be if an admin has unchecked the "Comment author must fill out name and e-mail" option, that would possibly result in many comments having empty name/email fields. I don't expect that to be a common problem, though.

#3 @johnbillion
4 years ago

  • Milestone changed from 3.9 to Future Release

#4 @chriscct7
2 years ago

  • Focuses performance added
  • Keywords needs-refresh added; has-patch removed

2 years ago

#5 @DrewAPicture
2 years ago

  • Keywords needs-refresh removed
  • Milestone changed from Future Release to 4.4
  • Owner set to pento
  • Status changed from new to assigned

26858.2.diff refreshes the patch (hopefully I did that right). Also fixes a typo from the original patch:


KEY comment_post_ID_content (comment_post_ID, comment_content(255))

Should've been:

KEY comment_post_ID (comment_post_ID,comment_content(255))

Also, following [31349] I opted to use $max_index_length instead of 255.

#6 @DrewAPicture
2 years ago

  • Keywords has-patch added

#7 @pento
2 years ago

  • Keywords dev-feedback has-patch removed
  • Milestone changed from 4.4 to Future Release
  • Owner pento deleted

Adding a single index isn't the right way to go - as @nacin mentioned, this needs a deeper dive into the queries used by the various comment functions, to see if we can make several of them better at once, rather than adding an index that's only used by one function.

As a side note, comment_post_ID_content is the correct index name - it's a compound index, so it's named for both columns - like comment_approved_date_gmt (comment_approved,comment_date_gmt).

Note: See TracTickets for help on using tickets.