Opened 11 years ago
Last modified 5 months ago
#26858 assigned defect (bug)
Comment duplicate check has a slow query
Reported by: | pento | Owned by: | pbearne |
---|---|---|---|
Milestone: | Future Release | Priority: | normal |
Severity: | normal | Version: | 2.0 |
Component: | Database | Keywords: | needs-testing has-patch |
Focuses: | performance | Cc: |
Description
When checking for duplicate comments, the query is pretty slow:
SELECT comment_ID FROM wp_comments WHERE comment_post_ID = '1' AND comment_parent = '0' AND comment_approved != 'trash' AND ( comment_author = 'foo' OR comment_author_email = 'foo@bar.com' ) AND comment_content = 'some content' LIMIT 1;
This will use the comment_post_ID index, which means it will scan all comments on that post. For a post with thousands of comments, this starts to get slow.
Instead, we should change KEY comment_post_ID (comment_post_ID)
to KEY comment_post_ID (comment_post_ID,comment_content(255))
, to cover this case.
Props MySQL Performance Blog for finding this.
Attachments (2)
Change History (15)
#2
@
11 years ago
That's a good point, adding a one-off index isn't a great solution.
A problem could be if an admin has unchecked the "Comment author must fill out name and e-mail" option, that would possibly result in many comments having empty name/email fields. I don't expect that to be a common problem, though.
#5
@
9 years ago
- Keywords needs-refresh removed
- Milestone changed from Future Release to 4.4
- Owner set to pento
- Status changed from new to assigned
26858.2.diff refreshes the patch (hopefully I did that right). Also fixes a typo from the original patch:
This:
KEY comment_post_ID_content (comment_post_ID, comment_content(255))
Should've been:
KEY comment_post_ID (comment_post_ID,comment_content(255))
Also, following [31349] I opted to use $max_index_length
instead of 255.
#7
@
9 years ago
- Keywords dev-feedback has-patch removed
- Milestone changed from 4.4 to Future Release
- Owner pento deleted
Adding a single index isn't the right way to go - as @nacin mentioned, this needs a deeper dive into the queries used by the various comment functions, to see if we can make several of them better at once, rather than adding an index that's only used by one function.
As a side note, comment_post_ID_content
is the correct index name - it's a compound index, so it's named for both columns - like comment_approved_date_gmt (comment_approved,comment_date_gmt)
.
#8
@
2 years ago
- Keywords needs-testing added
- Milestone set to Future Release
Re-adding a milestone to this one.
This could benefit from some investigation with the latest version of WordPress to figure out what has changed (if anything).
Perhaps the performance team would be able to dig in here? cc: @flixos90 @mxbclang
This ticket was mentioned in PR #6301 on WordPress/wordpress-develop by @pbearne.
7 months ago
#10
- Keywords has-patch added
The database version for WordPress has been updated. As a result, a new database upgrade routine "upgrade_660" has been introduced. Also, an adjustment has been made to refine the structure of the 'comment_post_ID' index in the comments table, enhancing database performance.
Trac ticket: 26858
Peter Zaitsev emailed me about this and I was looking into it earlier. Sounds good to me.
What I'm wondering is if we could actually ditch the comment_content conditional. If we could query all comments against the post with one more limiting factor — an index on comment_author and/or comment_author_email — we might be able to avoid adding a one-off comment_content index (we don't do a query like this anywhere else). And then we'd pull all comments against a post, by that author, into PHP, and look for duplicate content that way.
As we do an OR right now for comment_author and comment_author_email, a different index wouldn't actually help. I don't know how much sense this makes, though, and am trying to figure out the use case. Someone double-submits an identical comment, it gets caught, even if we compare author = % AND author_email = %s. If someone submits a comment, then goes back and changes *only* their name or email, it wouldn't get caught. I guess that kind of a correction is possible.
There are a few other ugly comment queries (including another in that post, compounded by a plugin-added comment_subscribe field); perhaps it's worth looking into these for 3.9 and seeing if a few smart indexes could handle a lot of our problems.