Opened 13 years ago
Last modified 5 years ago
#14711 new defect (bug)
Indexes for comment_author_email and user_id
Reported by: |
|
Owned by: | |
---|---|---|---|
Milestone: | Priority: | normal | |
Severity: | normal | Version: | 3.0 |
Component: | Database | Keywords: | needs-patch |
Focuses: | performance | Cc: |
Description
There are currently no indexes on any of the author columns in wp_comments. That means there's no efficient way to count the comments written by a given commenter, for example.
The enclosed patch adds two separate non-unique indexes, one each for comment_author_email and user_id.
Attachments (2)
Change History (19)
#2
in reply to:
↑ 1
@
13 years ago
Replying to westi:
This sounds reasonable.
Got any idea how long the table upgrade to add those might take on a site with alot of comments versus how much quicker they will make the queries?
Not the foggiest. I can see user_id being trivial. comment_author_email has the potential to be large in some circumstances. It might be worth limiting that index to 8 or 10 characters or thereabouts if that's a concern.
#3
follow-up:
↓ 4
@
13 years ago
One thing to beware of with that would be the low cardinatlity on user_id on sites with few / a single author. You could end up imposing drag on a query that pulls a lot of columns based on that.
#4
in reply to:
↑ 3
;
follow-up:
↓ 5
@
13 years ago
Replying to mrmist:
One thing to beware of with that would be the low cardinatlity on user_id on sites with few / a single author. You could end up imposing drag on a query that pulls a lot of columns based on that.
There are already indexes on several columns that are or could be low cardinality (comment_approved, comment_parent, comment_post_ID). Will adding user_id make a significant difference?
#5
in reply to:
↑ 4
@
13 years ago
Replying to tellyworth:
There are already indexes on several columns that are or could be low cardinality (comment_approved, comment_parent, comment_post_ID). Will adding user_id make a significant difference?
Probably not, but then simply because we are already doing something isn't necessarily a good reason to do something else.
Note - I'm not dismissing the idea completely, just suggesting that impact of any new indexes should be tested across the various scenerios.
#6
@
13 years ago
- Cc mdawaffe added
What are the actual queries you'd want to do here?
I imagine, for example, that you'd want to restrict by comment_approved, or order a list by date. In that case, it's not clear how much a single column index like that will help.
A query I would like to be able to do is: find the most recent comment by a given user/email.
SELECT * FROM wp_comments WHERE user_id = 1 AND comment_approved = 1 ORDER BY comment_date_gmt DESC LIMIT 1
In that case, the best index would be the following.
user_id, comment_approved, comment_date_gmt
That index (and a similar one for comment_author_email) would also work for your COUNT queries (at least, the queries I'm imagining you want):
SELECT COUNT( * ) FROM wp_comments WHERE user_id = 1 AND comment_approved = 1; SELECT COUNT( * ) FROM wp_comments WHERE comment_author_email = 'spammer@example.com' AND comment_approved = 'spam'
#7
@
13 years ago
- Keywords reporter-feedback added
@tellyworthy: Can you please prodvide example SQL queries you'd like to see supported (and reference the according operation in the application those queries are used in)?
#8
@
13 years ago
@hakre: we'd like to use this in the Akismet plugin, see http://plugins.trac.wordpress.org/ticket/1198 and the corresponding changeset http://plugins.trac.wordpress.org/changeset/294390
That query will almost certainly change before the plugin release, but it'll give you an idea of what we're trying to do and demonstrate that there's a practical need.
#9
@
13 years ago
Code for this recently hit the Akismet WP plugin and it makes for queries that look like:
SELECT COUNT( * ) FROM wp_comments WHERE user_id = 5 AND comment_approved = 1
and:
SELECT COUNT( * ) FROM wp_comments WHERE comment_author_email = 'joseph@josephscott.org' AND comment_author = 'Joseph Scott' AND comment_author_url = 'http://josephscott.org/' AND comment_approved = 1
The first is for comments left by users that were logged in, the second for those who weren't logged in. Unfortunately both of these queries do full table scans. Two new indexes that would help: user_id and comment_author_email (this seemed like the most likely to get a hit vs. author, author_url).
#12
@
10 years ago
- Component changed from Comments to Database
- Keywords 3.2-early removed
Related: #21435
#16
@
7 years ago
I found this ticket while looking for slow queries on a WooCommerce installation. This installation has about 2.5 million entries on wp_comments table (the majority of those entries are not WordPress comments but order notes, a custom comment type used by WooCommerce).
The query used by Akismet to get the number of approved comments made by a user (the same query that was previously mentioned in this ticket) takes about 1 second to run. On pages where this method is called multiple times, like the comments page and the dashboard, this significantly impacts the page load time. Adding a index to user_id field reduced the query time to about 0.001 seconds.
This sounds reasonable.
Got any idea how long the table upgrade to add those might take on a site with alot of comments versus how much quicker they will make the queries?