Make WordPress Core

Opened 13 years ago

Last modified 5 years ago

#14711 new defect (bug)

Indexes for comment_author_email and user_id

Reported by: tellyworth's profile tellyworth Owned by:
Milestone: Priority: normal
Severity: normal Version: 3.0
Component: Database Keywords: needs-patch
Focuses: performance Cc:

Description

There are currently no indexes on any of the author columns in wp_comments. That means there's no efficient way to count the comments written by a given commenter, for example.

The enclosed patch adds two separate non-unique indexes, one each for comment_author_email and user_id.

Attachments (2)

comment-indexes-r15542.patch (970 bytes) - added by tellyworth 13 years ago.
comment-indexes-r15542-alt.patch (974 bytes) - added by tellyworth 13 years ago.
alternate patch with the comment_author_index size limited to 10 characters

Download all attachments as: .zip

Change History (19)

#1 follow-up: @westi
13 years ago

  • Cc westi added
  • Milestone changed from Awaiting Review to 3.1

This sounds reasonable.

Got any idea how long the table upgrade to add those might take on a site with alot of comments versus how much quicker they will make the queries?

#2 in reply to: ↑ 1 @tellyworth
13 years ago

Replying to westi:

This sounds reasonable.

Got any idea how long the table upgrade to add those might take on a site with alot of comments versus how much quicker they will make the queries?

Not the foggiest. I can see user_id being trivial. comment_author_email has the potential to be large in some circumstances. It might be worth limiting that index to 8 or 10 characters or thereabouts if that's a concern.

#3 follow-up: @mrmist
13 years ago

One thing to beware of with that would be the low cardinatlity on user_id on sites with few / a single author. You could end up imposing drag on a query that pulls a lot of columns based on that.

@tellyworth
13 years ago

alternate patch with the comment_author_index size limited to 10 characters

#4 in reply to: ↑ 3 ; follow-up: @tellyworth
13 years ago

Replying to mrmist:

One thing to beware of with that would be the low cardinatlity on user_id on sites with few / a single author. You could end up imposing drag on a query that pulls a lot of columns based on that.

There are already indexes on several columns that are or could be low cardinality (comment_approved, comment_parent, comment_post_ID). Will adding user_id make a significant difference?

#5 in reply to: ↑ 4 @mrmist
13 years ago

Replying to tellyworth:

There are already indexes on several columns that are or could be low cardinality (comment_approved, comment_parent, comment_post_ID). Will adding user_id make a significant difference?

Probably not, but then simply because we are already doing something isn't necessarily a good reason to do something else.

Note - I'm not dismissing the idea completely, just suggesting that impact of any new indexes should be tested across the various scenerios.

#6 @mdawaffe
13 years ago

  • Cc mdawaffe added

What are the actual queries you'd want to do here?

I imagine, for example, that you'd want to restrict by comment_approved, or order a list by date. In that case, it's not clear how much a single column index like that will help.

A query I would like to be able to do is: find the most recent comment by a given user/email.

SELECT * FROM wp_comments
    WHERE user_id = 1 AND comment_approved = 1
    ORDER BY comment_date_gmt DESC LIMIT 1

In that case, the best index would be the following.

user_id, comment_approved, comment_date_gmt

That index (and a similar one for comment_author_email) would also work for your COUNT queries (at least, the queries I'm imagining you want):

SELECT COUNT( * ) FROM wp_comments
    WHERE user_id = 1 AND comment_approved = 1;

SELECT COUNT( * ) FROM wp_comments
    WHERE comment_author_email = 'spammer@example.com'
    AND comment_approved = 'spam'

#7 @hakre
13 years ago

  • Keywords reporter-feedback added

@tellyworthy: Can you please prodvide example SQL queries you'd like to see supported (and reference the according operation in the application those queries are used in)?

#8 @tellyworth
13 years ago

@hakre: we'd like to use this in the Akismet plugin, see http://plugins.trac.wordpress.org/ticket/1198 and the corresponding changeset http://plugins.trac.wordpress.org/changeset/294390

That query will almost certainly change before the plugin release, but it'll give you an idea of what we're trying to do and demonstrate that there's a practical need.

#9 @josephscott
13 years ago

Code for this recently hit the Akismet WP plugin and it makes for queries that look like:

SELECT COUNT( * )
FROM wp_comments
WHERE user_id = 5
AND comment_approved = 1

and:

SELECT COUNT( * )
FROM wp_comments
WHERE comment_author_email = 'joseph@josephscott.org'
AND comment_author = 'Joseph Scott'
AND comment_author_url = 'http://josephscott.org/'
AND comment_approved = 1

The first is for comments left by users that were logged in, the second for those who weren't logged in. Unfortunately both of these queries do full table scans. Two new indexes that would help: user_id and comment_author_email (this seemed like the most likely to get a hit vs. author, author_url).

#10 @ryan
13 years ago

  • Keywords 3.2-early added
  • Milestone changed from 3.1 to Future Release

#11 @ocean90
13 years ago

  • Keywords reporter-feedback removed

#12 @pento
10 years ago

  • Component changed from Comments to Database
  • Keywords 3.2-early removed

Related: #21435

Last edited 10 years ago by pento (previous) (diff)

#13 @SergeyBiryukov
10 years ago

  • Focuses performance added

#14 @chriscct7
8 years ago

  • Keywords needs-patch added

#16 @rodrigosprimo
7 years ago

I found this ticket while looking for slow queries on a WooCommerce installation. This installation has about 2.5 million entries on wp_comments table (the majority of those entries are not WordPress comments but order notes, a custom comment type used by WooCommerce).

The query used by Akismet to get the number of approved comments made by a user (the same query that was previously mentioned in this ticket) takes about 1 second to run. On pages where this method is called multiple times, like the comments page and the dashboard, this significantly impacts the page load time. Adding a index to user_id field reduced the query time to about 0.001 seconds.

#17 @lukecavanagh
7 years ago

@rodrigosprimo

Also WooCommerce product reviews are stored in the comments table.

Note: See TracTickets for help on using tickets.