WordPress.org

Make WordPress Core

Opened 4 years ago

Last modified 5 months ago

#14711 new defect (bug)

Indexes for comment_author_email and user_id

Reported by: tellyworth Owned by:
Milestone: Future Release Priority: normal
Severity: normal Version: 3.0
Component: Database Keywords:
Focuses: performance Cc:

Description

There are currently no indexes on any of the author columns in wp_comments. That means there's no efficient way to count the comments written by a given commenter, for example.

The enclosed patch adds two separate non-unique indexes, one each for comment_author_email and user_id.

Attachments (2)

comment-indexes-r15542.patch (970 bytes) - added by tellyworth 4 years ago.
comment-indexes-r15542-alt.patch (974 bytes) - added by tellyworth 4 years ago.
alternate patch with the comment_author_index size limited to 10 characters

Download all attachments as: .zip

Change History (15)

comment:1 follow-up: westi4 years ago

  • Cc westi added
  • Milestone changed from Awaiting Review to 3.1

This sounds reasonable.

Got any idea how long the table upgrade to add those might take on a site with alot of comments versus how much quicker they will make the queries?

comment:2 in reply to: ↑ 1 tellyworth4 years ago

Replying to westi:

This sounds reasonable.

Got any idea how long the table upgrade to add those might take on a site with alot of comments versus how much quicker they will make the queries?

Not the foggiest. I can see user_id being trivial. comment_author_email has the potential to be large in some circumstances. It might be worth limiting that index to 8 or 10 characters or thereabouts if that's a concern.

comment:3 follow-up: mrmist4 years ago

One thing to beware of with that would be the low cardinatlity on user_id on sites with few / a single author. You could end up imposing drag on a query that pulls a lot of columns based on that.

tellyworth4 years ago

alternate patch with the comment_author_index size limited to 10 characters

comment:4 in reply to: ↑ 3 ; follow-up: tellyworth4 years ago

Replying to mrmist:

One thing to beware of with that would be the low cardinatlity on user_id on sites with few / a single author. You could end up imposing drag on a query that pulls a lot of columns based on that.

There are already indexes on several columns that are or could be low cardinality (comment_approved, comment_parent, comment_post_ID). Will adding user_id make a significant difference?

comment:5 in reply to: ↑ 4 mrmist4 years ago

Replying to tellyworth:

There are already indexes on several columns that are or could be low cardinality (comment_approved, comment_parent, comment_post_ID). Will adding user_id make a significant difference?

Probably not, but then simply because we are already doing something isn't necessarily a good reason to do something else.

Note - I'm not dismissing the idea completely, just suggesting that impact of any new indexes should be tested across the various scenerios.

comment:6 mdawaffe4 years ago

  • Cc mdawaffe added

What are the actual queries you'd want to do here?

I imagine, for example, that you'd want to restrict by comment_approved, or order a list by date. In that case, it's not clear how much a single column index like that will help.

A query I would like to be able to do is: find the most recent comment by a given user/email.

SELECT * FROM wp_comments
    WHERE user_id = 1 AND comment_approved = 1
    ORDER BY comment_date_gmt DESC LIMIT 1

In that case, the best index would be the following.

user_id, comment_approved, comment_date_gmt

That index (and a similar one for comment_author_email) would also work for your COUNT queries (at least, the queries I'm imagining you want):

SELECT COUNT( * ) FROM wp_comments
    WHERE user_id = 1 AND comment_approved = 1;

SELECT COUNT( * ) FROM wp_comments
    WHERE comment_author_email = 'spammer@example.com'
    AND comment_approved = 'spam'

comment:7 hakre4 years ago

  • Keywords reporter-feedback added

@tellyworthy: Can you please prodvide example SQL queries you'd like to see supported (and reference the according operation in the application those queries are used in)?

comment:8 tellyworth4 years ago

@hakre: we'd like to use this in the Akismet plugin, see http://plugins.trac.wordpress.org/ticket/1198 and the corresponding changeset http://plugins.trac.wordpress.org/changeset/294390

That query will almost certainly change before the plugin release, but it'll give you an idea of what we're trying to do and demonstrate that there's a practical need.

comment:9 josephscott4 years ago

Code for this recently hit the Akismet WP plugin and it makes for queries that look like:

SELECT COUNT( * )
FROM wp_comments
WHERE user_id = 5
AND comment_approved = 1

and:

SELECT COUNT( * )
FROM wp_comments
WHERE comment_author_email = 'joseph@josephscott.org'
AND comment_author = 'Joseph Scott'
AND comment_author_url = 'http://josephscott.org/'
AND comment_approved = 1

The first is for comments left by users that were logged in, the second for those who weren't logged in. Unfortunately both of these queries do full table scans. Two new indexes that would help: user_id and comment_author_email (this seemed like the most likely to get a hit vs. author, author_url).

comment:10 ryan4 years ago

  • Keywords 3.2-early added
  • Milestone changed from 3.1 to Future Release

comment:11 ocean904 years ago

  • Keywords reporter-feedback removed

comment:12 pento5 months ago

  • Component changed from Comments to Database
  • Keywords 3.2-early removed

Related: #21435

Last edited 5 months ago by pento (previous) (diff)

comment:13 SergeyBiryukov5 months ago

  • Focuses performance added
Note: See TracTickets for help on using tickets.