Make WordPress Core

Opened 10 years ago

Closed 10 years ago

#33871 closed enhancement (fixed)

comment_exists() results in a bad query

Reported by: apokalyptik's profile apokalyptik Owned by: boonebgorges's profile boonebgorges
Milestone: 4.4 Priority: normal
Severity: normal Version: 4.4
Component: Comments Keywords: has-patch
Focuses: performance Cc:

Description

The function comment_exists() which is located in src/wp-admin/includes/comment.php composes a query that checks on two fields which are not included in any indexes in the database: comment_author and comment_date.

I am unaware of how often this function is used "in the wild" but I do know that extensive use of it is made in certain importer plugins (including the WXR importer)

For small comments tables this is not a signifigant problem (which is probably why it's flown under the radar for so long.) But for very large sites with considerable numbers of comments calling this function will result in a full table scan.

Simply using comment_date_gmt instead of comment_date we get to make use of an index which should normally have very high cardinality (even on very large sites) making scanning on the found comment_author values a massively more performant operation.

Since I don't imagine it's safe to update the function itself for all sites I'm attatching a patch to add a new function, comment_exists_gmt() to migrate to as a nearly drop-in replacement (only requiring the obvious change of having to supply the gmt date specifically)

Attachments (2)

wp-admin--includes--comments.php.diff (1005 bytes) - added by apokalyptik 10 years ago.
add the function function comment_exists_gmt()
33871.diff (3.7 KB) - added by boonebgorges 10 years ago.

Download all attachments as: .zip

Change History (12)

@apokalyptik
10 years ago

add the function function comment_exists_gmt()

#1 @Viper007Bond
10 years ago

  • Component changed from Import to Comments
  • Focuses performance added
  • Keywords has-patch added

I think the PHPDoc should be updated to mention that comment_exists_gmt() should be used where possible due to it's considerable speed improvements.

Nice work. :)

#2 @DrewAPicture
10 years ago

  • Keywords needs-docs added

#3 @wonderboymusic
10 years ago

  • Milestone changed from Awaiting Review to 4.4
  • Owner set to boonebgorges
  • Status changed from new to assigned

#4 @boonebgorges
10 years ago

For the sake of brevity, what do we think about adding a $timezone parameter to comment_exists()? It would default to 'blog' - current behavior - but you could pass 'gmt' to look at comment_date_gmt instead. See get_lastcommentmodified() for a similar strategy.

#5 @boonebgorges
10 years ago

In 34456:

Add unit test for comment_exists().

See #33871.

#6 @boonebgorges
10 years ago

In 34457:

Fix comment_exists() unit test introduced in [34456].

See #33871.

@boonebgorges
10 years ago

#7 follow-up: @boonebgorges
10 years ago

  • Keywords needs-docs removed

33871.diff is what I have in mind. Thoughts, apokalyptik or Viper007Bond?

#8 in reply to: ↑ 7 @apokalyptik
10 years ago

Replying to boonebgorges:

33871.diff is what I have in mind. Thoughts, apokalyptik or Viper007Bond?

If you're going to go this route I think a ternary is needlessly complex. The following is more lines, but instantly understandable even to those that don't think in ternary (which is many people, and so a pay-it-forward to future maintainers). Huge +1 for tests, which are awesome!

$date_field  = 'comment_date';
if ( 'gmt' === $timezone ) {
    $date_field  = 'comment_date_gmt';
}

#9 @boonebgorges
10 years ago

I don't care about the ternary :) Thanks for the feedback.

#10 @boonebgorges
10 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 34460:

Allow comment_exists() to match based on GMT date.

The comment_date_gmt field of the wp_comments table is indexed, which makes
WHERE matches against the field much faster than against the unindexed
comment_date. For bulk operations like data import, the speed difference can
be meaningful.

We continue to default to 'blog' for $timezone, to preserve compatibility
with existing uses.

Props apokalyptik.
Fixes #33871.

Note: See TracTickets for help on using tickets.