Opened 10 years ago
Closed 10 years ago
#33871 closed enhancement (fixed)
comment_exists() results in a bad query
Reported by: |
|
Owned by: |
|
---|---|---|---|
Milestone: | 4.4 | Priority: | normal |
Severity: | normal | Version: | 4.4 |
Component: | Comments | Keywords: | has-patch |
Focuses: | performance | Cc: |
Description
The function comment_exists()
which is located in src/wp-admin/includes/comment.php
composes a query that checks on two fields which are not included in any indexes in the database: comment_author
and comment_date
.
I am unaware of how often this function is used "in the wild" but I do know that extensive use of it is made in certain importer plugins (including the WXR importer)
For small comments tables this is not a signifigant problem (which is probably why it's flown under the radar for so long.) But for very large sites with considerable numbers of comments calling this function will result in a full table scan.
Simply using comment_date_gmt
instead of comment_date we get to make use of an index which should normally have very high cardinality (even on very large sites) making scanning on the found comment_author values a massively more performant operation.
Since I don't imagine it's safe to update the function itself for all sites I'm attatching a patch to add a new function, comment_exists_gmt()
to migrate to as a nearly drop-in replacement (only requiring the obvious change of having to supply the gmt date specifically)
Attachments (2)
Change History (12)
#1
@
10 years ago
- Component changed from Import to Comments
- Focuses performance added
- Keywords has-patch added
I think the PHPDoc should be updated to mention that comment_exists_gmt()
should be used where possible due to it's considerable speed improvements.
Nice work. :)
#3
@
10 years ago
- Milestone changed from Awaiting Review to 4.4
- Owner set to boonebgorges
- Status changed from new to assigned
#4
@
10 years ago
For the sake of brevity, what do we think about adding a $timezone
parameter to comment_exists()
? It would default to 'blog' - current behavior - but you could pass 'gmt' to look at comment_date_gmt
instead. See get_lastcommentmodified()
for a similar strategy.
#7
follow-up:
↓ 8
@
10 years ago
- Keywords needs-docs removed
33871.diff is what I have in mind. Thoughts, apokalyptik or Viper007Bond?
#8
in reply to:
↑ 7
@
10 years ago
Replying to boonebgorges:
33871.diff is what I have in mind. Thoughts, apokalyptik or Viper007Bond?
If you're going to go this route I think a ternary is needlessly complex. The following is more lines, but instantly understandable even to those that don't think in ternary (which is many people, and so a pay-it-forward to future maintainers). Huge +1 for tests, which are awesome!
$date_field = 'comment_date';
if ( 'gmt' === $timezone ) {
$date_field = 'comment_date_gmt';
}
add the function function comment_exists_gmt()