WordPress.org

Make WordPress Core

Opened 2 years ago

Last modified 3 months ago

#19901 accepted enhancement

Speeding up Dashboard and Comment moderation SQL load

Reported by: FolioVision Owned by: markjaquith
Milestone: Future Release Priority: normal
Severity: major Version: 3.3.1
Component: Comments Keywords: has-patch 2nd-opinion needs-testing
Focuses: performance Cc:

Description

The standard Wordpress function for counting the comments for Admin Bar and Dashboard named wp_count_comments is using a single SQL query with GROUP BY clause. That makes it slow on a large site with hundreds of thousands of comments.

SELECT comment_approved, COUNT(*) AS num_comments FROM wp_comments GROUP BY comment_approved;

This takes 0.3 seconds on our site with 400,000 comments. When there are 10 editors logged in, we can see increasing server load.

Our solution is to run 5 faster queries instead:

SELECT COUNT( comment_ID ) FROM wp_comments WHERE comment_approved = 'trash'
SELECT COUNT( comment_ID ) FROM wp_comments WHERE comment_approved = 'spam'
SELECT COUNT( comment_ID ) FROM wp_comments WHERE comment_approved = '0'
SELECT COUNT( comment_ID ) FROM wp_comments WHERE comment_approved = 'post-trash'
SELECT COUNT( comment_ID ) FROM wp_comments

Takes 0.042144 on the same site. The last query gets the number of all the comments, then we subtract the previous query totals to get number of approved comments.

On a database of 4 million comments the difference is 1.52 seconds for the original wp_count_comments and 0.01 seconds for our alternative count with 5 queries.

Here is a link to our quick piece of code which hooks to the required filter hook and replaces the original slow function wp_count_comments: http://foliovision.com/downloads/fv_wp_count_comments.php.txt

But this is a hack - it would be much better to fix this in core by replacing the existing slow queries with 5 fast ones and subtraction to get total approved comments.

This speedup can be very important on large sites, as often there are 10 or more writers and moderators working at the same time. What can happen with the existing code is that the slow count comments query can back up MySQL and then writers can no longer save or open posts to edit. They get very, very frustrated and even angry.

This fix will allow Wordpress to scale much larger on relatively modest hardware (no separate MySQL dual quad server).

Thanks for listening.

Martin

Attachments (1)

19901.diff (2.2 KB) - added by markjaquith 2 years ago.

Download all attachments as: .zip

Change History (18)

comment:1 nacin2 years ago

  • Keywords needs-patch added; wp_count_comments sql dashboard speed performance removed
  • Milestone changed from Awaiting Review to 3.4

Nice!

comment:2 markjaquith2 years ago

  • Owner set to markjaquith
  • Status changed from new to accepted

I validated these results. The large dataset was 4.7 million comments. It's much, much faster to query the individual approval statuses. It seems that statuses with more members take longer, so inferring the "1" result by subtracting the other results from the total is a nice speedup.

On a small site (50,000 comments), there was still a speedup, going from 50ms to 10ms. So I think we should do this for every site, not just sites with large numbers of comments.

I'm going ahead with a patch.

comment:3 follow-up: ryan2 years ago

Plugins can set comment_status to anything. I think we'd need to introduce comment status registration, like we do with post status.

comment:4 follow-up: ryan2 years ago

Is this with myisam or innodb? innodb will count the rows and possibly be slower.

comment:5 in reply to: ↑ 4 ryan2 years ago

Replying to ryan:

Is this with myisam or innodb? innodb will count the rows and possibly be slower.

For the last query that doesn't have a WHERE.

comment:6 ryan2 years ago

Also, a hack I've seen done for sites with millions of comments and high comment volume is to just not do the count queries. Comment counts become rather meaningless for such sites. Maybe worth a throttle or don't care switch. Maybe not.

comment:7 follow-up: coffee2code2 years ago

Replying to markjaquith:

I validated these results. The large dataset was 4.7 million comments. It's much, much faster to query the individual approval statuses. It seems that statuses with more members take longer, so inferring the "1" result by subtracting the other results from the total is a nice speedup.

Any performance difference in doing an IN() as opposed to separate queries?

SELECT COUNT( comment_ID ) FROM wp_comments WHERE comment_approved IN ( 'trash', 'spam', '0', 'post-trash' );

Could then throw a filter on the comment statuses prior to inclusion in the IN() so non-public comments can be omitted. Which somewhat ties in with...

Replying to ryan:

Plugins can set comment_status to anything. I think we'd need to introduce comment status registration, like we do with post status.

+100

comment:8 in reply to: ↑ 3 FolioVision2 years ago

Replying to ryan:

Plugins can set comment_status to anything. I think we'd need to introduce comment status registration, like we do with post status.

Even if there are alternative comment_status states, it shouldn't make much difference as those alternative states are not counted anyway in the core stats. But definitely it would be a better idea to have registration for comment_status.

Alec

comment:9 in reply to: ↑ 7 ; follow-up: markjaquith2 years ago

Replying to ryan:

Plugins can set comment_status to anything. I think we'd need to introduce comment status registration, like we do with post status.

Could do. But until then, it's not a big deal if the "approved" number contains any custom ones. We really don't support custom comment_approved statuses... I think I tried doing it once and there were multiple places where it broke. So I don't think that concern should hold us back on improving count performance.

Replying to ryan:

Is this with myisam or innodb? innodb will count the rows and possibly be slower.
For the last query that doesn't have a WHERE.

Tested on standard MyISAM, which is what we should be optimizing for. Can you test on one of the big wpcom InnoDB comment tables and post numbers? Be sure to query with SELECT SQL_NO_CACHE ... to disable qcache.

Replying to ryan:

Also, a hack I've seen done for sites with millions of comments and high comment volume is to just not do the count queries. Comment counts become rather meaningless for such sites. Maybe worth a throttle or don't care switch. Maybe not.

We already have one:

	$stats = apply_filters('wp_count_comments', array(), $post_id);
	if ( !empty($stats) )
		return $stats;

Replying to coffee2code:

Any performance difference in doing an IN() as opposed to separate queries?

SELECT COUNT( comment_ID ) FROM wp_comments WHERE comment_approved IN ( 'trash', 'spam', '0', 'post-trash' );

Could then throw a filter on the comment statuses prior to inclusion in the IN() so non-public comments can be omitted. Which somewhat ties in with...

That only gives you the total. We need the count for each comment_approved status.

markjaquith2 years ago

comment:10 markjaquith2 years ago

  • Keywords has-patch 2nd-opinion needs-testing added; needs-patch removed
  • Severity changed from critical to major

Took a swing at it.

comment:11 markjaquith2 years ago

We can also speed up comment queries in "count" mode (i.e. the mode where it just returns the count) if we determine that the query type is one that is handled by wp_count_comments().

Example, within WP_Comment_Query::query()

		if ( $count ) {
			if ( $clauses['join']   == '' &&
			     $clauses['where']  == "( comment_approved = '0' OR comment_approved = '1' )" &&
			     $clauses['limits'] == '' ) {
				// We can return a much faster result for this specific case (default view)
				return wp_count_comments()->approved + wp_count_comments()->moderated;
			}
			return $wpdb->get_var( $query );
		}

That query can also take a few seconds with a large number of queries.

comment:13 ramiy2 years ago

Related: #17275

comment:14 in reply to: ↑ 9 ryan2 years ago

Replying to ryan:

Is this with myisam or innodb? innodb will count the rows and possibly be slower.
For the last query that doesn't have a WHERE.

Tested on standard MyISAM, which is what we should be optimizing for. Can you test on one of the big wpcom InnoDB comment tables and post numbers? Be sure to query with SELECT SQL_NO_CACHE ... to disable qcache.

With 2,488,928 rows and the comments table on SSDs, the old query ran in 1.2449 seconds. With the new way, the last query alone takes 1.9485 seconds. Given that someone with this many rows will probably be using innodb instead of myisam to avoid table locking difficulties, the new queries make things worse. Further, MySQL 5.6 defaults to innodb. Of course, no one uses that yet. :-)

That's not to say the patch shouldn't go in. Just keep it in mind.

comment:15 wonderboymusic2 years ago

I just ran these queries on eMusic's production database cluster. Our wp_comments table (InnoDB) has only 300,000 rows -

original query: 161ms
5 queries: 98.2ms

I think most large-scale MySQL users are defaulting to InnoDB, if for no reason other than row-level locking. Facebook swears by InnoDB and their MySQL perf team is focusing a lot on InnoDB row cache and InnoDB buffer pool for Timeline (which is stored in MySQL).

This is a nice improvement, but there are way more dangerous queries in core related to full-table scans of LONGTEXT columns. Our wp_usermeta table has 11,000,000 rows and gets pounded by them - but that's off-topic...

comment:16 ryan2 years ago

  • Milestone changed from 3.4 to Future Release

comment:17 nacin3 months ago

  • Component changed from Performance to Comments
  • Focuses performance added
Note: See TracTickets for help on using tickets.