WordPress.org

Make WordPress Core

Opened 4 years ago

Last modified 5 days ago

#19901 accepted enhancement

Speeding up Dashboard and Comment moderation SQL load

Reported by: FolioVision Owned by: markjaquith
Milestone: Future Release Priority: normal
Severity: major Version: 3.3
Component: Comments Keywords: needs-testing dev-feedback needs-refresh
Focuses: performance Cc:

Description

The standard Wordpress function for counting the comments for Admin Bar and Dashboard named wp_count_comments is using a single SQL query with GROUP BY clause. That makes it slow on a large site with hundreds of thousands of comments.

SELECT comment_approved, COUNT(*) AS num_comments FROM wp_comments GROUP BY comment_approved;

This takes 0.3 seconds on our site with 400,000 comments. When there are 10 editors logged in, we can see increasing server load.

Our solution is to run 5 faster queries instead:

SELECT COUNT( comment_ID ) FROM wp_comments WHERE comment_approved = 'trash'
SELECT COUNT( comment_ID ) FROM wp_comments WHERE comment_approved = 'spam'
SELECT COUNT( comment_ID ) FROM wp_comments WHERE comment_approved = '0'
SELECT COUNT( comment_ID ) FROM wp_comments WHERE comment_approved = 'post-trash'
SELECT COUNT( comment_ID ) FROM wp_comments

Takes 0.042144 on the same site. The last query gets the number of all the comments, then we subtract the previous query totals to get number of approved comments.

On a database of 4 million comments the difference is 1.52 seconds for the original wp_count_comments and 0.01 seconds for our alternative count with 5 queries.

Here is a link to our quick piece of code which hooks to the required filter hook and replaces the original slow function wp_count_comments: http://foliovision.com/downloads/fv_wp_count_comments.php.txt

But this is a hack - it would be much better to fix this in core by replacing the existing slow queries with 5 fast ones and subtraction to get total approved comments.

This speedup can be very important on large sites, as often there are 10 or more writers and moderators working at the same time. What can happen with the existing code is that the slow count comments query can back up MySQL and then writers can no longer save or open posts to edit. They get very, very frustrated and even angry.

This fix will allow Wordpress to scale much larger on relatively modest hardware (no separate MySQL dual quad server).

Thanks for listening.

Martin

Attachments (2)

19901.diff (2.2 KB) - added by markjaquith 4 years ago.
union_count_comments.php (1.5 KB) - added by ComputerGuru 5 months ago.
Using UNION to update comment counts

Download all attachments as: .zip

Change History (26)

#1 @nacin
4 years ago

  • Keywords needs-patch added; wp_count_comments sql dashboard speed performance removed
  • Milestone changed from Awaiting Review to 3.4

Nice!

#2 @markjaquith
4 years ago

  • Owner set to markjaquith
  • Status changed from new to accepted

I validated these results. The large dataset was 4.7 million comments. It's much, much faster to query the individual approval statuses. It seems that statuses with more members take longer, so inferring the "1" result by subtracting the other results from the total is a nice speedup.

On a small site (50,000 comments), there was still a speedup, going from 50ms to 10ms. So I think we should do this for every site, not just sites with large numbers of comments.

I'm going ahead with a patch.

#3 follow-up: @ryan
4 years ago

Plugins can set comment_status to anything. I think we'd need to introduce comment status registration, like we do with post status.

#4 follow-up: @ryan
4 years ago

Is this with myisam or innodb? innodb will count the rows and possibly be slower.

#5 in reply to: ↑ 4 @ryan
4 years ago

Replying to ryan:

Is this with myisam or innodb? innodb will count the rows and possibly be slower.

For the last query that doesn't have a WHERE.

#6 @ryan
4 years ago

Also, a hack I've seen done for sites with millions of comments and high comment volume is to just not do the count queries. Comment counts become rather meaningless for such sites. Maybe worth a throttle or don't care switch. Maybe not.

#7 follow-up: @coffee2code
4 years ago

Replying to markjaquith:

I validated these results. The large dataset was 4.7 million comments. It's much, much faster to query the individual approval statuses. It seems that statuses with more members take longer, so inferring the "1" result by subtracting the other results from the total is a nice speedup.

Any performance difference in doing an IN() as opposed to separate queries?

SELECT COUNT( comment_ID ) FROM wp_comments WHERE comment_approved IN ( 'trash', 'spam', '0', 'post-trash' );

Could then throw a filter on the comment statuses prior to inclusion in the IN() so non-public comments can be omitted. Which somewhat ties in with...

Replying to ryan:

Plugins can set comment_status to anything. I think we'd need to introduce comment status registration, like we do with post status.

+100

#8 in reply to: ↑ 3 @FolioVision
4 years ago

Replying to ryan:

Plugins can set comment_status to anything. I think we'd need to introduce comment status registration, like we do with post status.

Even if there are alternative comment_status states, it shouldn't make much difference as those alternative states are not counted anyway in the core stats. But definitely it would be a better idea to have registration for comment_status.

Alec

#9 in reply to: ↑ 7 ; follow-up: @markjaquith
4 years ago

Replying to ryan:

Plugins can set comment_status to anything. I think we'd need to introduce comment status registration, like we do with post status.

Could do. But until then, it's not a big deal if the "approved" number contains any custom ones. We really don't support custom comment_approved statuses... I think I tried doing it once and there were multiple places where it broke. So I don't think that concern should hold us back on improving count performance.

Replying to ryan:

Is this with myisam or innodb? innodb will count the rows and possibly be slower.
For the last query that doesn't have a WHERE.

Tested on standard MyISAM, which is what we should be optimizing for. Can you test on one of the big wpcom InnoDB comment tables and post numbers? Be sure to query with SELECT SQL_NO_CACHE ... to disable qcache.

Replying to ryan:

Also, a hack I've seen done for sites with millions of comments and high comment volume is to just not do the count queries. Comment counts become rather meaningless for such sites. Maybe worth a throttle or don't care switch. Maybe not.

We already have one:

	$stats = apply_filters('wp_count_comments', array(), $post_id);
	if ( !empty($stats) )
		return $stats;

Replying to coffee2code:

Any performance difference in doing an IN() as opposed to separate queries?

SELECT COUNT( comment_ID ) FROM wp_comments WHERE comment_approved IN ( 'trash', 'spam', '0', 'post-trash' );

Could then throw a filter on the comment statuses prior to inclusion in the IN() so non-public comments can be omitted. Which somewhat ties in with...

That only gives you the total. We need the count for each comment_approved status.

@markjaquith
4 years ago

#10 @markjaquith
4 years ago

  • Keywords has-patch 2nd-opinion needs-testing added; needs-patch removed
  • Severity changed from critical to major

Took a swing at it.

#11 @markjaquith
4 years ago

We can also speed up comment queries in "count" mode (i.e. the mode where it just returns the count) if we determine that the query type is one that is handled by wp_count_comments().

Example, within WP_Comment_Query::query()

		if ( $count ) {
			if ( $clauses['join']   == '' &&
			     $clauses['where']  == "( comment_approved = '0' OR comment_approved = '1' )" &&
			     $clauses['limits'] == '' ) {
				// We can return a much faster result for this specific case (default view)
				return wp_count_comments()->approved + wp_count_comments()->moderated;
			}
			return $wpdb->get_var( $query );
		}

That query can also take a few seconds with a large number of queries.

#13 @ramiy
4 years ago

Related: #17275

#14 in reply to: ↑ 9 @ryan
4 years ago

Replying to ryan:

Is this with myisam or innodb? innodb will count the rows and possibly be slower.
For the last query that doesn't have a WHERE.

Tested on standard MyISAM, which is what we should be optimizing for. Can you test on one of the big wpcom InnoDB comment tables and post numbers? Be sure to query with SELECT SQL_NO_CACHE ... to disable qcache.

With 2,488,928 rows and the comments table on SSDs, the old query ran in 1.2449 seconds. With the new way, the last query alone takes 1.9485 seconds. Given that someone with this many rows will probably be using innodb instead of myisam to avoid table locking difficulties, the new queries make things worse. Further, MySQL 5.6 defaults to innodb. Of course, no one uses that yet. :-)

That's not to say the patch shouldn't go in. Just keep it in mind.

#15 @wonderboymusic
4 years ago

I just ran these queries on eMusic's production database cluster. Our wp_comments table (InnoDB) has only 300,000 rows -

original query: 161ms
5 queries: 98.2ms

I think most large-scale MySQL users are defaulting to InnoDB, if for no reason other than row-level locking. Facebook swears by InnoDB and their MySQL perf team is focusing a lot on InnoDB row cache and InnoDB buffer pool for Timeline (which is stored in MySQL).

This is a nice improvement, but there are way more dangerous queries in core related to full-table scans of LONGTEXT columns. Our wp_usermeta table has 11,000,000 rows and gets pounded by them - but that's off-topic...

#16 @ryan
4 years ago

  • Milestone changed from 3.4 to Future Release

#17 @nacin
2 years ago

  • Component changed from Performance to Comments
  • Focuses performance added

#18 @chriscct7
9 months ago

  • Keywords dev-feedback needs-refresh added; has-patch 2nd-opinion removed
  • Severity changed from major to normal
  • Version changed from 3.3.1 to 3.3

#19 @ComputerGuru
5 months ago

I was going through my old plugins and found one I'd made of this ticket many, many years ago. I've updated the code and verified that it remains consistent with the current behavior of wp_count_comments as of WP 4.4.2.

I'm actually not sure about the WordPress policy when it comes to using MySQL UNIONs, but I've changed (at least for myself) the behavior of the patch to issue all 4 queries as one UNION ALL query to avoid the round-trip to the db.

The query as it now stands:

(SELECT COUNT( comment_ID ) FROM wp_comments WHERE comment_approved = 'trash') UNION ALL
(SELECT COUNT( comment_ID ) FROM wp_comments WHERE comment_approved = 'spam') UNION ALL
(SELECT COUNT( comment_ID ) FROM wp_comments WHERE comment_approved = '0') UNION ALL
(SELECT COUNT( comment_ID ) FROM wp_comments WHERE comment_approved = 'post-trash') UNION ALL
(SELECT COUNT( comment_ID ) FROM wp_comments);

I'm attaching the revised "plugin" to this ticket.

@ComputerGuru
5 months ago

Using UNION to update comment counts

#20 @ComputerGuru
5 months ago

Oh, I forgot to mention - I also cleaned up the code significantly by just copy-and-pasting the inner loop from the original get_comment_count() method.

#21 @FolioVision
11 days ago

  • Severity changed from normal to major

@markjaquith @nacin

Has our patch for large comment groups been integrated yet?

Seems a bit to have isolated a 150x performance and have it fail in the face of excuses like this:

Also, a hack I've seen done for sites with millions of comments and high comment volume is to just not do the count queries. Comment counts become rather meaningless for such sites. Maybe worth a throttle or don't care switch. Maybe not.

Why should we be hacking and reducing functionality instead of fixing when the fix is handed to you on a silver platter?

or even weaker arguments:

This is a nice improvement, but there are way more dangerous queries in core related to full-table scans of LONGTEXT columns.

A nice improvement should be ignored as there are worse queries. Well let's start by fixing this one and then fix the other ones too.

The waffling about a 50% performance hit on the non-default InnoDB configuration in the face of a 150x increase on the default MyISAM is also embarassing.

Gentlemen (there are no women on this ticket), 4 years ago we gave Core WordPress a 150x improvement and you still four years later can't implement it?

Is it any wonder you have trouble getting coders not "in the club" to contribute to core WordPress? If I sound frustrated, I am. I've spent ten years of my life working on WordPress and we can't improve performance and security enough due to silly internal bickering like this.

Sincerely, Alec Kinnear

Last edited 11 days ago by FolioVision (previous) (diff)

#22 @FolioVision
10 days ago

Since InnoDB is the default database engine of MySQL since version 5.5.5 we did our test again on a site with 400,000 comments.

MyISAM:

The single query: 0.24-0.28 seconds
Using multiple queries (our fix): 0.10-0.14 seconds

InnoDB:

The single query: 0.15-0.20 seconds
Using multiple queries: 0.1 - 0.16 seconds

Another test on a site with 826,000 comments running on WP Engine with InnoDB:

The single query: 0.26-0.30 seconds
Using multiple queries: 0.18 - 0.22 seconds

So that contradicts what @ryan posted: https://core.trac.wordpress.org/ticket/19901#comment:14 The solution with multiple queries is faster than a single query.

I'm also noticing there is another query which looks kind-of like a duplicate:

SELECT COUNT(*) FROM wp_comments  WHERE ( ( comment_approved = '0' OR comment_approved = '1' ) )  ORDER BY wp_comments.comment_date_gmt DESC

It's called by WP_Comment_Query. I'm not sure if it's used for paging or something like that. Normally it doesn't slow down the site much, as that table is already in cache, but it's something to look into.

Thanks,
Martin Vicenik

Last edited 10 days ago by FolioVision (previous) (diff)

#23 @jb510
10 days ago

Just another data point (wp_comments w/ 280,000 rows)
The existing single query took 0.14s
The multiple queries with unions took 0.09s

InnoDB table and MySQL 5.6

Last edited 10 days ago by jb510 (previous) (diff)

#24 @SergeyBiryukov
5 days ago

Needs a refreshed patch after [33822].

Note: See TracTickets for help on using tickets.