Make WordPress Core

Opened 12 years ago

Closed 2 years ago

Last modified 16 months ago

#19901 closed enhancement (fixed)

Speeding up Dashboard and Comment moderation SQL load

Reported by: foliovision's profile FolioVision Owned by: spacedmonkey's profile spacedmonkey
Milestone: 6.0 Priority: normal
Severity: major Version: 3.3
Component: Comments Keywords: needs-testing has-patch commit assigned-for-commit add-to-field-guide
Focuses: performance Cc:

Description

The standard Wordpress function for counting the comments for Admin Bar and Dashboard named wp_count_comments is using a single SQL query with GROUP BY clause. That makes it slow on a large site with hundreds of thousands of comments.

SELECT comment_approved, COUNT(*) AS num_comments FROM wp_comments GROUP BY comment_approved;

This takes 0.3 seconds on our site with 400,000 comments. When there are 10 editors logged in, we can see increasing server load.

Our solution is to run 5 faster queries instead:

SELECT COUNT( comment_ID ) FROM wp_comments WHERE comment_approved = 'trash'
SELECT COUNT( comment_ID ) FROM wp_comments WHERE comment_approved = 'spam'
SELECT COUNT( comment_ID ) FROM wp_comments WHERE comment_approved = '0'
SELECT COUNT( comment_ID ) FROM wp_comments WHERE comment_approved = 'post-trash'
SELECT COUNT( comment_ID ) FROM wp_comments

Takes 0.042144 on the same site. The last query gets the number of all the comments, then we subtract the previous query totals to get number of approved comments.

On a database of 4 million comments the difference is 1.52 seconds for the original wp_count_comments and 0.01 seconds for our alternative count with 5 queries.

Here is a link to our quick piece of code which hooks to the required filter hook and replaces the original slow function wp_count_comments: http://foliovision.com/downloads/fv_wp_count_comments.php.txt

But this is a hack - it would be much better to fix this in core by replacing the existing slow queries with 5 fast ones and subtraction to get total approved comments.

This speedup can be very important on large sites, as often there are 10 or more writers and moderators working at the same time. What can happen with the existing code is that the slow count comments query can back up MySQL and then writers can no longer save or open posts to edit. They get very, very frustrated and even angry.

This fix will allow Wordpress to scale much larger on relatively modest hardware (no separate MySQL dual quad server).

Thanks for listening.

Martin

Attachments (4)

19901.diff (2.2 KB) - added by markjaquith 12 years ago.
union_count_comments.php (1.5 KB) - added by ComputerGuru 8 years ago.
Using UNION to update comment counts
19901.2.diff (7.6 KB) - added by FolioVision 7 years ago.
19901.3.diff (954 bytes) - added by peterwilsoncc 2 years ago.

Download all attachments as: .zip

Change History (45)

#1 @nacin
12 years ago

  • Keywords needs-patch added; wp_count_comments sql dashboard speed performance removed
  • Milestone changed from Awaiting Review to 3.4

Nice!

#2 @markjaquith
12 years ago

  • Owner set to markjaquith
  • Status changed from new to accepted

I validated these results. The large dataset was 4.7 million comments. It's much, much faster to query the individual approval statuses. It seems that statuses with more members take longer, so inferring the "1" result by subtracting the other results from the total is a nice speedup.

On a small site (50,000 comments), there was still a speedup, going from 50ms to 10ms. So I think we should do this for every site, not just sites with large numbers of comments.

I'm going ahead with a patch.

#3 follow-up: @ryan
12 years ago

Plugins can set comment_status to anything. I think we'd need to introduce comment status registration, like we do with post status.

#4 follow-up: @ryan
12 years ago

Is this with myisam or innodb? innodb will count the rows and possibly be slower.

#5 in reply to: ↑ 4 @ryan
12 years ago

Replying to ryan:

Is this with myisam or innodb? innodb will count the rows and possibly be slower.

For the last query that doesn't have a WHERE.

#6 @ryan
12 years ago

Also, a hack I've seen done for sites with millions of comments and high comment volume is to just not do the count queries. Comment counts become rather meaningless for such sites. Maybe worth a throttle or don't care switch. Maybe not.

#7 follow-up: @coffee2code
12 years ago

Replying to markjaquith:

I validated these results. The large dataset was 4.7 million comments. It's much, much faster to query the individual approval statuses. It seems that statuses with more members take longer, so inferring the "1" result by subtracting the other results from the total is a nice speedup.

Any performance difference in doing an IN() as opposed to separate queries?

SELECT COUNT( comment_ID ) FROM wp_comments WHERE comment_approved IN ( 'trash', 'spam', '0', 'post-trash' );

Could then throw a filter on the comment statuses prior to inclusion in the IN() so non-public comments can be omitted. Which somewhat ties in with...

Replying to ryan:

Plugins can set comment_status to anything. I think we'd need to introduce comment status registration, like we do with post status.

+100

#8 in reply to: ↑ 3 @FolioVision
12 years ago

Replying to ryan:

Plugins can set comment_status to anything. I think we'd need to introduce comment status registration, like we do with post status.

Even if there are alternative comment_status states, it shouldn't make much difference as those alternative states are not counted anyway in the core stats. But definitely it would be a better idea to have registration for comment_status.

Alec

#9 in reply to: ↑ 7 ; follow-up: @markjaquith
12 years ago

Replying to ryan:

Plugins can set comment_status to anything. I think we'd need to introduce comment status registration, like we do with post status.

Could do. But until then, it's not a big deal if the "approved" number contains any custom ones. We really don't support custom comment_approved statuses... I think I tried doing it once and there were multiple places where it broke. So I don't think that concern should hold us back on improving count performance.

Replying to ryan:

Is this with myisam or innodb? innodb will count the rows and possibly be slower.
For the last query that doesn't have a WHERE.

Tested on standard MyISAM, which is what we should be optimizing for. Can you test on one of the big wpcom InnoDB comment tables and post numbers? Be sure to query with SELECT SQL_NO_CACHE ... to disable qcache.

Replying to ryan:

Also, a hack I've seen done for sites with millions of comments and high comment volume is to just not do the count queries. Comment counts become rather meaningless for such sites. Maybe worth a throttle or don't care switch. Maybe not.

We already have one:

	$stats = apply_filters('wp_count_comments', array(), $post_id);
	if ( !empty($stats) )
		return $stats;

Replying to coffee2code:

Any performance difference in doing an IN() as opposed to separate queries?

SELECT COUNT( comment_ID ) FROM wp_comments WHERE comment_approved IN ( 'trash', 'spam', '0', 'post-trash' );

Could then throw a filter on the comment statuses prior to inclusion in the IN() so non-public comments can be omitted. Which somewhat ties in with...

That only gives you the total. We need the count for each comment_approved status.

@markjaquith
12 years ago

#10 @markjaquith
12 years ago

  • Keywords has-patch 2nd-opinion needs-testing added; needs-patch removed
  • Severity changed from critical to major

Took a swing at it.

#11 @markjaquith
12 years ago

We can also speed up comment queries in "count" mode (i.e. the mode where it just returns the count) if we determine that the query type is one that is handled by wp_count_comments().

Example, within WP_Comment_Query::query()

		if ( $count ) {
			if ( $clauses['join']   == '' &&
			     $clauses['where']  == "( comment_approved = '0' OR comment_approved = '1' )" &&
			     $clauses['limits'] == '' ) {
				// We can return a much faster result for this specific case (default view)
				return wp_count_comments()->approved + wp_count_comments()->moderated;
			}
			return $wpdb->get_var( $query );
		}

That query can also take a few seconds with a large number of queries.

#13 @ramiy
12 years ago

Related: #17275

#14 in reply to: ↑ 9 @ryan
12 years ago

Replying to ryan:

Is this with myisam or innodb? innodb will count the rows and possibly be slower.
For the last query that doesn't have a WHERE.

Tested on standard MyISAM, which is what we should be optimizing for. Can you test on one of the big wpcom InnoDB comment tables and post numbers? Be sure to query with SELECT SQL_NO_CACHE ... to disable qcache.

With 2,488,928 rows and the comments table on SSDs, the old query ran in 1.2449 seconds. With the new way, the last query alone takes 1.9485 seconds. Given that someone with this many rows will probably be using innodb instead of myisam to avoid table locking difficulties, the new queries make things worse. Further, MySQL 5.6 defaults to innodb. Of course, no one uses that yet. :-)

That's not to say the patch shouldn't go in. Just keep it in mind.

#15 @wonderboymusic
12 years ago

I just ran these queries on eMusic's production database cluster. Our wp_comments table (InnoDB) has only 300,000 rows -

original query: 161ms
5 queries: 98.2ms

I think most large-scale MySQL users are defaulting to InnoDB, if for no reason other than row-level locking. Facebook swears by InnoDB and their MySQL perf team is focusing a lot on InnoDB row cache and InnoDB buffer pool for Timeline (which is stored in MySQL).

This is a nice improvement, but there are way more dangerous queries in core related to full-table scans of LONGTEXT columns. Our wp_usermeta table has 11,000,000 rows and gets pounded by them - but that's off-topic...

#16 @ryan
12 years ago

  • Milestone changed from 3.4 to Future Release

#17 @nacin
10 years ago

  • Component changed from Performance to Comments
  • Focuses performance added

#18 @chriscct7
8 years ago

  • Keywords dev-feedback needs-refresh added; has-patch 2nd-opinion removed
  • Severity changed from major to normal
  • Version changed from 3.3.1 to 3.3

#19 @ComputerGuru
8 years ago

I was going through my old plugins and found one I'd made of this ticket many, many years ago. I've updated the code and verified that it remains consistent with the current behavior of wp_count_comments as of WP 4.4.2.

I'm actually not sure about the WordPress policy when it comes to using MySQL UNIONs, but I've changed (at least for myself) the behavior of the patch to issue all 4 queries as one UNION ALL query to avoid the round-trip to the db.

The query as it now stands:

(SELECT COUNT( comment_ID ) FROM wp_comments WHERE comment_approved = 'trash') UNION ALL
(SELECT COUNT( comment_ID ) FROM wp_comments WHERE comment_approved = 'spam') UNION ALL
(SELECT COUNT( comment_ID ) FROM wp_comments WHERE comment_approved = '0') UNION ALL
(SELECT COUNT( comment_ID ) FROM wp_comments WHERE comment_approved = 'post-trash') UNION ALL
(SELECT COUNT( comment_ID ) FROM wp_comments);

I'm attaching the revised "plugin" to this ticket.

@ComputerGuru
8 years ago

Using UNION to update comment counts

#20 @ComputerGuru
8 years ago

Oh, I forgot to mention - I also cleaned up the code significantly by just copy-and-pasting the inner loop from the original get_comment_count() method.

#21 @FolioVision
8 years ago

  • Severity changed from normal to major

@markjaquith @nacin

Has our patch for large comment groups been integrated yet?

Seems a bit to have isolated a 150x performance and have it fail in the face of excuses like this:

Also, a hack I've seen done for sites with millions of comments and high comment volume is to just not do the count queries. Comment counts become rather meaningless for such sites. Maybe worth a throttle or don't care switch. Maybe not.

Why should we be hacking and reducing functionality instead of fixing when the fix is handed to you on a silver platter?

or even weaker arguments:

This is a nice improvement, but there are way more dangerous queries in core related to full-table scans of LONGTEXT columns.

A nice improvement should be ignored as there are worse queries. Well let's start by fixing this one and then fix the other ones too.

The waffling about a 50% performance hit on the non-default InnoDB configuration in the face of a 150x increase on the default MyISAM is also embarassing.

Gentlemen (there are no women on this ticket), 4 years ago we gave Core WordPress a 150x improvement and you still four years later can't implement it? It looks like Ryan's main role here is to shoot down external contributions.

Is it any wonder you have trouble getting coders not "in the club" to contribute to core WordPress?

Sincerely, Alec Kinnear

Version 0, edited 8 years ago by FolioVision (next)

#22 @FolioVision
8 years ago

Since InnoDB is the default database engine of MySQL since version 5.5.5 we did our test again on a site with 400,000 comments.

MyISAM:

The single query: 0.24-0.28 seconds
Using multiple queries (our fix): 0.10-0.14 seconds

InnoDB:

The single query: 0.15-0.20 seconds
Using multiple queries: 0.1 - 0.16 seconds

Another test on a site with 826,000 comments running on WP Engine with InnoDB:

The single query: 0.26-0.30 seconds
Using multiple queries: 0.18 - 0.22 seconds

So that contradicts what @ryan posted: https://core.trac.wordpress.org/ticket/19901#comment:14 The solution with multiple queries is faster than a single query.

I'm also noticing there is another query which looks kind-of like a duplicate:

SELECT COUNT(*) FROM wp_comments  WHERE ( ( comment_approved = '0' OR comment_approved = '1' ) )  ORDER BY wp_comments.comment_date_gmt DESC

It's called by WP_Comment_Query. I'm not sure if it's used for paging or something like that. Normally it doesn't slow down the site much, as that table is already in cache, but it's something to look into.

Thanks,
Martin Vicenik

Last edited 8 years ago by FolioVision (previous) (diff)

#23 @jb510
8 years ago

Just another data point (wp_comments w/ 280,000 rows)
The existing single query took 0.14s
The multiple queries with unions took 0.09s

InnoDB table and MySQL 5.6

Last edited 8 years ago by jb510 (previous) (diff)

#24 @SergeyBiryukov
8 years ago

Needs a refreshed patch after [33822].

@FolioVision
7 years ago

#25 @FolioVision
7 years ago

  • Keywords needs-refresh removed

In the patch which I just added the current logic for comment counting was moved into the if ( $post_id > 0 ) { statement to keep the logic the same when querying the number of comments for a single post and in else there is the new logic for a global comment count - which uses multiple SQL queries as discussed.

I tested the old code (using group by) and the comments were counted in 0.415 seconds. With the new code it's 0.267 seconds. I did 5 test repetitions for each case using MySQL with InnoDB engine and no SQL caching (we can't count on it as on a live website there are query results to store there for front-end etc.). The number of comments was 916,480. On a server with high load these numbers would jump up much higher, so the savings here worth it - imagine it's 2.67 seconds instead of 4.15 seconds or worse.

When using with a specific $post_id there are no speed benefits, so it's probably better to just keep the SQL count down for it, as this function might be also used also in front-end on comment submission etc.

#26 @Znuff
4 years ago

I've been scouring the whole trac for this particular issue, and I'm surprised that after all these years (this ticket is _8_ years old, #32366 is "only" 4, and there are others that are around 5-6 years old).

Once again, I am baffled that nobody really wants to tackle this issue and fix it in the core.

Right now I am working on a large-ish site with around ~1mil comments, and I've stumbled upon this specific part of the dashboard.

More specifically, I'm surprised by this piece of code:

<?php
/**
 * Add edit comments link with awaiting moderation count bubble.
 *
 * @since 3.1.0
 *
 * @param WP_Admin_Bar $wp_admin_bar
 */
function wp_admin_bar_comments_menu( $wp_admin_bar ) {
  if ( ! current_user_can( 'edit_posts' ) ) {
    return;
  }

  $awaiting_mod  = wp_count_comments();
  $awaiting_mod  = $awaiting_mod->moderated;
  $awaiting_text = sprintf(
    /* translators: %s: number of comments in moderation */
    _n( '%s Comment in moderation', '%s Comments in moderation', $awaiting_mod ),
    number_format_i18n( $awaiting_mod )
  );

  $icon   = '<span class="ab-icon"></span>';
  $title  = '<span class="ab-label awaiting-mod pending-count count-' . $awaiting_mod . '" aria-hidden="true">' . number_format_i18n( $awaiting_mod ) . '</span>';
  $title .= '<span class="screen-reader-text comments-in-moderation-text">' . $awaiting_text . '</span>';

  $wp_admin_bar->add_menu(
    array(
      'id'    => 'comments',
      'title' => $icon . $title,
      'href'  => admin_url( 'edit-comments.php' ),
    )
  );
}

So wp_count_comments() gets called only for one thing and one thing in particular in this case: getting the moderated (pending) comments. This seems to be a rather wasteful query, when you could simply do:

<?php
global $wpdb;
$awaiting_mod  = $wpdb->get_var("SELECT comment_approved, count(*) as total from $wpdb->comments where comment_approved='0';");

And shave around 99% of the query time.

#27 @FolioVision
4 years ago

Hello @Znuff

it's been 8 years, so lets do a quick check if this ticket makes sense. The web hosts have evolved and the SQL performance is much better with the SSD drives that with the spinning drives 8 years ago (or course that's not a reason to not run bad queries).

Looking at the WordPress source for get_comment_count() https://github.com/WordPress/WordPress/blob/f545bb3f634c8e949db9051bd88924df592c3337/wp-includes/comment.php#L390-L397 I see it still runs:

SELECT comment_approved, COUNT( * ) AS total FROM wp_comments GROUP BY comment_approved;

When I run that without SQL cache

SELECT sql_no_cache comment_approved, COUNT( * ) AS total FROM wp_5_comments GROUP BY comment_approved;

I tested this again o 3 website:

1) Website with 1,624,487.

...it takes 0.55 seconds.

When I run all the individual queries as mentioned in the first post in this ticket I get times:

  • 0.01 for trash
  • 0.00 for spam
  • 0.00 for unapproved
  • 0.00 for post-trash
  • 0.00 for the total count

So the difference is huge, although I have to admin there are not many comments in trash or spam.

2) I tried on another site with 713,245 comments and that query with group by takes 0.31 s. Then with the individual queries I get these times:

  • 0.17 for trash
  • 0.00 for spam
  • 0.00 for unapproved
  • 0.00 for post-trash
  • 0.00 for the total count

Which is also better.

3) I tried on yet another site with comments and that query with group by takes 0.13 s. Then with the individual queries I get these times:

  • 0.01 for trash
  • 0.00 for spam
  • 0.00 for unapproved
  • 0.00 for post-trash
  • 0.07 for the total count

Which is a bit better.

Still, my patch from 2 years ago was not accepted.

Last edited 4 years ago by FolioVision (previous) (diff)

#28 @Rahe
3 years ago

Hello,

Same problem here, with 1 394 203 comments it add 500ms for each admin page load.
We have only two comments that are not approved and the realted query will take 0.00s from mysql cli return.

Last edited 3 years ago by Rahe (previous) (diff)

This ticket was mentioned in PR #2446 on WordPress/wordpress-develop by uday-kokitkar.


2 years ago
#29

  • Keywords has-patch added

#30 @uday17035
2 years ago

I have created PR with the required changes. @spacedmonkey could you please review it?

@markjaquith if you won't mind, may I own this ticket?

spacedmonkey commented on PR #2446:


2 years ago
#31

CC @markjaquith for review.

#32 @spacedmonkey
2 years ago

  • Keywords commit added; dev-feedback removed
  • Milestone changed from Future Release to 6.0

I have reviewed @uday17035 patch. It looks good to merge to me. I like the improve unit test.

I am going to move this ticket into 6.0 milestone in hope that it can be merged.

#33 @peterwilsoncc
2 years ago

  • Owner changed from markjaquith to spacedmonkey
  • Status changed from accepted to assigned

Reassigning this to you, Jonny, as it seems your most likely to commit when the time comes.

#34 @spacedmonkey
2 years ago

@peterwilsoncc I am happy with the patch. I am going to commit if you are happy.

#35 @peterwilsoncc
2 years ago

  • Keywords assigned-for-commit added

@spacedmonkey Go for it, I've hit approve on the PR and assigned you for commit.

I've run phpunit --filter Tests_Get_Comment_Count with the new tests with and without the patch applied and the results are identical, everything passes.

#36 @spacedmonkey
2 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 53036:

Comments: Improve performance of the wp_count_comments function.

Improve performance of the wp_count_comments function by replacing a complex query with multiple calls to the get_comments function. Passing the count parameter to the get_comments function results in a simple
count query that returns quickly. Using get_comments also means that query is cached and run through filters.

Props FolioVision, markjaquith, nacin, ryan, coffee2code, wonderboymusic, ComputerGuru, jb510, SergeyBiryukov, Znuff, Rahe, uday17035, spacedmonkey, peterwilsoncc.
Fixes #19901.

#37 @johnbillion
2 years ago

Trivia: With this optimisation now in place, some WordPress admin screens will perform zero database queries when a persistent object cache is in use and the cache is warm. Neat.

#38 @peterwilsoncc
2 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

@spacedmonkey Reopening this because I realised the docblock needs updating:

  • remove reference to always using live count
  • remove wpdb global (from docblock and within function)

Also does wp_count_comments() still need a cache or should a follow up be created to remove it in 6.1 now that get_comment_count() caches via the comment query?

#39 @peterwilsoncc
2 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 53225:

Comments: Improve accuracy of get_comment_count() docblock.

Remove reference to uncached database query now get_comment_count() uses WP_Comment_Query which contains caching. Remove reference to $wpdb global, it is no longer used.

Follow up to [53036].

Fixes #19901.

#40 @bph
23 months ago

  • Keywords add-to-field-guide added

#41 @desrosj
16 months ago

#31072 was marked as a duplicate.

Note: See TracTickets for help on using tickets.