Make WordPress Core

Opened 11 months ago

Closed 7 months ago

Last modified 7 months ago

#58368 closed defect (bug) (fixed)

WordPress dashboard is very slow when there are many comments (and the database isn't great)

Reported by: guss77's profile Guss77 Owned by: spacedmonkey's profile spacedmonkey
Milestone: 6.4 Priority: normal
Severity: minor Version: 6.0
Component: Comments Keywords: has-patch has-unit-tests
Focuses: performance Cc:

Description

It seems the admin dashboard - as far as I can tell in every page, so it must be something in the side bar - is calling wp_count_comments(). That causes the following database query to be run:

SELECT COUNT(*)
FROM wp_xp7b48_comments
WHERE ( comment_approved = '1' )
ORDER BY wp_xp7b48_comments.comment_date_gmt DESC

... among other queries that are as problematic, but as we have several million approved comments on the side, this is the one that is slow - it takes > 10 seconds on our (very tiny) server.

The problem is that it doesn't have to be that way - there is absolutely no reason to have an ORDER BY clause in a COUNT(*) comment - that is just asking for the query to be needlessly slow.

I can workaround the issue by editing wp-includes/class-wp-comment-query.php and in get_comment_ids() change:

        $this->sql_clauses['orderby'] = $orderby;

to

    if ( !$this->query_vars['count'] )
        $this->sql_clauses['orderby'] = $orderby;

This reduces the run time for wp_count_comments() from > 20 seconds less than 0.5 seconds.

See attached screenshots from Query Monitor dev panel, showing slowness and results of workaround.

Attachments (9)

Screenshot_20230521_175108.png (89.5 KB) - added by Guss77 11 months ago.
Screenshot of Query Monitor showing the problematic query (this one is actually tame considering some other runs)
Screenshot_20230521_175335.png (117.4 KB) - added by Guss77 11 months ago.
Screenshot showing a very bad breakdown of get_comment_ids() queries using Query Monitor
Screenshot_20230521_175414.png (109.8 KB) - added by Guss77 11 months ago.
Screenshot showing a much improved breakdown of get_comment_ids() queries using Query Monitor, after workaround.
wp-comment-query-get-comment-count-no-orderby.patch (644 bytes) - added by Guss77 11 months ago.
Workaround as a patch - I really think this should be accepted as is.
wp-admin-comments-slow-queries-before-1.png (101.9 KB) - added by FolioVision 11 months ago.
Slow comment type drop down query with the current WordPress code.
wp-admin-comments-slow-queries-with-plugin-1.png (93.3 KB) - added by FolioVision 11 months ago.
Slow comment type drop down query with the remove-orderby-when-counting-comments.php mini-plugin
wp-admin-comments-slow-queries-58368-comment-count-no-orderby-1.png (99.8 KB) - added by FolioVision 11 months ago.
Slow comment type drop down query with the 58368-comment-count-no-orderby branch.
wp-admin-comments-slow-queries-wp-admin-comments-screen-performance-1.png (94.1 KB) - added by FolioVision 11 months ago.
Slow comment type drop down query with the wp-admin-comments-screen-performance branch
wp-admin-comments-slow-queries-wp-admin-comments-screen-performance-3.png (67.6 KB) - added by FolioVision 11 months ago.
Slow comment type drop down query with the wp-admin-comments-screen-performance branch - no longer slow

Download all attachments as: .zip

Change History (74)

@Guss77
11 months ago

Screenshot of Query Monitor showing the problematic query (this one is actually tame considering some other runs)

@Guss77
11 months ago

Screenshot showing a very bad breakdown of get_comment_ids() queries using Query Monitor

@Guss77
11 months ago

Screenshot showing a much improved breakdown of get_comment_ids() queries using Query Monitor, after workaround.

@Guss77
11 months ago

Workaround as a patch - I really think this should be accepted as is.

#1 @Guss77
11 months ago

#57168 was marked as a duplicate.

#2 @Guss77
11 months ago

Apparently I already opened a ticket about this issue, on 6.1.1, with a worse fix.

That's the problem - it is an issue that I keep needing to manually workaround every time I update WordPress.

#3 @Guss77
11 months ago

I added a PR - https://github.com/WordPress/wordpress-develop/pull/4481 - hopefully this will help with fixing this issue.

Thanks.

#4 @Guss77
11 months ago

  • Focuses performance added

This ticket was mentioned in PR #4481 on WordPress/wordpress-develop by guss77.


11 months ago
#5

If the caller wants a count, sorting just slows down the database (sometimes massively) without providing any more information.

Trac ticket: https://github.com/guss77/wordpress-develop/pull/1

EDIT by @audrasjb: Trac ticket: https://core.trac.wordpress.org/ticket/58368

#6 @davidbaumwald
11 months ago

  • Milestone changed from Awaiting Review to 6.3

@audrasjb commented on PR #4481:


11 months ago
#7

Thanks for the PR, I added the Trac ticket number in the description to make sure the PR appears in the related Trac ticket. Also, I added one commit to fix some WordPress Coding Standards issues.

#8 follow-up: @spacedmonkey
11 months ago

@Guss77 How many comments do you have on that site? I am finding this hard to replicate this locally.

#10 in reply to: ↑ 8 @Guss77
11 months ago

Replying to spacedmonkey:

@Guss77 How many comments do you have on that site? I am finding this hard to replicate this locally.

The website has > 5M comments, and is running both the database, web server and FPM-CGI daemon on a VPC that is equivalent to an AWS EC2 micro instance (1 heavily throttled Xeon 6xxx core w/ 1GB RAM). I would expect this will repro only on severely memory and CPU constrained implementations.

#11 follow-up: @spacedmonkey
11 months ago

@Guss77 Are you using object caching. A site with 5M should really be using object caching...

#12 @davidbaumwald
11 months ago

Just noting that this is common in larger WooCommerce sites. I think this is a good change, regardless if caching is in use or not.

#13 in reply to: ↑ 11 @Guss77
11 months ago

Replying to spacedmonkey:

@Guss77 Are you using object caching. A site with 5M should really be using object caching...

We're not using an object cache - its will use RAM that we cannot afford. Generally - IMHO - using an object cache to cache database results seems counterintuitive: the same RAM and CPU can better be used by your database server's internal caching which must be better - it can cache more effectively much closer to the data.

Regardless whether it is a good idea to use an object cache or not - it looks like disabling sorting in a query that absolutely does not need sorting, solves the problem, so if a small change with no adverse effects negates the need for complex and expensive infrastructure (and a first expensive query), why not just do that?

#14 follow-up: @westonruter
11 months ago

Skipping sorting when only querying for a count makes sense to me.

Per @joemcgill, the problematic code is the call to wp_count_comments() in wp-admin/menu.php.

This function uses the object cache to store the comment count for a post. If this were made into a transient instead, this would address the lack of persistent object cache.

Aside: @Guss77 another option for you would be to add a wp_count_comments filter in that same function. This would allow you to short-circuit the expensive query, and you could use the query that lacks the ORDER BY.

Lastly, I just wanted to note that in the AMP plugin we had this same problem of a count in the menu slowing down admin page loads, here specifically the count for AMP validation errors. We solved the problem by fetching the count over the REST API and then storing it in sessionStorage, and we only do the fetch once the menu item is actually shown (via IntersectionObserver): this would help in the case where the admin menu is collapsed. See the code here: https://github.com/ampproject/amp-wp/blob/develop/assets/src/amp-validation/counts/index.js

#15 in reply to: ↑ 14 @Guss77
11 months ago

Replying to westonruter:

Aside: @Guss77 another option for you would be to add a wp_count_comments filter in that same function. This would allow you to short-circuit the expensive query, and you could use the query that lacks the ORDER BY.

My current solution is to apply the patch manually to the installation of WordPress that I use - which will cause it to be undone if the next version is released without addressing this issue.

Hopefully, that will not be the case, but if so I will have to implement the filter function - though that solution will have to make a manual SQL query that mimics WP_Comment_Query->get_comment_ids() and will be vulnerable to bitrot due to future WordPress core changes - so, not a great solution.

Last edited 11 months ago by Guss77 (previous) (diff)

#16 @peterwilsoncc
11 months ago

I think using a transient would be problematic on large sites, a different transient would be created for each post. If set not to expire, then each of these transients would be auto-loaded.

I hacked together a mini-plugin using the comments_clauses filter to test the results with a lot of comments (@Guss77 this may suit your purpose rather than hacking core). Unfortunately generating millions of comments takes a while so I haven't been able to get some numbers.

I don't mind the proposed patch but would need to test to make sure it doesn't have unintended consequences. DB behavior without a ordering clauses can lead to unexpected results, as Jonny and I have seen on other tickets.

#17 @spacedmonkey
11 months ago

In get_comment_count we could change parameters passed to WP_Comment_Query to orderby => none. This would solve this issue. We could consider making this change to WP_Comment_Query to order orderby => none if fields => count, maybe in another ticket.

This ticket was mentioned in PR #4553 on WordPress/wordpress-develop by @peterwilsoncc.


11 months ago
#18

  • Keywords has-unit-tests added

Modifies all the WP_Comment_Query calls to remove the orderby if count === true.

I can never remember what comes from the GB repo, so it is possible src/wp-includes/blocks/comments.php should be changed upstream.

Trac ticket: https://core.trac.wordpress.org/ticket/58368

#19 follow-up: @peterwilsoncc
11 months ago

See the linked pull request as a proof of concept to remove the orderby from all the orderby clauses from counting queries in WP-Develop.

With 180K comments, I am not seeing a great deal of difference in the dashboard

#20 in reply to: ↑ 19 @Guss77
11 months ago

Replying to peterwilsoncc:

See the linked pull request as a proof of concept to remove the orderby from all the orderby clauses from counting queries in WP-Develop.

Thank you - this also looks like it will solve the problem (I'll be able to verify only much later).

With 180K comments, I am not seeing a great deal of difference in the dashboard

From my experience this issue only manifests when the hardware is very constrained (see above comments about this) and/or a very large comments table. I think the database issue is that the MySQL server can't load the entire comments table into memory - which it needs to do for efficient sorting, and indexing does not help (there actually is a good index on the table). On my system with the issue, loading the comments table will require several times the RAM available to the MySQL server.

#21 @FolioVision
11 months ago

Great work @Guss77. We ran across the same issue and propose a similar solution, except with database indexes.

With all due respect, those who do not have access to sites with a million+ comments (@peterwilsoncc, @spacedmonkey) probably should not be commenting on this issue. Hopelessly inefficient comment code doesn't affect you, we all know that.

I'd continue to recommend database indexing as it means that when these queries are necessary they are very efficient. For small databases, the space used is not much.

The idea that one must introduce object caching or redis or any ultra complex caching method (all of which have their own huge downsides) as soon as a WordPress site grows beyond toy size/personal weblog is both elitist and defeatist. We (Core WordPress developers) could do a lot more to allow WordPress to scale to medium size before requiring a fully handrolled install and dedicated server or worse multiple servers. I understand that many of us make our money (including Automattic) with custom solutions but are we so venial that we must cripple WordPress sites of even moderate size?

@Guss77's proposal not to sort if not necessary is such an obvious fix, it's hard to believe anyone would argue against it.

#22 follow-up: @peterwilsoncc
11 months ago

@FolioVision I don't think your comment is very helpful.

I think there is potential for improvement here, otherwise I wouldn't have taken the time to provide a mini-plugin to avoid hacking Core and the subsequent pull request.

That I am having trouble reproducing the issue is an indication I need assistance testing my patch on sites with a lot of comments.

You can help move this ticket forward by testing the patch: that will allow committers to validate whether the approach works or whether another approach is needed. Data that will help:

  • provide performance results of the existing queries in Core.
  • provide performance results of the proposed patches
  • it would be most helpful to test the results both with and without object caching enabled
  • provide the database version and engine you are testing with -- how databases process queries without an orderby is undefined so the results can differ with different version/engine combinations

When testing performance it's best to provide an average from multiple test runs, between 10 and 20. This will help account for environmental issues due to other processes running on the server.

@FolioVision
11 months ago

Slow comment type drop down query with the current WordPress code.

@FolioVision
11 months ago

Slow comment type drop down query with the remove-orderby-when-counting-comments.php mini-plugin

@FolioVision
11 months ago

Slow comment type drop down query with the 58368-comment-count-no-orderby branch.

@FolioVision
11 months ago

Slow comment type drop down query with the wp-admin-comments-screen-performance branch

@FolioVision
11 months ago

Slow comment type drop down query with the wp-admin-comments-screen-performance branch - no longer slow

#23 follow-up: @FolioVision
11 months ago

In our ticket #58488 we solve a problem with slow WP_Comments_List_Table::comment_type_dropdown() which is used for the comment type filter drop down menu on wp-admin -> Comments. This ticket was suggested as related.

So we did the performance testing with the different fixes posted in this ticket.

Each test was repeated 3 times, but it was running on our own staging server so we know nothing was interfering.

wp-admin-comments-slow-queries-before-1.png shows the performance with the current WordPress code. That slow query to figure out if the pingbacks are present took about 6.7 to 7 seconds.

wp-admin-comments-slow-queries-with-plugin-1.png shows performance with the remove-orderby-when-counting-comments.php mini-plugin - it took 7.6 to 8 seconds.

It does not help as WP_Comments_List_Table::comment_type_dropdown() which we are trying to fix does not count the comments, it only needs to see if there is any comment of the desired type. So that kind of a check (if the comments are being counted, then do not use order by) is not going to help.

wp-admin-comments-slow-queries-58368-comment-count-no-orderby-1.png shows performance with the 58368-comment-count-no-orderby branch: https://github.com/WordPress/wordpress-develop/pull/4553 The query would take 7.1 to 7.8 seconds that way.

In that pull request @peterwilsoncc tried to adjust the get_comments() calls in different places to remove the ordering where it's not necessary. But the exact single place where it matters for our issue was omitted: https://github.com/WordPress/wordpress-develop/pull/4571/files#diff-1018e4974931b2cea88b413dadc533d3cf535f977c7f3a5f310e0bc7376233a8R522

So our fix it still needed. This is how much faster it gets - 0.7 to seconds, see wp-admin-comments-slow-queries-wp-admin-comments-screen-performance-1.png.

Or it does not even appear in the list of slow queries, see wp-admin-comments-slow-queries-wp-admin-comments-screen-performance-3.png.

Please check the fix at #58488, it's one line of code and we also had to add the database index for comment_type on wp_comments table.

The server that we tested with is a 2 CPU 4 GB RAM VPS on DigitalOcean. Our database server is MySQL 5.7.37 and the database engine is InnoDB.

#24 @FolioVision
11 months ago

duplicate

Last edited 11 months ago by FolioVision (previous) (diff)

#25 follow-ups: @spacedmonkey
10 months ago

I think the biggest issue here is that comment_type does not have an index. I feel like it really should. This will likely fix these issues.

#26 in reply to: ↑ 25 @Guss77
10 months ago

Replying to spacedmonkey:

I think the biggest issue here is that comment_type does not have an index. I feel like it really should. This will likely fix these issues.

I have tried all the indexing options - indexing helps to quickly find the rows that you need instead of performing a full table scan, but when trying to load all approved comments, that's almost all of them (we have required logins and very few unapproved comments), and then you have to sort all of the rows - which is the actual performance problem (as disabling sorting immediately fixes the proble), and an index will never help you sort.

#27 in reply to: ↑ 25 @FolioVision
10 months ago

Replying to spacedmonkey:

I think the biggest issue here is that comment_type does not have an index.

In our testing just adding the index did not always solve the issue of the slow Comment Type drop down. It helped on a website with a dedicated MariaDB database server, but on a shared web host we also had to stop using ORDER BY to improve the performance.

Full details in our related ticket: #58488

#28 in reply to: ↑ 25 @Guss77
10 months ago

Replying to spacedmonkey:

I think the biggest issue here is that comment_type does not have an index.

To provide more actual info, on why indexing doesn't (and can't) help, here is how the comments table currently looks - with tons of indexes:

Create Table: CREATE TABLE `wp_xp7b48_comments` (
  `comment_ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `comment_post_ID` bigint(20) unsigned NOT NULL DEFAULT '0',
  `comment_author` text COLLATE utf8mb4_unicode_ci,
  `comment_author_email` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `comment_author_url` varchar(200) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `comment_author_IP` varchar(100) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `comment_date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `comment_date_gmt` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `comment_content` mediumtext COLLATE utf8mb4_unicode_ci,
  `comment_karma` int(11) NOT NULL DEFAULT '0',
  `comment_approved` varchar(20) COLLATE utf8mb4_unicode_ci DEFAULT '1',
  `comment_agent` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `comment_type` varchar(20) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'comment',
  `comment_parent` bigint(20) unsigned NOT NULL DEFAULT '0',
  `user_id` bigint(20) unsigned NOT NULL DEFAULT '0',
  `comment_subscribe` enum('Y','N') COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'N',
  `comment_mail_notify` tinyint(4) NOT NULL DEFAULT '0',
  PRIMARY KEY (`comment_ID`),
  KEY `comment_post_ID` (`comment_post_ID`),
  KEY `comment_approved_date_gmt` (`comment_approved`,`comment_date_gmt`),
  KEY `comment_date_gmt` (`comment_date_gmt`),
  KEY `comment_parent` (`comment_parent`),
  KEY `user_id` (`user_id`),
  KEY `comment_type` (`comment_type`),
  KEY `comment_approved` (`comment_approved`),
  KEY `comment_date` (`comment_date`),
  KEY `comment_author_email` (`comment_author_email`(10)),
  KEY `opt_last_commented_posts` (`comment_post_ID`,`comment_approved`,`comment_type`,`comment_ID`)
) ENGINE=MyISAM AUTO_INCREMENT=12253647 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

As you can see, there is an index on comment_type, but it isn't useful for the query in question - mostly because wp_count_comments() doesn't even look at comment_type - as shown in the ticket description.

Running EXPLAIN shows us that the query does use indexes - it just isn't helpful when you are loading literally 100% of the rows and then needs to sort them "in memory" on a server without enough memory:

mysql> explain SELECT COUNT(*) FROM wp_xp7b48_comments WHERE ( comment_approved = '1' )
  ORDER BY wp_xp7b48_comments.comment_date_gmt DESC \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: wp_xp7b48_comments
   partitions: NULL
         type: ref
possible_keys: comment_approved_date_gmt,comment_approved
          key: comment_approved
      key_len: 83
          ref: const
         rows: 5831275
     filtered: 100.00
        Extra: NULL

#29 follow-up: @spacedmonkey
10 months ago

Could we change the count query to something like this then.

SELECT COUNT(id) FROM wp_xp7b48_comments WHERE ( comment_approved = '1' )

#30 in reply to: ↑ 29 @Guss77
10 months ago

Replying to spacedmonkey:

Could we change the count query to something like this then.

This is kind of the whole point of this ticket 😅

#31 in reply to: ↑ 23 ; follow-up: @peterwilsoncc
10 months ago

Replying to FolioVision:

In that pull request @peterwilsoncc tried to adjust the get_comments() calls in different places to remove the ordering where it's not necessary. But the exact single place where it matters for our issue was omitted: https://github.com/WordPress/wordpress-develop/pull/4571/files#diff-1018e4974931b2cea88b413dadc533d3cf535f977c7f3a5f310e0bc7376233a8R522

Good catch, I've added it in to the dropdown in my PR. I did a quick search for 'number'\s*=> 1 in the code base to see if I'd missed any similar occurrences but couldn't find any.

#32 in reply to: ↑ 31 ; follow-up: @FolioVision
10 months ago

Replying to peterwilsoncc:

Good catch, I've added it in to the dropdown in my PR.

Thank you for the follow up.

I tested the new version of your branch: https://github.com/peterwilsoncc/wordpress-develop/tree/try/58368-comment-count-no-orderby

With standard WordPress the query responsible for the comment type drop down:

SELECT wp_5_comments.comment_ID
FROM wp_5_comments
WHERE ( ( comment_approved = '0'
OR comment_approved = '1' ) )
AND comment_type IN ('pingback', 'trackback')
ORDER BY wp_5_comments.comment_date_gmt DESC
LIMIT 0,1

Would take 7.28 seconds on average.

With the new improved code the query would look like this:

SELECT wp_5_comments.comment_ID
FROM wp_5_comments
WHERE ( ( comment_approved = '0'
OR comment_approved = '1' ) )
AND comment_type IN ('pingback', 'trackback')
LIMIT 0,1

But strangely it would take 8.68 seconds.

As I explained in our ticket the comment_type index is what really helps: #58488

With that index the query would take 0.3 miliseconds on average.

Each test was repeated 4 times and I would go from the original code to the new branch and then back and run more tests to ensure the numbers are not affected by something else.

#33 in reply to: ↑ 32 ; follow-up: @spacedmonkey
10 months ago

Replying to FolioVision:

As I explained in our ticket the comment_type index is what really helps: #58488

I said this above and was told i was wrong.

I think the biggest issue here is that comment_type does not have an index. I feel like it really should. This will likely fix these issues.

I am not enjoying the tone of this ticket. I am trying to help getting a fix committed to core and getting rude responses. Please keep these comments civil.

I think the fix for this is two fold.

Remove orderby for count to WP_Comment_Query and add a comment_type index. I think the index maybe have to be another ticket. Adding an index is not simple, as it locks the table.

#34 @spacedmonkey
10 months ago

  • Milestone changed from 6.3 to 6.4

With Beta 1 on the 27th of June and no clear solution forward, I am punting this change to 6.4.

#35 in reply to: ↑ 33 @FolioVision
10 months ago

Replying to spacedmonkey:

Replying to FolioVision:

As I explained in our ticket the comment_type index is what really helps: #58488

I said this above and was told i was wrong.

I think the biggest issue here is that comment_type does not have an index. I feel like it really should. This will likely fix these issues.

The comment_type index will help our issue of slow "All comment types" on the wp-admin -> Comments screen.

@Guss77 only said that the comment_type index does not help with slow wp_count_comments(). Which makes sense as comment_type is not used in these queries.

I think the fix for this is two fold.

Remove orderby for count to WP_Comment_Query and add a comment_type index. I think the index maybe have to be another ticket. Adding an index is not simple, as it locks the table.

That's exactly right.

In our testing adding comment_type index locks the table for:

  • 5 seconds for 2,668,659 comments (dedicated SQL server with low load)
  • 23 second for 795,581 comments (shared server with higher load)

I tried again on that shared server now and it took 17 seconds. The server load was at 2.5 to 2.7 and the VPS has 16 CPU cores and 62 GB of RAM.

So the comment_type index is really a separate issue. We started our ticket at #58488 and only got here as it was marked as related or partial duplicate. Which is true, but adding the database table index is indeed a different challenge.

WordPress does use the maintenance mode during upgrades, would locking the table be a big obstacle in that case?

#36 follow-up: @Guss77
10 months ago

@spacedmonkey and @FolioVision - is it possible to leave the issue of the comment_type index in #58488 and have this issue focused on removing the ordering clauses when issuing wp_count_comments() - as: (a) that was the original reported issue and the index issue is unrelated to the original report (i.e. it does not help nor hinder a solution); and (b) the index issue has a prior report about it (as noted).

I would really love to have the slowness of wp_count_comments() fixed in the 6.3 beta, before we (who are affected) have to update and patch manually again.

Several good solutions have been offered to solve the problem in core - can we have a discussion on which is best and just go with it?

#37 @spacedmonkey
10 months ago

  • Milestone changed from 6.4 to 6.3

As this is a bug, I am going to move this back to 6.3. Let's leave the index to #58488 as mentioned above.

#38 in reply to: ↑ 36 @peterwilsoncc
10 months ago

Replying to Guss77:

@spacedmonkey and @FolioVision - is it possible to leave the issue of the comment_type index in #58488 and have this issue focused on removing the ordering clauses when issuing wp_count_comments() - as: (a) that was the original reported issue and the index issue is unrelated to the original report (i.e. it does not help nor hinder a solution); and (b) the index issue has a prior report about it (as noted).

I think this makes sense. It appears there is potential to improve the speed with an index but if removing the orderby will get the code part of the way there, then it makes sense to do that.

@peterwilsoncc commented on PR #4553:


10 months ago
#39

@spacedmonkey Do you have any thoughts on tests I should add?

@spacedmonkey commented on PR #4553:


10 months ago
#40

@spacedmonkey Do you have any thoughts on tests I should add?

@peterwilsoncc I don't personally think this needs to have unit tests.

#41 @spacedmonkey
10 months ago

  • Owner set to peterwilsoncc
  • Status changed from new to assigned

This change looks good. Assigning to @peterwilsoncc to commit. If not, let's punt.

#42 @spacedmonkey
10 months ago

  • Keywords has-unit-tests removed

#43 follow-up: @peterwilsoncc
10 months ago

Running some EXPLAINs on MySQL 8.0.30-ubuntu using the innoDB engine:

For the count queries, the query optimizer is discarding the orderby clause provided by WP and defaulting to it's preference:

EXPLAIN SELECT COUNT(*)
FROM wp_comments
WHERE ( ( comment_approved = '0'
OR comment_approved = '1' ) )
ORDER BY wp_comments.comment_date_gmt DESC;

It actually uses the comment_approved_date_gmt index because that makes more sense according to the WHERE clause.

For the queries determining if the post type is used that include LIMIT 0,1 then removing the orderby clause does make a difference:

EXPLAIN SELECT wp_comments.comment_ID
FROM wp_comments
WHERE ( ( comment_approved = '0'
OR comment_approved = '1' ) )
AND comment_type IN ('', 'comment')
ORDER BY wp_comments.comment_date_gmt DESC
LIMIT 0,1;

The above uses the comment_date_gmt index, which isn't ideal for the job. By removing the orderby it uses the comment_approved_date_gmt index which seems a better fit.

My inclination is to put the change in my PR in for the comment type dropdown (ie, the LIMIT 0,1 queries) as I think that is where any benefit from this ticket will come from. For the counting queries, I don't think the changes actually help due to the optimizer.

#44 in reply to: ↑ 43 ; follow-up: @Guss77
10 months ago

Replying to peterwilsoncc:

Running some EXPLAINs on MySQL 8.0.30-ubuntu using the innoDB engine:

For the count queries, the query optimizer is discarding the orderby clause provided by WP and defaulting to it's preference:

Thank you for doing the detailed analysis on this issue.

Unfortunately, I'm not using MySQL 8, and - as I reported in the original ticket description - for the first query that you tested, the database I use does not discard the unneeded ORDER BY and that massively slows it down.

As far as I can tell, the ORDER BY clause is always wrong when there is a COUNT, and even if some database implementations are smart enough to ignore it - it should still be removed.

#45 in reply to: ↑ 44 @Guss77
10 months ago

Replying to Guss77:

Unfortunately, I'm not using MySQL 8, and - as I reported in the original ticket description - for the first query that you tested, the database I use does not discard the unneeded ORDER BY and that massively slows it down.

A small correction - I'm not sure how to tell whether the database optimizer discarded the unneeded ORDER BY, but the query @peterwilsoncc listed first, i.e.:

SELECT COUNT(*)
FROM wp_comments
WHERE ( ( comment_approved = '0'
OR comment_approved = '1' ) )
ORDER BY wp_comments.comment_date_gmt DESC

Completes in a more or less reasonable time on my system (around a second - not great, but manageable). The problem I have - as reported in the ticket description - is with a slightly different query:

SELECT COUNT(*)
FROM wp_comments
WHERE ( comment_approved = '1' )
ORDER BY wp_comments.comment_date_gmt DESC

This one - which is what wp_count_comments() does - runs 20 times slower.

Here's the EXPLAIN result of the first one:

type: range
possible_keys: comment_approved_date_gmt,comment_approved
key: comment_approved_date_gmt
ref: NULL
filtered: 100
Extra: Using where; Using index

and here's the EXPLAIN results for the second - slower - one:

type: ref
possible_keys: comment_approved_date_gmt,comment_approved
key: comment_approved
ref: const
filtered: 100
Extra: NULL

and this is the EXPLAIN results for the second query (the problematic one, in my case) where the ORDER BY clause was manually removed:

type: ref
possible_keys: comment_approved_date_gmt,comment_approved
key: comment_approved
ref: const
filtered: 100
Extra: Using index

I'm not a database expert, but I think what it means is that for the first version (checking for all possible values of comment_approved), the database figures out that the WHERE clause is meaningless and discards it - quickly returning all rows. It probably doesn't bother with ordering. For the second version - the slow one, it knows there's some work to do, tries to use the comment_approved index - probably ignoring the ORDER BY clause (I think?) - but then, for some reason, doesn't actually uses the index. I think it scans the table doing extra work and running out of cache. For the third - and faster - version it suddenly can use the comment_approved index, even though the only change is the removal of the ORDER BY.

BTW, this is on MySQL 5.7.

#46 follow-up: @spacedmonkey
10 months ago

  • Milestone changed from 6.3 to 6.4
  • Version changed from 6.2.2 to 6.0

Okay, I spent a little more time researching this one.

First of all, the currently PR #4553 does seem to help improve the query. Using query monitor, I am seeing these values with 50k comments.

SELECT COUNT(*) FROM wp_comments WHERE ( comment_approved = '1' ) ORDER BY wp_comments.comment_date_gmt DESC

0.0580

VS

SELECT COUNT(*) FROM wp_comments WHERE ( comment_approved = '1' )

0.0521

This doesn't exactly fix the problem.

I tried some other stuff, I changed the query to this.

SELECT COUNT(wp_comments.comment_ID) FROM wp_comments WHERE ( comment_approved = '1' )

0.0562

Which did make things better either.
I also created a index for comment_approved. Which also did nothing.

I believe that [53036], did make this issue worse, as before it did one query with a group, now it does multiple queries.

So I am unclear here what to do make this faster. As there is no clear path forward to fix this issue, I am going to punt this to WP 6.4, where we can find a solution.

@peterwilsoncc If you want, you can commit the existing PR, but I think we should leave this ticket as is to do more research.

#47 in reply to: ↑ 46 @Guss77
10 months ago

Replying to spacedmonkey:

[…]

0.0580

VS

[…]

0.0521

This doesn't exactly fix the problem.

I can suggest that 50 milliseconds is quite fast, and possibly most of those 50ms is overhead that you can't decrease in your setup (such as just transferring the data). When you get to sub-100ms times, there's no a lot of improvement left to be had.

I'm very glad to see that you've tested that this change has no negative effect! Thank you!

On my setup - as I've explained multiple times on this ticket - the difference imparted by removing the ORDER BY clause from a query time of "many seconds" to "under a second". I wouldn't expect to see the same order of improvement when you start with a sub-0.1 seconds query time.

I suggest that if you aren't looking at a significant query time (more than 0.1 seconds), you cannot reproduce the issue and therefor I wouldn't expect you to see an improvement by applying the fix.

As the fix - according to your testing - has no negative effects, and as its (from an SQL syntax perspective) the Right Thing, how about we apply it anyway, to the earliest version possible - so we can have this issue resolved for the people that are experiencing this slow down?

Thank you in advance.

#48 @spacedmonkey
9 months ago

I don't like committing something until we know what the problem was in the first place. I am for committing the the change as is, I don't see how it would hurt.

For now, you could use the action pre_get_comments to change your WP_Comment_Query.

 add_action( 'pre_get_comments', function( $wp_comment_query ){
       if ( $wp_comment_query->query_vars['count'] ) {
          $wp_comment_query->query_vars['orderby'] = 'none';
       }
 });

#49 in reply to: ↑ 22 @Hareesh Pillai
9 months ago

I'm starting to think what is the harm in removing the ORDER BY clause if there is solid proof of no negative impact on the query performance. We seem to be focussed on verifying if there is a positive impact or not, and in an attempt to quantify that, we are overlooking the non-negative impact. The fact that at least some environments would benefit from this change merits a commit, IMHO.

And, of course, we should leave the discussion about adding an index to the comment_type column to the other ticket.

To take this forward, I wish if we could try and answer this concern raised earlier:

You can help move this ticket forward by testing the patch: that will allow committers to validate whether the approach works or whether another approach is needed. Data that will help:

  • provide the database version and engine you are testing with -- how databases process queries without an orderby is undefined so the results can differ with different version/engine combinations

This ticket was mentioned in Slack in #core-performance by mukeshpanchal27. View the logs.


7 months ago

This ticket was mentioned in PR #5341 on WordPress/wordpress-develop by @spacedmonkey.


7 months ago
#51

  • Keywords has-unit-tests added

#52 follow-up: @spacedmonkey
7 months ago

  • Owner changed from peterwilsoncc to spacedmonkey

So after more discovery, I believe this is the best path forward.

  1. Commit PR. The orders do not fix this issue 100% but do not the database queries better.
  2. I have created a follow on ticket #59488. This adds database indexes to the two fields that are queried in these counts. This is a none trival ticket, as we have to discuss if you updating existing sites.

I have taken over this ticket from @peterwilsoncc. If we agrees, I will commit.

#53 in reply to: ↑ 52 @Guss77
7 months ago

Replying to spacedmonkey:

So after more discovery, I believe this is the best path forward.

Thank you. I would very much appreciate seeing this fix merged before the next release 👍👍

#54 @spacedmonkey
7 months ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 56747:

Comments: Improve WP_Comment_Query count query performance by setting 'order by' to 'none'.

In cases where WP_Comment_Query or get_comments is employed with the 'count' parameter set to true, specify 'order by' as 'none'. Since these queries serve solely to determine the count of comments matching specific query parameters, the 'order by' clause becomes redundant and places unnecessary strain on the database server, resulting in slower query execution. Given that count queries are executed on every admin request to retrieve comment counts, this change enhances the performance of the wp-admin interface.

Props guss77, davidbaumwald, SergeyBiryukov, westonruter, peterwilsoncc, foliovision, hareesh-pillai, spacedmonkey.
Fixes #58368

#58 @spacedmonkey
7 months ago

  • Keywords needs-dev-note added

This ticket was mentioned in PR #5386 on WordPress/wordpress-develop by @peterwilsoncc.


7 months ago
#59

https://core.trac.wordpress.org/ticket/58368

  • primes parent id cache in update_post_cache()
  • adds @since annotation
  • renames function for clarity --- it is currently a little unclear and could imply the parent post object is cached in its entirity
  • adds parameter to _prime_post_caches for priming the post parent ID caches if not already primed.

The docblock for the new parameter needs some work.

#60 @peterwilsoncc
7 months ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

I've opened a follow up pull request to address some concerns I have

  • I'm not sure the name of the function is clear that it only primes the parent IDs, not the parent post object 9I think @spacedmonkey had this question at some point too)
  • Include priming of the parent ID cache when priming post objects to avoid additional queries at a later time.
  • Include a parameter for caching the parent ID of already cached posts.

@peterwilsoncc commented on PR #5386:


7 months ago
#61

I don't like these changes. Changing update_post_cache, would prime a cache that is not used in most cases. This is just another memcache / memory writing call, which is wasteful.

In this case, I'd question the value of introducing the new cache and instead change the shape of the cache in WP_Query calls requesting id=>parent fields to include the post parent.

https://github.com/WordPress/wordpress-develop/blob/44e521c861c29e6d540910df64fa5fd8c460e443/src/wp-includes/class-wp-query.php#L3263-L3267

For backcompat, the change of the cache's shape would need to be considered wehn getting the data.

#62 @spacedmonkey
7 months ago

  • Resolution set to fixed
  • Status changed from reopened to closed

@peterwilsoncc I think your PR is designed for #59188. For that reason I am going to close this ticket again and update your PR.

joemcgill commented on PR #5386:


7 months ago
#63

@peterwilsoncc and @spacedmonkey, it seems like there's consensus here to commit the function name change and the @since annotation updates, but some disagreement about whether the new cache should be primed or if it should be removed and replaced with updates to the main WP_Query cache, if I'm understanding @peterwilsoncc's comment above correctly:

I'd question the value of introducing the new cache and instead change the shape of the cache in WP_Query calls requesting id=>parent fields to include the post parent

Can you two clarify what is needed to move this forward?

@spacedmonkey commented on PR #5386:


7 months ago
#64

@joemcgill

What we are trying to avoid from the original ticket, was lookuping the whole post object if all you need is a parent id. This is why this new cache was introduced.

So here is something I considered in the original PR.

  • We could change all id queries to get id and parent id. Then cache the parent ids in as another key in cache the array. Okay, this would help id=>parent queries, but make id queries slower, as you are requested more from the database. As id queries are FAR more common, I think we should optimize for id queries over the rare id=>parent.
  • On the first run on the query, the parent cache is primed if id=>parent. These caches are not expiring. There are only invalided if you update the post or flush the whole cache. Meaning, if you are using object caching, then the first time you load the post list page in wp-admin, the parent will be primed and cache will live forever. If not there a couple of posts that were updated, the new function will do a very small query to get the missing caches.
  • Priming this cache when it is not used or needed, takes resources. Adding this priming may negatively affect performance.

How about just updating the post parent cache on wp_insert_post, that way the value would be updated in cache just once and not primed all the time?

#65 @spacedmonkey
7 months ago

  • Keywords needs-dev-note removed
Note: See TracTickets for help on using tickets.