Make WordPress Core

Opened 5 years ago

Last modified 5 months ago

#47280 reviewing enhancement

SQL_CALC_FOUND_ROWS is deprecated as of MySQL 8.0.17

Reported by: javorszky's profile javorszky Owned by: johnbillion's profile johnbillion
Milestone: Future Release Priority: normal
Severity: normal Version:
Component: Database Keywords: has-patch has-unit-tests early changes-requested
Focuses: performance Cc:

Description

Per https://dev.mysql.com/doc/refman/8.0/en/information-functions.html#function_found-rows

The SQL_CALC_FOUND_ROWS query modifier and accompanying FOUND_ROWS() function are deprecated as of MySQL 8.0.17 and will be removed in a future MySQL version. As a replacement, considering executing your query with LIMIT, and then a second query with COUNT(*) and without LIMIT to determine whether there are additional rows.

This is not yet immediately important because most hosts are on 5.5, or 5.6, rarely 5.7, but given the speed with which trac tickets move that impact very core functionalities, I thought it best to open this ticket to get the work started.

This impacts all the 6 places where it's being used, though one of them is in the WP_Query definition.

Change History (65)

#1 @johnbillion
5 years ago

  • Keywords needs-patch added
  • Milestone changed from Awaiting Review to Future Release
  • Severity changed from minor to normal
  • Version trunk deleted

#2 follow-up: @johnbillion
5 years ago

@javorszky Is this something you're willing to work on? As SQL_CALC_FOUND_ROWS can get slow for a large data set, does the replacment SELECT COUNT(*) WHERE ... method recommended in the MySQL docs have a performance advantage?

#3 @javorszky
5 years ago

No. For various reasons I have stepped back from contributing to WordPress in any way except for this ticket via comments.

Regarding the performance advantage, as long as the select count(column) from table where ... happens on indices, and uses only one column, it should be okay.

Also per mysql docs, there's not much in terms of alternatives.

Alternative is to not have the number of total rows available any more, which I suspect would break bc.

#4 in reply to: ↑ 2 @morgantocker
4 years ago

Former MySQL Product Manager here. Just a reply to this question specifically:

does the replacment SELECT COUNT(*) WHERE ... method recommended in the MySQL docs have a performance advantage?

Yes it can. What happens with SQL_CALC_FOUND_ROWS is that it disables the optimizations that MySQL can apply when there is a LIMIT.

The corresponding COUNT(*) query will have its own optimizations available too, such as covering indexes.

So the problem with SQL_CALC_FOUND_ROWS is you kind of get the worst of both worlds - with neither types of optimizations applying. Issuing two queries means one more network roundtrip, but on the MySQL-side both of the queries will be more efficient.

This ticket was mentioned in PR #330 on WordPress/wordpress-develop by morgo.


4 years ago
#5

  • Keywords has-patch added; needs-patch removed

Signed-off-by: Morgan Tocker <tocker@…>

Trac ticket: https://core.trac.wordpress.org/ticket/47280

This ticket was mentioned in Slack in #core by antpb. View the logs.


4 years ago

#7 @SergeyBiryukov
4 years ago

  • Milestone changed from Future Release to 5.7

#8 @wpe_bdurette
4 years ago

Awesome to see this is happening. The GitHub PR only includes fixes for two of the occurrences SQL_CALC_ROWS_FOUND. There are others. I'm working on a PR to address all instances in core.

The usage in class-wp-query.php as some interesting implications. There is a filter on the posts query allowing for a completely arbitrary query to be run. Options that come to mind are to try to parse that query to replace the fields with COUNT(*) or simply wrap the query (e.g., SELECT COUNT(*) from ($query) p). For safety, I plan to do the latter, but I have no idea how that will play with the query optimizer.

Edit: Realizing the above won't work. At a minimum I need to strip any LIMIT clauses off the original query.

Last edited 4 years ago by wpe_bdurette (previous) (diff)

#9 @morgantocker
4 years ago

@wpe_bdurette I started a reply before seeing your edit. But I will include it here since it might still be useful.

Consider this example:

CREATE TABLE t1 (id INT NOT NULL);
INSERT INTO t1 VALUES (1),(2),(3),(4),(5);
SELECT * FROM t1 LIMIT 3;
mysql> SELECT COUNT(*) FROM (SELECT * FROM t1 LIMIT 3) p; # applies LIMIT, returns incomplete total
+----------+
| COUNT(*) |
+----------+
|        3 |
+----------+
1 row in set (0.00 sec)


mysql> SELECT COUNT(*) FROM (SELECT * FROM t1) p; # correct
+----------+
| COUNT(*) |
+----------+
|        5 |
+----------+
1 row in set (0.00 sec)

In terms of efficiency:

In MySQL 5.7+ it is just as efficient as rewriting the query to not use the subquery. You should be able to see this in explain:

mysql [localhost:8022] {msandbox} (test) > explain SELECT COUNT(*) FROM (SELECT * FROM t1) p;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------+
|  1 | SIMPLE      | t1    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    5 |   100.00 | NULL  |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)

mysql [localhost:8022] {msandbox} (test) > show warnings;
+-------+------+---------------------------------------------------------------+
| Level | Code | Message                                                       |
+-------+------+---------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select count(0) AS `COUNT(*)` from `test`.`t1` |
+-------+------+---------------------------------------------------------------+
1 row in set (0.00 sec)

The optimization that rewrites subqueries in the from clause is called "derived_merge". It is a MySQL 5.7+ feature, and enabled by default. Earlier versions will be less efficient:

mysql [localhost:8022] {msandbox} (test) > set optimizer_switch="derived_merge=off";
Query OK, 0 rows affected (0.00 sec)

mysql [localhost:8022] {msandbox} (test) > explain SELECT COUNT(*) FROM (SELECT * FROM t1) p;
+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+-------+
| id | select_type | table      | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+-------+
|  1 | PRIMARY     | <derived2> | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    5 |   100.00 | NULL  |
|  2 | DERIVED     | t1         | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    5 |   100.00 | NULL  |
+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+-------+
2 rows in set, 1 warning (0.00 sec)

mysql [localhost:8022] {msandbox} (test) > show warnings;
+-------+------+-------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                 |
+-------+------+-------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select count(0) AS `COUNT(*)` from (/* select#2 */ select `test`.`t1`.`id` AS `id` from `test`.`t1`) `p` |
+-------+------+-------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

I don't think the worse performance in 5.6 and lower will be that impactful for the typical sized database, but I will let others judge that one. There is technically a difference.

#10 @wpe_bdurette
4 years ago

Thanks @morgantocker! That's super helpful.

Whether or not I use a subselect or just replace the select_expr from the original query, I'm going to have to do some surgery on a query string in PHP. This is less than ideal without a full parser. It strikes me as safer to strip the LIMIT clause and move to a subselect, so that's what I'll do.

This ticket was mentioned in Slack in #core by hellofromtonya. View the logs.


4 years ago

This ticket was mentioned in Slack in #core by hellofromtonya. View the logs.


4 years ago

#13 @hellofromTonya
4 years ago

  • Keywords early added
  • Milestone changed from 5.7 to Future Release

With 5.7 Beta 1 landing in less than 6 days and no activity, we agreed during core scrub today to punt this ticket to Future Release. Marking it as early for 5.8.

If any maintainer or committer feels this can be resolved in time, or wishes to assume ownership during a specific cycle, feel free to update the milestone accordingly.

#14 @madpeter
3 years ago

#54077 was marked as a duplicate.

#15 @madpeter
3 years ago

  • Focuses performance added
  • Keywords early removed
  • Severity changed from normal to major

This issue missed 5.8 and has been open for more than 2 years now.

#16 @pembo13
3 years ago

Is this being actively worked on? Luckily, it doesn't look like MariaDB has deprecated this as well.

#17 @johnbillion
3 years ago

  • Owner set to johnbillion
  • Status changed from new to reviewing

I wonder if we could get more clever by lazily populating the found_posts and max_num_pages properties only when one of them is accessed. The existing SELECT FOUND_ROWS() query needs to run immediately after the query you're counting the results for, but the corresponding SELECT COUNT(...) does not.

This way if a query runs but its found_rows or max_num_pages properties are never subsequently read then the query to count the results never needs to run.

The count_found_rows argument for the query classes should remain in case there's a need to prevent the found_posts and max_num_pages properties from being populated.

  • Need to account for PHP serialization and JSON serialization of the query object.
  • Probably need to convert found_posts and max_num_pages into private properties and then expose them via __get() which lazily populates (and caches) their values.
  • Need tests for it all.
  • Need to account for third party search integrations (eg. Elasticsearch) that override the SQL query to count results, should be handled by the existing found_posts_query filter.

I can work on this during 6.0 unless someone else wants to.

Last edited 3 years ago by johnbillion (previous) (diff)

This ticket was mentioned in PR #2119 on WordPress/wordpress-develop by johnbillion.


3 years ago
#18

  • Keywords has-unit-tests added

Trac ticket: https://core.trac.wordpress.org/ticket/47280

This continues the work of @morgo on #330. Additions:

  • [X] Convert found_posts and max_num_pages into private properties and then expose them via __get() which lazily populates (and caches) their values
  • [X] Tests for basic queries

Todo:

  • [ ] Account for PHP serialization and JSON serialization of the WP_Query object -- the newly private properties still need to be included
  • [ ] Many more tests for complex queries
  • [ ] Test with third party search integrations (eg. Elasticsearch) that override the SQL query to count results, should be handled by the existing found_posts_query filter
  • [ ] Repeat the process for all query classes that can perform count queries:
    • [X] WP_Query
    • [ ] WP_Network_Query
    • [ ] WP_Site_Query
    • [ ] WP_Comment_Query

#19 @johnbillion
3 years ago

@morgantocker @wpe_bdurette The PR at https://github.com/WordPress/wordpress-develop/pull/2119 is still heavily in progress but I wouldn't mind a review on it. So far I've been focusing on the lazy loading aspect of the found_posts and max_num_pages properties, and soon I'll focus on ensuring that the switch from SELECT FOUND_ROWS() to the unbounded COUNT works as expected for as many different query arguments as is reasonably possible.

So far I've added tests and query adjustments for simple queries:

  • Basic query
  • No limits: posts_per_page => -1 and nopaging => true (no LIMIT clause)
  • Paginated query: paged => 3 (LIMIT with an offset)
  • Author query: author => {id} (basic field match of post_author = {id})

And queries that result in joins:

  • Term query: tag => {id} (LEFT JOIN with GROUP BY ID)
  • Meta query: meta_key => {key} (INNER JOIN with GROUP BY ID)

Do you have suggestions for other clause types that can affect the accuracy of the SELECT COUNT query that we're using? It's generated by re-running the same query without its GROUP BY, ORDER BY, and LIMIT clauses: https://github.com/johnbillion/wordpress-develop/blob/26174be996cd46f1e99788b1dccfce6adaf921a8/src/wp-includes/class-wp-query.php#L3027-L3029

Once the changes for WP_Query are in a good place, the process can be repeated for the other query classes.

#20 follow-up: @xParham
3 years ago

Hey @johnbillion, can we also maybe add count as an option for the fields query parameter that would then only run the COUNT query and populate the found_posts/total_users values and skip running the main query to save some query time in cases where only the count is needed?

#21 follow-up: @misterwebdeveloper
3 years ago

Hi,

Experienced software developer of 20 years here. I just want to start by saying MySQL is a crazy good open source platform and kudos to everyone involved building it.

Sorry if I'm posting this in the wrong place...

Let me play devil's advocate.

As far as I can see the recommendation is to break DRY and duplicate SELECT statements.
Presumably this feature was orignally added to prevent that.

The MYSQL recommendation: duplicate the original query. So a SQL statement using SQL_CALC_FOUND_ROWS is 150 lines, most of those lines should be duplicated? This isn't a pragmatic solution. What if an existing system is using 50k lines? Is there something I'm missing here?

Another suggestion is to rewrite systems to use lazy loading. Not all systems are lazily loaded social media platforms. Pagination still has it's place and isn't going anywhere long term.

Is it possible to fix/improve the performance issues in this feature? How about MySQL re-runs the proc in the background without the LIMIT code when it encounters SQL_CALC_FOUND_ROWS? That way there the dev doesn't need to duplicate the code.

My regards

#22 in reply to: ↑ 20 @johnbillion
3 years ago

Replying to xParham:

Hey @johnbillion, can we also maybe add count as an option for the fields query parameter

Interesting idea! I'll open a follow-up ticket for that so we keep this one focused.

#23 in reply to: ↑ 21 @johnbillion
3 years ago

Replying to misterwebdeveloper:

Thanks for the comment. I don't think any of this is relevant to this particular ticket, which is about implementing the MySQL change into the WordPress software. Cheers!

This ticket was mentioned in Slack in #core by johnbillion. View the logs.


3 years ago

#26 @johnbillion
3 years ago

  • Keywords early added
  • Severity changed from major to normal

#27 follow-up: @rjasdfiii
2 years ago

  • @javorszky - It is a common misconception that one needs to say "COUNT(col)". That goes to the extra effort of counting only rows where "col IS NOT NULL". Unless you want that, you should do "COUNT(*)",
  • @johnbillion - "SELECT FOUND_ROWS() query needs to run immediately after the query you're counting the results for, but the corresponding SELECT COUNT(...) does not." I think this is backward. No, even worse. If row(s) where inserted/deleted between the main SELECT and the COUNT, the count could be incorrect -- possibly visibly incorrect. Example: SELECT finds 4 rows; a DELETE intervenes; then COUNT says 3 rows. (OK, maybe no WP users have multiple connections??
  • @misterwebdeveloper - Doing the COUNT in the background: Have the web page request the count via AJAX; then fill in the count when the response comes back. That way, the page will run _faster_ than it does today. And you get the count [eventually].

#28 @desrosj
2 years ago

#55781 was marked as a duplicate.

#29 in reply to: ↑ 27 @johnbillion
2 years ago

Replying to rjasdfiii:

If row(s) where inserted/deleted between the main SELECT and the COUNT, the count could be incorrect

Thanks for raising this concern. For this problem to occur and cause unexpected or undesired results would require:

  1. That a WP_Query is performed
  2. That one or more post insertion, post deletion, post update, change to post meta, or change to post term relationship occurs after the SQL query runs but before either of the found_posts or max_num_pages properties are accessed
  3. That the above change affects the result of the clauses of the SQL query of the concerned WP_Query
  4. If max_num_pages is used, that the resulting change to the found_posts value affects the calculation of max_num_pages (a 1 in 10 chance given 10 posts per page)

There certainly is a window for this opportunity to present itself within a standard request to WordPress (for example it's common for pagination controls in themes to be displayed at the bottom of the page), but it's a small window.

I think the relatively low chance and infrequency of this situation happening, combined with the low severity, is outweighed by the benefit for all other cases where lazy loading the count results in the elimination of the count query when it's not needed.

#32 @spacedmonkey
2 years ago

I put together a much simpler solution in #2829.

#33 @johnbillion
2 years ago

Cheers @spacedmonkey, are you concerned that the lazy-loading of the max_num_pages and found_posts properties is a risky change? Or are you just proposing a simpler solution to begin with? I think the lazy loading is very close to being complete, I need to find some more time to further test it with plugins that use these properties and related filters.

AkramiPro commented on PR #2829:


2 years ago
#34

Count query is not working when using GROUP By in sql query .
and if you remove GROUP By the result is not the same as SELECT FOUND_ROWS()

#35 @akramipro
2 years ago

COUNT(*) query is not working when using GROUP BY in sql query .
and if you remove GROUP BYthe result is not the same as SELECT FOUND_ROWS()

see this exam. main query result has 415 rows but with count query if we remove GROUP BY we get 423 rows

https://i.ibb.co/M5rsqzY/179920451-1c238783-b255-4b07-ac39-e50296100f36.png
https://i.ibb.co/Y8KRVNQ/179920575-81b41c72-c033-456c-a442-9e9fe82e4765.png

#36 @akramipro
2 years ago

After research that I have , I found a solution for getting a count of queries that have GROUP BY .
we can remove GROUP BY and then change count syntax to cover only fields that are used in GROUP BY .
Please see my previous comment and compare it with this image to understand what I say .

https://i.ibb.co/dGYJvKj/Screenshot-3.png

#37 @akramipro
2 years ago

also i wrote this code for count query and it's work like charm:

note: this filter: found_posts_query to set my custom count query and in this filter we can't access $clauses so you need to save $clauses in WP_Query object with this filter : posts_clauses

<?php

/**
 * Fix WordPress Slow Queries
 *
 * @author Mahdi Akrami
 */
class FIX_WP_SLOW_QUERY {

        public static function init() {

                /**
                 * WP_Query
                 */

                add_filter( 'found_posts_query', [ __CLASS__, 'add_found_rows_query' ], 999, 2 );

                add_filter( 'posts_request_ids', [ __CLASS__, 'remove_found_rows_query' ], 999 );

                add_filter( 'posts_pre_query', function ( $posts, \WP_Query $query ) {

                        $query->request = self::remove_found_rows_query( $query->request );

                        return $posts;
                }, 999, 2 );

                add_filter( 'posts_clauses', function ( $clauses, \WP_Query $wp_query ) {

                        $wp_query->fw_clauses = $clauses;

                        return $clauses;
                }, 999, 2 );

        }

        public static function remove_found_rows_query( $sql ) {

                return str_replace( ' SQL_CALC_FOUND_ROWS ', '', $sql );
        }

        public static function add_found_rows_query( $sql, WP_Query $query ) {

                global $wpdb;

                $distinct = $query->fw_clauses['distinct'] ?? '';
                $join     = $query->fw_clauses['join'] ?? '';
                $where    = $query->fw_clauses['where'] ?? '';
                $groupby  = $query->fw_clauses['groupby'] ?? '';

                $count = 'COUNT(*)';

                if ( ! empty( $groupby ) ) {
                        $count = "COUNT( distinct $groupby )";
                }

                return "
                        SELECT $distinct $count
                        FROM {$wpdb->posts} $join
                        WHERE 1=1 $where
                ";
        }

}

FIX_WP_SLOW_QUERY::init();

#38 @rjasdfiii
2 years ago

@akramipro - It looks like that query could end up with "DISTINCT" twice; this seems wrong:

SELECT DISTINCT COUNT(DISTINCT groupbyvar) ...

Also, it is essentially wrong to mix DISTINCT and GROUP BY:

SELECT DISTINCT ... GROUP BY ...

(Since I don't know what happens after "return", I may be incorrectly critiquing the code.)

#39 @akramipro
2 years ago

@rjasdfiii - no i think you are wrong . about DISTINCT i tested before and it works fine with no errors . also there is no GROUP BY in my count query . you should read my code with more attention !
Anyway, I tested my code in site with 3.5 million wp_postmeta records and I checked sql logs and all count queries executed with no error and paginations work like before.
Last night my server cpu was 100% but after I fixed the count queries it's under 10% !!

#40 @akramipro
2 years ago

@rjasdfiii - also i should mention that wordpress add GROUP BY to $groupby after posts_clauses filter so this is safe to use $groupby in my count query $count = "COUNT( distinct $groupby )" since it's just contain group column name.

https://i.ibb.co/ZY0h9Wz/download.jpg

#41 @rjasdfiii
2 years ago

Please provide the generated SQL, after the $ variables are interpolated.

#42 @akramipro
2 years ago

this is the generate sql i send it before !

https://i.ibb.co/dGYJvKj/Screenshot-3.png

#43 @rjasdfiii
2 years ago

Get the 423 rows, look at the 8 rows with duplicate wp_posts.ID. That should explain the purpose of the GROUP BY -- and/or point out redundant or conflicting information in postmeta.

#44 @akramipro
2 years ago

@rjasdfiii I can't understand you at all. If there is an issue in my code please explain it or give the right solution .
wp default query return 415 rows . ( without my code)
My count query also returns 415 rows.( with my code)
What is the problem here ?

AkramiPro commented on PR #2119:


2 years ago
#45

Hi . i write some code to fix wordpress slow query bug in count queries .

i explain the code here: link

<?php

/**
 * Fix WordPress Slow Queries
 *
 * @author Mahdi Akrami
 */
class FIX_WP_SLOW_QUERY {

        public static function init() {

                /**
                 * WP_Query
                 */

                add_filter( 'found_posts_query', [ __CLASS__, 'add_found_rows_query' ], 999, 2 );

                add_filter( 'posts_request_ids', [ __CLASS__, 'remove_found_rows_query' ], 999 );

                add_filter( 'posts_pre_query', function ( $posts, \WP_Query $query ) {

                        $query->request = self::remove_found_rows_query( $query->request );

                        return $posts;
                }, 999, 2 );

                add_filter( 'posts_clauses', function ( $clauses, \WP_Query $wp_query ) {

                        $wp_query->fw_clauses = $clauses;

                        return $clauses;
                }, 999, 2 );

        }

        public static function remove_found_rows_query( $sql ) {

                return str_replace( ' SQL_CALC_FOUND_ROWS ', '', $sql );
        }

        public static function add_found_rows_query( $sql, WP_Query $query ) {

                global $wpdb;

                $distinct = $query->fw_clauses['distinct'] ?? '';
                $join     = $query->fw_clauses['join'] ?? '';
                $where    = $query->fw_clauses['where'] ?? '';
                $groupby  = $query->fw_clauses['groupby'] ?? '';

                $count = 'COUNT(*)';

                if ( ! empty( $groupby ) ) {
                        $count = "COUNT( distinct $groupby )";
                }

                return "
                        SELECT $distinct $count
                        FROM {$wpdb->posts} $join
                        WHERE 1=1 $where
                ";
        }

}

FIX_WP_SLOW_QUERY::init();

#46 @spacedmonkey
2 years ago

Just flagging that I am currently working on #22176 that will likely end up in 6.1 / 6.2. This adds caching to WP_Query. This affects this patch, because the found_post property is cached. If the value is cached, even in memory, it means that less times it is run.

#47 @johnbillion
2 years ago

Thinking out loud, if #22176 lands before this then the cached values for found_posts and max_num_pages can also be lazily populated. I'll bear it in mind.

#48 @johnbillion
2 years ago

@akramipro I've pushed some more commits to the pull request which accounts for the GROUP BY clause using a field name or names other than wp_posts.ID. Thanks.

@rjasdfiii The point that @akramipro was trying to make is that if the original query contains a GROUP BY clause then the subsequent COUNT(...) query needs to contain a DISTINCT clause in order to account for rows that were grouped in the main query. This can be seen when performing a meta query with the relation field set to OR and a result set that contains posts that match both the meta clauses. Without the COUNT(DISTINCT ...) clause the count will not be correct.

#49 @spacedmonkey
2 years ago

Now that #22176 is committed in [54111], this patch will need to be refactored.

#50 @archon810
2 years ago

Think we can expect to see this one finished and released early next year? It's going to result in huge perf improvements.

#51 @JaworskiMatt
2 years ago

This is a very important improvement no matter what MySQL version the website is using. Huge databases end up being extremely slow because of this, and in case of our plugin WordPress insists on counting the rows in our WP_Queryies even when we don't need it for anything, leading to major performance degradation. Trying to circumvent it with filters has never worked fine.

We hope this makes it into Core sooner than later.

@johnbillion commented on PR #330:


22 months ago
#53

I'm going to close this off in favour of #3863 which carries on this work. Cheers!

@johnbillion commented on PR #2119:


22 months ago
#54

Closing this in favour of #3863

#55 @johnbillion
22 months ago

I've opened a new PR for this: https://github.com/WordPress/wordpress-develop/pull/3863

Lazy loading the found_posts and max_num_pages properties was an interesting exercise but it introduces a few potential problems, not least potentially inaccurate counts if a post in the result set is updated in between the query being run and one of those properties being accessed.

The new PR scales back the changes to just the replacement of SQL_COUNT_FOUND_ROWS and SELECT FOUND_ROWS() with a performant COUNT query that's the approach recommended by MySQL. It includes feedback from the discussion on this ticket and the original PRs.

New tests have been added and some existing tests have been updated.

There is one outstanding issue which is to decide how to handle a change to the query via the posts_request filter (the only filter that runs between the query being constructed and executed) which could potentially result in an inaccurate count. I don't think there's a way to avoid this while still removing SQL_COUNT_FOUND_ROWS.

Testing and feedback welcome and appreciated.

@spacedmonkey commented on PR #3863:


16 months ago
#56

With 10k+ posts

Before
https://i0.wp.com/github.com/WordPress/wordpress-develop/assets/237508/73fa7bdf-ab79-40d3-8bbe-0a14e19cdf73

After
https://i0.wp.com/github.com/WordPress/wordpress-develop/assets/237508/3a21b8fe-e4e4-44c5-899d-2c2028b2b706

#58 @mukesh27
14 months ago

  • Keywords changes-requested added

#59 follow-up: @johnbillion
14 months ago

Quick status update on this:

  1. The posts_request filter is a real stumbling block for this change. The SQL to perform the count query can be reliably adjusted to work without SQL_COUNT_FOUND_ROWS, but the posts_request filter runs between the query being constructed and before it runs, which means arbitrary changes to the query can be made in a filter and cause the count query to be incorrect. We'll need some suggestions for how to solve this.
  2. There are some valid concerns about the performance of the count query, even though it's the query structure that MySQL recommends. This needs some benchmarking.
  3. A few other todos in the description of the PR.

#60 @spacedmonkey
14 months ago

I posted some quick benchmarks here.

The posts_request filter is a real stumbling block for this change. The SQL to perform the count query can be reliably adjusted to work without SQL_COUNT_FOUND_ROWS, but the posts_request filter runs between the query being constructed and before it runs, which means arbitrary changes to the query can be made in a filter and cause the count query to be incorrect. We'll need some suggestions for how to solve this.

It is worth noting that is the same problem with the filter posts_request_ids.

Maybe the best we can do is detect if posts_request / posts_request_ids have a filter hooked in and if do, fallback to the old style SQL_CALC_FOUND_ROWS queries. This way, string replacement would work within filters. We could add a _doing_it_wrong and write a dev note to ensure developers update their code, when it falls back. Any clever manipulation of query after it has been filter may result in an invalid count query. Worse, we may not know it is invalid and count could just be wrong.

I think we should be pushing developers to use the posts_pre_query if they want to run custom queries or change the value WP_Query returns.

archon810 commented on PR #3863:


12 months ago
#61

Any updates here please? SQL_CALC_FOUND_ROWS is twice as slow as count+select in our use cases and would mean massive performance improvements. https://core.trac.wordpress.org/ticket/47280 is almost at the finish line.

#62 @johnbillion
11 months ago

#60058 was marked as a duplicate.

#63 @leendertvb
9 months ago

Since some of our clients were facing this issue, we decided to create a plugin to fix the issue for now. I know it is not ideal to use a plugin for this.
Currently it provides a fix for the posts_query and comments_query through the filters found_posts_query and found_comments_query.
Please let me know your thoughts on this.

https://wordpress.org/plugins/count-pagination-fix/

#64 in reply to: ↑ 59 @enghell
7 months ago

Replying to johnbillion:

Quick status update on this:

  1. The posts_request filter is a real stumbling block for this change. The SQL to perform the count query can be reliably adjusted to work without SQL_COUNT_FOUND_ROWS, but the posts_request filter runs between the query being constructed and before it runs, which means arbitrary changes to the query can be made in a filter and cause the count query to be incorrect. We'll need some suggestions for how to solve this.
  2. There are some valid concerns about the performance of the count query, even though it's the query structure that MySQL recommends. This needs some benchmarking.
  3. A few other todos in the description of the PR.

My two cents on this, it may be wise to deprecate posts_request filter and push posts_pre_query as replacement.

mibmo commented on PR #3863:


5 months ago
#65

what's blocking here? i'd love to help fix this.

Note: See TracTickets for help on using tickets.