WordPress.org

Make WordPress Core

Opened 3 years ago

Last modified 7 days ago

#47280 reviewing enhancement

SQL_CALC_FOUND_ROWS is deprecated as of MySQL 8.0.17

Reported by: javorszky Owned by: johnbillion
Milestone: Future Release Priority: normal
Severity: major Version:
Component: Database Keywords: has-patch
Focuses: performance Cc:

Description

Per https://dev.mysql.com/doc/refman/8.0/en/information-functions.html#function_found-rows

The SQL_CALC_FOUND_ROWS query modifier and accompanying FOUND_ROWS() function are deprecated as of MySQL 8.0.17 and will be removed in a future MySQL version. As a replacement, considering executing your query with LIMIT, and then a second query with COUNT(*) and without LIMIT to determine whether there are additional rows.

This is not yet immediately important because most hosts are on 5.5, or 5.6, rarely 5.7, but given the speed with which trac tickets move that impact very core functionalities, I thought it best to open this ticket to get the work started.

This impacts all the 6 places where it's being used, though one of them is in the WP_Query definition.

Change History (17)

#1 @johnbillion
3 years ago

  • Keywords needs-patch added
  • Milestone changed from Awaiting Review to Future Release
  • Severity changed from minor to normal
  • Version trunk deleted

#2 follow-up: @johnbillion
2 years ago

@javorszky Is this something you're willing to work on? As SQL_CALC_FOUND_ROWS can get slow for a large data set, does the replacment SELECT COUNT(*) WHERE ... method recommended in the MySQL docs have a performance advantage?

#3 @javorszky
2 years ago

No. For various reasons I have stepped back from contributing to WordPress in any way except for this ticket via comments.

Regarding the performance advantage, as long as the select count(column) from table where ... happens on indices, and uses only one column, it should be okay.

Also per mysql docs, there's not much in terms of alternatives.

Alternative is to not have the number of total rows available any more, which I suspect would break bc.

#4 in reply to: ↑ 2 @morgantocker
18 months ago

Former MySQL Product Manager here. Just a reply to this question specifically:

does the replacment SELECT COUNT(*) WHERE ... method recommended in the MySQL docs have a performance advantage?

Yes it can. What happens with SQL_CALC_FOUND_ROWS is that it disables the optimizations that MySQL can apply when there is a LIMIT.

The corresponding COUNT(*) query will have its own optimizations available too, such as covering indexes.

So the problem with SQL_CALC_FOUND_ROWS is you kind of get the worst of both worlds - with neither types of optimizations applying. Issuing two queries means one more network roundtrip, but on the MySQL-side both of the queries will be more efficient.

This ticket was mentioned in PR #330 on WordPress/wordpress-develop by morgo.


18 months ago

  • Keywords has-patch added; needs-patch removed

Signed-off-by: Morgan Tocker <tocker@…>

Trac ticket: https://core.trac.wordpress.org/ticket/47280

This ticket was mentioned in Slack in #core by antpb. View the logs.


13 months ago

#7 @SergeyBiryukov
13 months ago

  • Milestone changed from Future Release to 5.7

#8 @wpe_bdurette
13 months ago

Awesome to see this is happening. The GitHub PR only includes fixes for two of the occurrences SQL_CALC_ROWS_FOUND. There are others. I'm working on a PR to address all instances in core.

The usage in class-wp-query.php as some interesting implications. There is a filter on the posts query allowing for a completely arbitrary query to be run. Options that come to mind are to try to parse that query to replace the fields with COUNT(*) or simply wrap the query (e.g., SELECT COUNT(*) from ($query) p). For safety, I plan to do the latter, but I have no idea how that will play with the query optimizer.

Edit: Realizing the above won't work. At a minimum I need to strip any LIMIT clauses off the original query.

Last edited 13 months ago by wpe_bdurette (previous) (diff)

#9 @morgantocker
13 months ago

@wpe_bdurette I started a reply before seeing your edit. But I will include it here since it might still be useful.

Consider this example:

CREATE TABLE t1 (id INT NOT NULL);
INSERT INTO t1 VALUES (1),(2),(3),(4),(5);
SELECT * FROM t1 LIMIT 3;
mysql> SELECT COUNT(*) FROM (SELECT * FROM t1 LIMIT 3) p; # applies LIMIT, returns incomplete total
+----------+
| COUNT(*) |
+----------+
|        3 |
+----------+
1 row in set (0.00 sec)


mysql> SELECT COUNT(*) FROM (SELECT * FROM t1) p; # correct
+----------+
| COUNT(*) |
+----------+
|        5 |
+----------+
1 row in set (0.00 sec)

In terms of efficiency:

In MySQL 5.7+ it is just as efficient as rewriting the query to not use the subquery. You should be able to see this in explain:

mysql [localhost:8022] {msandbox} (test) > explain SELECT COUNT(*) FROM (SELECT * FROM t1) p;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------+
|  1 | SIMPLE      | t1    | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    5 |   100.00 | NULL  |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+-------+
1 row in set, 1 warning (0.00 sec)

mysql [localhost:8022] {msandbox} (test) > show warnings;
+-------+------+---------------------------------------------------------------+
| Level | Code | Message                                                       |
+-------+------+---------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select count(0) AS `COUNT(*)` from `test`.`t1` |
+-------+------+---------------------------------------------------------------+
1 row in set (0.00 sec)

The optimization that rewrites subqueries in the from clause is called "derived_merge". It is a MySQL 5.7+ feature, and enabled by default. Earlier versions will be less efficient:

mysql [localhost:8022] {msandbox} (test) > set optimizer_switch="derived_merge=off";
Query OK, 0 rows affected (0.00 sec)

mysql [localhost:8022] {msandbox} (test) > explain SELECT COUNT(*) FROM (SELECT * FROM t1) p;
+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+-------+
| id | select_type | table      | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra |
+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+-------+
|  1 | PRIMARY     | <derived2> | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    5 |   100.00 | NULL  |
|  2 | DERIVED     | t1         | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    5 |   100.00 | NULL  |
+----+-------------+------------+------------+------+---------------+------+---------+------+------+----------+-------+
2 rows in set, 1 warning (0.00 sec)

mysql [localhost:8022] {msandbox} (test) > show warnings;
+-------+------+-------------------------------------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                                                 |
+-------+------+-------------------------------------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select count(0) AS `COUNT(*)` from (/* select#2 */ select `test`.`t1`.`id` AS `id` from `test`.`t1`) `p` |
+-------+------+-------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

I don't think the worse performance in 5.6 and lower will be that impactful for the typical sized database, but I will let others judge that one. There is technically a difference.

#10 @wpe_bdurette
13 months ago

Thanks @morgantocker! That's super helpful.

Whether or not I use a subselect or just replace the select_expr from the original query, I'm going to have to do some surgery on a query string in PHP. This is less than ideal without a full parser. It strikes me as safer to strip the LIMIT clause and move to a subselect, so that's what I'll do.

This ticket was mentioned in Slack in #core by hellofromtonya. View the logs.


11 months ago

This ticket was mentioned in Slack in #core by hellofromtonya. View the logs.


10 months ago

#13 @hellofromTonya
10 months ago

  • Keywords early added
  • Milestone changed from 5.7 to Future Release

With 5.7 Beta 1 landing in less than 6 days and no activity, we agreed during core scrub today to punt this ticket to Future Release. Marking it as early for 5.8.

If any maintainer or committer feels this can be resolved in time, or wishes to assume ownership during a specific cycle, feel free to update the milestone accordingly.

#14 @madpeter
3 months ago

#54077 was marked as a duplicate.

#15 @madpeter
3 months ago

  • Focuses performance added
  • Keywords early removed
  • Severity changed from normal to major

This issue missed 5.8 and has been open for more than 2 years now.

#16 @pembo13
3 months ago

Is this being actively worked on? Luckily, it doesn't look like MariaDB has deprecated this as well.

#17 @johnbillion
7 days ago

  • Owner set to johnbillion
  • Status changed from new to reviewing

I wonder if we could get more clever by lazily populating the found_posts and max_num_pages properties only when one of them is accessed. The existing SELECT FOUND_ROWS() query needs to run immediately after the query you're counting the results for, but the corresponding SELECT COUNT(...) does not.

This way if a query runs but its found_rows or max_num_pages properties are never subsequently read then the query to count the results never needs to run.

The count_found_rows argument for the query classes should remain in case there's a need to prevent the found_posts and max_num_pages properties from being populated.

  • Need to account for PHP serialization and JSON serialization of the query object.
  • Probably need to convert found_posts and max_num_pages into private properties and then expose them via __get() which lazily populates (and caches) their values.
  • Need tests for it all.
  • Need to account for third party search integrations (eg. Elasticsearch) that override the SQL query to count results, should be handled by the existing found_posts_query filter.

I can work on this during 6.0 unless someone else wants to.

Last edited 7 days ago by johnbillion (previous) (diff)
Note: See TracTickets for help on using tickets.