Make WordPress Core

Opened 6 years ago

Closed 2 months ago

Last modified 3 weeks ago

#50161 closed enhancement (fixed)

Slow query in WP_Posts_List_Table, no way to avoid with filters/hooks

Reported by: lucasms's profile LucasMS Owned by: sergeybiryukov's profile SergeyBiryukov
Milestone: 6.9 Priority: normal
Severity: normal Version: 5.4.1
Component: Database Keywords:
Focuses: administration, performance Cc:

Description

The post listing page on my wordpress installation is taking more than 15seconds to load (wp_posts w/ more than 120k records, running on an AWS EC2 t2.large instance), and the main cause of this is the following query in the constructor of class wp-admin/includes/class-wp-posts-list-table.php:

SELECT COUNT( 1 )
			FROM $wpdb->posts
			WHERE post_type = %s
			AND post_status NOT IN ( '" . implode( "','", $exclude_states ) . "' )
			AND post_author = %d

It is used just to count the posts the logged-in user posted and show it over the listing table (in "Mine" link).
There is no way to filter this out or enhance the query (other than changing this file).

Suggestion: try to create an database index for this query (I tried, but failed), or enhance it in some way to make it faster.

Other option is to remove this "Mine (xx)" link above the listing, using hooks.

Attachments (1)

patch-50161-new-index.diff (1.9 KB) - added by josephscott 4 months ago.
type_status_author index

Download all attachments as: .zip

Change History (42)

#1 @LucasMS
6 years ago

Removing the 'mine' index from $views in views_edit-post filter will only remove the link, but the query is still executed.

<?php
add_filter('views_edit-post', 'remove_mine_view');
function remove_mine_view( $views ) {
        unset($views['mine']);
        return $views;
}

#2 @Otto42
6 years ago

This query is already pretty well optimized using the existing default indexes.

Example:

EXPLAIN SELECT COUNT( 1 ) FROM wp_posts WHERE post_type = 'post' AND post_status NOT IN ( 'draft' ) AND post_author = 1

This comes back as a SIMPLE select using the keys of type_status_date, and post_author, with "Using index condition" and "Using where".

Basically, it already has indexes. The type_status_date is an index on post_type and post_status, with post_date there but not needed in this case. The post_author selection is also indexed by itself (as just post_author).

If you're having issues with it, you may want to check your indexes on wp_posts to make sure they're all there. You can find the default keys on the table definition here:

https://core.trac.wordpress.org/browser/trunk/src/wp-admin/includes/schema.php#L159

#3 @SergeyBiryukov
5 years ago

  • Component changed from Administration to Posts, Post Types

This ticket was mentioned in Slack in #core-media by joedolson. View the logs.


5 years ago

#5 @flixos90
5 months ago

  • Focuses ui removed
  • Milestone Awaiting Review deleted
  • Resolution set to wontfix
  • Status changed from new to closed

Thank you for the report @LucasMS.

There hasn't been any feedback on this ticket for several years, and the query appears to work well for most WordPress sites. Adding an index in the database for this only seems relevant for very large sites, so I would argue this falls under custom optimization that you could attempt for your website specifically. I don't think it would benefit enough users to be added to WordPress Core.

#6 @matt
5 months ago

  • Resolution wontfix deleted
  • Status changed from closed to reopened

WordPress should be natively performant into the millions of items in wp_posts, and the cost in terms of size and overhead is minimal for smaller sites if the index is increased to support this.

#7 @josephscott
4 months ago

To fill out the details on the potential impact of a new index to speed things the All Posts page up I setup a fresh WP 6.8.1 install with PHP 8.4.7, MySQL 8.4.2 on my Apple M3 laptop ( no memcached or redis ).

I'm specifically looking at how long it takes for the initial page request to finish for /wp-admin/edit.php. I generated 340,000 posts for this test. They are spread across two users, and the admin user view of the All Posts page shows:

  • All: 340,000
  • Mine: 339,999
  • Published: 339,800
  • Drafts: 100
  • Private: 100

Again, looking only at how long the initial page request for /wp-admin/edit.php was taking, I was seeing times in the 0.820 seconds range. This wasn't TTFB, it was the total time to get the single HTML page response back. After enabling SPX to get an idea of which parts were taking the longest, the vast majority of the time was spent in wpdb::_do_query, which wasn't a huge surprise.

But looking at just the _do_query parts from the SPX report showed that one query in particular was the slowest part. It was the query from class-wp-posts-list-table.php - https://github.com/WordPress/WordPress/blob/2f41f3fdaded86e2f4baca87001138542fb7fc55/wp-admin/includes/class-wp-posts-list-table.php#L92-L102

		$this->user_posts_count = (int) $wpdb->get_var(
			$wpdb->prepare(
				"SELECT COUNT( 1 )
				FROM $wpdb->posts
				WHERE post_type = %s
				AND post_status NOT IN ( '" . implode( "','", $exclude_states ) . "' )
				AND post_author = %d",
				$post_type,
				get_current_user_id()
			)
		);

In the SPX captured request it indicated that query was taking 0.360 seconds. That was more than 40% of the total time to request the page.

To isolate just the query, I started running it directly against MySQL. In my case that was:

SELECT COUNT( 1 )
                FROM wp_posts
                WHERE post_type = 'post'
                AND post_status NOT IN ( 'trash','auto-draft','inherit','request-pending','request-confirmed','request-failed','request-completed' )
                AND post_author = 1

Doing that 10 times gave me a p75 response time of 0.346 seconds ( min: 0.341 seconds, max: 0.356 seconds ). That confirms that this query alone was taking up a big chunk of time to finish the request.

Adding EXPLAIN to that query reported:

  • id: 1
  • select_type: SIMPLE
  • table: wp_posts
  • partitions: NULL
  • type: ref
  • possible_keys: type_status_date,post_author
  • key: post_author
  • key_len: 8
  • ref: const
  • rows: 145,414
  • filtered: 50.04
  • extra: Using where

Trying a few different things, I eventually landed on this new index:

CREATE INDEX type_status_author ON wp_posts (post_type, post_status, post_author);

Running the same COUNT( 1 ) query another 10 times gave a new p75 response time of 0.123 seconds ( min: 0.119 seconds, max: 0.125 seconds ). Doing EXPLAIN on the query with the new index in place reported:

  • id: 1
  • select_type: SIMPLE
  • table: wp_posts
  • partitions: NULL
  • type: range
  • possible_keys: type_status_date,post_author,type_status_author
  • key: type_status_author
  • key_len: 164
  • ref: NULL
  • rows: 145,520
  • filtered: 50.00
  • extra: Using where; Using index

To confirm the impact that this would have on the full All Posts ( /wp-admin/edit.php ) HTTP request I ran ten curl requests for it before and after. This is not TTFB, it was the total time the curl request took.

  • p75 BEFORE: 0.826 seconds ( min: 0.812 seconds, max: 0.829 seconds )
  • p75 AFTER: 0.558 seconds ( min: 0.550 seconds, max: 0.563 seconds )

That confirmed that adding the index, speeding up that single query, reduced the time for the All Posts HTTP page request significantly. Speeding up a request by 32% doesn't come along every day.

Adding an index does come with some trade offs. Inserting into the wp_posts table will require an update to the new index. In my test setup I found that inserts took about 0.0001 seconds longer at p75 with the new index in place. That seems small enough that I don't think it is a concern.

The other part of having a new index is space. I ran the following query to get a list of the index sizes ( in MB ) for the wp_posts table:

SELECT
    database_name,
    TABLE_NAME,
    index_name,
    ROUND(
        stat_value * @@innodb_page_size / 1024 / 1024,
        2
    ) size_in_mb
FROM
    mysql.innodb_index_stats
WHERE
    stat_name = 'size' AND index_name != 'PRIMARY' AND database_name = 'wordpress' AND TABLE_NAME = 'wp_posts'
ORDER BY
    size_in_mb
DESC;
  • post_name: 30.6 MB
  • type_status_date: 18.5 MB
  • type_status_author: 13.5 MB
  • post_author: 7.5 MB
  • post_parent: 7.5 MB

Adding 13.5 MB to the set of indexes is a sizable increase. But in the bigger picture, the wp_posts table is also 500 MB. Making a 13.5 MB new index a 2.7% increase.

All in all, I think the trade off of having a new index is worth the improvement.

@josephscott
4 months ago

type_status_author index

This ticket was mentioned in PR #9312 on WordPress/wordpress-develop by @josephscott.


4 months ago
#8

  • Keywords has-patch added

See https://core.trac.wordpress.org/ticket/50161

When you have a large number of posts, not having this index can significantly slow down some DB queries.

This ticket was mentioned in Slack in #core by josephscott. View the logs.


4 months ago

#10 @johnjamesjacoby
4 months ago

Love this idea. Patch looks perfect.

Index size on disk seems like a reasonable trade-off for the improved user experience. Big bbPress' benefit big-time.

WordPress should be natively performant into the millions of items in wp_posts

Agree – similar to the early work that went into millions of users.

#11 @siliconforks
4 months ago

I'm skeptical how useful this index will actually be in practice - it seems like it is designed for just this one specific SQL query, and even for that query the index does not appear to entirely solve the issue. Even if it reduces the query time to 0.1 seconds, that is still a relatively slow query. The problem is that database indexes are designed for retrieving specific data, not for counting things. An index might make counting somewhat faster (because the database might be able to count using only the index and avoid reading the full table) but it is still likely to get slower and slower as more and more blog posts are added to the table.

For what it's worth, I tested this on a Ubuntu 22.04 machine with MySQL 8.0.42-0ubuntu0.22.04.2 and found that on this system MySQL does not actually use the index at all:

mysql> EXPLAIN SELECT COUNT( 1 )
    -> FROM wp_posts
    -> WHERE post_type = 'post'
    -> AND post_status NOT IN ( 'trash','auto-draft','inherit','request-pending','request-confirmed','request-failed','request-completed' )
    -> AND post_author = 1
    -> \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: wp_posts
   partitions: NULL
         type: ref
possible_keys: type_status_date,post_author,type_status_author
          key: post_author
      key_len: 8
          ref: const
         rows: 168194
     filtered: 50.03
        Extra: Using where

Granted, that might just be some bug in older MySQL versions and may not actually be an issue going forward. Still, I don't think using an index is really a reliable way of speeding up counting things. If you really want to solve the issue, I think you would have to do one of the following:

  1. Store the count somewhere instead of re-counting rows every time. (Note that WordPress already does something similar in a few places - e.g., wp_posts.comment_count, wp_term_taxonomy.count.)
  1. Do what @LucasMS originally suggested and provide a filter/hook to avoid counting rows entirely. Note that it would not be necessary to remove the "Mine" link entirely - merely skipping the count would be sufficient. (The other counts are not fast either, so it would be good to be able to skip those as well.)

#12 follow-up: @johnjamesjacoby
4 months ago

Store the count somewhere instead of re-counting rows every time. (Note that WordPress already does something similar in a few places - e.g., wp_posts.comment_count, wp_term_taxonomy.count.)

The "Mine" queries are per-user & per-post-type, so the only place to keep it is usermeta. I don't really consider it viable though, because of the overhead from meta writes & user meta/object cache invalidation for every post-status change.

(bbPress does this using the user-options API – see bbp_get_user_topic_count_raw() for a similar query – but only with the knowledge & vision that there are very few multi-forum setups out there to overpopulate usermeta the way multisite/blogs/posts would.)

We could, at the very least, add caching to these Mine queries, but that would require more PHP than just adding the index via SQL, and isn't universally useful to plugins, custom post types & statuses, etc...

The other counts are not fast either, so it would be good to be able to skip those as well.

Or add an index for them, too? 😅

Actually, the SQL inside of wp_count_posts() could probably be rewritten to leverage this new index, as it's the same post_type, post_status, post_author combo thanks to readable being used.

Granted, that might just be some bug in older MySQL versions

My guess is MySQL's query optimizer gives up after a certain number of NOT IN's relative to the cardinality of the data in the post_status column, and that including & excluding fewer statuses would usually help.

I tested this on a Ubuntu 22.04 machine with MySQL 8.0.42-0ubuntu0.22.04.2
it is still likely to get slower and slower as more and more blog posts are added to the table.

How many posts did you test this with? (Edit: 168194 I see!) I think the more posts there are, the more useful this index becomes relative to the existing slowness?

Last edited 4 months ago by johnjamesjacoby (previous) (diff)

#13 in reply to: ↑ 12 @siliconforks
4 months ago

Replying to johnjamesjacoby:

My guess is MySQL's query optimizer gives up after a certain number of NOT IN's relative to the cardinality of the data in the post_status column, and that including & excluding fewer statuses would usually help.

It seems like it refuses to use the index whenever there's a NOT IN. Even if it's just one item, like NOT IN ( 'trash' ), it still doesn't use the index. The only way I can get it to use the index is to specify the index explicitly:

SELECT COUNT( 1 )
FROM wp_posts
USE INDEX (type_status_author)
WHERE post_type = 'post'
AND post_status NOT IN ( 'trash','auto-draft','inherit','request-pending','request-confirmed','request-failed','request-completed' )
AND post_author = 1

Then it does use the index.

How many posts did you test this with?

I tried to make it roughly the same as @josephscott did with 340,000 blog posts.

#14 follow-up: @josephscott
4 months ago

@johnjamesjacoby where do we go from here? I'd really like to get this into WP 6.9.

#15 in reply to: ↑ 14 @johnjamesjacoby
4 months ago

  • Keywords 2nd-opinion added
  • Milestone set to 6.9

Replying to josephscott:

@johnjamesjacoby where do we go from here? I'd really like to get this into WP 6.9.

Good question... I think this doable in 6.9 for what I know about the intended scope of this release. I'll milestone it as such right now.

Replying to siliconforks:

The only way I can get it to use the index is to specify the index explicitly:

That's not ideal. USE INDEX makes sense here, but nothing in WordPress Core uses that variety of hint, so we'll definitely need a 2nd opinion before integrating it, and I'll add that keyword.

(bbPress has 1 FORCE INDEX in it to address a similar MySQL optimization concern, inside of _bbp_has_replies_where().)

#16 follow-up: @josephscott
3 months ago

@siliconforks I've been trying to reproduce this issue -

For what it's worth, I tested this on a Ubuntu 22.04 machine with MySQL 8.0.42-0ubuntu0.22.04.2 and found that on this system MySQL does not actually use the index at all

But so far have not been able to. I tried it with a quick Docker approach:

docker run --name test-mysql-server -e MYSQL_ROOT_PASSWORD=--password-- -d mysql:8.0.42

And then I spun up an Ubuntu 22.04 container, brought it up to date and did apt-get install mysql-server to get a server.

$ dpkg -l |grep mysql-server
ii mysql-server 8.0.42-0ubuntu0.22.04.2 all MySQL database server (metapackage depending on the latest version)
ii mysql-server-8.0 8.0.42-0ubuntu0.22.04.2 arm64 MySQL database server binaries and system database setup
ii mysql-server-core-8.0 8.0.42-0ubuntu0.22.04.2 arm64 MySQL database server binaries

This was on my MacBook Pro, so arm64 versions all the way around. I made no changes to the MySQL server config.

 mysql> EXPLAIN SELECT COUNT( 1 )
    ->         FROM wp_posts
    ->         WHERE post_type = 'post'
    ->         AND post_status NOT IN ( 'trash','auto-draft','inherit','request-pending','request-confirmed','request-failed','request-completed' )
    ->         AND post_author = 1
    -> \G
 *************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: wp_posts
   partitions: NULL
         type: range
 possible_keys: type_status_date,post_author,type_status_author
          key: type_status_author
       key_len: 164
          ref: NULL
         rows: 30
     filtered: 39.64
        Extra: Using where; Using index
 1 row in set, 1 warning (0.00 sec)

My first thought on why this might be happening was different in query optimizers between MySQL versions, but that doesn't appear to be the case.

I used a smaller data set, to make it easier to quickly import into different environments. This reduced wp_posts table only has 1,848 rows.

Not sure what is causing the difference in index use. If you have any ideas I am happy to try them out. Better yet, if you can reproduce it with a Docker image that would be great. Then I could confirm if this is a version specific issue.

#17 in reply to: ↑ 16 @siliconforks
3 months ago

Replying to josephscott:

I used a smaller data set, to make it easier to quickly import into different environments. This reduced wp_posts table only has 1,848 rows.

I'm not sure what's going on there, but did you try it with your original data set?

This is what I used to populate the wp_posts table for testing (using wp eval-file):

<?php

define( 'NUM_BATCHES', 3400 );
define( 'POST_CONTENT', 'XXX' );
define( 'STARTING_TIMESTAMP', 500 * 1000 * 1000 );

global $wpdb;

$unix_timestamp = STARTING_TIMESTAMP;
for ( $batch = 0; $batch < NUM_BATCHES; ++$batch ) {
        $sql = [];
        for ( $i = 0; $i < 100; ++$i ) {
                if ( $batch === 0 && $i === 0 ) {
                        $post_author = 2;
                } else {
                        $post_author = 1;
                }
                $post_date = gmdate( 'Y-m-d H:i:s', $unix_timestamp );
                $post_content = POST_CONTENT;
                $post_title = 'Posted at ' . $post_date;
                $post_excerpt = POST_CONTENT;
                if ( $batch === 3398 ) {
                        $post_status = 'draft';
                } elseif ( $batch === 3399 ) {
                        $post_status = 'private';
                } else {
                        $post_status = 'publish';
                }
                $post_name = $post_title;
                $post_name = strtolower( $post_name );
                $post_name = preg_replace( '/[^a-z0-9]/', '-', $post_name );
                $to_ping = '';
                $pinged = '';
                $post_content_filtered = POST_CONTENT;

                $sql[] = $wpdb->prepare( '(%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)', $post_author, $post_date, $post_date, $post_content, $post_title, $post_excerpt, $post_status, $post_name, $to_ping, $pinged, $post_date, $post_date, $post_content_filtered );
                $unix_timestamp += 600;
        }
        $sql = 'INSERT INTO ' . $wpdb->posts . ' (post_author, post_date, post_date_gmt, post_content, post_title, post_excerpt, post_status, post_name, to_ping, pinged, post_modified, post_modified_gmt, post_content_filtered) VALUES ' . implode( ', ', $sql );
        $wpdb->query( $sql );
        echo '.';
        flush();
}
echo "\n";

#18 @josephscott
3 months ago

After more testing ( and some suggestions and experiments from @mreishus ) I think what we have settled on is that use of the new type_status_author can be data dependent. I don't think this is a problem, we should let the MySQL query optimizer pick the index it thinks is best given the conditions of the data and database.

For those that might want to follow along, prepare to go down the rabbit hole. I have a GH repo - https://github.com/josephscott/wporg-mysql-tsa-index - that explores a few different conditions and data sets to show when the new index does and does not get used. All of the tests in the repo are making use of MySQL 8.0.42.

The only requirements are Docker and PHP. Most of the tests are run against a subset of my previously generated hundreds of thousands of posts. These work as expected. An alternate dataset ( from @mreishus ) provide conditions where the new type_status_author does not get used. There are also variations on when the new index is added and running analyze on the table.

Here is the output of the existing tests:

$ php tests.php 

***** Test: 1 *****
> NO TSA index

MySQL version: 8.0.42
dropping table: wp_posts
running: create-table.sql
running: autoinc.sql
running: wp-posts-data.sql
Using index: type_status_date


***** Test: 2 *****
> New TSA index immediately after create table, before insert

MySQL version: 8.0.42
dropping table: wp_posts
running: create-table.sql
running: autoinc.sql
running: tsa-index.sql
running: wp-posts-data.sql
Using index: type_status_author


***** Test: 2b *****
> New TSA index immediately after create table, before insert (alt data set)

MySQL version: 8.0.42
dropping table: wp_posts
running: create-table.sql
running: autoinc.sql
running: tsa-index.sql
running: wp-posts-data-alt.sql
Using index: post_author


***** Test: 3 *****
> New TSA index immediately after create table, then analyze, before insert

MySQL version: 8.0.42
dropping table: wp_posts
running: create-table.sql
running: autoinc.sql
running: tsa-index.sql
running: analyze-wp-posts.sql
running: wp-posts-data.sql
Using index: type_status_author


***** Test: 4 *****
> New TSA index after inserting data

MySQL version: 8.0.42
dropping table: wp_posts
running: create-table.sql
running: autoinc.sql
running: wp-posts-data.sql
running: tsa-index.sql
Using index: type_status_author

There is also a verbose option that will provide the full EXPLAIN key/value pairs. For this test I was only concerned about the specific index that was being used.

This ticket was mentioned in Slack in #core by josephscott. View the logs.


3 months ago

This ticket was mentioned in Slack in #core-performance by josephscott. View the logs.


3 months ago

#21 @josephscott
3 months ago

Mentioned @craigfrancis on this - as he is listed as the maintainer over the database component - https://make.wordpress.org/core/components/

This ticket was mentioned in Slack in #core-performance by josephscott. View the logs.


3 months ago

This ticket was mentioned in Slack in #hosting by josephscott. View the logs.


3 months ago

This ticket was mentioned in Slack in #core-performance by josephscott. View the logs.


3 months ago

#25 @SergeyBiryukov
3 months ago

  • Owner set to SergeyBiryukov
  • Status changed from reopened to accepted

@mukesh27 commented on PR #9312:


3 months ago
#26

@josephscott In https://core.trac.wordpress.org/ticket/15499 we explored adding an index for the get_lastpostmodified query. There was some great discussion on it, though we didn’t commit the change in the end. Worth a read/review.

@josephscott commented on PR #9312:


3 months ago
#27

Unfortunately that ticket, which was opened 15 years ago, ended up like so many other database performance improvement discussions - demonstration of being able to make significant improvements in query times, but no one actually gets it committed.

This ticket was mentioned in Slack in #core by benjamin_zekavica. View the logs.


3 months ago

#29 @josephscott
2 months ago

I'd also be game with taking a different approach to this - for example, only including the index for new sites. In that situation I would remove the index from the upgrade process.

If that would make people feel more comfortable with this please let me know.

#30 @josephscott
2 months ago

@jonsurrell wanted to see if you had any input or ideas on moving this forward.

#31 follow-up: @jonsurrell
2 months ago

I spent a lot of time to understand this and here are my findings:

  • Unsurprisingly, the available indexes seem fine when there are not a lot of rows. Queries on tables few rows are not particularly interesting because they tend to be fast regardless.
  • When the rows get into the hundreds of thousands, the query slows down noticeably. WordPress should remain performant well beyond this threshold.
  • I generated data and did a lot of testing. I found at least three different compound keys that yield significant performance improvements in the neighborhood of 3x faster.

Based on my findings I'm in favor of adding the new index as proposed in PR 9312.


This is a relevant point and one I was wondering about. Is the new index more broadly applicable, or can it be?

Actually, the SQL inside of wp_count_posts() could probably be rewritten to leverage this new index, as it's the same post_type, post_status, post_author combo thanks to readable being used.

I tested this out on a smaller test site. I added three different indexes (described below), monitored some queries with the Query Monitor plugin and then explained them.

I observed that queries without readable continued to use an existing index, but _do_ have the type_status_author index as a candidate:

EXPLAIN SELECT post_status, COUNT( * ) AS num_posts FROM wp_posts WHERE post_type = 'post' GROUP BY post_status;
+------+-------------+----------+------+-------------------------------------+------------------+---------+-------+------+--------------------------+
| id   | select_type | table    | type | possible_keys                       | key              | key_len | ref   | rows | Extra                    |
+------+-------------+----------+------+-------------------------------------+------------------+---------+-------+------+--------------------------+
|    1 | SIMPLE      | wp_posts | ref  | type_status_date,type_status_author | type_status_date | 82      | const | 217  | Using where; Using index |
+------+-------------+----------+------+-------------------------------------+------------------+---------+-------+------+--------------------------+

However, when I force the readable code path to be entered to adjust the query, I observed the following:

EXPLAIN SELECT post_status, COUNT( * ) AS num_posts FROM wp_posts WHERE post_type = 'post' AND (post_status != 'private' OR ( post_author = 1 AND post_status = 'private' )) GROUP BY post_status;
+------+-------------+----------+-------+---------------------------------------------------------------------------------------+--------------------+---------+------+------+--------------------------+
| id   | select_type | table    | type  | possible_keys                                                                         | key                | key_len | ref  | rows | Extra                    |
+------+-------------+----------+-------+---------------------------------------------------------------------------------------+--------------------+---------+------+------+--------------------------+
|    1 | SIMPLE      | wp_posts | range | type_status_date,post_author,type_status_author,author_type_status,type_author_status | type_status_author | 172     | NULL | 218  | Using where; Using index |
+------+-------------+----------+-------+---------------------------------------------------------------------------------------+--------------------+---------+------+------+--------------------------+

The proposed type_author_status is used! The other indexes I tried were also available in this case, but were not selected. This is more supports my conclusion below that the proposed index is a good choice.


I don't have a site available for benchmarking with sufficiently large posts table, but I did some benchmarking locally that align with the findings @josephscott shared above.

I generated a posts table and inserted random data. Users were randomly selected from 7 users, post types were random from the standard set of post types, and so on for status, etc. The table contained 400,000 rows.

I tried a few different ways of benchmarking, primarily I used hyperfine to run the query via PHP with or without different indexes.

I tried three different indexes:

  • No new index (current WordPress Core).
  • TSA as proposed by @josephscott (post_type, post_status, post_author)
  • ATS (post_author, post_type, post_status )
  • TAS (post_type, post_author, post_status )

In my test data, TSA, ATS, and TAS all seemed to yield a greater than 3x improvement. ATS and TAS seemed to yield the largest speedup with a negligible difference, while TSA had slightly smaller improvement.

Given that this was generated test data and not likely to follow real world data patterns (for example, I doubt the distribution of post type, author, and status is uniform) I'm confident enough that the performance improvement is real and likely to mitigate the problem described in this ticket for large sites. I'm not confident enough in the results to recommend a specific index over another and would instead rely on the TSA index that @josephscott selected based on working with real sites.

Last edited 2 months ago by jonsurrell (previous) (diff)

#32 in reply to: ↑ 31 ; follow-up: @SirLouen
2 months ago

  • Component changed from Posts, Post Types to Database

Replying to jonsurrell:

Given that this was generated test data and not likely to follow real world data patterns (for example, I doubt the distribution of post type, author, and status is uniform) I'm confident enough that the performance improvement is real and likely to mitigate the problem described in this ticket for large sites. I'm not confident enough in the results to recommend a specific index over another and would instead rely on the TSA index that @josephscott selected based on working with real sites.

Like @mukesh27 commented, there is another variant commented on in #15499 that can be seen in PR 3117

We added CREATE INDEX type_status_modified ON wp_posts (post_type, post_status, post_modified_gmt); and CREATE INDEX type_status_modified_no_id ON wp_posts (post_type, post_status, post_date_gmt);

I have not tested it yet, but since you made the test env, could you also check this to see whether this is better or not? At a first glance I would say it should not.

Last edited 2 months ago by SirLouen (previous) (diff)

#33 in reply to: ↑ 32 ; follow-up: @jonsurrell
2 months ago

Replying to SirLouen:

Like @mukesh27 commented, there is another variant commented on in #15499 that can be seen in PR 3117
I have not tested it yet, but since you made the test env, could you also check this to see whether this is better or not? At a first glance I would say it should not.

Will you clarify what you wanted me to test? I understand those changes target different queries. I'm not sure if this is what you meant, but I tried adding the indexes introduced in the other PR and ran my benchmarks (against the same queries in this ticket). Those indexes did not seem to have any significant impact.

#34 @josephscott
2 months ago

PR 9312 has been rebased

Last edited 2 months ago by josephscott (previous) (diff)

#35 in reply to: ↑ 33 @SirLouen
2 months ago

Replying to jonsurrell:

Will you clarify what you wanted me to test? I understand those changes target different queries. I'm not sure if this is what you meant, but I tried adding the indexes introduced in the other PR and ran my benchmarks (against the same queries in this ticket). Those indexes did not seem to have any significant impact.

It seems that mukesh commented that in the WCUS there were a group of people that were looking into this. Since you got the testing env fresh with all the posts and that, I wanted to quickly see if it was worthy or not.

Now that I got more time, I've gotten back more deeply into that PR and I'm seeing that its partially related, but not completely related. The problem here is that that PR aimed for an improvement of a query that is not even as frequent as this. So I think that we can just ignore that one, and focus on this.

#36 @SergeyBiryukov
2 months ago

  • Resolution set to fixed
  • Status changed from accepted to closed

In 60717:

Database: Add type_status_author index for the posts table.

This aims to improve performance of some queries on installations with a large number of posts.

Follow-up to [3678], [3690], [9290], [13576].

Props josephscott, LucasMS, Otto42, flixos90, matt, johnjamesjacoby, siliconforks, mukesh27, jonsurrell, SirLouen, SergeyBiryukov.
Fixes #50161.

#37 @SergeyBiryukov
2 months ago

In 60718:

Database: Update $wp_db_version to point to the correct commit number.

Follow-up to [60717].

See #50161.

#38 @peterwilsoncc
2 months ago

  • Keywords has-patch 2nd-opinion removed
  • Resolution fixed deleted
  • Status changed from closed to reopened

Reopening for some follow up: dbDelta() will modify the table without the need to call $wpdb->query() in the upgrade routine. The database version bump alone will trigger it, code ref

Testing notes:

  • Checkout r60716
  • Fresh install: wp db clean --yes; wp core install --url=http://wp-dev.local/ --title="WP Dev" --admin_user=admin --admin_password=password --admin_email=admin@example.com (your specific commands will vary)
  • Log in
  • Checkout trunk
  • Remove ALTER command in the upgrade routine
  • Reload dashboard page
  • Go through db upgrade as prompted
  • Review DB schema

PR incoming but I won't commit until I get a logic check as dbDelta() is a little funky.

Edit:

The call to upgrade_690() was in the wrong location, once moved an error is thrown during upgrade:

[08-Sep-2025 04:43:50 UTC] WordPress database error Duplicate key name 'type_status_author' for query ALTER TABLE wp_posts ADD INDEX type_status_author (post_type,post_status,post_author) made by wp_upgrade, upgrade_all, upgrade_690
Last edited 2 months ago by peterwilsoncc (previous) (diff)

#39 @SergeyBiryukov
2 months ago

In 60721:

Upgrade/Install: Correct the database upgrade routine for WordPress 6.9.

This commit:

  • Moves the call to upgrade_690() to the correct place so it actually gets called.
  • Wraps Hello Dolly upgrade code in a version check per standard practice.
  • Removes the $wpdb->query() call to avoid attempting to create the new index twice, once in the upgrade and once in dbDelta().

Follow-up to [60666], [60716], [60717].

Props peterwilsoncc, mukesh27.
See #50161, #53323.

#40 @peterwilsoncc
2 months ago

  • Resolution set to fixed
  • Status changed from reopened to closed

Closing as fixed following @SergeyBiryukov's commit, r60721, fixing/removing the upgrade routine given dbDelta().

This ticket was mentioned in Slack in #core by westonruter. View the logs.


3 weeks ago

Note: See TracTickets for help on using tickets.