Opened 3 years ago

Last modified 6 months ago

#15499 new enhancement

Add an index for get_lastpostmodified query

Reported by: simonwheatley Owned by:
Priority: normal Milestone: Future Release
Component: Database Version: 3.0.1
Severity: normal Keywords: has-patch dev-feedback 3.6-early
Cc:

Description

I had a friend (Jools Wills) look over a WordPress site recently, to get a fresh view on what might be optimised, and he noticed a query which might benefit from an additional index on WP_Posts. The query SELECT post_modified_gmt FROM $wpdb->posts WHERE post_status = 'publish' AND post_type = 'post' ORDER BY post_modified_gmt DESC LIMIT 1 in get_lastpostmodified is run for last modified date in GMT, and currently doesn't use an index. This SQL is run whenever certain types of feed are requested as far as I can see.

We added CREATE INDEX type_status_modified ON wp_posts (post_type, post_status, post_modified_gmt); and CREATE INDEX type_status_modified_no_id ON wp_posts (post_type, post_status, post_date_gmt); and the query runs a lot faster now. The following timings were taken running the first query (post_modified_gmt) on a 36,362 row posts table. Note that it doesn't use filesort after the index has been added.

Before:

mysql> EXPLAIN SELECT post_modified_gmt FROM slgr_posts WHERE post_status = 'publish' AND post_type = 'post' ORDER BY post_modified_gmt DESC LIMIT 1;
+----+-------------+------------+------+------------------+------------------+---------+-------------+-------+-----------------------------+
| id | select_type | table      | type | possible_keys    | key              | key_len | ref         | rows  | Extra                       |
+----+-------------+------------+------+------------------+------------------+---------+-------------+-------+-----------------------------+
|  1 | SIMPLE      | slgr_posts | ref  | type_status_date | type_status_date | 124     | const,const | 24718 | Using where; Using filesort |
+----+-------------+------------+------+------------------+------------------+---------+-------------+-------+-----------------------------+
1 row in set (0.03 sec)
  • 0.21290683746338ms
  • 0.25690102577209ms
  • 0.230553150177ms
  • 0.2274341583252ms
  • 0.23083996772766ms

After:

mysql> EXPLAIN SELECT post_modified_gmt FROM slgr_posts WHERE post_status = 'publish' AND post_type = 'post' ORDER BY post_modified_gmt DESC LIMIT 1;
+----+-------------+------------+------+---------------------------------------+----------------------+---------+-------------+-------+-------------+
| id | select_type | table      | type | possible_keys                         | key                  | key_len | ref         | rows  | Extra       |
+----+-------------+------------+------+---------------------------------------+----------------------+---------+-------------+-------+-------------+
|  1 | SIMPLE      | slgr_posts | ref  | type_status_date,type_status_modified | type_status_modified | 124     | const,const | 24718 | Using where |
+----+-------------+------------+------+---------------------------------------+----------------------+---------+-------------+-------+-------------+
1 row in set (0.00 sec)
  • 0.00082707405090332ms
  • 0.00072288513183594ms
  • 0.00074386596679688ms
  • 0.00066494941711426ms
  • 0.00066208839416504ms

In get_lastpostmodified both these queries are run, so the total savings in my case on a quiet server are nearly 0.5 seconds... worth having, I reckon.

I've not created a patch for schema changes before, but I think the only place the change would need to go would be scheme.php? Suggested patch attached.

Attachments (1)

new-keys.diff (569 bytes) - added by simonwheatley 3 years ago.
New indexes for wp_posts

Download all attachments as: .zip

Change History (10)

New indexes for wp_posts

comment:1   dd322 years ago

  • Keywords 3.2-early added
  • Milestone changed from Awaiting Review to Future Release
  • Keywords 3.2-early removed

These are good keys and should be added.

  • Milestone changed from Future Release to 3.5

Let me know if there's anything further you need from me for this.

I recall barry hesitating for the idea of adding an index to improve a single query.

This query is persistently cached, so this is definitely less of a concern than if we couldn't reasonably cache it.

get_lastpostmodified() is only used in core with 'GMT' (and only for feeds), so I think at most we need to add type_status_modified.

An index to improve a single query is fine, if that query is run frequently enough.

I don't think using the object cache is a good enough reason to not do this - do that many sites really have a persistent cache? Similarly, I think these indexes would be more helpful on self-hosted sites than they would be harmful on WP.com.

I'm not following why we'd do one but not the other - both queries are run every time get_lastpostmodified() is called.

Finally, nobody likes long read locks. :)

I didn't catch that get_lastpostmodified() also calls get_lastpostdate(), triggering the other query. Lame.

Here's the thing. I don't know why. get_lastpostdate() wants to know the newest published post. Since it's published, it can't be in the future. get_lastpostmodified() wants to know the latest modified date. There is never going to be a situation that I can think of (aside from manually messing around with the values) where the latest post_modified date in a table is going to be less than the latest post_date in a table.

I think we can just axe the call to get_lastpostdate() from get_lastpostmodified() for the same amount of functionality.

This compare-modified-and-last-post-date thing was introduced, no joke, in [908]. Probably time it got challenged.

  • Keywords 3.6-early added
  • Milestone changed from 3.5 to Future Release

The great majority of the time these functions are called are within a feed context, and in fact, they're called multiple times there. Once, in the WP::send_headers() then once for the feed generation. I'm not sure if this query is caching within WP or mysql at all and thus whether the second call is significant, but I would expect it is significant.

I'm looking to remove the first call, and set the headers later in #22742. If there are hesitations to adding the index, then perhaps reducing the query by 1/2 would be another way to deal with this issue. (Although, without a full investigation, I like the index).

Note: See TracTickets for help on using tickets.