Opened 3 years ago
Last modified 6 months ago
#15499 new enhancement
Add an index for get_lastpostmodified query
| Reported by: |
|
Owned by: | |
|---|---|---|---|
| Priority: | normal | Milestone: | Future Release |
| Component: | Database | Version: | 3.0.1 |
| Severity: | normal | Keywords: | has-patch dev-feedback 3.6-early |
| Cc: |
Description
I had a friend (Jools Wills) look over a WordPress site recently, to get a fresh view on what might be optimised, and he noticed a query which might benefit from an additional index on WP_Posts. The query SELECT post_modified_gmt FROM $wpdb->posts WHERE post_status = 'publish' AND post_type = 'post' ORDER BY post_modified_gmt DESC LIMIT 1 in get_lastpostmodified is run for last modified date in GMT, and currently doesn't use an index. This SQL is run whenever certain types of feed are requested as far as I can see.
We added CREATE INDEX type_status_modified ON wp_posts (post_type, post_status, post_modified_gmt); and CREATE INDEX type_status_modified_no_id ON wp_posts (post_type, post_status, post_date_gmt); and the query runs a lot faster now. The following timings were taken running the first query (post_modified_gmt) on a 36,362 row posts table. Note that it doesn't use filesort after the index has been added.
Before:
mysql> EXPLAIN SELECT post_modified_gmt FROM slgr_posts WHERE post_status = 'publish' AND post_type = 'post' ORDER BY post_modified_gmt DESC LIMIT 1; +----+-------------+------------+------+------------------+------------------+---------+-------------+-------+-----------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+------+------------------+------------------+---------+-------------+-------+-----------------------------+ | 1 | SIMPLE | slgr_posts | ref | type_status_date | type_status_date | 124 | const,const | 24718 | Using where; Using filesort | +----+-------------+------------+------+------------------+------------------+---------+-------------+-------+-----------------------------+ 1 row in set (0.03 sec)
- 0.21290683746338ms
- 0.25690102577209ms
- 0.230553150177ms
- 0.2274341583252ms
- 0.23083996772766ms
After:
mysql> EXPLAIN SELECT post_modified_gmt FROM slgr_posts WHERE post_status = 'publish' AND post_type = 'post' ORDER BY post_modified_gmt DESC LIMIT 1; +----+-------------+------------+------+---------------------------------------+----------------------+---------+-------------+-------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+------+---------------------------------------+----------------------+---------+-------------+-------+-------------+ | 1 | SIMPLE | slgr_posts | ref | type_status_date,type_status_modified | type_status_modified | 124 | const,const | 24718 | Using where | +----+-------------+------------+------+---------------------------------------+----------------------+---------+-------------+-------+-------------+ 1 row in set (0.00 sec)
- 0.00082707405090332ms
- 0.00072288513183594ms
- 0.00074386596679688ms
- 0.00066494941711426ms
- 0.00066208839416504ms
In get_lastpostmodified both these queries are run, so the total savings in my case on a quiet server are nearly 0.5 seconds... worth having, I reckon.
I've not created a patch for schema changes before, but I think the only place the change would need to go would be scheme.php? Suggested patch attached.
Attachments (1)
Change History (10)
simonwheatley — 3 years ago
- Keywords 3.2-early added
- Milestone changed from Awaiting Review to Future Release
comment:4
simonwheatley — 10 months ago
Let me know if there's anything further you need from me for this.
I recall barry hesitating for the idea of adding an index to improve a single query.
This query is persistently cached, so this is definitely less of a concern than if we couldn't reasonably cache it.
get_lastpostmodified() is only used in core with 'GMT' (and only for feeds), so I think at most we need to add type_status_modified.
An index to improve a single query is fine, if that query is run frequently enough.
I don't think using the object cache is a good enough reason to not do this - do that many sites really have a persistent cache? Similarly, I think these indexes would be more helpful on self-hosted sites than they would be harmful on WP.com.
I'm not following why we'd do one but not the other - both queries are run every time get_lastpostmodified() is called.
Finally, nobody likes long read locks. :)
I didn't catch that get_lastpostmodified() also calls get_lastpostdate(), triggering the other query. Lame.
Here's the thing. I don't know why. get_lastpostdate() wants to know the newest published post. Since it's published, it can't be in the future. get_lastpostmodified() wants to know the latest modified date. There is never going to be a situation that I can think of (aside from manually messing around with the values) where the latest post_modified date in a table is going to be less than the latest post_date in a table.
I think we can just axe the call to get_lastpostdate() from get_lastpostmodified() for the same amount of functionality.
This compare-modified-and-last-post-date thing was introduced, no joke, in [908]. Probably time it got challenged.
- Keywords 3.6-early added
- Milestone changed from 3.5 to Future Release
comment:9
brokentone — 6 months ago
The great majority of the time these functions are called are within a feed context, and in fact, they're called multiple times there. Once, in the WP::send_headers() then once for the feed generation. I'm not sure if this query is caching within WP or mysql at all and thus whether the second call is significant, but I would expect it is significant.
I'm looking to remove the first call, and set the headers later in #22742. If there are hesitations to adding the index, then perhaps reducing the query by 1/2 would be another way to deal with this issue. (Although, without a full investigation, I like the index).

New indexes for wp_posts