WordPress.org

Make WordPress Core

Opened 7 years ago

Closed 5 years ago

Last modified 5 years ago

#8351 closed enhancement (fixed)

Poorly optimized queries in wp_get_calendar()

Reported by: vladimir_kolesnikov Owned by:
Milestone: 3.1 Priority: normal
Severity: normal Version:
Component: Optimization Keywords: has-patch commit
Focuses: Cc:

Description

Performance of the queries in is too bad wp_get_archives() and wp_get_calendar() if you have a lot of records in wp_posts table (so bad that hosting company suspends the account).

1. wp_get_archives():

EXPLAIN SELECT DISTINCT YEAR(post_date) AS `year`, MONTH(post_date) AS `month`, count(ID) as posts FROM wp_posts WHERE post_type = 'post' AND post_status = 'publish' GROUP BY YEAR(post_date), MONTH(post_date) ORDER BY post_date DESC

+----+-------------+----------+------+------------------+------------------+---------+-------------+------+-----------------------------------------------------------+
| id | select_type | table    | type | possible_keys    | key              | key_len | ref         | rows | Extra                                                     |
+----+-------------+----------+------+------------------+------------------+---------+-------------+------+-----------------------------------------------------------+
|  1 | SIMPLE      | wp_posts | ref  | type_status_date | type_status_date | 124     | const,const | 3594 | Using where; Using index; Using temporary; Using filesort |
+----+-------------+----------+------+------------------+------------------+---------+-------------+------+-----------------------------------------------------------+ 

Nothing you can do with this query, as grouping by expression cannot use indices by definition. The only thing is to remove DISTINCT from the query, as grouping by year and month guarantees that every (year, month) pair in the result set will be unique. Same for yearly and daily archives.

2. wp_get_calendar():

Although we can do almost nothing to wp_get_archives() (due to DB design deficiencies), we can significantly boost wp_get_calendar().

First, wp_get_calendar() checks if we have any posts at all:

SELECT ID from wp_posts WHERE post_type = 'post' AND post_status = 'publish' ORDER BY post_date DESC LIMIT 1

Since we only need to check the existence of the record, we don't have any reason to use ORDER BY (although it does nothing harmful as MySQL is able to use index).

If the query succeeded, WP starts searching for the posts in past months:

SELECT DISTINCT MONTH(post_date) AS month, YEAR(post_date) AS year FROM wp_posts WHERE post_date < '2008-11-01' AND post_type = 'post' AND post_status = 'publish' ORDER BY post_date DESC LIMIT 1

What is wrong: since we need only ONE record (see LIMIT 1), it will be unique in the result set anyway, DISTINCT is not needed. Here's the EXPLAIN of the query:

+----+-------------+----------+-------+----------------------------+------------------+---------+------+------+-------------------------------------------+
| id | select_type | table    | type  | possible_keys              | key              | key_len | ref  | rows | Extra                                     |
+----+-------------+----------+-------+----------------------------+------------------+---------+------+------+-------------------------------------------+
|  1 | SIMPLE      | wp_posts | range | type_status_date,post_type | type_status_date | 132     | NULL | 3238 | Using where; Using index; Using temporary |
+----+-------------+----------+-------+----------------------------+------------------+---------+------+------+-------------------------------------------+ 

As you can see, MySQL needs a temporary table (it doesn't matter that it will have only one row, it will be created). If we remove DISTINCT, we can get rid of the temporary table:

EXPLAIN SELECT MONTH(post_date) AS month, YEAR(post_date) AS year FROM wp_posts WHERE post_date < '2008-11-01' AND post_type = 'post' AND post_status = 'publish' ORDER BY post_date DESC LIMIT 1

+----+-------------+----------+-------+----------------------------+------------------+---------+------+------+--------------------------+
| id | select_type | table    | type  | possible_keys              | key              | key_len | ref  | rows | Extra                    |
+----+-------------+----------+-------+----------------------------+------------------+---------+------+------+--------------------------+
|  1 | SIMPLE      | wp_posts | range | type_status_date,post_type | type_status_date | 132     | NULL | 3238 | Using where; Using index |
+----+-------------+----------+-------+----------------------------+------------------+---------+------+------+--------------------------+ 

Then WP looks for the posts written in the next months:

SELECT DISTINCT MONTH(post_date) AS month, YEAR(post_date) AS year FROM wp_posts
WHERE post_date > '2008-11-01' AND MONTH( post_date ) != MONTH( '2008-11-01' )
AND post_type = 'post' AND post_status = 'publish' ORDER BY post_date ASC LIMIT 1

+----+-------------+----------+-------+----------------------------+------------------+---------+------+------+-------------------------------------------+
| id | select_type | table    | type  | possible_keys              | key              | key_len | ref  | rows | Extra                                     |
+----+-------------+----------+-------+----------------------------+------------------+---------+------+------+-------------------------------------------+
|  1 | SIMPLE      | wp_posts | range | type_status_date,post_type | type_status_date | 132     | NULL |  356 | Using where; Using index; Using temporary |
+----+-------------+----------+-------+----------------------------+------------------+---------+------+------+-------------------------------------------+

Same as above: DISTINCT is not needed. But we still can rewrite the query in a better way:

EXPLAIN  SELECT MONTH(post_date) AS month, YEAR(post_date) AS year FROM wp_posts WHERE post_date > '2008-11-31 23:59:59' AND post_type = 'post' AND post_status = 'publish' ORDER BY post_date ASC LIMIT 1;+----+-------------+----------+-------+----------------------------+------------------+---------+------+------+--------------------------+
| id | select_type | table    | type  | possible_keys              | key              | key_len | ref  | rows | Extra                    |
+----+-------------+----------+-------+----------------------------+------------------+---------+------+------+--------------------------+
|  1 | SIMPLE      | wp_posts | range | type_status_date,post_type | type_status_date | 132     | NULL |    1 | Using where; Using index |
+----+-------------+----------+-------+----------------------------+------------------+---------+------+------+--------------------------+

We got rid of Using temporary AND we were able to make MySQL scan much less rows!

Next, WP looks for the days in the month when at least one post was published:

SELECT DISTINCT DAYOFMONTH(post_date)
FROM wp_posts WHERE MONTH(post_date) = '11'
AND YEAR(post_date) = '2008'
AND post_type = 'post' AND post_status = 'publish'
AND post_date < '2008-11-24 18:30:04'

Only range optimization can be used here. But, as I have already mentioned, MySQL is unable to use indices for expressions. Thus, our range search will only be limited from the top:

+----+-------------+----------+-------+----------------------------+------------------+---------+------+------+-------------------------------------------+
| id | select_type | table    | type  | possible_keys              | key              | key_len | ref  | rows | Extra                                     |
+----+-------------+----------+-------+----------------------------+------------------+---------+------+------+-------------------------------------------+
|  1 | SIMPLE      | wp_posts | range | type_status_date,post_type | type_status_date | 132     | NULL | 3594 | Using where; Using index; Using temporary | 
+----+-------------+----------+-------+----------------------------+------------------+---------+------+------+-------------------------------------------+

It is very easy to rewrite the query:

SELECT DISTINCT DAYOFMONTH(post_date)
FROM wp_posts
WHERE post_date >= '2008-11-01 00:00:00'
AND post_type = 'post' AND post_status = 'publish'
AND post_date < '2008-11-24 18:30:04'

+----+-------------+----------+-------+----------------------------+------------------+---------+------+------+-------------------------------------------+
| id | select_type | table    | type  | possible_keys              | key              | key_len | ref  | rows | Extra                                     |
+----+-------------+----------+-------+----------------------------+------------------+---------+------+------+-------------------------------------------+
|  1 | SIMPLE      | wp_posts | range | type_status_date,post_type | type_status_date | 132     | NULL |  267 | Using where; Using index; Using temporary | 
+----+-------------+----------+-------+----------------------------+------------------+---------+------+------+-------------------------------------------+

As you can see, MySQL needed to scan only 267 rows instead 3594, and the more records you have in your wp_posts table, the more performance is.

Finally, WP fetches the days and post titles:

SELECT post_title, DAYOFMONTH(post_date) as dom
FROM wp_posts
WHERE YEAR(post_date) = '2008'
AND MONTH(post_date) = '11'
AND post_date < '2008-11-24 18:30:04'
AND post_type = 'post'
AND post_status = 'publish'

The above optimization can be applied here.

Attachments (3)

patch.diff (4.2 KB) - added by vladimir_kolesnikov 7 years ago.
Patch for the Ticket #8351
general-template.php.diff (4.5 KB) - added by vladimir_kolesnikov 6 years ago.
New patch for #8351/9197 (branches/2.7 only)
general-template.php.2.diff (2.4 KB) - added by vladimir_kolesnikov 6 years ago.

Download all attachments as: .zip

Change History (23)

@vladimir_kolesnikov7 years ago

Patch for the Ticket #8351

comment:1 @ryan7 years ago

  • Keywords early added
  • Milestone changed from 2.7 to 2.8

Patch seems good, but we're too close to the 2.7 final release to change queries. Let's schedule this for early 2.8 and consider backporting to a 2.7 dot release.

comment:2 @djr7 years ago

  • Cc djr added
  • Keywords mysql queries database added

comment:3 @djr7 years ago

See also 7415 for more queries to be optimized.

comment:4 @ryan7 years ago

  • Resolution set to fixed
  • Status changed from new to closed

(In [10602]) Better archives and calendar queries. Props vladimir_kolesnikov. fixes #8351

comment:5 @ryan7 years ago

(In [10646]) Revert get_calendar() changes from [10602]. Broke calendar links. see #8351

comment:7 @vladimir_kolesnikov6 years ago

  • Cc vladimir@… added
  • Resolution fixed deleted
  • Status changed from closed to reopened

Attched is the patch that fixes the bug for branches/2.7 - no posts for subsequent months will be shown and queries become optimized.

NB: the patch is for branches/2.7 and has not been tested on trunk.

@vladimir_kolesnikov6 years ago

New patch for #8351/9197 (branches/2.7 only)

comment:8 @Denis-de-Bernardy6 years ago

  • Keywords tested added; early mysql queries database removed

comment:9 @ryan6 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

There probably won't be another 2.7 release, and we had to revert some of the calendar queries because they broke the calendar. If you want to try a new patch for that against 2.8, reopen.

comment:10 @vladimir_kolesnikov6 years ago

  • Component changed from General to Optimization
  • Keywords needs-patch added; has-patch tested removed
  • Resolution fixed deleted
  • Status changed from closed to reopened

we had to revert some of the calendar queries because they broke the calendar

I saw that. The latest patch takes this into account.

If you want to try a new patch for that against 2.8, reopen.

OK, will create a patch for 2.8.

comment:11 @Denis-de-Bernardy6 years ago

  • Summary changed from Poorly optimized queries in wp_get_archives() and wp_get_calendar() to Poorly optimized queries in wp_get_calendar()

comment:12 @Denis-de-Bernardy6 years ago

  • Milestone changed from 2.8 to Future Release

punting to future, pending new patch

comment:13 @Denis-de-Bernardy6 years ago

  • Milestone changed from Future Release to 2.9

comment:14 @Denis-de-Bernardy6 years ago

  • Type changed from defect (bug) to enhancement

comment:15 @vladimir_kolesnikov6 years ago

  • Keywords has-patch needs-testing added; needs-patch removed

Added a new patch against rev. 11862

comment:16 @ryan6 years ago

  • Milestone changed from 2.9 to 3.0

comment:17 @Denis-de-Bernardy6 years ago

  • Keywords tested commit added; needs-testing removed

calendar is working fine and as expected here. calendar queries run as so:

without:

SELECT DISTINCT MONTH(post_date) AS month, YEAR(post_date) AS year
		FROM www_posts
		WHERE post_date < '2009-11-01'
		AND post_type = 'post' AND post_status = 'publish'
			ORDER BY post_date DESC
			LIMIT 1

0.7ms

SELECT	DISTINCT MONTH(post_date) AS month, YEAR(post_date) AS year
		FROM www_posts
		WHERE post_date >	'2009-11-01'
		AND MONTH( post_date ) != MONTH( '2009-11-01' )
		AND post_type = 'post' AND post_status = 'publish'
			ORDER	BY post_date ASC
			LIMIT 1

0.8ms

SELECT DISTINCT DAYOFMONTH(post_date)
		FROM www_posts WHERE MONTH(post_date) = '11'
		AND YEAR(post_date) = '2009'
		AND post_type = 'post' AND post_status = 'publish'
		AND post_date < '2010-03-01 19:58:02'

1.4ms

SELECT post_title, DAYOFMONTH(post_date) as dom FROM www_posts WHERE YEAR(post_date) = '2009' AND MONTH(post_date) = '11' AND post_date < '2010-03-01 19:58:02' AND post_type = 'post' AND post_status = 'publish'

7.4ms

with:

SELECT MONTH(post_date) AS month, YEAR(post_date) AS year
		FROM www_posts
		WHERE post_date < '2009-11-01'
		AND post_type = 'post' AND post_status = 'publish'
			ORDER BY post_date DESC
			LIMIT 1

0.7ms

SELECT MONTH(post_date) AS month, YEAR(post_date) AS year
		FROM www_posts
		WHERE post_date > '2009-11-30 23:59:59'
		AND post_type = 'post' AND post_status = 'publish'
			ORDER BY post_date ASC
			LIMIT 1

0.8ms

SELECT DISTINCT DAYOFMONTH(post_date)
		FROM www_posts WHERE post_date >= '2009-11-01 00:00:00'
		AND post_type = 'post' AND post_status = 'publish'
		AND post_date <= '2009-11-30 23:59:59'

0.6ms

SELECT post_title, DAYOFMONTH(post_date) as dom FROM www_posts WHERE post_date >= '2009-11-01 00:00:00' AND post_date <= '2009-11-30 23:59:59' AND post_type = 'post' AND post_status = 'publish'

0.8ms

comment:18 @nacin5 years ago

  • Keywords tested removed
  • Milestone changed from 3.0 to 3.1

comment:19 @markjaquith5 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

(In [15585]) More efficient calendar queries. fixes #8351. props vladimir_kolesnikov

comment:20 @nacin5 years ago

  • Milestone changed from Awaiting Triage to 3.1
Note: See TracTickets for help on using tickets.