WordPress.org

Make WordPress Core

Opened 9 years ago

Closed 2 years ago

#9642 closed enhancement (maybelater)

Database Schema Optimizations

Reported by: Denis-de-Bernardy Owned by:
Milestone: Priority: normal
Severity: normal Version: 2.8
Component: Database Keywords: needs-patch needs-refresh
Focuses: performance Cc:

Description

Diffs coming...

Attachments (3)

9642-upgrade.php.diff (1.2 KB) - added by Denis-de-Bernardy 9 years ago.
9642-wpdb-debug-tools.php.diff (583 bytes) - added by Denis-de-Bernardy 9 years ago.
db debug tools (updated)
9642-posts.diff (6.8 KB) - added by Denis-de-Bernardy 9 years ago.
this is not finished yet

Download all attachments as: .zip

Change History (45)

#1 @Denis-de-Bernardy
9 years ago

  • Component changed from General to Optimization

#2 @Denis-de-Bernardy
9 years ago

  • Owner changed from anonymous to Denis-de-Bernardy

#3 @Denis-de-Bernardy
9 years ago

9642-upgrade.php.diff gets rid of multitudes of pointless upgrade queries that can be done in one go. they're not used often, since it's old stuff, but I bumped into it while looking into posts-related queries...

#4 @Denis-de-Bernardy
9 years ago

that last patch reveals quite a bit about existing WP queries :-)

#5 @Denis-de-Bernardy
9 years ago

useful debugging code, after the require in wp-config.php:

function dump_queries() {
	global $wpdb;
	foreach ( $wpdb->queries as $query ) {
		echo '<pre style="text-align: left; padding: 10px; border: solid 1px #000;">' . "\n";
		var_dump($query);
		echo '</pre>' . "\n";
	}
}

if ( defined('SAVEQUERIES') && defined('QUERIES_DEBUG') ) {
	add_action('wp_footer', 'dump_queries');
	add_action('admin_footer', 'dump_queries');
}

#6 @Denis-de-Bernardy
9 years ago

  • Keywords has-patch added

@Denis-de-Bernardy
9 years ago

db debug tools (updated)

#7 @DD32
9 years ago

Correct me if i'm wrong, But THEN is a MySQL 5 thing is it not?

#8 @Denis-de-Bernardy
9 years ago

no no... case when foo then bar when foo then bar else bar end is ansi.

#9 @Denis-de-Bernardy
9 years ago

*grin*. This is sooo much better:

mysql> explain select www_posts.* FROM www_posts  WHERE 1=1  AND www_posts.post_type = 'post' AND (www_posts.post_status = 'publish' OR www_posts.post_status = 'private')  ORDER BY www_posts.post_date_gmt DESC LIMIT 0, 10 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: www_posts
         type: ref
possible_keys: type_parent,type_date_gmt,type_modified_gmt
          key: type_date_gmt
      key_len: 62
          ref: const
         rows: 304
        Extra: Using where
1 row in set (0.01 sec)

#10 @Denis-de-Bernardy
9 years ago

The most horrible to optimize are the page related queries. MySQL's optimizer is stubbornly refuses to use an index on multi-field sorts such as menu_order, post_title ASC...

#11 @Denis-de-Bernardy
9 years ago

oh silly me, nevermind that, there is no limit. no wonder it's file-sorting. :D

#12 @DD32
9 years ago

case when foo then bar when foo then bar else bar end is ansi.

Ah... The docs suggest its MySQL5+ only.. but it is in 4.. Interesting SQL i've never come across before :)

@Denis-de-Bernardy
9 years ago

this is not finished yet

#13 @Denis-de-Bernardy
9 years ago

  • Milestone changed from 2.8 to 2.9

#17 @Denis-de-Bernardy
9 years ago

  • Type changed from defect (bug) to enhancement

#18 @Denis-de-Bernardy
9 years ago

  • Status changed from new to accepted

#19 @Denis-de-Bernardy
9 years ago

  • Keywords has-patch removed

#20 @dd32
9 years ago

I was nearly going to query the need for the upgrade patch from here.. ("Why bother touching the old upgrade code.. its working and not used by new installs") But given thats for the 2.0 -> onwards upgrade.. I think that might be a worthy inclusion.

Certainly going to be a lot more efficient than the current loop which could end up with a large number of queries being made.

#21 @aaroncampbell
9 years ago

  • Cc aaroncampbell added

#22 follow-up: @dbuser123
9 years ago

Posted before but still not fixed.

Another slow query (0.30s for a 50 MB comments table) I regularly encounter is this one:

SELECT comment_post_ID
FROM comments
WHERE LCASE(comment_author_email) = 'user@host.com' AND comment_subscribe='Y' AND comment_approved = '1'
GROUP BY comment_post_ID

It would be better to always store e-mail as lowercase, stop using the LCASE function, and add an index on (comment_author_email, comment_post_ID).

#23 follow-up: @dbuser123
9 years ago

Why is autoload in the options table not an enum field?

#24 in reply to: ↑ 23 @Denis-de-Bernardy
9 years ago

Replying to dbuser123:

Why is autoload in the options table not an enum field?

For legacy reasons and/or far away compatibility with other DB engines?

Else, agreed. It should be a tinyint(1).

D.

#25 follow-up: @dd32
9 years ago

Posted before but still not fixed.

Probably because its not a query WordPress makes.

I believe that belongs to the Subscribe to comments plugin

#26 in reply to: ↑ 25 ; follow-up: @dbuser123
9 years ago

Replying to dd32:

Probably because its not a query WordPress makes.

I see, sorry for the disturbance.

A minor improvement for the comments table: the first index is redundant since it is a prefix of the second one. KEY comment_approved (comment_approved), KEY comment_approved_date_gmt (comment_approved,comment_date_gmt),

#27 @Denis-de-Bernardy
9 years ago

  • Owner Denis-de-Bernardy deleted
  • Status changed from accepted to assigned

#28 @ryan
9 years ago

  • Milestone changed from 2.9 to 3.0

#29 @mattrude
9 years ago

  • Cc m@… added

#30 @hakre
8 years ago

  • Keywords has-patch added

#31 @hakre
8 years ago

Reviewed the ticket again. Last patch has the comment this is not finished yet. Looks like something is still missing here?

#32 @hakre
8 years ago

  • Keywords reporter-feedback added; has-patch removed

#33 @Denis-de-Bernardy
8 years ago

  • Keywords needs-patch added; reporter-feedback removed

Yeah, there are tons of potential optimizations. Low hanging fruits, too. But this ought better be revisited after the merge, rather than at the same time. The merge is going to make a load of patches stale.

#34 @miqrogroove
8 years ago

  • Milestone changed from 3.0 to Future Release

#35 @simonwheatley
8 years ago

  • Cc simon@… added

#36 in reply to: ↑ 22 @Otto42
7 years ago

Replying to dbuser123:

Posted before but still not fixed.

Another slow query (0.30s for a 50 MB comments table) I regularly encounter is this one:

SELECT comment_post_ID
FROM comments
WHERE LCASE(comment_author_email) = 'user@host.com' AND comment_subscribe='Y' AND comment_approved = '1'
GROUP BY comment_post_ID

It would be better to always store e-mail as lowercase, stop using the LCASE function, and add an index on (comment_author_email, comment_post_ID).

Old, but it needs to be said: If we're doing this somewhere, then we're doing it wrong.

Email addresses are case sensitive. The local part, anyway. See RFC 2821: http://www.faqs.org/rfcs/rfc2821.html

The local-part of a mailbox MUST BE treated as case sensitive. Therefore, SMTP implementations MUST take care to preserve the case of mailbox local-parts. Mailbox domains are not case sensitive. In particular, for some hosts the user "smith" is different from the user "Smith". However, exploiting the case sensitivity of mailbox local-parts impedes interoperability and is discouraged.

While most all systems nowadays are case insensitive, technically in the first part of the email address, case matters. So we shouldn't be using LCASE here at all, or lowercasing email addresses.

#37 @nacin
7 years ago

I don't know of a system that doesn't treat emails as case insensitive. Granted, the RFC says that, but it doesn't apply to the domain, and local servers pretty much always handle the the both. I'll call it the Facebook effect -- needing to play to the lowest common denominator of users.

I agree with killing the LCASE, if we can standardize on lowercase on storage. There's another ticket somewhere (from tellyworth, re: Akismet) asking for the index.

#38 in reply to: ↑ 26 @westi
6 years ago

Replying to dbuser123:

Replying to dd32:

Probably because its not a query WordPress makes.

I see, sorry for the disturbance.

A minor improvement for the comments table: the first index is redundant since it is a prefix of the second one. KEY comment_approved (comment_approved), KEY comment_approved_date_gmt (comment_approved,comment_date_gmt),

This is lost and buried in this very generic ticket so I've raised a new one #19935

#39 @husobj
5 years ago

  • Cc ben@… added

#40 @nacin
5 years ago

  • Component changed from Optimization to Database

#41 @chriscct7
3 years ago

  • Focuses performance added
  • Keywords needs-refresh added

Needs to be finished, and what is already there needs to be refreshed (baring in mind some of it was merged in #19935)

#42 @pento
2 years ago

  • Milestone Future Release deleted
  • Resolution set to maybelater
  • Status changed from assigned to closed

This ticket is wildly outdated.

If someone feels like taking another stab at it, they're welcome to, but tackling individual optimisations as they come up in individual tickets seems to have made more progress over the years.

Note: See TracTickets for help on using tickets.