WordPress.org

Make WordPress Core

Opened 5 years ago

Last modified 14 months ago

#9642 assigned enhancement

Database Schema Optimizations

Reported by: Denis-de-Bernardy Owned by:
Milestone: Future Release Priority: normal
Severity: normal Version: 2.8
Component: Database Keywords: needs-patch
Focuses: Cc:

Description

Diffs coming...

Attachments (3)

9642-upgrade.php.diff (1.2 KB) - added by Denis-de-Bernardy 5 years ago.
9642-wpdb-debug-tools.php.diff (583 bytes) - added by Denis-de-Bernardy 5 years ago.
db debug tools (updated)
9642-posts.diff (6.8 KB) - added by Denis-de-Bernardy 5 years ago.
this is not finished yet

Download all attachments as: .zip

Change History (43)

comment:1 Denis-de-Bernardy5 years ago

  • Component changed from General to Optimization

comment:2 Denis-de-Bernardy5 years ago

  • Owner changed from anonymous to Denis-de-Bernardy

comment:3 Denis-de-Bernardy5 years ago

9642-upgrade.php.diff gets rid of multitudes of pointless upgrade queries that can be done in one go. they're not used often, since it's old stuff, but I bumped into it while looking into posts-related queries...

comment:4 Denis-de-Bernardy5 years ago

that last patch reveals quite a bit about existing WP queries :-)

comment:5 Denis-de-Bernardy5 years ago

useful debugging code, after the require in wp-config.php:

function dump_queries() {
	global $wpdb;
	foreach ( $wpdb->queries as $query ) {
		echo '<pre style="text-align: left; padding: 10px; border: solid 1px #000;">' . "\n";
		var_dump($query);
		echo '</pre>' . "\n";
	}
}

if ( defined('SAVEQUERIES') && defined('QUERIES_DEBUG') ) {
	add_action('wp_footer', 'dump_queries');
	add_action('admin_footer', 'dump_queries');
}

comment:6 Denis-de-Bernardy5 years ago

  • Keywords has-patch added

Denis-de-Bernardy5 years ago

db debug tools (updated)

comment:7 DD325 years ago

Correct me if i'm wrong, But THEN is a MySQL 5 thing is it not?

comment:8 Denis-de-Bernardy5 years ago

no no... case when foo then bar when foo then bar else bar end is ansi.

comment:9 Denis-de-Bernardy5 years ago

*grin*. This is sooo much better:

mysql> explain select www_posts.* FROM www_posts  WHERE 1=1  AND www_posts.post_type = 'post' AND (www_posts.post_status = 'publish' OR www_posts.post_status = 'private')  ORDER BY www_posts.post_date_gmt DESC LIMIT 0, 10 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: www_posts
         type: ref
possible_keys: type_parent,type_date_gmt,type_modified_gmt
          key: type_date_gmt
      key_len: 62
          ref: const
         rows: 304
        Extra: Using where
1 row in set (0.01 sec)

comment:10 Denis-de-Bernardy5 years ago

The most horrible to optimize are the page related queries. MySQL's optimizer is stubbornly refuses to use an index on multi-field sorts such as menu_order, post_title ASC...

comment:11 Denis-de-Bernardy5 years ago

oh silly me, nevermind that, there is no limit. no wonder it's file-sorting. :D

comment:12 DD325 years ago

case when foo then bar when foo then bar else bar end is ansi.

Ah... The docs suggest its MySQL5+ only.. but it is in 4.. Interesting SQL i've never come across before :)

Denis-de-Bernardy5 years ago

this is not finished yet

comment:13 Denis-de-Bernardy5 years ago

  • Milestone changed from 2.8 to 2.9

comment:17 Denis-de-Bernardy5 years ago

  • Type changed from defect (bug) to enhancement

comment:18 Denis-de-Bernardy5 years ago

  • Status changed from new to accepted

comment:19 Denis-de-Bernardy5 years ago

  • Keywords has-patch removed

comment:20 dd325 years ago

I was nearly going to query the need for the upgrade patch from here.. ("Why bother touching the old upgrade code.. its working and not used by new installs") But given thats for the 2.0 -> onwards upgrade.. I think that might be a worthy inclusion.

Certainly going to be a lot more efficient than the current loop which could end up with a large number of queries being made.

comment:21 aaroncampbell5 years ago

  • Cc aaroncampbell added

comment:22 follow-up: dbuser1235 years ago

Posted before but still not fixed.

Another slow query (0.30s for a 50 MB comments table) I regularly encounter is this one:

SELECT comment_post_ID
FROM comments
WHERE LCASE(comment_author_email) = 'user@host.com' AND comment_subscribe='Y' AND comment_approved = '1'
GROUP BY comment_post_ID

It would be better to always store e-mail as lowercase, stop using the LCASE function, and add an index on (comment_author_email, comment_post_ID).

comment:23 follow-up: dbuser1235 years ago

Why is autoload in the options table not an enum field?

comment:24 in reply to: ↑ 23 Denis-de-Bernardy5 years ago

Replying to dbuser123:

Why is autoload in the options table not an enum field?

For legacy reasons and/or far away compatibility with other DB engines?

Else, agreed. It should be a tinyint(1).

D.

comment:25 follow-up: dd325 years ago

Posted before but still not fixed.

Probably because its not a query WordPress makes.

I believe that belongs to the Subscribe to comments plugin

comment:26 in reply to: ↑ 25 ; follow-up: dbuser1235 years ago

Replying to dd32:

Probably because its not a query WordPress makes.

I see, sorry for the disturbance.

A minor improvement for the comments table: the first index is redundant since it is a prefix of the second one.
KEY comment_approved (comment_approved),
KEY comment_approved_date_gmt (comment_approved,comment_date_gmt),

comment:27 Denis-de-Bernardy5 years ago

  • Owner Denis-de-Bernardy deleted
  • Status changed from accepted to assigned

comment:28 ryan4 years ago

  • Milestone changed from 2.9 to 3.0

comment:29 mattrude4 years ago

  • Cc m@… added

comment:30 hakre4 years ago

  • Keywords has-patch added

comment:31 hakre4 years ago

Reviewed the ticket again. Last patch has the comment this is not finished yet. Looks like something is still missing here?

comment:32 hakre4 years ago

  • Keywords reporter-feedback added; has-patch removed

comment:33 Denis-de-Bernardy4 years ago

  • Keywords needs-patch added; reporter-feedback removed

Yeah, there are tons of potential optimizations. Low hanging fruits, too. But this ought better be revisited after the merge, rather than at the same time. The merge is going to make a load of patches stale.

comment:34 miqrogroove4 years ago

  • Milestone changed from 3.0 to Future Release

comment:35 simonwheatley3 years ago

  • Cc simon@… added

comment:36 in reply to: ↑ 22 Otto423 years ago

Replying to dbuser123:

Posted before but still not fixed.

Another slow query (0.30s for a 50 MB comments table) I regularly encounter is this one:

SELECT comment_post_ID
FROM comments
WHERE LCASE(comment_author_email) = 'user@host.com' AND comment_subscribe='Y' AND comment_approved = '1'
GROUP BY comment_post_ID

It would be better to always store e-mail as lowercase, stop using the LCASE function, and add an index on (comment_author_email, comment_post_ID).

Old, but it needs to be said: If we're doing this somewhere, then we're doing it wrong.

Email addresses are case sensitive. The local part, anyway. See RFC 2821: http://www.faqs.org/rfcs/rfc2821.html

The local-part of a mailbox
MUST BE treated as case sensitive. Therefore, SMTP implementations
MUST take care to preserve the case of mailbox local-parts. Mailbox
domains are not case sensitive. In particular, for some hosts the
user "smith" is different from the user "Smith". However, exploiting
the case sensitivity of mailbox local-parts impedes interoperability
and is discouraged.

While most all systems nowadays are case insensitive, technically in the first part of the email address, case matters. So we shouldn't be using LCASE here at all, or lowercasing email addresses.

comment:37 nacin3 years ago

I don't know of a system that doesn't treat emails as case insensitive. Granted, the RFC says that, but it doesn't apply to the domain, and local servers pretty much always handle the the both. I'll call it the Facebook effect -- needing to play to the lowest common denominator of users.

I agree with killing the LCASE, if we can standardize on lowercase on storage. There's another ticket somewhere (from tellyworth, re: Akismet) asking for the index.

comment:38 in reply to: ↑ 26 westi2 years ago

Replying to dbuser123:

Replying to dd32:

Probably because its not a query WordPress makes.

I see, sorry for the disturbance.

A minor improvement for the comments table: the first index is redundant since it is a prefix of the second one.
KEY comment_approved (comment_approved),
KEY comment_approved_date_gmt (comment_approved,comment_date_gmt),

This is lost and buried in this very generic ticket so I've raised a new one #19935

comment:39 husobj14 months ago

  • Cc ben@… added

comment:40 nacin14 months ago

  • Component changed from Optimization to Database
Note: See TracTickets for help on using tickets.