Make WordPress Core

Opened 15 years ago

Closed 8 years ago

#9642 closed enhancement (maybelater)

Database Schema Optimizations

Reported by: denis-de-bernardy's profile Denis-de-Bernardy Owned by:
Milestone: Priority: normal
Severity: normal Version: 2.8
Component: Database Keywords: needs-patch needs-refresh
Focuses: performance Cc:

Description

Diffs coming...

Attachments (3)

9642-upgrade.php.diff (1.2 KB) - added by Denis-de-Bernardy 15 years ago.
9642-wpdb-debug-tools.php.diff (583 bytes) - added by Denis-de-Bernardy 15 years ago.
db debug tools (updated)
9642-posts.diff (6.8 KB) - added by Denis-de-Bernardy 15 years ago.
this is not finished yet

Download all attachments as: .zip

Change History (45)

#1 @Denis-de-Bernardy
15 years ago

  • Component changed from General to Optimization

#2 @Denis-de-Bernardy
15 years ago

  • Owner changed from anonymous to Denis-de-Bernardy

#3 @Denis-de-Bernardy
15 years ago

9642-upgrade.php.diff gets rid of multitudes of pointless upgrade queries that can be done in one go. they're not used often, since it's old stuff, but I bumped into it while looking into posts-related queries...

#4 @Denis-de-Bernardy
15 years ago

that last patch reveals quite a bit about existing WP queries :-)

#5 @Denis-de-Bernardy
15 years ago

useful debugging code, after the require in wp-config.php:

function dump_queries() {
	global $wpdb;
	foreach ( $wpdb->queries as $query ) {
		echo '<pre style="text-align: left; padding: 10px; border: solid 1px #000;">' . "\n";
		var_dump($query);
		echo '</pre>' . "\n";
	}
}

if ( defined('SAVEQUERIES') && defined('QUERIES_DEBUG') ) {
	add_action('wp_footer', 'dump_queries');
	add_action('admin_footer', 'dump_queries');
}

#6 @Denis-de-Bernardy
15 years ago

  • Keywords has-patch added

@Denis-de-Bernardy
15 years ago

db debug tools (updated)

#7 @DD32
15 years ago

Correct me if i'm wrong, But THEN is a MySQL 5 thing is it not?

#8 @Denis-de-Bernardy
15 years ago

no no... case when foo then bar when foo then bar else bar end is ansi.

#9 @Denis-de-Bernardy
15 years ago

*grin*. This is sooo much better:

mysql> explain select www_posts.* FROM www_posts  WHERE 1=1  AND www_posts.post_type = 'post' AND (www_posts.post_status = 'publish' OR www_posts.post_status = 'private')  ORDER BY www_posts.post_date_gmt DESC LIMIT 0, 10 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: www_posts
         type: ref
possible_keys: type_parent,type_date_gmt,type_modified_gmt
          key: type_date_gmt
      key_len: 62
          ref: const
         rows: 304
        Extra: Using where
1 row in set (0.01 sec)

#10 @Denis-de-Bernardy
15 years ago

The most horrible to optimize are the page related queries. MySQL's optimizer is stubbornly refuses to use an index on multi-field sorts such as menu_order, post_title ASC...

#11 @Denis-de-Bernardy
15 years ago

oh silly me, nevermind that, there is no limit. no wonder it's file-sorting. :D

#12 @DD32
15 years ago

case when foo then bar when foo then bar else bar end is ansi.

Ah... The docs suggest its MySQL5+ only.. but it is in 4.. Interesting SQL i've never come across before :)

@Denis-de-Bernardy
15 years ago

this is not finished yet

#13 @Denis-de-Bernardy
15 years ago

  • Milestone changed from 2.8 to 2.9

#17 @Denis-de-Bernardy
15 years ago

  • Type changed from defect (bug) to enhancement

#18 @Denis-de-Bernardy
15 years ago

  • Status changed from new to accepted

#19 @Denis-de-Bernardy
15 years ago

  • Keywords has-patch removed

#20 @dd32
15 years ago

I was nearly going to query the need for the upgrade patch from here.. ("Why bother touching the old upgrade code.. its working and not used by new installs") But given thats for the 2.0 -> onwards upgrade.. I think that might be a worthy inclusion.

Certainly going to be a lot more efficient than the current loop which could end up with a large number of queries being made.

#21 @aaroncampbell
15 years ago

  • Cc aaroncampbell added

#22 follow-up: @dbuser123
15 years ago

Posted before but still not fixed.

Another slow query (0.30s for a 50 MB comments table) I regularly encounter is this one:

SELECT comment_post_ID
FROM comments
WHERE LCASE(comment_author_email) = 'user@host.com' AND comment_subscribe='Y' AND comment_approved = '1'
GROUP BY comment_post_ID

It would be better to always store e-mail as lowercase, stop using the LCASE function, and add an index on (comment_author_email, comment_post_ID).

#23 follow-up: @dbuser123
15 years ago

Why is autoload in the options table not an enum field?

#24 in reply to: ↑ 23 @Denis-de-Bernardy
15 years ago

Replying to dbuser123:

Why is autoload in the options table not an enum field?

For legacy reasons and/or far away compatibility with other DB engines?

Else, agreed. It should be a tinyint(1).

D.

#25 follow-up: @dd32
15 years ago

Posted before but still not fixed.

Probably because its not a query WordPress makes.

I believe that belongs to the Subscribe to comments plugin

#26 in reply to: ↑ 25 ; follow-up: @dbuser123
15 years ago

Replying to dd32:

Probably because its not a query WordPress makes.

I see, sorry for the disturbance.

A minor improvement for the comments table: the first index is redundant since it is a prefix of the second one.
KEY comment_approved (comment_approved),
KEY comment_approved_date_gmt (comment_approved,comment_date_gmt),

#27 @Denis-de-Bernardy
15 years ago

  • Owner Denis-de-Bernardy deleted
  • Status changed from accepted to assigned

#28 @ryan
15 years ago

  • Milestone changed from 2.9 to 3.0

#29 @mattrude
15 years ago

  • Cc m@… added

#30 @hakre
15 years ago

  • Keywords has-patch added

#31 @hakre
15 years ago

Reviewed the ticket again. Last patch has the comment this is not finished yet. Looks like something is still missing here?

#32 @hakre
15 years ago

  • Keywords reporter-feedback added; has-patch removed

#33 @Denis-de-Bernardy
15 years ago

  • Keywords needs-patch added; reporter-feedback removed

Yeah, there are tons of potential optimizations. Low hanging fruits, too. But this ought better be revisited after the merge, rather than at the same time. The merge is going to make a load of patches stale.

#34 @miqrogroove
15 years ago

  • Milestone changed from 3.0 to Future Release

#35 @simonwheatley
14 years ago

  • Cc simon@… added

#36 in reply to: ↑ 22 @Otto42
14 years ago

Replying to dbuser123:

Posted before but still not fixed.

Another slow query (0.30s for a 50 MB comments table) I regularly encounter is this one:

SELECT comment_post_ID
FROM comments
WHERE LCASE(comment_author_email) = 'user@host.com' AND comment_subscribe='Y' AND comment_approved = '1'
GROUP BY comment_post_ID

It would be better to always store e-mail as lowercase, stop using the LCASE function, and add an index on (comment_author_email, comment_post_ID).

Old, but it needs to be said: If we're doing this somewhere, then we're doing it wrong.

Email addresses are case sensitive. The local part, anyway. See RFC 2821: http://www.faqs.org/rfcs/rfc2821.html

The local-part of a mailbox
MUST BE treated as case sensitive. Therefore, SMTP implementations
MUST take care to preserve the case of mailbox local-parts. Mailbox
domains are not case sensitive. In particular, for some hosts the
user "smith" is different from the user "Smith". However, exploiting
the case sensitivity of mailbox local-parts impedes interoperability
and is discouraged.

While most all systems nowadays are case insensitive, technically in the first part of the email address, case matters. So we shouldn't be using LCASE here at all, or lowercasing email addresses.

#37 @nacin
14 years ago

I don't know of a system that doesn't treat emails as case insensitive. Granted, the RFC says that, but it doesn't apply to the domain, and local servers pretty much always handle the the both. I'll call it the Facebook effect -- needing to play to the lowest common denominator of users.

I agree with killing the LCASE, if we can standardize on lowercase on storage. There's another ticket somewhere (from tellyworth, re: Akismet) asking for the index.

#38 in reply to: ↑ 26 @westi
13 years ago

Replying to dbuser123:

Replying to dd32:

Probably because its not a query WordPress makes.

I see, sorry for the disturbance.

A minor improvement for the comments table: the first index is redundant since it is a prefix of the second one.
KEY comment_approved (comment_approved),
KEY comment_approved_date_gmt (comment_approved,comment_date_gmt),

This is lost and buried in this very generic ticket so I've raised a new one #19935

#39 @husobj
12 years ago

  • Cc ben@… added

#40 @nacin
12 years ago

  • Component changed from Optimization to Database

#41 @chriscct7
9 years ago

  • Focuses performance added
  • Keywords needs-refresh added

Needs to be finished, and what is already there needs to be refreshed (baring in mind some of it was merged in #19935)

#42 @pento
8 years ago

  • Milestone Future Release deleted
  • Resolution set to maybelater
  • Status changed from assigned to closed

This ticket is wildly outdated.

If someone feels like taking another stab at it, they're welcome to, but tackling individual optimisations as they come up in individual tickets seems to have made more progress over the years.

Note: See TracTickets for help on using tickets.