WordPress.org

Make WordPress Core

Opened 8 years ago

Closed 4 years ago

Last modified 4 years ago

#2699 closed task (blessed) (fixed)

Make option_name index unique

Reported by: johnjosephbachir Owned by: ryan
Milestone: 2.9 Priority: high
Severity: normal Version: 2.8
Component: Optimization Keywords: tested
Focuses: Cc:

Description (last modified by lloydbudd)

Duplicate options leads to bad things, as we've experienced on WordPress.com . This was not the route issue, but definitely an easy part of the solution.

Seems like golden oldie:

http://comox.textdrive.com/pipermail/wp-hackers/2006-April/005638.html

http://comox.textdrive.com/pipermail/wp-hackers/2006-April/005647.html

http://comox.textdrive.com/pipermail/wp-hackers/2006-April/005651.html

Attachments (9)

upgrade-scheme.php.diff (521 bytes) - added by johnjosephbachir 8 years ago.
2699.diff (2.1 KB) - added by ryan 5 years ago.
2699.2.diff (1.9 KB) - added by Denis-de-Bernardy 5 years ago.
similar to Ryan's patch without the unneeded wp_cache delete
2699.3.diff (4.3 KB) - added by Denis-de-Bernardy 5 years ago.
same the previous patch, but deprecate option_id as the same time
2699.4.diff (3.6 KB) - added by ryan 5 years ago.
2699.5.diff (713 bytes) - added by Denis-de-Bernardy 5 years ago.
fix the sql syntax
2699.6.diff (3.4 KB) - added by ryan 4 years ago.
Make option_id primary. Add uniques for option_name and autoload
2699.7.diff (538 bytes) - added by Denis-de-Bernardy 4 years ago.
2699.8.diff (535 bytes) - added by Denis-de-Bernardy 4 years ago.
switch to key (autoload)

Download all attachments as: .zip

Change History (60)

comment:1 rgovostes8 years ago

  • Keywords has-patch added

comment:2 matt7 years ago

  • Milestone changed from 2.1 to 2.2

comment:3 foolswisdom7 years ago

  • Milestone changed from 2.2 to 2.3

comment:4 Nazgul7 years ago

  • Milestone 2.3 (trunk) deleted
  • Resolution set to wontfix
  • Status changed from new to closed

No traction in over a year, so closing as wontfix.

Feel free to reopen if you have additional info/patches/comments/...

comment:5 lloydbudd5 years ago

  • Keywords early added; has-patch removed
  • Milestone set to 2.9
  • Priority changed from low to high
  • Resolution wontfix deleted
  • Status changed from closed to reopened
  • Version changed from 2.0.2 to 2.8

comment:6 lloydbudd5 years ago

  • Owner changed from anonymous to ryan
  • Status changed from reopened to assigned

comment:7 lloydbudd5 years ago

  • Description modified (diff)

Updating desc:

Duplicate options leads to bad things, as we've experienced on WordPress.com . This was not the route issue, but definitely part of the solution.

ryan5 years ago

comment:8 ryan5 years ago

Adds a function that is run before the schema is upgraded. That function deletes duplicate options. The high option ID is kept, the rest deleted. Then a new unique key is added. dbDelta doesn't seem to delete the old key, so we might need to add an explicit delete to the upgrade.

comment:9 Denis-de-Bernardy5 years ago

  • Keywords dup added

there is a dup ticket

comment:10 Denis-de-Bernardy5 years ago

  • Keywords dup removed

cross-referencing: #8949

comment:11 Denis-de-Bernardy5 years ago

  • Keywords has-patch 2nd-opinion added

suggesting we do the following:

  • ensure no more dups exist in 2.8
  • drop the option_id field
  • primary key (blog_id, option_name)
  • drop key(option_name), when then becomes unneeded (and which is useless when there are several blog_ids)

or:

  • keep option_id field and ignore it
  • primary key (blog_id, option_name)
  • drop key(option_name)

comment:12 Denis-de-Bernardy5 years ago

Quick question, while on the topic, and amending my previous comment: the blog_id field in the database doesn't seem to ever be used. Shouldn't we drop it?

Denis-de-Bernardy5 years ago

similar to Ryan's patch without the unneeded wp_cache delete

Denis-de-Bernardy5 years ago

same the previous patch, but deprecate option_id as the same time

comment:13 Denis-de-Bernardy5 years ago

new patch attached that drops the two unneeded fields in wp_options.

ryan5 years ago

comment:14 ryan5 years ago

dbDelta() doesn't handle changing around primary keys so I did it by hand in pre_schema_upgrade(). I also kept option_id and blog_id around for now just in case some plugins are using it. I'd like to avoid even small plugin breakages in 2.9. We can deprecate these later. Unfortunately, since option_id is auto_increment we have to put a key on it until we are rid of it.

comment:15 ryan5 years ago

  • Type changed from enhancement to task (blessed)

comment:16 ryan5 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

(In [11883]) Make option_name the primary key for the options table. Props Denis-de-Bernardy. fixes #2699

comment:17 Viper007Bond5 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

Upgrade on my live blog went fine (no dupes I guess). Attempted to upgrade my localhost blog and it errored out:

WordPress database error: [You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '', '2480', '2482', '2482', '2491', '2492)' at line 1]
DELETE FROM wp_trunk_options WHERE option_id IN (2480', '2480', '2482', '2482', '2491', '2492)

WordPress database error: [Duplicate entry 'yarpp_use_template' for key 'PRIMARY']
ALTER TABLE wp_trunk_options ADD PRIMARY KEY (option_name)
Upgrade Complete

Your WordPress database has been successfully upgraded!

Note the missing single quotes on the first query. I assume the second query's error is due to the first query failying.

comment:18 Viper007Bond5 years ago

"failying"? That's a new one...

comment:19 Denis-de-Bernardy5 years ago

Can you send a dump of your options table by email?

comment:20 Denis-de-Bernardy5 years ago

oh, nm, it's a syntax error

Denis-de-Bernardy5 years ago

fix the sql syntax

comment:21 dd325 years ago

(In [11883])

Surely that'd have been better off in a upgrade_290() style function?

Also, Need to warn those whose utilise blog_id, I think VirtualMultiblog uses it at least..

comment:22 dd325 years ago

oh, nm, it's a syntax error

Not just that.. But an array_unique() should be added, notice the Duplicate ID's its finding?

comment:23 dd325 years ago

Surely that'd have been better off in a upgrade_290() style function?

Oh, I see why it was done now..

comment:24 azaozz5 years ago

(In [11884]) Fix quotes, props Denis-de-Bernardy, see #2699

comment:25 follow-up: ryan5 years ago

Some WPMU installs use InnoDB for the options table. Having a varchar as the primary key isn't the best for InnoDB since it prefers small, sequential primary keys. Should we make option_id primary and option_name unique?

comment:26 hakre5 years ago

  • Keywords early has-patch 2nd-opinion removed

Is this fixed now or not? Can someone please provide a step-by-step instruction to test?

comment:27 in reply to: ↑ 25 ; follow-up: johnjosephbachir5 years ago

Hey folks -- cool to see my ticket resurrected after 3 years :-)

Replying to ryan:

Some WPMU installs use InnoDB for the options table. Having a varchar as the primary key isn't the best for InnoDB since it prefers small, sequential primary keys. Should we make option_id primary and option_name unique?

That's true -- it's a pretty bad idea to make an innodb primary key non-numeric or non-sequential, and here we are doing both.

Having a primary key on option_id, and then a separate unique key covering whatever columns we like, will provide the same performance benefits, with a slight increase in disk space used by that table and its indexes (but maybe not compared to what we had before-- I'd have to think a little harder to be sure). At any rate, it's a good design.

in r11883:

PRIMARY KEY  (option_name), 
KEY option_id (option_id) 

Best for InnoDB (and probably best or equivalent in myisam as well:)

PRIMARY KEY  (option_id), 
UNIQUE option_name (option_name) 

And while we're at it, having the unique key cover all of the columns in popular queries would allow for innodb to not have to go back to the primary data store to get the info, and only stay in the index, cutting down on a spindle access or two, per query.

Because the most common query on that table by far is this (found in functions.php around line 347):

SELECT option_value FROM $wpdb->options WHERE option_name = '$setting' LIMIT 1

We could get a big performance win with this index:

PRIMARY KEY  (option_id), 
UNIQUE option_name (option_name, option_value ) 

And that's what I recommend as the least obtrusive optimization. (assuming we do still want to enforce uniqueness of option_name at the DBMS layer-- otherwise just make it a straight INDEX)

Now, something to consider is that

  1. option_id is currently a bigint, which means it is 8 bytes. This restricts the number of rows in that table to (wait for it…) 18,446,744,073,709,551,615
  2. the bigger the column on which a primary key is made, the worse the performance (both for space and time complexity), especially if it is bigger than 4 bytes (int).
  3. so, assuming that we are okay with that table holding only the number of rows allowed by int (4,294,967,295 -- yes, that's 4.3 billion), it would be beneficial to make option_id an int

But-- that's the case with all the tables in the system that won't need to hold more than 4.2 billion entries. I can make a separate ticket for that if you guys think that's an idea that should be discussed further.

comment:28 johnjosephbachir5 years ago

  • Cc j@… added

comment:29 in reply to: ↑ 27 johnjosephbachir5 years ago

Replying to johnjosephbachir:

the most common query on that table by far is this (found in functions.php around line 347):

Actually I forgot about preloading autoloaded options (functions.php around line 423):

SELECT option_name, option_value FROM $wpdb->options WHERE autoload = 'yes'

So, adding an index to cover that:

PRIMARY KEY  (option_id), 
UNIQUE option_name (option_name, option_value ),
UNIQUE autoloaded_options (autoload, option_name, option_value )

Note that each of the indexes is needed to cover each of the mentioned queries, because the order of the columns is crucial.

Actually, now I'm realizing that there isn't an index on autoload at all, which is kind of wild, because that means that query results in a tablescan on the options table. Since that table rarely gets big, MySQL's own on-the-fly optimizations must be sufficient (I certainly have never seen that query in a slow query log).

Or maybe there aren't a lot of options with the autoload flag set.

comment:30 Denis-de-Bernardy5 years ago

UNIQUE autoloaded_options (autoload, option_name) makes sense, yeah.

D.

comment:31 jdub4 years ago

Owen Taylor, one of the GNOME sysadmins, just resolved an issue on blogs.gnome.org (WPMU) by adding an index on autoload. Should the no-index-on-autoload issue be put into a separate report?

ryan4 years ago

Make option_id primary. Add uniques for option_name and autoload

comment:32 ryan4 years ago

option_value is not in the keys due to:

WordPress database error BLOB/TEXT column 'option_value' used in key specification without a key length for query ALTER TABLE wp_options ADD UNIQUE autoloaded_options (autoload, option_name, option_value ) made by wp_upgrade, make_db_current_silent, dbDelta

comment:33 Denis-de-Bernardy4 years ago

  • Keywords tested added

comment:34 ryan4 years ago

Upgrade from 2.8.6 to 2.9 runs clean. Need to make sure 2.9-beta-1 upgrades clean so we don't trip up beta testers.

comment:35 Denis-de-Bernardy4 years ago

I had tested both upgrade paths:

  • upgrade from WP 2.8.6 to WP 2.9 beta-1, upgrade DB, apply patch, and upgrade DB yet again.
  • upgrade from WP 2.8.6 to patched WP 2.9 beta, upgrade DB directly.

option_id is around and primary keyed in both cases. I must confess I didn't check if the auto-increment flag was still around.

comment:36 ryan4 years ago

(In [12217]) Make option_id primary. Add uniques for option_name and autoload. see #2699

comment:37 ryan4 years ago

(In [12218]) Check DB ver 12217. see #2699

comment:38 miqrogroove4 years ago

Hi ryan, please see #10994 kthx :)

comment:39 ryan4 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

#10994 taken care of. I'll venture to close this as fixed now.

comment:40 follow-up: barry4 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

Can someone explain why

UNIQUE autoloaded_options (autoload, option_name)

makes sense?

1) If we already have a UNIQUE index on option_name, having another one on (autoload, option_name) is redundant and just adds overhead on updates/inserts etc. So.... if we were going to add an index here, it seems that something like

KEY autoload (autoload)

is what we would want.

2) Assuming the index proposed in #1 is what we have, we run into another problem - the index is never used. The autoload field in every WP install I have seen just has 2 values (yes/no) making the cardinality of that index 2 which means it's effectiveness as an index is almost none.

Some EXPLAIN output from a case where this index is useful I think is necessary to keep it in core. In my tests, MySQL didn't even use the index b/c the optimizer thought a full table scan would be faster due to the low cardinality of the index in question. I think by adding option_name to the index and increasing the cardinality you can trick the MySQL optimizer to use the index in some cases, but I still don't see how it would be more efficient.

I propose dropping this autoloaded_options index unless it can be shown (with EXPLAIN output) to be useful.

comment:41 in reply to: ↑ 40 johnjosephbachir4 years ago

Replying to barry:

1) If we already have a UNIQUE index on option_name, having another one on (autoload, option_name) is redundant and just adds overhead on updates/inserts etc. So.... if we were going to add an index here, it seems that something like

In a query where both autoload and option_name are in the WHERE clause, mysql will be able to use this index for both columns. In a query where only option_name is in the WHERE clause, mysql will not use this index (or at least that was the case a few years ago, maybe it has changed. I think Oracle does try to use such indexes in such cases, which of course requires "ignoring" the first level of the b-tree)

KEY autoload (autoload)

is what we would want.

In a query that uses both autoload and option_name in the where clause, MySQL will only use the option_name index, because it typically only uses one index per table per query.

2) Assuming the index proposed in #1 is what we have, we run into another problem - the index is never used. The autoload field in every WP install I have seen just has 2 values (yes/no) making the cardinality of that index 2 which means it's effectiveness as an index is almost none.

Some EXPLAIN output from a case where this index is useful I think is necessary to keep it in core. In my tests, MySQL didn't even use the index b/c the optimizer thought a full table scan would be faster due to the low cardinality of the index in question. I think by adding option_name to the index and increasing the cardinality you can trick the MySQL optimizer to use the index in some cases, but I still don't see how it would be more efficient.

That's very true about the low cardinality (although I've never understood why that's the case, and several things I've read don't explain it. If the data is evenly split 50/50 with a cardinality of 2, shouldn't an index on that column cut down the tablescan time by 50%?)

At any rate-- as I noted in a comment above, I've never seen a slow query log entry based on a select on autoload, so for whatever reason MySQL doesn't end up needing it. Although, jdub above notes an instance where adding an index on it solved a problem for a particular installation. jdub, do you have any more info for us?

comment:42 follow-up: Denis-de-Bernardy4 years ago

Replying to barry:

KEY autoload (autoload)

is what we would want.

Barry is actually right, here. It's really meant for fetching all autoloaded options. The unique index on option name is fine for queries that have both in the where clause. I'll attach a patch in a sec.

2) Assuming the index proposed in #1 is what we have, we run into another problem - the index is never used. The autoload field in every WP install I have seen just has 2 values (yes/no) making the cardinality of that index 2 which means it's effectiveness as an index is almost none.

Actually... and best I'm aware of course, index use mostly depends on the size and data distribution in the table. Too little rows, or too common values, mean the index isn't used. The odds are strong this rules our your test site.

My live site, for instance, has had plenty of plugins over time. The options table is cluttered with leftover junk from old plugins, and old features such as cached RSS and so on. And now transients, etc. Here's what my april 2009 dump returns:

mysql> create index option_autoload on www_options ( autoload );
Query OK, 246 rows affected (0.42 sec)
Records: 246  Duplicates: 0  Warnings: 0

mysql> explain select * from www_options where autoload = 'yes';
+----+-------------+-------------+------+-----------------+-----------------+---------+-------+------+-------------+
| id | select_type | table       | type | possible_keys   | key             | key_len | ref   | rows | Extra       |
+----+-------------+-------------+------+-----------------+-----------------+---------+-------+------+-------------+
|  1 | SIMPLE      | www_options | ref  | option_autoload | option_autoload | 62      | const |  216 | Using where | 
+----+-------------+-------------+------+-----------------+-----------------+---------+-------+------+-------------+
1 row in set (0.00 sec)

Denis-de-Bernardy4 years ago

comment:43 in reply to: ↑ 42 barry4 years ago

Replying to Denis-de-Bernardy:

Replying to barry:

KEY autoload (autoload)

is what we would want.

Barry is actually right, here. It's really meant for fetching all autoloaded options. The unique index on option name is fine for queries that have both in the where clause. I'll attach a patch in a sec.

The patch still has the UNIQUE in there - it needs to be removed.

Denis-de-Bernardy4 years ago

switch to key (autoload)

comment:44 Denis-de-Bernardy4 years ago

Replying to johnjosephbachir:

At any rate-- as I noted in a comment above, I've never seen a slow query log entry based on a select on autoload, so for whatever reason MySQL doesn't end up needing it. Although, jdub above notes an instance where adding an index on it solved a problem for a particular installation. jdub, do you have any more info for us?

NDB clusters like it when there are plenty of indexes.

@Barry: thanks!

comment:45 barry4 years ago

Replying to Denis-de-Bernardy:

mysql> create index option_autoload on www_options ( autoload );
Query OK, 246 rows affected (0.42 sec)
Records: 246 Duplicates: 0 Warnings: 0

mysql> explain select * from www_options where autoload = 'yes';
+----+-------------+-------------+------+-----------------+-----------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+-----------------+-----------------+---------+-------+------+-------------+
| 1 | SIMPLE | www_options | ref | option_autoload | option_autoload | 62 | const | 216 | Using where |
+----+-------------+-------------+------+-----------------+-----------------+---------+-------+------+-------------+
1 row in set (0.00 sec)
}}}

What is the count of yes vs no options in your table? In most WP installs I have seen, the vast majority are autoload=yes and that is the only case we query for in bulk meaning we are returning the majority of the rows in the table thus the index is not used.

comment:46 miqrogroove4 years ago

As long as we're piling back on this ticket, someone explain why a column that is never used (option_id) has become the primary key? ryan, does the performance of an unused key ever count for anything in DB performance? Let's also dispel some of the misinformation. MySQL never returns column values from indexes on BLOBs.

If a query uses only columns from a table that are numeric and that form a leftmost prefix for some key, the selected values may be retrieved from the index tree for greater speed

Indexes on BLOBs do nothing unless the BLOB is in a WHERE condition. Unique keys never increase performance. If someone in the world has an options table with 60,000 rows in it and only 100 are autoload true, then MySQL can optimize the query based on row count.

MySQL uses index cardinality estimates only in join optimization.

comment:47 Denis-de-Bernardy4 years ago

@Barry - On the live site, it is 36 vs 227. Most of the 36 is tinymce advanced, and transients related to rss feeds.

comment:49 ryan4 years ago

Let's drop the autoload key for now and take it up in 3.0.

comment:50 ryan4 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

comment:51 hakre4 years ago

Related: #11649

Note: See TracTickets for help on using tickets.