#2699 closed task (blessed) (fixed)
Make option_name index unique
Reported by: | johnjosephbachir | Owned by: | ryan |
---|---|---|---|
Milestone: | 2.9 | Priority: | high |
Severity: | normal | Version: | 2.8 |
Component: | Optimization | Keywords: | tested |
Focuses: | Cc: |
Description (last modified by )
Duplicate options leads to bad things, as we've experienced on WordPress.com . This was not the route issue, but definitely an easy part of the solution.
Seems like golden oldie:
http://comox.textdrive.com/pipermail/wp-hackers/2006-April/005638.html
http://comox.textdrive.com/pipermail/wp-hackers/2006-April/005647.html
http://comox.textdrive.com/pipermail/wp-hackers/2006-April/005651.html
Attachments (9)
Change History (60)
#4
@
17 years ago
- Milestone 2.3 (trunk) deleted
- Resolution set to wontfix
- Status changed from new to closed
#5
@
15 years ago
- Keywords early added; has-patch removed
- Milestone set to 2.9
- Priority changed from low to high
- Resolution wontfix deleted
- Status changed from closed to reopened
- Version changed from 2.0.2 to 2.8
#7
@
15 years ago
- Description modified (diff)
Updating desc:
Duplicate options leads to bad things, as we've experienced on WordPress.com . This was not the route issue, but definitely part of the solution.
#8
@
15 years ago
Adds a function that is run before the schema is upgraded. That function deletes duplicate options. The high option ID is kept, the rest deleted. Then a new unique key is added. dbDelta doesn't seem to delete the old key, so we might need to add an explicit delete to the upgrade.
#11
@
15 years ago
- Keywords has-patch 2nd-opinion added
suggesting we do the following:
- ensure no more dups exist in 2.8
- drop the option_id field
- primary key (blog_id, option_name)
- drop key(option_name), when then becomes unneeded (and which is useless when there are several blog_ids)
or:
- keep option_id field and ignore it
- primary key (blog_id, option_name)
- drop key(option_name)
#12
@
15 years ago
Quick question, while on the topic, and amending my previous comment: the blog_id field in the database doesn't seem to ever be used. Shouldn't we drop it?
#14
@
15 years ago
dbDelta() doesn't handle changing around primary keys so I did it by hand in pre_schema_upgrade(). I also kept option_id and blog_id around for now just in case some plugins are using it. I'd like to avoid even small plugin breakages in 2.9. We can deprecate these later. Unfortunately, since option_id is auto_increment we have to put a key on it until we are rid of it.
#17
@
15 years ago
- Resolution fixed deleted
- Status changed from closed to reopened
Upgrade on my live blog went fine (no dupes I guess). Attempted to upgrade my localhost blog and it errored out:
WordPress database error: [You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '', '2480', '2482', '2482', '2491', '2492)' at line 1] DELETE FROM wp_trunk_options WHERE option_id IN (2480', '2480', '2482', '2482', '2491', '2492) WordPress database error: [Duplicate entry 'yarpp_use_template' for key 'PRIMARY'] ALTER TABLE wp_trunk_options ADD PRIMARY KEY (option_name) Upgrade Complete Your WordPress database has been successfully upgraded!
Note the missing single quotes on the first query. I assume the second query's error is due to the first query failying.
#21
@
15 years ago
(In [11883])
Surely that'd have been better off in a upgrade_290() style function?
Also, Need to warn those whose utilise blog_id, I think VirtualMultiblog uses it at least..
#22
@
15 years ago
oh, nm, it's a syntax error
Not just that.. But an array_unique() should be added, notice the Duplicate ID's its finding?
#23
@
15 years ago
Surely that'd have been better off in a upgrade_290() style function?
Oh, I see why it was done now..
#25
follow-up:
↓ 27
@
15 years ago
Some WPMU installs use InnoDB for the options table. Having a varchar as the primary key isn't the best for InnoDB since it prefers small, sequential primary keys. Should we make option_id primary and option_name unique?
#26
@
15 years ago
- Keywords early has-patch 2nd-opinion removed
Is this fixed now or not? Can someone please provide a step-by-step instruction to test?
#27
in reply to:
↑ 25
;
follow-up:
↓ 29
@
15 years ago
Hey folks -- cool to see my ticket resurrected after 3 years :-)
Replying to ryan:
Some WPMU installs use InnoDB for the options table. Having a varchar as the primary key isn't the best for InnoDB since it prefers small, sequential primary keys. Should we make option_id primary and option_name unique?
That's true -- it's a pretty bad idea to make an innodb primary key non-numeric or non-sequential, and here we are doing both.
Having a primary key on option_id, and then a separate unique key covering whatever columns we like, will provide the same performance benefits, with a slight increase in disk space used by that table and its indexes (but maybe not compared to what we had before-- I'd have to think a little harder to be sure). At any rate, it's a good design.
in r11883:
PRIMARY KEY (option_name), KEY option_id (option_id)
Best for InnoDB (and probably best or equivalent in myisam as well:)
PRIMARY KEY (option_id), UNIQUE option_name (option_name)
And while we're at it, having the unique key cover all of the columns in popular queries would allow for innodb to not have to go back to the primary data store to get the info, and only stay in the index, cutting down on a spindle access or two, per query.
Because the most common query on that table by far is this (found in functions.php around line 347):
SELECT option_value FROM $wpdb->options WHERE option_name = '$setting' LIMIT 1
We could get a big performance win with this index:
PRIMARY KEY (option_id), UNIQUE option_name (option_name, option_value )
And that's what I recommend as the least obtrusive optimization. (assuming we do still want to enforce uniqueness of option_name at the DBMS layer-- otherwise just make it a straight INDEX
)
Now, something to consider is that
- option_id is currently a
bigint
, which means it is 8 bytes. This restricts the number of rows in that table to (wait for it…) 18,446,744,073,709,551,615 - the bigger the column on which a primary key is made, the worse the performance (both for space and time complexity), especially if it is bigger than 4 bytes (
int
). - so, assuming that we are okay with that table holding only the number of rows allowed by
int
(4,294,967,295 -- yes, that's 4.3 billion), it would be beneficial to make option_id anint
But-- that's the case with all the tables in the system that won't need to hold more than 4.2 billion entries. I can make a separate ticket for that if you guys think that's an idea that should be discussed further.
#29
in reply to:
↑ 27
@
15 years ago
Replying to johnjosephbachir:
the most common query on that table by far is this (found in functions.php around line 347):
Actually I forgot about preloading autoloaded options (functions.php around line 423):
SELECT option_name, option_value FROM $wpdb->options WHERE autoload = 'yes'
So, adding an index to cover that:
PRIMARY KEY (option_id), UNIQUE option_name (option_name, option_value ), UNIQUE autoloaded_options (autoload, option_name, option_value )
Note that each of the indexes is needed to cover each of the mentioned queries, because the order of the columns is crucial.
Actually, now I'm realizing that there isn't an index on autoload at all, which is kind of wild, because that means that query results in a tablescan on the options table. Since that table rarely gets big, MySQL's own on-the-fly optimizations must be sufficient (I certainly have never seen that query in a slow query log).
Or maybe there aren't a lot of options with the autoload flag set.
#31
@
15 years ago
Owen Taylor, one of the GNOME sysadmins, just resolved an issue on blogs.gnome.org (WPMU) by adding an index on autoload. Should the no-index-on-autoload issue be put into a separate report?
#32
@
15 years ago
option_value is not in the keys due to:
WordPress database error BLOB/TEXT column 'option_value' used in key specification without a key length for query ALTER TABLE wp_options ADD UNIQUE autoloaded_options (autoload, option_name, option_value ) made by wp_upgrade, make_db_current_silent, dbDelta
#34
@
15 years ago
Upgrade from 2.8.6 to 2.9 runs clean. Need to make sure 2.9-beta-1 upgrades clean so we don't trip up beta testers.
#35
@
15 years ago
I had tested both upgrade paths:
- upgrade from WP 2.8.6 to WP 2.9 beta-1, upgrade DB, apply patch, and upgrade DB yet again.
- upgrade from WP 2.8.6 to patched WP 2.9 beta, upgrade DB directly.
option_id is around and primary keyed in both cases. I must confess I didn't check if the auto-increment flag was still around.
#39
@
15 years ago
- Resolution set to fixed
- Status changed from reopened to closed
#10994 taken care of. I'll venture to close this as fixed now.
#40
follow-up:
↓ 41
@
15 years ago
- Resolution fixed deleted
- Status changed from closed to reopened
Can someone explain why
UNIQUE autoloaded_options (autoload, option_name)
makes sense?
1) If we already have a UNIQUE index on option_name, having another one on (autoload, option_name) is redundant and just adds overhead on updates/inserts etc. So.... if we were going to add an index here, it seems that something like
KEY autoload (autoload)
is what we would want.
2) Assuming the index proposed in #1 is what we have, we run into another problem - the index is never used. The autoload field in every WP install I have seen just has 2 values (yes/no) making the cardinality of that index 2 which means it's effectiveness as an index is almost none.
Some EXPLAIN output from a case where this index is useful I think is necessary to keep it in core. In my tests, MySQL didn't even use the index b/c the optimizer thought a full table scan would be faster due to the low cardinality of the index in question. I think by adding option_name to the index and increasing the cardinality you can trick the MySQL optimizer to use the index in some cases, but I still don't see how it would be more efficient.
I propose dropping this autoloaded_options index unless it can be shown (with EXPLAIN output) to be useful.
#41
in reply to:
↑ 40
@
15 years ago
Replying to barry:
1) If we already have a UNIQUE index on option_name, having another one on (autoload, option_name) is redundant and just adds overhead on updates/inserts etc. So.... if we were going to add an index here, it seems that something like
In a query where both autoload and option_name are in the WHERE clause, mysql will be able to use this index for both columns. In a query where only option_name is in the WHERE clause, mysql will not use this index (or at least that was the case a few years ago, maybe it has changed. I think Oracle does try to use such indexes in such cases, which of course requires "ignoring" the first level of the b-tree)
KEY autoload (autoload)
is what we would want.
In a query that uses both autoload and option_name in the where clause, MySQL will only use the option_name index, because it typically only uses one index per table per query.
2) Assuming the index proposed in #1 is what we have, we run into another problem - the index is never used. The autoload field in every WP install I have seen just has 2 values (yes/no) making the cardinality of that index 2 which means it's effectiveness as an index is almost none.
Some EXPLAIN output from a case where this index is useful I think is necessary to keep it in core. In my tests, MySQL didn't even use the index b/c the optimizer thought a full table scan would be faster due to the low cardinality of the index in question. I think by adding option_name to the index and increasing the cardinality you can trick the MySQL optimizer to use the index in some cases, but I still don't see how it would be more efficient.
That's very true about the low cardinality (although I've never understood why that's the case, and several things I've read don't explain it. If the data is evenly split 50/50 with a cardinality of 2, shouldn't an index on that column cut down the tablescan time by 50%?)
At any rate-- as I noted in a comment above, I've never seen a slow query log entry based on a select on autoload, so for whatever reason MySQL doesn't end up needing it. Although, jdub above notes an instance where adding an index on it solved a problem for a particular installation. jdub, do you have any more info for us?
#42
follow-up:
↓ 43
@
15 years ago
Replying to barry:
KEY autoload (autoload)
is what we would want.
Barry is actually right, here. It's really meant for fetching all autoloaded options. The unique index on option name is fine for queries that have both in the where clause. I'll attach a patch in a sec.
2) Assuming the index proposed in #1 is what we have, we run into another problem - the index is never used. The autoload field in every WP install I have seen just has 2 values (yes/no) making the cardinality of that index 2 which means it's effectiveness as an index is almost none.
Actually... and best I'm aware of course, index use mostly depends on the size and data distribution in the table. Too little rows, or too common values, mean the index isn't used. The odds are strong this rules our your test site.
My live site, for instance, has had plenty of plugins over time. The options table is cluttered with leftover junk from old plugins, and old features such as cached RSS and so on. And now transients, etc. Here's what my april 2009 dump returns:
mysql> create index option_autoload on www_options ( autoload ); Query OK, 246 rows affected (0.42 sec) Records: 246 Duplicates: 0 Warnings: 0 mysql> explain select * from www_options where autoload = 'yes'; +----+-------------+-------------+------+-----------------+-----------------+---------+-------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------+------+-----------------+-----------------+---------+-------+------+-------------+ | 1 | SIMPLE | www_options | ref | option_autoload | option_autoload | 62 | const | 216 | Using where | +----+-------------+-------------+------+-----------------+-----------------+---------+-------+------+-------------+ 1 row in set (0.00 sec)
#43
in reply to:
↑ 42
@
15 years ago
Replying to Denis-de-Bernardy:
Replying to barry:
KEY autoload (autoload)
is what we would want.
Barry is actually right, here. It's really meant for fetching all autoloaded options. The unique index on option name is fine for queries that have both in the where clause. I'll attach a patch in a sec.
The patch still has the UNIQUE in there - it needs to be removed.
#44
@
15 years ago
Replying to johnjosephbachir:
At any rate-- as I noted in a comment above, I've never seen a slow query log entry based on a select on autoload, so for whatever reason MySQL doesn't end up needing it. Although, jdub above notes an instance where adding an index on it solved a problem for a particular installation. jdub, do you have any more info for us?
NDB clusters like it when there are plenty of indexes.
@Barry: thanks!
#45
@
15 years ago
Replying to Denis-de-Bernardy:
mysql> create index option_autoload on www_options ( autoload );
Query OK, 246 rows affected (0.42 sec)
Records: 246 Duplicates: 0 Warnings: 0
mysql> explain select * from www_options where autoload = 'yes';
+----+-------------+-------------+------+-----------------+-----------------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+-----------------+-----------------+---------+-------+------+-------------+
| 1 | SIMPLE | www_options | ref | option_autoload | option_autoload | 62 | const | 216 | Using where |
+----+-------------+-------------+------+-----------------+-----------------+---------+-------+------+-------------+
1 row in set (0.00 sec)
}}}
What is the count of yes vs no options in your table? In most WP installs I have seen, the vast majority are autoload=yes and that is the only case we query for in bulk meaning we are returning the majority of the rows in the table thus the index is not used.
#46
@
15 years ago
As long as we're piling back on this ticket, someone explain why a column that is never used (option_id) has become the primary key? ryan, does the performance of an unused key ever count for anything in DB performance? Let's also dispel some of the misinformation. MySQL never returns column values from indexes on BLOBs.
If a query uses only columns from a table that are numeric and that form a leftmost prefix for some key, the selected values may be retrieved from the index tree for greater speed
Indexes on BLOBs do nothing unless the BLOB is in a WHERE condition. Unique keys never increase performance. If someone in the world has an options table with 60,000 rows in it and only 100 are autoload true, then MySQL can optimize the query based on row count.
MySQL uses index cardinality estimates only in join optimization.
No traction in over a year, so closing as wontfix.
Feel free to reopen if you have additional info/patches/comments/...