WordPress.org

Make WordPress Core

Opened 2 years ago

Closed 5 months ago

#24044 closed enhancement (maybelater)

Add index to wp_options to aid/improve performance

Reported by: DanBUK Owned by:
Milestone: Priority: normal
Severity: normal Version: 3.6
Component: Options, Meta APIs Keywords: has-patch
Focuses: performance Cc:

Description

I work for a hosting company and we manage many '000s of WP sites for people. In order to improve the performance of all of these instances I find myself adding an index onto wp_options autoload many times daily.

I'm aware this is a small gain, but any gain is good.

This index will be hit on every single page load.

Attachments (7)

autoload_key.patch (552 bytes) - added by DanBUK 2 years ago.
Patch to add index onto wp_options autoload
24044.diff (566 bytes) - added by cliffseal 14 months ago.
24044-test.php (6.2 KB) - added by pento 14 months ago.
24044-test-results.txt (5.7 KB) - added by pento 14 months ago.
dan_results.txt (1.7 KB) - added by DanBUK 14 months ago.
test results
dan_test.php (3.3 KB) - added by DanBUK 14 months ago.
dan_results_myisam.txt (1.7 KB) - added by DanBUK 14 months ago.
test results myisam

Download all attachments as: .zip

Change History (46)

@DanBUK2 years ago

Patch to add index onto wp_options autoload

comment:1 @nacin2 years ago

Could you show some statistics for how this helps performance on your sites? See ticket:14258#comment:3 for why we have not done this.

comment:2 @DanBUK2 years ago

  • Summary changed from Add index to wp_options to aid improve performance to Add index to wp_options to aid/improve performance

This may be the case, but there are some godforsaken ugly plugins out there which mean that wp_options ends up '000s of rows. In these situations this will help with performance.

Also the maintenance of such an index is not placing a lot of burden on the DB. Overall this is reducing the work to deliver the page, that is what matters.

comment:3 @DanBUK2 years ago

Also the index will still be used for statistics, rather than assuming a full table scan every time. The index has a high level of cardinality.

comment:4 follow-up: @DanBUK2 years ago

And assuming that all wp_options tables will have > 30% autoload == yes isn't prudent, there are many ways in which people use the wp_options table.

comment:5 in reply to: ↑ 4 @nacin2 years ago

Replying to DanBUK:

And assuming that all wp_options tables will have > 30% autoload == yes isn't prudent, there are many ways in which people use the wp_options table.

I disagree; it's a very fair assumption. "Ugly" plugins that add thousands of rows to a table are still very, very likely to autoload all of them, because that's the default. The alternative is each get_option() is an extra query, which you'd probably notice as a host.

The maintenance of an index is not placing a lot of burden, no. But if the index is just sitting there waiting to be ignored, then it doesn't do us much good. The larger issue is that I haven't seen any evidence — in terms of performance or otherwise — to actually show that an index here would reduce the work to deliver the page.

I'm not saying you've added an index to thousands of sites yet assuredly have no performance improvements to show for it. I'm saying, show us the measurable improvement so we can see it for ourselves.

comment:6 @toscho2 years ago

  • Cc info@… added

comment:7 @pento2 years ago

  • Cc gary@… added

comment:8 @DeanMarkTaylor23 months ago

  • Cc DeanMarkTaylor added

comment:9 @krazybean22 months ago

On a table with above 1k rows in wp_options

Before autoload index:
mysql> explain select option_name, option_value from wp_options where autoload = 'yes';
+----+-------------+------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | wp_options | ALL | NULL | NULL | NULL | NULL | 1460 | Using where |
+----+-------------+------------+------+---------------+------+---------+------+------+-------------+

After adding index named i1

mysql> create index i1 on wp_options(autoload, option_name, option_value(50));

mysql> explain select option_name, option_value from wp_options where autoload = 'yes';
+----+-------------+------------+------+---------------+------+---------+-------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+------+---------------+------+---------+-------+------+-------------+
| 1 | SIMPLE | wp_options | ref | i1 | i1 | 62 | const | 178 | Using where |
+----+-------------+------------+------+---------------+------+---------+-------+------+-------------+

After the index was created the total rows searched dropped from 1460 to 178.

Version 0, edited 22 months ago by krazybean (next)

comment:10 @cliffseal14 months ago

  • Keywords dev-feedback added

On a WordPress 'app' in which lots of non-autoloaded transients are used, I got similar results, running the same queries as @krazybean.

Before:

mysql> explain select option_name, option_value from wp_options where autoload = 'yes';
+----+-------------+------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+------------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | wp_options | ALL  | NULL          | NULL | NULL    | NULL | 8196 | Using where |
+----+-------------+------------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.01 sec)

After:

mysql> create index i1 on wp_options(autoload, option_name, option_value(50));
Query OK, 0 rows affected (0.52 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> explain select option_name, option_value from wp_options where autoload = 'yes';
+----+-------------+------------+------+---------------+------+---------+-------+------+-------------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref   | rows | Extra       |
+----+-------------+------------+------+---------------+------+---------+-------+------+-------------+
|  1 | SIMPLE      | wp_options | ref  | i1            | i1   | 62      | const |  155 | Using where |
+----+-------------+------------+------+---------------+------+---------+-------+------+-------------+
1 row in set (0.00 sec)

Understood this may not affect the general population, but I wonder how we could best help folks who make extensive use of transients in certain situations.

comment:11 @nacin14 months ago

What's the purpose behind adding option_value to that index? It looks like it would only serve to increase the length of the index.

What's the performance difference on these queries? Reducing rows examined from 1460 to 178 (or even 10,000 to 100) is still peanuts. DanBUK mentioned adding this index many times daily, but I am trying to figure out if that was simply cathartic or if it was actually making a difference.

comment:12 @cliffseal14 months ago

I was just trying to replicate his results.

I wish I had tracked the difference when I added it to a much larger version of that same app—important queries went from several seconds in some cases to < 1 in all. I assure you it was directly related to adding an autoload index.

Again, I know this isn't the most common configuration, but it's allowing me to work with tons of transients at once with decent page load times.

If you can tell me what sort of feedback/results you need, I'll find a way to get it for you.

comment:13 @pento14 months ago

  • Keywords reporter-feedback added; dev-feedback removed

There are a few things to consider here.

First up, there's unlikely to be any benefit to adding (option_name, option_value(50)) to the index. There are no queries in WordPress core that would use them.

So, if we were going to add an index here, it would just be on autoload, which would have a very low cardinality. For some sites, this would potentially avoid a table scan, but I'm inclined to think that most sites would have >30% of rows with autoload set to yes.

Anyway, to determine if this index would be helpful to you, and the degree of helpfulness, it would be most useful to see how long the query takes to run with and without the index, rather than looking at the number of rows read. (If you're running on a test machine, run the query at least twice, to ensure the cache is warmed up.

If you're not using InnoDB, I'd also be curious to see how switching to that, with a large innodb_buffer_pool_size, changes the query time.

Finally, if you're keeping stats on page generation time, I'd be interested to to see how this changes in production with an autoload index.

comment:14 @cliffseal14 months ago

Agreed on adding that specific index—again, just trying to replicate results. Simply adding an autoload index is what made the monumental change in my case.

I'll run these tests and post the results here. Thanks!

Last edited 14 months ago by cliffseal (previous) (diff)

comment:15 @nacin14 months ago

  • Component changed from Performance to Options and Meta
  • Focuses performance added

comment:16 @cliffseal14 months ago

On an InnoDB database with about 615K rows in wp_options, here's what came across, running locally in MAMP. I started with an autoload index, removed it, and added it back.

SELECT option_name, option_value FROM wp_options WHERE autoload = 'yes';

628 rows in set (0.01 sec)

ALTER TABLE wp_options DROP INDEX autoload;
SELECT option_name, option_value FROM wp_options WHERE autoload = 'yes';

628 rows in set (2.30 sec)

CREATE INDEX autoload ON wp_options(autoload, option_name);
SELECT option_name, option_value FROM wp_options WHERE autoload = 'yes';

628 rows in set (0.00 sec)

Does that help?

comment:17 @cliffseal14 months ago

Also, I went through these steps again and ran the SELECT query a few more times after removing the index as requested. Still got:

628 rows in set (2.24 sec)
628 rows in set (2.24 sec)
628 rows in set (2.23 sec)

Then:

CREATE INDEX autoload ON wp_options(autoload, option_name);
SELECT option_name, option_value FROM wp_options WHERE autoload = 'yes';
628 rows in set (0.01 sec)
628 rows in set (0.01 sec)
628 rows in set (0.00 sec)

comment:18 @nacin14 months ago

Why do you have 615K rows in your wp_options table?

comment:19 @cliffseal14 months ago

Because the Transients API has been perfect for a specific project needing lots of individual external API calls cached.

Again, not arguing for wp_options size, just trying to show where the index can make a big difference.

comment:20 follow-up: @nacin14 months ago

I figured it would be transients.

You should enable an external object cache, that way transients aren't stored in the database. If the lack of an index on this table forces hosting companies to consider the availability of an external object cache for their users, I'm not necessarily going to be bothered by that.

I find it hard to justify an index that benefits an infinitesimal number of sites, probably slows down many others, especially were the sites affected have better steps to take to improve performance.

comment:21 @cliffseal14 months ago

Actually, in this specific case, object caching _was_ enabled and kept getting overloaded, so this was the only option after having to disable it.

Again, totally understood this is not a common case, but I also wanted to demonstrate that it could make a big difference for some.

Here's a patch just in case. :)

Last edited 14 months ago by cliffseal (previous) (diff)

@cliffseal14 months ago

comment:22 in reply to: ↑ 20 ; follow-ups: @DanBUK14 months ago

Sorry I never came back with statistics. I'll try and find some of the customer sites I've worked on and generate some data.

Replying to pento:

If you're not using InnoDB, I'd also be curious to see how switching to that, with a large innodb_buffer_pool_size, changes the query time.

In almost all cases everyone should be using InnoDB for a WP database, excluding the tables that have FULLTEXT indexes of course. (Then again, full text indexing should really be shifted to something like Solr, but that is another story...) The nature of the wp_comments especially cannot scale with MyISAM, the write level of a busy WP site means that you'll end up with table locks. And the number of plugins that utilise wp_meta or wp_options to store post views, or similar that get updated on every view... Again leads to locking that blocks the viewing of the site.

Just by having a large buffer pool that doesn't change how the MySQL query planner/statistic collator/table scanning works. It's not about if the data is in RAM or not, the wp_options table even in the cases I've seen with up to 10k rows, isn't actually a lot of data, and will either end up in the buffer pool or the OS file system caches. The issue is that the planning has to examine all the rows in the table, every time the page is loaded. That's dependent upon CPU time.

Heck you can put your buffer pool up to 5x your database size and you'll see 0 performance improvement over 1x. Ideally the buffer pool should be as big as the dataset that is accessed often. Ergo if you have a 10GB InnoDB dataset and 9GB of that is archive of logs/actions/very old posts/etc you only really need a 1GB buffer pool.

But that is the thing, we are moving into the realm of database optimisation, and to be honest from my view that isn't always seen by developers. Don't even get me started on Magento.

Replying to nacin:

You should enable an external object cache, that way transients aren't stored in the database. If the lack of an index on this table forces hosting companies to consider the availability of an external object cache for their users, I'm not necessarily going to be bothered by that.

Quite often the hosting company cannot control the customer fully. The customer may have limited budget and we work within their constraints.

Also there are lots of plugins out there that are already using transients, most users of WP that are not developers/sys admins/etc do not understand that maybe they have used the inappropriate method of storing data.

Replying to nacin:

I find it hard to justify an index that benefits an infinitesimal number of sites, probably slows down many others, especially were the sites affected have better steps to take to improve performance.

How do you think this will slow things down? It is an index of boolean cardinality, therefore the binary tree will have only two branches (YES|NO). The addition/updating of rows only require very small index changes. I really cannot understand how you think it will slow things down.

comment:23 in reply to: ↑ 22 @DanBUK14 months ago

Replying to DanBUK:

But that is the thing, we are moving into the realm of database optimisation, and to be honest from my view that isn't always seen by developers. Don't even get me started on Magento.

My main basis for this is that developers don't see the application later in production on much busier sites that their test box that takes 10 requests per minute. Once you are dealing with a site that takes 100 requests per second minor changes to indexes in this suggestion make a huge difference. The Read/Write ratio is what counts.

comment:24 in reply to: ↑ 22 ; follow-up: @nacin14 months ago

Replying to DanBUK:

But that is the thing, we are moving into the realm of database optimisation, and to be honest from my view that isn't always seen by developers.

Pardon me :) I'm just watching over this ticket, playing devil's advocate and working to turn assumptions into data. On the other hand, pento spent something like four years at MySQL AB and Oracle. So feel free to continue to geek out with him.

comment:25 in reply to: ↑ 24 @cliffseal14 months ago

Replying to nacin:

I'm just watching over this ticket, playing devil's advocate and working to turn assumptions into data.

That's needed! I've been using this for a while now, but never saw a ton of good data on it either.

How can we test the impact of the index on a DB on the other extreme and test the assumption it might slow things down?

@pento14 months ago

@pento14 months ago

comment:26 @pento14 months ago

Okay, here are some test results! All of this was on MySQL 5.5, I didn't test on other versions. The MyISAM results should be largely the same on other versions, the InnoDB results may be a little slower on older versions, but I don't expect there to be massive differences.

Anyway, here are the variables tested:

  • Engine: InnoDB or MyISAM
  • Rows: 100, 1000, 10000, 100000, 1000000
  • Percentage of rows selected: 5%, 20%, 50%, 80%, 95%

The full testing script and results are attached, but here's the summary:

  • Adding this index causes nearly all queries to be slower, except for InnoDB at 100k rows. I have no idea why this is the case.
  • Once MyISAM gets to 100K rows, adding the index makes it exponentially slower.
  • I killed the test script before it finished MyISAM 1M rows, because each query was taking > 2 minutes to run.

@DanBUK14 months ago

test results

@DanBUK14 months ago

comment:27 @DanBUK14 months ago

  • Keywords reporter-feedback removed

I've performed some slightly different testing. With some restrictions.

As we are only issuing queries with autoload=yes the results for =no are not relevant.

With any site that has a high level of read/write concurrency MyISAM won't scale. (You would be surprised how many modules update wp_options on every page load.)

The results indicate that only in the case when 50% or more of the rows have autoload=yes things slow down. But this is by a negligible amount, even with 100k rows in wp_options the difference was only 0.02s (6.45%). Yet in the case of 100k rows and 20% of the rows have autoload=yes the gains in speed are 0.07s (35%).

Therefore I propose that the gains offered by adding this index outweigh the losses.

@DanBUK14 months ago

test results myisam

comment:28 @DanBUK14 months ago

And just for completeness, similar results as I mentioned for innodb are the same for MyISAM.

comment:29 follow-up: @pento14 months ago

With attachment:dan_test.php, I still had the same result of MyISAM getting really slow for large tables:

myisam_VARCHAR20_Size_100000_Percent_50 'YES' 0.97143130302429
myisam_VARCHAR20_INDEXED_Size_100000_Percent_50 'YES' 11.403997087479
>> Slower

myisam_VARCHAR20_Size_100000_Percent_20 'YES' 0.96122975349426
myisam_VARCHAR20_INDEXED_Size_100000_Percent_20 'YES' 5.91277384758
>> Slower

myisam_VARCHAR20_Size_100000_Percent_5 'YES' 0.94139242172241
myisam_VARCHAR20_INDEXED_Size_100000_Percent_5 'YES' 1.2993580341339
>> Slower

As much as I'd love to ignore MyISAM, the vast majority of WordPress sites use it, including WordPress.com. While we're clearly getting different results here, we can't take the risk that other sites will suffer the same slowness I'm seeing.

comment:30 @cliffseal14 months ago

So, the utility of this index is (roughly) inversely proportional to the percentage of rows where autoload = 'yes'? That'd make sense, considering, in my case, 0.001% of rows are such.

To me, this now goes back to:

Replying to nacin:

"Ugly" plugins that add thousands of rows to a table are still very, very likely to autoload all of them, because that's the default.

If that's the case, is there any way we can 'police' this behavior better with plugins, or define and encourage a best practice for it?

comment:31 in reply to: ↑ 29 ; follow-up: @DanBUK14 months ago

Replying to pento:

How do you have your MySQL instance configured? (I'm really surprised your getting such slow results.)

I have the following:-

thread_cache_size = 8
innodb_buffer_pool_size = 1G
key_buffer_size = 128M
table_cache = 1024

(Running on a 2CPU / 2GB KVM Centos 6 VM, gah I just noticed it's only MySQL 5.1, will install 5.5 and re-test.)
I added my MyISAM results as an attachment earlier.

comment:32 in reply to: ↑ 31 @DanBUK14 months ago

Replying to DanBUK:

(Running on a 2CPU / 2GB KVM Centos 6 VM, gah I just noticed it's only MySQL 5.1, will install 5.5 and re-test.)

I'm getting similar results on 5.5 MyISAM:-

myisam_VARCHAR20_Size_100000_Percent_50 'YES' 0.23201403617859
myisam_VARCHAR20_INDEXED_Size_100000_Percent_50 'YES' 0.29787020683289
>> Slower

myisam_VARCHAR20_Size_100000_Percent_20 'YES' 0.13488883972168
myisam_VARCHAR20_INDEXED_Size_100000_Percent_20 'YES' 0.12055621147156
>> Faster

myisam_VARCHAR20_Size_100000_Percent_5 'YES' 0.086687040328979
myisam_VARCHAR20_INDEXED_Size_100000_Percent_5 'YES' 0.030073165893555
>> Faster

comment:33 @pento14 months ago

I've had a bit more of a play around with my MySQL settings, but I haven't managed to make performance suck less on MyISAM.

I'm using a standard VVV install for testing. You're welcome to play around with it and see if you can get better performance out of it, but I'm strongly inclined to not add this index. VVV is a pretty popular base for WordPress dev shops to create sites with these days, so if I'm running into performance problems, there's a good chance a bunch of other sites will, too.

comment:34 @jeichorn9 months ago

At Pagely on a semi-regular basis we end up with sites with hundreds of thousands of rows in wp_options. The cause of this is normally storing sessions in wp_options. In a case like this having the index on autoload is huge win.

Before index

mysql> explain select option_name, option_value from wp_options where autoload = 'yes';
+----+-------------+------------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table      | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+-------------+------------+------+---------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | wp_options | ALL  | NULL          | NULL | NULL    | NULL | 807663 | Using where |
+----+-------------+------------+------+---------------+------+---------+------+--------+-------------+

After index:

mysql> explain select option_name, option_value from wp_options where autoload = 'yes';
+----+-------------+------------+------+---------------+----------+---------+-------+------+-----------------------+
| id | select_type | table      | type | possible_keys | key      | key_len | ref   | rows | Extra                 |
+----+-------------+------------+------+---------------+----------+---------+-------+------+-----------------------+
|  1 | SIMPLE      | wp_options | ref  | autoload      | autoload | 62      | const |  265 | Using index condition |
+----+-------------+------------+------+---------------+----------+---------+-------+------+-----------------------+

We are talking about reducing query time by 1 second in this instance. As the average size of wp_options grows and grows its a nice low cost index to add that protects people from plugins that use wp_options as a temp data store.

Last edited 9 months ago by DrewAPicture (previous) (diff)

comment:35 @SergeyBiryukov7 months ago

#29637 was marked as a duplicate.

comment:36 @brokentone6 months ago

Even without thousands of rows, we found benefits in adding this index ourselves. I'd advocate for it to be added in core.

Last edited 6 months ago by brokentone (previous) (diff)

comment:37 @JanR5 months ago

I hope you don't mind me barging in here... A referrer from ticket #29637 to my site brought me to this ticket, and apparently it has been submitted more than once (#14258). So there is some demand for this.

In a live-situation, adding an index on the autoload column resulted in a drop of examined rows, from 33927 to 1508. I call that significant. The real performance gain depends on a lot of factors, I know, but I just wanted to share these numbers.

PS: before one starts to install and test with MySQL 5.5 (or 5.6 for that matter), notice the innodb_buffer_pool_instances *and* innodb_buffer_pool_size settings. One can even increase InnoDB performance by adjusting innodb_write_io_threads and innodb_read_io_threads. As long as you have the RAM. Don't use MyISAM (IMO).

comment:38 @jorbin5 months ago

  • Keywords close added

I think we need to pull the trigger and make a decision here rather than letting this ticket waste away.

All the testing shows that adding it reduces performance for MyISAM which if we take a "first, do no harm" approach, means that this comes accross as a non starter. I think we should close as maybelater and if evidence of it not harming performance for MyISAM comes up, we can revisit.

This of course doesn't prevent individual site owners from adding the index themselves.

comment:39 @pento5 months ago

  • Keywords close removed
  • Milestone Awaiting Review deleted
  • Resolution set to maybelater
  • Status changed from new to closed

Agreed, I'm open to revisiting this when we have more InnoDB users. As InnoDB use is steadily increasing, we'll get there eventually. :-)

In the mean time, for folks who happen upon this ticket, you have two options:

  • Try adding this index yourself, and see if it improves performance for you
  • Use an external object cache to avoid hitting the DB
Note: See TracTickets for help on using tickets.