Make WordPress Core

Opened 2 months ago

Last modified 3 weeks ago

#62790 reviewing defect (bug)

"autoload" index is not used

Reported by: kkmuffme's profile kkmuffme Owned by: joemcgill's profile joemcgill
Milestone: 6.9 Priority: normal
Severity: minor Version: 5.3
Component: Database Keywords: needs-patch early
Focuses: performance Cc:

Description

Added in https://core.trac.wordpress.org/ticket/24044
This index is not used anymore by MySQL and slows down WP (bc of indexing cost)

EXPLAIN FORMAT=JSON SELECT option_name, option_value FROM wpoptions WHERE autoload IN ( 'yes', 'on', 'auto-on', 'auto' )

Why?

  • the cardinality of the index is too low, the mysql optimizer decides not to use the index
  • the IN() is too unspecific (e.g. if the query would be = 'yes' the index would be used)

=> this is actually the reason why there was a performance gain when this was added https://core.trac.wordpress.org/ticket/24044#comment:17 but why there is none now

  • lots of things have changed in the meantime in mysql performance (e.g. myisam to innodb,...)

Optionally the index can be forced like:

EXPLAIN FORMAT=JSON SELECT option_name, option_value FROM wpoptions FORCE INDEX (autoload) WHERE autoload IN ( 'yes', 'on', 'auto-on', 'auto' )

However I couldn't see a performance gain by forcing the index either.

Suggestion: remove the index again, since it's not used anyway

Change History (9)

#1 @johnbillion
4 weeks ago

  • Focuses performance added

#2 @joemcgill
4 weeks ago

  • Milestone changed from Awaiting Review to 6.8
  • Owner set to joemcgill
  • Status changed from new to reviewing

#3 @joemcgill
4 weeks ago

  • Keywords needs-patch added

Thanks for raising this @kkmuffme. I can confirm that the current index does not seem to be used.

Optionally the index can be forced...

I've tested this approach with the following query on a default install:

EXPLAIN SELECT option_name, option_value
FROM wp_options
FORCE INDEX (autoload)
WHERE autoload IN ( 'yes', 'on', 'auto-on', 'auto' )

However I couldn't see a performance gain by forcing the index either.

I can confirm that this forces the index to be used, and results in fewer rows to queried. In a default install the difference is very slight, but on a larger DB with many options, this may still provide the previous benefit.

There were a lot of tests done on the approach in #24044 prior to creating the index, so it would be useful to get some similar data now on whether trying to update the index in a way that will allow it to be used is worth pursuing.

#4 follow-up: @joemcgill
4 weeks ago

After doing a bit more testing, I am not seeing a huge performance benefit from removing the index either. Given that the IN clause is now generated from wp_autoload_values_to_autoload() since [57920], and that value can be filtered to produce a more narrow IN clause where the index could still apply, it may be worth retaining the index for folks who are still able to make use of the benefit unless the performance gains from removing the index are worthwhile.

#5 in reply to: ↑ description @siliconforks
4 weeks ago

Replying to kkmuffme:

Added in https://core.trac.wordpress.org/ticket/24044
This index is not used anymore by MySQL

This does not appear to be true in all cases.

I tested it on a site with several thousand rows in the wp_options table, and it appears to be using the index:

mysql> EXPLAIN FORMAT=JSON
    -> SELECT option_name, option_value
    -> FROM wp_options
    -> WHERE autoload IN ( 'yes', 'on', 'auto-on', 'auto' )
    -> \G
*************************** 1. row ***************************
EXPLAIN: {
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "988.76"
    },
    "table": {
      "table_name": "wp_options",
      "access_type": "range",
      "possible_keys": [
        "autoload"
      ],
      "key": "autoload",
      "used_key_parts": [
        "autoload"
      ],
      "key_length": "82",
      "rows_examined_per_scan": 2195,
      "rows_produced_per_join": 2195,
      "filtered": "100.00",
      "index_condition": "(`**********`.`wp_options`.`autoload` in ('yes','on','auto-on','auto'))",
      "cost_info": {
        "read_cost": "769.26",
        "eval_cost": "219.50",
        "prefix_cost": "988.76",
        "data_read_per_join": "1M"
      },
      "used_columns": [
        "option_name",
        "option_value",
        "autoload"
      ]
    }
  }
}

#6 in reply to: ↑ 4 ; follow-up: @kkmuffme
4 weeks ago

Replying to joemcgill:

After doing a bit more testing, I am not seeing a huge performance benefit from removing the index either

How did you test that? The performance "benefit" (= actually removing the performance penalty) of removing the index is when adding or updating data to the options table. Removing the index slightly speeds up adding/updating data and reduces CPU load. For people where this index didn't provide any gains, this will be minimal. For people where this index at some point did provide gains, the benefits of removing them will be bigger

Given that the IN clause is now generated from wp_autoload_values_to_autoload()

I think this is an important point. Since using = with 1 value will still use the index:

EXPLAIN FORMAT=JSON SELECT option_name, option_value FROM wpoptions WHERE autoload = 'yes';

What should happen, I guess, is that possibly redundant (yes, on, auto-on, ...) are reduced to a single value (e.g. "on") like it used to be, as this will make sure the index is used again.
While this is straightforward for yes/on no/off, someone who is familiar why/for what auto/auto-on was implemented should take a look at if those are really necessary?
As it stands, it's faster to just update the autoload value of all options if the admin's/site's preference changes than using auto/auto-on/auto-off and thus slowing every options table query down.

---

For testing it doesn't make a lot of sense to test with a default WP install that has maybe a few 100 or maybe 2000 rows in the table, since you won't ever encounter this in the wild.
If a site is using just 3-4 plugins and not using an external object cache (transients!) with only 1 active user and cron running, you'll end up with 4k+ rows in the options table.

With a basic WooCommerce site with a few 100 products and no object cache, but few carts, I got around 6000 rows.
Using:
8.0.35-27 - Percona

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "1357.52"
    },
    "table": {
      "table_name": "wpoptions",
      "access_type": "ALL",
      "possible_keys": [
        "autoload"
      ],
      "rows_examined_per_scan": 6855,
      "rows_produced_per_join": 3776,
      "filtered": "55.08",
      "cost_info": {
        "read_cost": "979.92",
        "eval_cost": "377.60",
        "prefix_cost": "1357.52",
        "data_read_per_join": "3M"
      },
      "used_columns": [
        "option_name",
        "option_value",
        "autoload"
      ],
      "attached_condition": "(`db_`.`wpoptions`.`autoload` in ('yes','on','auto-on','auto'))"
    }
  }
}	

Forced is 4x as expensive:

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "6197.83"
    },
    "table": {
      "table_name": "wpoptions",
      "access_type": "range",
      "possible_keys": [
        "autoload"
      ],
      "key": "autoload",
      "used_key_parts": [
        "autoload"
      ],
      "key_length": "82",
      "rows_examined_per_scan": 5484,
      "rows_produced_per_join": 5484,
      "filtered": "100.00",
      "index_condition": "(`db_`.`wpoptions`.`autoload` in ('yes','on','auto-on','auto'))",
      "cost_info": {
        "read_cost": "5649.44",
        "eval_cost": "548.40",
        "prefix_cost": "6197.84",
        "data_read_per_join": "4M"
      },
      "used_columns": [
        "option_name",
        "option_value",
        "autoload"
      ]
    }
  }
}	

#7 in reply to: ↑ 6 ; follow-up: @joemcgill
3 weeks ago

Replying to kkmuffme:

How did you test that? The performance "benefit" (= actually removing the performance penalty) of removing the index is when adding or updating data to the options table. Removing the index slightly speeds up adding/updating data and reduces CPU load. For people where this index didn't provide any gains, this will be minimal. For people where this index at some point did provide gains, the benefits of removing them will be bigger

Thanks for clarifying that the performance concern is about updating/adding options rather than reading options. I was focused on the query time used when calling wp_load_alloptions(), which is what will have the biggest impact on most end users.

What should happen, I guess, is that possibly redundant (yes, on, auto-on, ...) are reduced to a single value (e.g. "on") like it used to be, as this will make sure the index is used again.
While this is straightforward for yes/on no/off, someone who is familiar why/for what auto/auto-on was implemented should take a look at if those are really necessary?

I disagree with this suggestion. These values were added intentionally as a part of #42441 in order to maintain backwards compatibly, while opening up the ability for use to move away from autoloading all options by default, unless a developer remembers to explicitly pass a falsey value to the $autoload param when calling add|update_option().

#8 in reply to: ↑ 7 ; follow-up: @kkmuffme
3 weeks ago

Replying to joemcgill:

I disagree with this suggestion. These values were added intentionally as a part of #42441 in order to maintain backwards compatibly, while opening up the ability for use to move away from autoloading all options by default, unless a developer remembers to explicitly pass a falsey value to the $autoload param when calling add|update_option().

Ironically, that feature achieved the opposite of what it set out to do though. By adding multiple different autoload values, it actually makes loading the options slower since the index isn't used anymore and if it is used (as seen by the "forced" cases) the query is actually slower in all cases except if you actually do have a huge option value which is prevented from being loaded.

I think the changes of that ticket must be seen in combination with the index usage issue at hand here.

The better approach at that other ticket would have been to validate (eg option value size) when adding/updating only and based on that assign yes/no instead of adding additional values auto-on,... which make loading options overall slower. Since whether an option should autoload or not is already known at the point it gets saved, so it doesn't make sense to delegate (some of) that logic to the point options are retrieved?

#9 in reply to: ↑ 8 @joemcgill
3 weeks ago

  • Keywords early added
  • Milestone changed from 6.8 to 6.9

Replying to kkmuffme:

Ironically, that feature achieved the opposite of what it set out to do though. By adding multiple different autoload values, it actually makes loading the options slower since the index isn't used anymore and if it is used (as seen by the "forced" cases) the query is actually slower in all cases except if you actually do have a huge option value which is prevented from being loaded.

In some cases maybe so, but the feature ultimately set out to avoid auto-loading overly large options which has was measured to be a much larger performance problem then the time saved by ensuring the index is being used.

Regardless, this conversation is getting off-topic from the point of this ticket, which is whether or not we should remove the index. I think it's worth gathering more data about the performance effects of doing so before making that change. I'm going to move this to the 6.9 milestone early so we have ample time to review and make a decision.

Note: See TracTickets for help on using tickets.