#31245 closed enhancement (fixed)
Replace alloptions with a key cache
Reported by: | rmccue | Owned by: | SergeyBiryukov |
---|---|---|---|
Milestone: | 5.3.1 | Priority: | normal |
Severity: | normal | Version: | 2.1 |
Component: | Options, Meta APIs | Keywords: | has-patch needs-testing |
Focuses: | performance | Cc: |
Description
Let's talk about alloptions
. alloptions
is a particularly fragile piece of WordPress, since it's exceptionally vulnerable to accidental concurrency issues due to its design.
A primer on alloptions
, for those who haven't dug into it: alloptions
is a cache key that stores all the options marked with autoload, thereby reducing the number of external calls out to an object cache from hundreds down to 1. It does this by running SELECT * FROM wp_options WHERE autoload = 'yes'
, then storing that array (map of option_name => option_value
) under the alloptions
cache key. (Non-autoloaded options are stored as individual cache items instead, with a key of option_name
in the options
group.)
When you add an option, you can pick the autoload
flag. This flag is 'yes'
(true
) by default, meaning most calls to add_option
will set options to autoload. Additionally, if you add an option via update_option
, there's no way to set this flag. These combined mean that the vast majority of options are stored in the alloptions
cache.
Why is this an issue?
The alloptions
cache is loaded very early in the WordPress request lifetime. From this moment until the end of a request, option reads are taken from this cache rather than the database.
In addition, every time an autoloaded option is updated or deleted, alloptions
is similarly updated. However, this means that on any autoloaded option being updated, every autoloaded option has its value set to the value at load time.
This is a huge concurrency bug, as it's very easy to run into accidentally. As a proof of concept, try updating two different options in two different requests at the same time, and you'll see that whichever ran first will lose the changes. (See #25623)
(We're running into this issue a few times a week, due to certain plugins running an update_option
on a high percentage of requests. These plugins are obviously being dumb in doing so, but they're revealing the core bug quite nicely.)
How do we fix this?
The biggest issue is that alloptions
is a single cache item containing every option. If we reduce this to one cache item per option, the concurrency problem is then reduced to individual options (and can be mitigated in the places it actually matters, typically in plugins). Rather than storing all the options, we can instead cache the keys that matter, then grab the values for all of those at once.
This means that we may still have minor concurrency issues when adding and removing options, however these happen much less often than updating existing ones. In addition, the cache would now only be used to decide which options to load at the start of the request, so worst case scenario, you end up with extra object cache get
calls. This is much better than the current case, where a concurrency problem can return incorrect data.
This change would require adding a wp_cache_get_multi
to take full advantage of the cache, however compatibility could be shimmed here with a loop and individual calls to wp_cache_get
. This would cause worse performance for object caches that don't support multi-get (or haven't been updated), but at the benefit of no longer returning potentially bad data.
There's a few things that need to be done for this:
- Add
wp_cache_multi_get
, and talk to object cache maintainers to update the popular ones to support it (Memcache, APC) - Change
wp_load_alloptions
to pull fromalloptionskeys
, then run a multi-get on those - Ensure
alloptionskeys
is a protected option name - Store value caches per-key
Attachments (8)
Change History (110)
#3
@
10 years ago
- Keywords has-patch needs-testing added
Added a first patch which:
- Caches autoloaded option keys in
alloptionskeys
- Adds
alloptionskeys
to the protected names list - Sets both
alloptionskeys
and the individual option caches ifalloptionskeys
isn't found during load - Changes
add_option
anddelete_option
to add/remove keys fromalloptionskeys
- Changes
add_option
,update_option
anddelete_option
to use individual cache items for each option
This fixes #28701 in the process.
I'm not entirely sure on the _wp_cache_compat_get_multi
naming/usage pattern here, but doing it in a compatibility file requires us to either define the function inside wp_start_object_cache
or in a new file after wp_start_object_cache
but before default-filters.php
is loaded. Thoughts?
I'd like to get more testing in on this one too. I've tested locally with WP's default cache, and I have a patched version of Memcache ready to go too, but needs a bit more testing still.
This ticket was mentioned in Slack in #core by rmccue. View the logs.
10 years ago
#5
@
10 years ago
Also worth noting: the format of wp_cache_get_multi
conflicts with rboren's original memcached plugin, wonderboymusic's Memcached Redux as well as tollmanz's PECL Memcached object cache. These are all mutually incompatible as well, so that's fun.
I'm happy to change the format of the proposed wp_cache_get_multi
here if we want, but let's pick one and stick with it. :)
#6
@
10 years ago
I'd just like to note down the formats out there in the wild already.
Note: in the below, group
is a normal group name, like options
, id
is a cache key like alloptions
, and key
is an automatically generated key like $salt:$db_prefix:$group:$id
Original memcached
Input:
get_multi( array( $group => array( $id, $id2, ... ), $group2 => array( $id3 ), ... ) )
Output:
array( $key => $value, $key2 => $value, $key3 => $value )
Available as: $wp_object_cache->get_multi
PECL Memcache
Input:
get_multi( array( $id, $id2, $id3, ... ), array( $group, $group, $group2, ... ) )
Output:
array( $key => $value, $key2 => $value, $key3 => value )
Available as: $wp_object_cache->getMulti
and wp_cache_get_multi
Memcached Redux
Input:
get_multi( array( $id, $group ), array( $id2, $group ), array( $id3, $group2 ) )
Output:
array( $value, $value, $value )
(Note that the output order is identical to the input order, however it is a
list, not an associative array.)
Available as $wp_object_cache->get_multi
and wp_cache_get_multi
Anything that uses $key
is pretty useless for keying, as they're internal keys
and implementation changes depending on the cache. You'd also need to pull them
back out again. You can almost treat them as ordered to use it, but as far as I
can see, they don't guarantee this.
#7
@
10 years ago
Implementation for memcache: https://github.com/humanmade/memcache-object-cache/pull/2
This ticket was mentioned in Slack in #core by rmccue. View the logs.
10 years ago
This ticket was mentioned in Slack in #core by drew. View the logs.
10 years ago
This ticket was mentioned in Slack in #core by johnbillion. View the logs.
10 years ago
#11
@
10 years ago
- Keywords 4.3-early added
- Milestone changed from 4.2 to Future Release
This needs to go into a cycle early so we can coordinate with the existing caching plugins which implement get_multi()
, and to get some solid testing in lieu of unit tests.
#12
@
10 years ago
Suggestions for function name: wp_cache_get_mutiple
or wp_cache_get_batch
.
The thing that really bugs me about implementing this is the 'default' group that is virtually hidden in the cache API at this point because it's set by default when no group is supplied.
I think that return value should be something in line of $return[$group][$key] = $value
which allows for easy identification what data belongs where. Also missing keys would just be missing from the return array.
Though allowing for empty default group would result in 'default' as return group, so not a 1-on-1 expectation.
That said.
I am currently in the process of concepting a WP_Cache_Control handler on which Handlers can be registered. This would make it possible to have cache fallbacks/fallthrough and no longer have the need for wp_using_ext_object_cache
.
Note: this concept will remove the need for the _wp_cache_compat_get_multi
.
Will post a patch on #22661 in the near future.
Voting for putting this on the Cache API instead of Options, Meta APIs.
This ticket was mentioned in Slack in #core by johnbillion. View the logs.
9 years ago
#15
@
9 years ago
- Keywords changed from has-patch, needs-testing to has-patch needs-testing
@rmccue, no movement in 6 weeks. Do you think two more weeks will make the difference or should we move it to Future Release?
#16
@
9 years ago
- Keywords 4.4-early added
- Milestone changed from 4.3 to Future Release
As @johnbillion says, we need a full cycle for this to soak. Pains me to say it, but let's bump.
This ticket was mentioned in Slack in #core by helen. View the logs.
9 years ago
#18
@
9 years ago
I've been looking into whether we can adapt this for WordPress.com, and I have some concerns, mainly about the wp_cache_set()
loop.
The wp_cache_set()
loop would cause many set calls instead of one. Depending on traffic patterns that could significantly increase the load on the cache, especially on large multisite installs. Adding a wp_cache_set_multi()
function would fix that. Unfortunately the pecl memcache extension only supports single gets http://php.net/manual/en/memcache.set.php, so that's not a solution for us or anyone else who relies on it. (The memcached extension appears to support set multi, but internally it might just iterate over set, see http://php.net/manual/en/memcached.setmulti.php).
I'm looking into whether we can simplify things a bit and make it filterable, perhaps combined with some of the ideas in #23330 (specifically, making notoptions optional).
#19
@
9 years ago
I'm looking into this and with some help from @rmccue, I can reliably reproduce this locally. I've seen the results of this bug in production environments, but can now reproduce it locally. I want to document this here for when I forget and for others who venture into this place.
Prerequisites:
- Memcached installed and running
- PECL Memcache installed (https://pecl.php.net/package/memcache)
- Memcache object cache installed (https://wordpress.org/plugins/memcached/)
- Using WP CLI, insert an autoloading option
wp option add this-damn-bug drives-me-nuts
- Open two terminal windows and use WP CLI to invoke a WP shell session in both
wp shell
- In the first window, update the option:
update_option( 'this-damn-bug', 'is-fixed' );
- In the second window, add another, unrelated option:
update_option( 'this-bug-sucks', 'really-hard' );
- Close both WP shell sessions.
- Get the value of the original option:
wp option get this-damn-bug # Returns "drives-me-nuts" wp cache get alloptions options | grep this-damn-bug # Returns "drives-me-nuts" wp dq query mysql> SELECT option_value FROM wp_options WHERE option_name = 'this-damn-bug' LIMIT 1; # Returns "is-fixed"
The result is that the object cache is unchanged, yet the DB value is changed.
#20
@
9 years ago
Note: this is possible to trigger in any context, not just wp-cli. wp-cli makes it easy to test though, since it has long-running sessions: the second "request" (wp shell
here) sets alloptions
back to whatever it was when the process started.
#21
@
9 years ago
Argh! I guess this isn't going to be fixed in the 4.4? We're using the Redis object cache (just to add a little flavor) and this bug is causing white-screen-of-death on sites that it affects until the cache is cleared (which is a huge hit to our server as it stores multiple sites but you can only clear them en-masse).
Alternate idea
Currently when you change an autoloaded option it updates the option cache with old-alloptions+your-change, and the problem is old-alloptions might already contain out of date values. So here's my question:
Why not just delete the alloptions cache completely, rather than updating it. It's described in this ancient+closed forum thread:
https://wordpress.org/support/topic/caching-of-options?replies=2
It wouldn't be as fast as the current way, but would avoid the race condition (since subsequent pageloads would get a fully fresh copy). The downside is that alloptions needs to get totally reloaded, but loading it in is fast anyway right?
AFAICT the problem is that a system which fetched or saved options to the cache individually would be much much slower, but the system for getting/setting them together is fast. This implies that we should still get decent performance letting the whole cache be reloaded at once even if it happens more than we'd like.
Dumping the whole cache when a field is updated is simple and seems like it would solve the problem without any concerns about compatibility with the "multi" options of other plugins. This is especially important once you factor in non-mecached servers, since e.g. Redis has very different support for bulk handling of keys.
This ticket was mentioned in Slack in #core by joehoyle. View the logs.
9 years ago
This ticket was mentioned in Slack in #core-restapi by rmccue. View the logs.
9 years ago
#25
follow-up:
↓ 37
@
9 years ago
The more I think about this, the more I think this won't be suitable for core, as it's going to be a lot slower in situations where the object cache can't handle a get/set multi (as pointed out by @tellyworth).
The approach of allowing the cache drop-in to support it's own alloptions gettings / setting would be better. Right now, this takes a crazy amount of hacking to do.
#26
follow-up:
↓ 67
@
9 years ago
@joehoyle Don't want to give up on this ticket but willing to rename it/migrate it elsewhere since I agree with you that the current title "Replace alloptions with a key cache" isn't the right approach.
Joe what do you think of solving this by clearing the alloptions
cache at the right times (triggering a reload from the db) rather than micro-managing the fields that were just changed?
In our testing it has proved very effective at avoiding the race condition entirely. When a process saves an autoloaded option we run wp_cache_delete('alloptions', 'options');
and the next time an option is fetched WP does the single query to reload them (including any changes made by other processes).
Specifically this solution solves the reproduction steps laid out by @tollmanz (Thanks so much for that BTW, made our testing much easier!).
Here is a filter that solves the problem for us and relies only on what's already in core:
/** * Deletes the alloptions cache value whenever an option is modified. * * This code triggers a cache delete even when non-autoloaded options are saved because * we don't have easy access to check if the option is autoload=1 or not * * Ideally this should be done in core itself, only when alloptions needs to be updated. */ function gv_redis_object_cache_delete_alloptions() { wp_cache_delete('alloptions', 'options'); } add_action('added_option', 'gv_redis_object_cache_delete_alloptions'); add_action('deleted_option', 'gv_redis_object_cache_delete_alloptions'); add_action('updated_option', 'gv_redis_object_cache_delete_alloptions');
For anyone suffering from the WSOD bug like we were this is probably a worthwhile solution, but as stated in the phpdoc, it's not at all ideal and will result in resetting the alloptions
cache more often than necessary.
Core would be in a MUCH better position to solve this directly while handling the alloptions
cache and knowing whether the option being saved is autoloaded or not.
It can also be fixed at the object-cache.php
level, which is the solution we're currently using. By adding a filter on the SET command in our Redis object cache we can run wp_cache_delete('alloptions', 'options');
only when the actual alloptions
cache is updated. This seems insane/ironic but works very well to fix this problem and seems unlikely to cause any serious performance problems.
Quick note on stupid plugins
FWIW for anyone experiencing problems related to alloptions
I recommend doing an audit of the ongoing actions taken by your object cache. I added logging to our object-cache.php
and found several plugin features (some by me, some from repo) that were triggering option changes on every pageload for foolish reasons. Cleaning up these noisy/constant updates will probably make a huge difference to the incidence of {{[alloptions}}} bugs and obviously reduce the overall load on your system.
This ticket was mentioned in Slack in #core by mikengarrett. View the logs.
9 years ago
This ticket was mentioned in Slack in #core by jeremyfelt. View the logs.
9 years ago
#29
@
9 years ago
- Keywords early added; 4.4-early removed
Moving forward here, I think it's going to be fairly tough getting a viable multi_get()
cache implementation in place for the majority of cache-enabled sites without significant detriment to the rest of the sites, as @tellyworth points out.
what do you think of solving this by clearing the alloptions cache at the right times
@jeremyclarke Simply deleting the cache bucket isn't entirely viable for some users, as the next page load has the potential to cause a large DB load as many simultaneous requests hit a cold cache. However, that could easily be worked with with the next suggestion:
The approach of allowing the cache drop-in to support it's own alloptions gettings / setting would be better. Right now, this takes a crazy amount of hacking to do.
This seems like a viable option to me, and I'm curious what it'd actually look like @joehoyle - @tollmanz @rmccue what do you think?
#30
@
8 years ago
- Keywords early removed
- Milestone changed from Future Release to 4.6
I am going to tackle this, and we are going to solve it this cycle.
This ticket was mentioned in Slack in #core-multisite by rmccue. View the logs.
8 years ago
#32
@
8 years ago
There is an old ticket for introducing multi-get to the cache API: #20875. I noted in https://core.trac.wordpress.org/ticket/20875#comment:19 that we have use cases for multi-get outside of the 'alloptions' fiasco. So I'd like to suggest that we pursue multi-get support independent of whether and how it's used in the current ticket.
#33
@
8 years ago
@boonebgorges Along the lines of your "I would hate to see the Cache API improvement held up because we can't come to a consensus about alloptions
" comment, @joehoyle and I have been discussing this ticket, and we want to try and push forward on it without the multi-get support. Right now, the ticket is held up because we don't have consensus on multi-get.
I believe @joehoyle was planning on working on a patch for this that doesn't require multi-get support. I'll let him weigh in on that.
(n.b. He is currently in the middle of the Atlantic, so responses may be delayed.)
This ticket was mentioned in Slack in #core by voldemortensen. View the logs.
8 years ago
#35
@
8 years ago
- Milestone changed from 4.6 to Future Release
Punting. If a working patch is produced so this can be brought back into the milestone.
This ticket was mentioned in Slack in #core by danielbachhuber. View the logs.
8 years ago
#37
in reply to:
↑ 25
;
follow-up:
↓ 39
@
8 years ago
Replying to joehoyle:
The more I think about this, the more I think this won't be suitable for core, as it's going to be a lot slower in situations where the object cache can't handle a get/set multi (as pointed out by @tellyworth).
The approach of allowing the cache drop-in to support it's own alloptions gettings / setting would be better. Right now, this takes a crazy amount of hacking to do.
@joehoyle Did you ever get to a working patch for this? WP LCache exhibits the race condition issue more severely than most drop-ins, so I need to start working on an alternative.
At this point, I'm thinking the alternative alloptions implementation could be a library bundled by the object cache drop-ins that want to support it.
#38
@
8 years ago
Not to hi-jack, but for medium-sized multisite installls, I started using this approach:
https://github.com/stuttter/wp-blog-meta
It has the same scalability concerns as any global table does, but allowed me to circumvent this issue as options can be short-circuited and rerouted ontop of the meta-data API & caching layer.
#39
in reply to:
↑ 37
@
8 years ago
Replying to danielbachhuber:
At this point, I'm thinking the alternative alloptions implementation could be a library bundled by the object cache drop-ins that want to support it.
Honestly, I think that's the correct answer here in the end, although I don't think you should library it, it should just be a standard part of the object cache.
The object cache has freedom to route get/set for alloptions
however it wants, it can route it as a standard request, or it can route it through a multiget.
If core adds something in the future that's better, object caches can implement that, or their existing method may continue to work.
#40
@
8 years ago
We changed our object cache implementations to hijack alloptions
(Memcache and Memcached).
The problem with this approach is that it's pretty terrible handling without WordPress helping out more. The object cache receives an opaque heap of data that then needs to be parsed back out, diffed, and saved correctly.
There needs to be changes in WordPress core for this. I can't remember the exact approach we discussed, maybe @joehoyle can.
#41
@
8 years ago
Thanks @rmccue, those examples were quite helpful.
For the next person to run into this, here's WP LCache's implementation (with tests). I didn't find splitting alloptions
to be horrible, but having to keep record of the alloptions
keys in a separate cache does seem a bit dodgy.
So it's mentioned, WP LCache doesn't absolutely need to have a multi get implementation because most cache gets will be hitting its L1 cache (APCu). Other drop-ins will need to implement multi get as well to mitigate the performance impact of introducing hundreds of new cache keys.
This ticket was mentioned in Slack in #core-multisite by jjj. View the logs.
8 years ago
#43
@
8 years ago
For multisite folks who stumble in here, consider #25344/#37181 and #37923.
- Using the meta-data API for network options
- Using the meta-data API for a shared
wp_blogmeta
global table, to store meta-data for sites that may be useful to query against (names, descriptions, post/user counts, last activity, etc...)
/tangent
This ticket was mentioned in Slack in #core by sergey. View the logs.
8 years ago
#45
@
8 years ago
Another really out-there idea:
- A
WP_Option
class - A
WP_Option_Query
class for querying options - Introduce a
wp_optionmeta
database table - key/value storage for every option - Use the metadata API for the above (inherit caching, etc...)
Treat options like post objects:
- Query for them
- Cache them (locally & persistently)
- Query for their option-meta as needed
- Abandon the
alloptions
logic entirely - Let the cache API sort it all out
¯\_(ツ)_/¯
Then core could:
- Transition some keys from
wp_options
into a newWP_Option
object with related meta - Route requests for converted keys from the old option name to the new option name
- Have a legitimate
register_option()
API to begin to replace the ailing Settings API
Conversely:
- Abandon
wp_options
completely - Dream up a completely new and adequate object & schema for today's needs
- Consider the same object & meta pairing as above
(Edit: typos & grammar)
This ticket was mentioned in Slack in #core-multisite by spacedmonkey. View the logs.
7 years ago
#47
@
7 years ago
31245.2.diff is my first-pass patch at using option_name
as the cache-key. It includes the following changes to accomplish this:
- A
WP_Option
class, with variables to match database columns, and magic methods to help with routing variable names around (I found these personally useful during the research/discovery/invention/transition phases) - A
WP_Option_Query
class, used to interact with thewp_options
database table in a way that is more like the query classes we are using/building today, vs. the direct bespoke MySQL statements we've relied on forever previously - A few common helper functions for interacting with objects in key caches (
wp_clean_option_cache()
, etc...) - Usage of a new
notoptions
cache group, for queries towp_options
for options that do not exist there.
Notables:
- A product of all of this is the removal of the need to call
wp_load_alloptions()
on every call toget_option()
. Some changes were made to unit tests to accommodate for this by calling it directly inside those offending tests. It's likely not ideal to do this in our tests, but the significant performance boost seems worth it to leave in for my first pass. - All other tests pass for me except
test_wp_get_attachment_image_should_use_wp_get_attachment_metadata
which somehow seems unrelated - I made other code changes to these and related functions, but I did try to be conservative and stick only to lines of code that were directly effected or near lines that were.
- I've probably spent close to 100 hours massaging this patch, and I ran into many significant snags along the way to get this to work in a backwards compatible way.
- One design compromise was to mock a
WP_Option
object to save in a newnotoptions
cache group. It's /easier/ (and likely makes more sense) to just save theoption_value
, but doing so increases the complexity of the logic in PHP to a degree that doesn't seem to actually make anything easier or better for ourselves. - The lack of continuity of our
pre_
filters in these functions makes it completely impossible to improve thealloptions/notoptions
cache problem without extremely elaborate filters onquery
and so on. Normally I'd prefer to make a plugin as a proof-of-concept, but that is unfortunately impossible here as this API is coded today (and my patch does not make any effort to improve that, though it isn't any more difficult either.) - Please be open and critical, and scrutinize this deeply. I'm too close to the concept now to make many significant improvements to this patch alone, but I'm convinced that something similar to this approach is the ideal solution today (beyond schema changes,
wp_optionmeta
, etc...) - The way that
options.php
is included at the top offunctions.php
is weird to me, but I rolled with it here.
Thank you in advance for taking any time to review my patch. 💜
#48
@
7 years ago
I really like this proposed change. I'm not sure about the usage of metadata for options, but this can be discussed separately. Having distinct classes for options and a better way to query and cache them is a major improvement, not only for the obvious issues, but also for allowing developers to query options more flexibly for all the edge-cases where they currently make direct DB requests. :)
I did a first review on 31245.2.diff and have the following comments:
- Why does line 236 of
class-wp-option-query.php
check whether multisite is disabled? Not saying this is wrong, but I'd like to know what the background for doing that is. - The query vars for querying by option ID should either be called simply
option
,option__in
andoption__not_in
(or alternativelyid
,id__in
andid__not_in
). This ensures they align with similar names from other Core query classes. I think this change is especially important since theoption_id
column is not the "actual" property inWP_Option
(it is justid
which is good!), so I don't think that name should be used anywhere. - I'd prefer for
WP_Option::get_instance()
to have a different name. Other Core object classes already contain aget_instance()
method that accepts an ID and returns the instance for that ID.WP_Option::get_instance()
is more flexible though, as it behaves more like functions such asget_post()
orget_site()
. Therefore I think we should give it a different name, so that we would theoretically be able to add a method with similar flexibility to the other Core object classes that has the same name. I'm thinking about simply usingWP_Option::get()
, just to be future-proof (think about a possibleWP_Post::get()
as replacement forget_post()
, orWP_Site::get()
as replacement forget_site()
). - I don't think we need to support an array being passed to the
WP_Option
constructor. No other Core object class supports this, and supporting that only adds a bit of unnecessary complexity to the logic. - I like the idea of having properties with more sane names in
WP_Option
, however it appears there are some issues with setting them. The constructor simply sets theget_object_vars()
keys from the database result, which would not be the properties that the class declares. We need to make sure the correct properties are set by mapping$option_id
to$id
,$option_name
to$name
and$option_value
to$value
. - I'm not sure we should even support the original database column names through magic properties. There are no backward-compatibility concerns as this is new. :) Especially the magic
ID
property should not be supported. It has never been associated with options, and at some point in the Core chat it was decided to go withid
as property name for an ID wherever possible. Due to those reasons, I think no magic methods should be present in the class. - Currently full
WP_Option
objects are stored in cache. I think it's fine to store the full objects, but it should be plainstdClass
objects, as storing the entire class instances can cause unexpected problems. Some logic could be addedwp_update_option_cache()
to ensure that.
#49
@
7 years ago
My feedback
- Remove check if is_multisite() in WP_Option_Query
- wp_option:: get_instance should work like get_instance methods in core using id. Another method called get_by_name should be added.
- wp_option:: get_instance $use_cache could be removed and a filter of 'option_use_cache'. When installing
add_filter( 'option_use_cache', '__return_true');
- Set no_found_rows to false but default. Pagation will not be needed in most use cases.
if ( ( 'home' === $option ) && empty( $value ) ) {
should go backif ( 'home' == $option && '' == $value )
. There might be good reason to have a false value in database.- populate_options should use WP_option_query and add_option.
- It is a shame that we have to store the whole object in cache. It would be better to just store the option_ids and use something like wp_cache_get_multi.
#50
@
7 years ago
In 31245.3.diff I try something a little different. I really like the changed in 31245.2.diff however, I don't think that is fixes the race condition, where autoloaded options revert. In my patch just deleted the alloptions cache value after add/update/delete options. It treats alloptions cache as poisoned on update and deletes it. I believe this is the best and most simple solution to this error. It is also how the meta api works. https://github.com/WordPress/WordPress/blob/master/wp-includes/meta.php#L107
Delete the alloption cache brings some problems. It does mean that next time you interact with options, it will call load_all_options again, resulting in another query. This added query broke a lot of the tests, so I had to remove tests that do query counting. Query counting as a test feel like a bad way to test.
As for WP_Option and WP_Option_Query, this is great work and should be broken into another ticket.
#51
@
7 years ago
I am personally strongly -1 on introducing WP_Option
. Options are properties of sites, not fully-fledged objects. I am likewise strongly -1 on introducing wp_optionmeta
- to me, it'd be similar to introducing a wp_postmetameta
.
I'm ambivalent towards introducing a WP_Option_Query
; I'm not sure options really need the full complexity that comes with a query class. That said, I think there is some utility in it, so I'm not against it.
I am very strongly against treating options as objects however. Similar to how the meta tables are key-value stores for their parent objects (i.e. postmeta is a key-value store for the parent post), options are a key-value store for the parent site. The only real additional property that options have in the database (autoload) is a concession for performance, not a real additional properties. (The ID property has been functionally useless since 1.5, and apart from upgrade routines for v1.5 and v2.9, is never used.)
#52
@
7 years ago
(On the ID point: IDs were effectively deprecated in #2699 (2.9), and were only kept around for plugin compatibility, so options has been enforced as a key-value store since then.)
#53
@
7 years ago
My recommendation is the following.
We use my patch 31245.3.diff , as it fixes the race condition documented at above. It is a tidy solution as it doesn't require the introduction of get_cache_multi and possible breaks. I would like @rmccue & @joehoyle to take a look at my patch, as I know it is issue they have looked and fixed outside of core.
As for WP_Option_Query and WP_Option, I like standardising interfaces in core. WP_Option_Query feels like a good fit for the api, as there is value in search for options by name. WP_Option_Query has some nice filters and should replace the internals of wp_load_alloptions. For that reason, WP_Option_Query work should be moved to #33958 .
#54
@
7 years ago
My concern with 31245.3.diff is the performance hit. Our redis object cache at Pagely works around the race condition already, this gets rid of my ability to keep our performant implementation.
Object cache authors can already add this solution to the race if they don't' want something fancier.
This ticket was mentioned in Slack in #core by swissspidy. View the logs.
7 years ago
#56
@
7 years ago
- Why not adding a
wp_cache_get_group
to retrieve all values from a group? This would allow the underlying cache storage to exploit the relational model within group data, to possibly speed up a multi fetch. There is currently no cache implementation I know that would profit. But think about a cache implementation were fetching two adjacent values in a single group is cheaper than fetching two values from two different groups.
- Like @spacedmonkey said, introducing a multi-fetch here is not the proper way to fix the race condition. I think there is a much easier fix, see fabian-race-fix.diff . The additional condition
wp_using_ext_object_cache()
is necessary because the core implementation of the object cache is actually wrong.wp_cache_get
should return "not found" whenever$force
is true on a non-persistent cache.
#57
follow-up:
↓ 58
@
7 years ago
For the record this is the work around used on WordPress VIP.
/**
* Fix a race condition in alloptions caching
*
* See https://core.trac.wordpress.org/ticket/31245
*/
function _wpcom_vip_maybe_clear_alloptions_cache( $option ) {
if ( ! wp_installing() ) {
$alloptions = wp_load_alloptions(); //alloptions should be cached at this point
if ( isset( $alloptions[ $option ] ) ) { //only if option is among alloptions
wp_cache_delete( 'alloptions', 'options' );
}
}
}
add_action( 'added_option', '_wpcom_vip_maybe_clear_alloptions_cache' );
add_action( 'updated_option', '_wpcom_vip_maybe_clear_alloptions_cache' );
add_action( 'deleted_option', '_wpcom_vip_maybe_clear_alloptions_cache' );
This is the same as patch 31245.3.diff. Which means this patch has been tested as scale...
#58
in reply to:
↑ 57
@
7 years ago
Replying to spacedmonkey:
For the record this is the work around used on WordPress VIP.
...
This is the same as patch 31245.3.diff. Which means this patch has been tested as scale...
Yup, we've been using essentially this code (delete alloptions when updating options and let it reaload) for the past two years since I posted above and never had problems with it.
I don't have time to read all the complex ideas that were posted in between, but this simple one sure works fine.
#59
@
7 years ago
- Severity changed from normal to major
- Type changed from enhancement to defect (bug)
- Version set to 4.9
Faced with this bug every day. It really interferes with normal work of the editorial Board of our newspaper. What you need to do to have this error corrected? Patch from user @spacedmonkey, while solving the problem. Made it into a separate plugin (https://github.com/mihdan/mihdan-ticket-31245-patch).
#62
@
7 years ago
- Severity changed from major to normal
- Type changed from defect (bug) to enhancement
Resetting ticket attributes to keep it in the correct trac reports. Changing them won't help get it fixed faster.
#63
follow-up:
↓ 64
@
7 years ago
@dd32: I have to agree with @mihdan, it's a bug, not an enhancement. It's major, not normal.
#64
in reply to:
↑ 63
@
7 years ago
- Version changed from 4.9 to 2.1
Replying to tillkruess:
The issue being run into is a bug (one which I also run into as well), however the fix proposed here is an enhancement and is why it was filed as such.
Setting version to 2.1 as that's when we introduced the alloptions functionality being referenced here, the version field should reference the earliest version of WordPress something applies to.
#67
in reply to:
↑ 26
;
follow-up:
↓ 68
@
6 years ago
Replying to jeremyclarke:
Quick note on stupid plugins
FWIW for anyone experiencing problems related to
alloptions
I recommend doing an audit of the ongoing actions taken by your object cache. I added logging to ourobject-cache.php
and found several plugin features (some by me, some from repo) that were triggering option changes on every pageload for foolish reasons. Cleaning up these noisy/constant updates will probably make a huge difference to the incidence of {{[alloptions}}} bugs and obviously reduce the overall load on your system.
Heya @jeremyclarke are you able to show what you did to get better insight into what plugins are triggering option updates on every page load?
#68
in reply to:
↑ 67
@
6 years ago
Replying to lkraav:
Heya @jeremyclarke are you able to show what you did to get better insight into what plugins are triggering option updates on every page load?
In my case, I had an object caching plugin running, which meant that options were going straight there, and not to the DB, so I couldn't use normal DB profiling to figure it out.
If you aren't using an object cache, I recommend Query Monitor:
https://en-ca.wordpress.org/plugins/query-monitor/
Easy to use and super effective at showing you what is taking time, causing queries, and eating up resources on your site. If you disable object cache, you should see anything updating options because it will cause DB queries that will show in the plugin.
p.s. PLZ tag @jerclarke, unfortunately there's no way to rename my old wordpress.org account.
#70
@
5 years ago
Anyone still face this issue in 2019? :)
I know some web hosts definitely do. (I don't want to call them to feel called out, so I won't list them here.)
#71
@
5 years ago
@dd32 What is it gonna take to get this "enhancement" into core? Clearly everyone knows the issue still exists and is just ignoring it, the fix is well tested at scale, could have been merged into any given point release or major version in the past few years.
WP has been super focused on getting major performance improvements, a working object cache can definitely help there.
Just wondering why everyone agrees it needs addressing, have a fix, yet like so many other tickets the fix never gets applied. Works done, lets apply it, close this and move on.
#72
@
5 years ago
everyone agrees it needs addressing
Not everyone agrees on what the correct fix is.
Agreement isn’t necessarily a prerequisite for committing, but it doesn’t hurt, and I’d lobby for some version of it here considering how important this issue is.
#73
@
5 years ago
@dd32 @johnjamesjacoby I have a patch 31245.3.diff which I haven't really got any feedback on. Can you review and give me usable feedback and I will work on a updated patch.
For what is worth, my patch has been running on WordPress.com VIP platform for years and I have run this patch for years in my projects with no issue.
#75
in reply to:
↑ 74
;
follow-up:
↓ 97
@
5 years ago
Replying to johnbillion:
Is it also in place on VIP Go?
Yes it is - https://github.com/Automattic/vip-go-mu-plugins/blob/master/misc.php#L73-L85
#76
@
5 years ago
- Owner changed from rmccue to SergeyBiryukov
- Status changed from assigned to reviewing
#77
@
5 years ago
@johnjamesjacoby Sorry, I thought everyone had seen @spacedmonkey solution above (Comment 57 above), that was what I was referring to, it was tested at scale, has multiple users confirming it works etc. It is the accepted solution in all rights, or should be.
Just my 2 cents.
This ticket was mentioned in Slack in #core by sergey. View the logs.
5 years ago
#81
follow-ups:
↓ 82
↓ 99
@
5 years ago
I like the simple direction that 31245.3.diff has gone in, but I have two main concerns: In a multi-database-server environment or where there's heavy load/lots of requests/second that this just swaps one cache race issue for another.
This "solves" the cache race issue by having the next option get/page load pulling the data directly from the database, but in doing so introduces two cache race issues:
- In a heavy-option-write situation (Where the original cache race issue is most present) If an option write occurs in another page request between the time the DB Read and Cache set there's a "small" (Depending upon server speed/load) margin of time where the same race issue can occur.
- In a multi-database environment, #1 is more pronounced if the second thread setting the cache is reading from a secondary database where there's a likelyhood that the replication time isn't effectively instant. For example, take a site that's receiving 100 req/s: 1 thread updates an option, 99 threads immediately query their secondary database that's out of date by a millisecond and cache the old data.
The second case above can be worked around by having alloptions queries being made against the primary DB server, although that might not be ideal for load purposes, but would an easy way to avoid it.
The first case is arguably better than the current situation though, and might fix it for some sites.
I believe this is the best and most simple solution to this error. It is also how the meta api works
Those cases are far more likely to happen with options than meta, as options often have a much heavier write load due to the ways that options have been used - In a way, that options/alloptions were never designed for.
The benefit of shifting to multiple cache keys / multi_get() isn't maybe as obvious at first, but it works around the above in a fairly performant manner as a) race issues are not an issue (Multiple cache keys can update independently), b) there's no sudden DB load increase when clearing the cache (Not that this is much of an issue with modern servers) and c)avoids any per-key limits on the object cache (For example, the 1MB Memcache value limit that used to be hit on WordPress.com so often that I think it still has logic to split some cache values over multiple cache keys)
Personally, I like how this issue has been fixed in existing Object caches, for example, Human Made's Memcache object-cache.php has specific logic for internally converting the core alloptions cache to individual cache keys using it's get_multi implementation. It solves a lot of the problems that this ticket has, by just fixing the bug at the object-cache layer, so that core doesn't have to worry about how every different object cache differs.
(I guess that's also another concern of mine - This change breaking an existing cache's implementation of alloptions - 31245.3.diff won't break the above implementation, but it'll make it less performant)
Perhaps an alternate options here is to specifically offload the alloptions caching to the object cache? Fix the Core API to do something like if ( is_callable( $wp_object_cache, 'get_alloptions' ) ) { return $wp_object_cache->get_alloptions(); } else { /* existing code */ }
#82
in reply to:
↑ 81
;
follow-up:
↓ 83
@
5 years ago
Replying to dd32:
(I guess that's also another concern of mine - This change breaking an existing cache's implementation of alloptions - 31245.3.diff won't break the above implementation, but it'll make it less performant)
One option that comes to mind is implementing 31245.3.diff not in the hardcoded way it is in the patch, but instead hooked to added_option
/updated_option
/deleted_option
, as demonstrated in comment:57. That way existing object caches could choose whether to go with core's wp_cache_delete( 'alloptions', 'options' )
or unhook it and keep their own implementation for better performance.
I'm also curious about fabian-race-fix.diff, do you have an opinion on that patch?
#83
in reply to:
↑ 82
@
5 years ago
Replying to SergeyBiryukov:
One option that comes to mind is implementing 31245.3.diff not in the hardcoded way it is in the patch, but instead hooked to
added_option
/updated_option
/deleted_option
, as demonstrated in comment:57. That way existing object caches could choose whether to go with core'swp_cache_delete( 'alloptions', 'options' )
or unhook it and keep their own implementation for better performance.
That would be a cleaner way of implementing it, and wouldn't be much different to the current implementation since the actions run immediately after the cache is updated.
I guess the difference is that this would be relegating it to object caches as an optimization they could add, vs adding it a a "first class new feature" that the object caches would be expected to implement.
I'm also curious about fabian-race-fix.diff, do you have an opinion on that patch?
That'll also fix some variants of the issue, especially with long-running processes or where options are being used as a temporary data store in response to "expensive" actions (ie. external HTTP calls, etc). In an environment where there's heavy option writing clashes would still be possible, but much minimised unless they happened at the same few milliseconds as mentioned earlier.
It'd also remove the possibility of a sudden DB read influx of the other approach.
This seems like the safest option here that will have an impact while also reducing the side effects. Probably won't fix it entirely, but will fix it for a not insignificant number of use-cases.
This ticket was mentioned in Slack in #core by sergey. View the logs.
5 years ago
#87
follow-up:
↓ 88
@
5 years ago
What are the blockers for getting attachment:31245.3.diff into trunk?
I'm not thrilled that it removes tests.
#88
in reply to:
↑ 87
@
5 years ago
Replying to tellyworth:
What are the blockers for getting attachment:31245.3.diff into trunk?
I'm not thrilled that it removes tests.
Per discussion starting with comment:82, 31245.4.diff would be a cleaner way of implementing it, allowing existing object caches to choose whether to go with core's wp_cache_delete( 'alloptions', 'options' )
or unhook it and keep their own implementation for better performance.
However, I'm leaning towards 31245.4.fabian-race-fix.diff (a refresh of fabian-race-fix.diff). Per comment:83, it appears to be the safest option here that will have an impact while also reducing the side effects. It also doesn't need to remove any tests, the full suite seems to pass as is with the patch applied.
#89
follow-up:
↓ 90
@
5 years ago
Are the two patches intended to be used together, or are they mutually exclusive?
Also, it looks like the race-fix patches probably have no effect on a base install, since the $force
parameter to WP_Object_Cache::get()
is unused. Is that intended?
#90
in reply to:
↑ 89
;
follow-up:
↓ 91
@
5 years ago
Replying to tellyworth:
Are the two patches intended to be used together, or are they mutually exclusive?
Not mutually exclusive, but I think 31245.4.diff might no longer be necessary if 31245.5.fabian-race-fix.diff manages to fix most manifestations of the issue without the side effects of a sudden DB load increase after clearing the cache, or reducing performance of existing caches' implementations, per comment:81 and comment:83.
Also, it looks like the race-fix patches probably have no effect on a base install, since the
$force
parameter toWP_Object_Cache::get()
is unused. Is that intended?
Yes, in my testing using the steps from comment:19 (with Redis instead of Memcached), the base install appears to be unaffected, the issue only occurs when using persistent cache, which does handle the $force
parameter.
The testing also revealed that $force_cache
should be set not only in update_option()
, but also in add_option()
and delete_option()
. This was missed in the previous refresh, and is now done in 31245.5.fabian-race-fix.diff.
#91
in reply to:
↑ 90
@
5 years ago
Replying to SergeyBiryukov:
Yes, in my testing using the steps from comment:19 (with Redis instead of Memcached), the base install appears to be unaffected, the issue only occurs when using persistent cache, which does handle the
$force
parameter.
Just noting that this is similar to what _get_cron_lock() is doing to retrieve the cron lock.
That appears to be the only other usage of wp_cache_get()
's $force
parameter in core so far, introduced in [18659]. It's a bit different in that it also performs a DB query if not wp_using_ext_object_cache()
. Looks like that's not necessary here.
#93
@
5 years ago
- Keywords fixed-major added
- Milestone changed from 5.4 to 5.3.1
Moving for 5.3.1 consideration.
#97
in reply to:
↑ 75
@
5 years ago
Replying to spacedmonkey:
Replying to johnbillion:
Is it also in place on VIP Go?
Yes it is - https://github.com/Automattic/vip-go-mu-plugins/blob/master/misc.php#L73-L85
Now that there is a fix in place in WP 5.3.1, is it important to remove the previous fix for the alloptions race condition?
#98
@
5 years ago
- Keywords needs-dev-note added; fixed-major removed
This should receive a dev note.
#99
in reply to:
↑ 81
@
4 years ago
After running this fix in production for a few months now, I want to document that the alloptions race condition still exists in some circumstances.
Our site runs https://wordpress.org/plugins/redis-cache/ from Till Kruss. Our site also uses HyperDB with a write master and a read slave. We have a healthy amount of traffic, typically we have ~50 reqs/sec or so, but that number is highly variable.
At periodic and unpredictable times, the server load on our Redis server will jump from 20% to 100%, rendering the site inoperable as Redis times out repeatedly. The logs show hundreds of events doing calls to set the alloptions cache, which in turn tries to save it in Redis. Clearly there is still a race condition possible here.
wp_cache_set('alloptions', Array, 'options') WP_Object_Cache->set('alloptions', Array, 'options', 43200) Redis->setex('1:options:allop...', 43200, 'a:568:{s:7:"sit...')
In the case of our site, the calls to writing alloptions to the cache come from a few different plugins. Jetpack's sync function does it. So does https://wordpress.org/plugins/safe-redirect-manager/ and https://wordpress.org/plugins/the-events-calendar/.
I am curious from @spacedmonkey what the experience has been on VIP Go with this new functionality introduced in 5.3.1 and whether the previous patch remains in place in production.
Also @SergeyBiryukov, @dd32, @fabifott note that this patch may have improved, but evidently did not fix the potential for alloptions race conditions when used with an object cache.
#100
@
4 years ago
note that this patch may have improved, but evidently did not fix the potential for alloptions race conditions when used with an object cache.
As I noted above, this was never going to fix it 100%. It was only ever going to reduce the instances of it, and hopefully limit those who experience it to the more complex scenario's.
With plugins like Jetpack and specifically Jetpack Sync which make use of a lot of options, when Primary/Secondary DB's come into play there's always going to be a risk of multiple requests attempting to fill the cache, and worse a request reading from the secondary DB before the replication has occurred. WordPress can't really work around all of these when Object Caches and Primary/Secondary DB interfaces are handled by plugins.
IMHO the only way that this can be reliably fixed in those environments is to have the Object Cache treat the alloptions
cache differently than the rest, like how LCache, Memcache, and Memcached do (All mentioned above in comments ~3 years ago). Something similar to those for the Redis implementation would probably solve a bunch of problems, and based on my understanding of Redis might even be easier than how it's implemented in Memcache (due to the different APIs).
#101
@
4 years ago
@lisota: The [Pro version](https://wprediscache.com) of Redis Object Cache is currently test driving using a Redis hash for the alloptions
keys to see how it performs and if all race conditions are resolved on high traffic sites. If you can't wait for it to land in the open source version, feel free to submit a PR.
Previously discussed on Slack: https://wordpress.slack.com/archives/core/p1423196524002507