The design of update_option etc makes race conditions likely to happen
|Reported by:||joehoyle||Owned by:|
After spending a few hours chasing a nasty race condition bug in production today, I thought it would be good to discuss the design of the $alloptions object caching and how it can lead to data loss very easily when using any persistent object caching.
- Thread A: Loads options from cache
- Thread B: Loads options from cache
- Thread A: Updates options to cache
- Thread B: Updates options to cache, which includes the "outdated" option loaded in step 2
This is not a new problem, and was discussed in some part in #17519.
With most persistent object caching plugins, this is not a problem limited to options, as they use an internal (in memory) cache which is used before going to the cache backend (such as memcached) - this means get / updating anything into the cache will not propagate between threads as each thread has potentially already populated it's internal cache tool of now stale data.
The "fix" for that scenario is to not use internal cache pools, so query memcached directly (aside: this is too slow so not really a solution).
However, options is a much worse case, because a single cache key alloptions is used. So updating a single option, pushes all (potentially stale) options back to the cache store. To draw a parallel with MySQL, this would be like the MySQL query in update_option pushing every option in memory back to the db.
As for a persistent object caching plugin fixing this internally - I don't see how it could with maintaining the use of an internal cache -- it's not so bad with updating a post_meta key for example, as the "race condition" scenario can only mess up that single post meta entry - with options, a race condition can lead to a lot data loss.
Having said that - what is the "solution" here? Presumably the options are cached as one large array for performance reasons so it's one cache get / set from memcached / apc etc to load everything in. I am not sure how much of an impact that really has - would having a single cache key for each option make things much worse?
Has anyone else come up against this much before? I know I have run into in a lot of api drriven stuff, as writes can often happen in concurrent running threads.
Change History (11)
2 years ago
- Keywords reporter-feedback added; dev-feedback removed
- Milestone changed from Awaiting Review to Future Release