WordPress.org

Make WordPress Core

Opened 3 years ago

Last modified 23 months ago

#33276 new defect (bug)

redundant wp_using_ext_object_cache check in get/set_transient

Reported by: shmulikk Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 2.8
Component: Cache API Keywords: close
Focuses: performance Cc:

Description

Hello, That's my first WP bug report.

I'm using external memcached plugin. I'm testing WP for stability and abusing the system while shutting down different components.

I've found a stability issue when using external object caching plugin, not specifically in the external plugin(s) but in WP core.

If you look at the following methods you see there's a check for whether an external object caching plugin exist and if there is - it short circuit to directly use it.

get_transient line 602: https://developer.wordpress.org/reference/functions/get_transient/ set_transient line 667: https://developer.wordpress.org/reference/functions/set_transient/

This check is redundant because if you look at the execution route of when such external plugin doesn't exist - it first try to utilize wp_cache_set/get as part of the get/set_option or wp_load_alloptions functions anyway.

Now on top of that being redundant it also causes stability and performance issues in case where the external plugin fails to work with the persistent cache for some reason (i.e. memcached is down). If we remove this short circuit / redundant check - we gain a fallback mechanism to the options table. So at least we get the original WP behavior instead of to a broken state.

Performance tests I've made on the application without these checks resulted with same performance and stability of the WP core even when external persistent cache is unhealthy. I couldn't see any regression in application functionality to either core or 3rd party plugins we're using.

Best, Shmuel.

Attachments (1)

option.php.diff (3.8 KB) - added by shmulikk 3 years ago.
git diff of suggested fix

Download all attachments as: .zip

Change History (15)

#1 @atomicjack
3 years ago

-removed-

Last edited 3 years ago by atomicjack (previous) (diff)

#2 in reply to: ↑ description ; follow-up: @johnbillion
3 years ago

  • Keywords 2nd-opinion added
  • Version changed from 4.2.3 to 2.8

Replying to atomicjack:

Submit a .patch/.diff file and unit tests.

Please be a little more polite when interacting with other people on Trac. This sort of comment is not very welcoming for shmulikk.

Replying to shmulikk:

@shmulikk Thanks for the report, and welcome to WordPress Trac.

Unfortunately there's currently no unit test coverage for persistent object caching. It's a difficult thing to test because it typically requires multiple page loads in order to verify its behaviour. That said, removing the wp_using_ext_object_cache() conditional check means that the $group argument used for transients changes from transient to options. This creates potential for cache pollution if a transient shares its key with an option, or vice versa.

There is potential to allow get_transient() to fall back to its non-persistent cache behaviour when there's no value for the transient in the object cache, but this would likely mean an extra database query each time this happens because the value won't be present in the $alloptions array.

If object cache fault tolerance is desirable it'll need to be a bit more robust. Suggestions welcome.

#3 @dimadin
3 years ago

It seems that you are proposing to make transients functions look like methods in helper class I have created. In other words, it's the same as when transients are stored in the database.

#4 @shmulikk
3 years ago

Hi, Thank you for the quick and detailed feedback.

Basically once an external cache plugin misbehave - the transients functionality breaks, causing any feature dependent on transients to fail. In my experience one core feature breaks - the wp_cron (https://codex.wordpress.org/Function_Reference/wp_cron) which results with cron runs with every request instead of once per interval (default once per 60 seconds). I think this solely is a good cause to have more fault tolerance transients. On top of that also couple other plugins we use are breaking too as they never manage to persist data between requests using transients.

Now there are two roads to take IMO:

  1. Assume any third party caching plugin must fallback to the DB for persistent caching.
  2. Make the transients functionality to behave as it usually does, ignoring any external caching plugin existance in it's logic.

If we remove this logic to look for external 3rd party caching plugins in the core, we gain:

  1. WP functionality is working as expected even while persistent caching doesn't.
  2. Removing short-circuiting, which in case the 3rd party works - behave just about the same. Except of the valid point by @johnbillion - that the cache group will shift from 'transient' to 'options'.

The thing is that my suggestion is not a new behavior. This fallback from 'transient' cache group to 'options' happens anyway in the current code set of WP core. My only suggestion is to always use it, even while 3rd party caching plugins are in place. Once we always use it - we enable all the stability / robustness we need, which is already there.

Regarding tests - I've created a while ago an open source framework to performance test WP. This is a subset of what I'm using to test the stability and performance of our own system, in addition to hugh set of cucumber (functional) tests we are running for our own customizations / plugins, which also go through core functionality.

Maybe I'm missing the point of this short-circuit, but I cannot understand why having it in the first place.

Regarding diffs - I'll attach later today, just so this ticket will be more detailed.

@shmulikk
3 years ago

git diff of suggested fix

#5 @shmulikk
3 years ago

How can I help removing/reducing concerns of this change?

I wouldn't want to maintain local option.php version on my own, I rather have this changed in the core WP or otherwise I need to look for other workarounds which will not block me from upgrading WP due to this local change...

If there's anything which can speed up and assist the review process, please let me know.

#6 @shmulikk
3 years ago

Hello, I'm unsure how to push this forward? Should I ping anyone, should I clarify anything further?

Any status update is welcome. Thanks.

#7 in reply to: ↑ 2 @shmulikk
3 years ago

@johnbillion I'm not getting any attention for few months now. Is there anything I can do to clarify the topic? This is a core bug IMO which needs to be fixed.

Replying to johnbillion:

Replying to atomicjack:

Submit a .patch/.diff file and unit tests.

Please be a little more polite when interacting with other people on Trac. This sort of comment is not very welcoming for shmulikk.

Replying to shmulikk:

@shmulikk Thanks for the report, and welcome to WordPress Trac.

Unfortunately there's currently no unit test coverage for persistent object caching. It's a difficult thing to test because it typically requires multiple page loads in order to verify its behaviour. That said, removing the wp_using_ext_object_cache() conditional check means that the $group argument used for transients changes from transient to options. This creates potential for cache pollution if a transient shares its key with an option, or vice versa.

There is potential to allow get_transient() to fall back to its non-persistent cache behaviour when there's no value for the transient in the object cache, but this would likely mean an extra database query each time this happens because the value won't be present in the $alloptions array.

If object cache fault tolerance is desirable it'll need to be a bit more robust. Suggestions welcome.

#8 @shmulikk
2 years ago

That's still an issue in core WP. Anyone? for the moment we are running with a local modification but I would prefer to fix it in WP.

#9 @dd32
2 years ago

For an explanation of why this logic exists:

When you're using an external cache, all options ultimately get stored in an alloptions cache, which on large sites will exceed the storage limits once transients are added (for example, Memcache is 1MB by default).

As transients are transient, all code that uses them should expect the transient to expire in as little as instantly.

So; Removing the object cache checks within get/set_transient() isn't going to be possible, however, fixing it to detect a cache failure (wp_cache_set() returning false perhaps?) is a potential enhancement we can look at here.

#10 follow-up: @shmulikk
23 months ago

Hi @dd32 Thanks for looking into it. For some reason I didn't not get any notification and I just happened to re-check the status here.

Baiscally you are right with claim that all code should deal with not being able to get transients. The problem is that once an external caching plugin is used - the transients will completely fail, while the options mechanism will gracefully fall back. This means that the fact that I'm adding a caching plugin might actually hurt me because it will never fall back to the options table, thus functionality and performance are affected because the original code that try to use transients will fall back to its own defaults or re-run of logic due to the missing data.

This is specially true to the cron issue I've mentioned above in one of my comments.

#11 in reply to: ↑ 10 @dd32
23 months ago

  • Keywords close added; 2nd-opinion removed

Baiscally you are right with claim that all code should deal with not being able to get transients. The problem is that once an external caching plugin is used - the transients will completely fail, while the options mechanism will gracefully fall back.

I understand the issue here, but I don't think this is something we can handle in Core gracefully in a manner that is appropriate for the vast majority of sites that actually rely upon object caches. If you're running an object cache on a large site, falling back to the database would be disastrous for a site in the event the object cache went unavailable.

#12 @lukecavanagh
23 months ago

On the large site example, it would make more sense just restarting Redis.

#13 @shmulikk
23 months ago

Even if all my claims are not accepted, I would still argue against this short circuit in the logic of the transients. I fail to see why this code is in there, as the exact same code will get executed in the corresponding Options functions.

Best,

#14 @dd32
23 months ago

The code is intentionally duplicated, as Transients are designed not to hit the database in the event that an object cache is in use (Whether that object cache is working correctly or not, is out of the hands of WordPress).

A better way to approach this, I think, would be for for a object cache dropin to be able to notify WordPress that it's unhealthy and not working correctly, possibly by filtering wp_using_ext_object_cache() somehow.

Note: See TracTickets for help on using tickets.