Make WordPress Core

Opened 6 months ago

Closed 3 months ago

Last modified 3 months ago

#63450 closed defect (bug) (wontfix)

Race condition for transient with no expiration and object cache active

Reported by: hugod's profile hugod Owned by:
Milestone: Priority: normal
Severity: normal Version:
Component: Cache API Keywords: has-patch close
Focuses: Cc:

Description

When object cache is installed and is temporarily unavailable, transient values with no expiration can be obsolete.

Here is a scenario:

If some major feature depends on this transient (e.g. rewrite rules), it can lead to weird behaviors that are hard to debug.
For the record, we encountered such a bug in Polylang ([see]https://github.com/polylang/polylang/pull/1653).

Change History (10)

This ticket was mentioned in PR #8826 on WordPress/wordpress-develop by @hugod.


6 months ago
#1

  • Keywords has-patch added

As per the title, force deletion or creation of the transient in wp_options table when object cache is active.
Preventing race condition between object cache and wp_options table for a transient without expiration.

Trac ticket: https://core.trac.wordpress.org/ticket/63450

#2 @audrasjb
6 months ago

  • Milestone changed from Awaiting Review to 6.9

Thanks for the report and patch, moving this for 6.9 consideration.

#3 @peterwilsoncc
3 months ago

  • Keywords close added

I'm strongly inclined to close this as wontfix. There's a few reasons so I apologise if this becomes long winded.

The linked pull request modifies the transients to store them in both the options table and the object cache.

For sites with a persistent object cache, this will result in the transients being cached twice: once in the transient and once in the options group. For non-expiring transients the option will be stored in the alloptions cache. This will have an adverse affect on the performance on these sites as the double storage will lead to increased evictions and potential issues caused by the size of the alloptions key.

Toggling the object cache on and off in the manner described isn't a good idea and will lead to problems. If the caching server is flaky, I consider that a hosting error that needs to be resolved, no different to a database server being flaky.

If some major feature depends on this transient (e.g. rewrite rules), it can lead to weird behaviors that are hard to debug.

I'm not too concerned about this because transients are designed to be highly disposable, their existence from one second to another isn't guaranteed; this is the case even for non-expiring transients.

If a feature requires data be persistent in some fashion, then transients aren't the solution.

#4 @TimothyBlynJacobs
3 months ago

  • Milestone 6.9 deleted
  • Resolution set to wontfix
  • Status changed from new to closed

Agreed, I think the performance impact for "working" sites would be too great.

#5 @hugod
3 months ago

Hi,
Thanks for your feedback. I get your concerns about performance. But this is related only to my code proposal. Not the issue itself.
I think you focused too much on the PR and not the issue. There are other ways to fix it IMHO.
For example, we could consider to *not* fetch data from wp_options table at all when querying transient with object cache enabled.

I'm not too concerned about this because transients are designed to be highly disposable, their existence from one second to another isn't guaranteed; this is the case even for non-expiring transients.

I agree, but in this case I expect to either get the right value or nothing at all, not another value.

#6 @TimothyBlynJacobs
3 months ago

For example, we could consider to *not* fetch data from wp_options table at all when querying transient with object cache enabled.

Is this not already the case? AFAICT the branch to check the database is not executed if an external object cache is in use.

#7 @hugod
3 months ago

Is this not already the case? AFAICT the branch to check the database is not executed if an external object cache is in use.

Indeed, my bad. I've opened the ticket a long time ago and didn't re-read it correctly.
So I guess we're stuck with the issue.

#8 @TimothyBlynJacobs
3 months ago

Yeah I don’t think there is much for Core to really do here. But really if the client is toggling on and off an object cache, they should be flushing the object cache or clearing all transients at that time.

#9 @hugod
3 months ago

Just for you know, the issue appeared on servers without the client disabling the object cache.
But sometimes the object cache would be unavailable for some reason (update, malfunction...).
But I guess this is too specific to be addressed in Core.

#10 @peterwilsoncc
3 months ago

@hugod It's probably worth documenting the recommendation to clear transient options when enabling a cache in the handbook entry on optimization.

We could consider calling delete_option when a transient changes if an object cache is enabled but my instinct is that this too would be a performance hit that would be best left for a plugin using the set_transient_{$transient} hook.

Note: See TracTickets for help on using tickets.