Make WordPress Core

Opened 3 years ago

Closed 9 months ago

Last modified 7 months ago

#52710 closed enhancement (fixed)

Taxonomy: Make it possible to bypass automatic caching of results in `get_terms()`

Reported by: dlh's profile dlh Owned by: spacedmonkey's profile spacedmonkey
Milestone: 6.4 Priority: normal
Severity: normal Version: 2.3
Component: Taxonomy Keywords: needs-unit-tests has-patch needs-dev-note
Focuses: performance Cc:

Description

With a few exceptions, every call to get_terms() places the query results in the object cache for 24 hours as part of WP_Term_Query::get_terms(). The query results are cached under a key that is based on the last_changed cache value for the terms cache group, such that whenever last_changed changes, all cache entries become stale.

Many different events can bump the last_changed time for terms, including creating, updating, or deleting terms, or changing the terms assigned to a post.

For busy sites (even not-so-busy ones), these events happen many times a day, if not many times an hour. The chances of any cached term results surviving for 24 hours without being made stale is basically zero.

For sites that are behind a full-page HTML cache, the chances of a request reaching WordPress such that the cached term results can be reused before they're made stale are also somewhat small.

Lastly, larger sites are likely to have implemented their own query caching in ways that are optimized for their use cases and that make core's caching redundant.

For these sites, the result is a heap of unreachable cache entries waiting to expire or be evicted. The goal of this ticket is to give developers a way to prevent the caching from occurring.

The attached patch proposes a boolean update_get_terms_cache parameter for get_terms() and WP_Term_Query::get_terms() (so named for the cache key prefix). When false, caching is skipped. The parameter can be set on a per-query basis or made false by default via the get_terms_defaults filter.

That's one idea, and I'm open to any others that might gain a consensus. For example, the cache TTL could be made into a parameter. So could the cache_group, allowing developers to direct the results into a non-persistent group.

Attachments (1)

52710.diff (4.9 KB) - added by dlh 3 years ago.

Download all attachments as: .zip

Change History (25)

@dlh
3 years ago

This ticket was mentioned in Slack in #core by dlh. View the logs.


3 years ago

#2 @spacedmonkey
2 years ago

  • Keywords needs-unit-tests has-patch added

Related: #36949 as this also introduces no caching option.

#3 follow-up: @peterwilsoncc
2 years ago

For sites wishing to avoid the issue you describe, a one line plugin wp_cache_add_non_persistent_groups( 'terms' ) may achieve the same thing.

@dlh are you able to let me know if I am missing something?

#4 @spacedmonkey
2 years ago

Can't you just pass a random string to the cache_domain field. It would mean that cache's would always miss.

You can do this in a small plugin as well.

add_filter('get_terms_defaults', function( $query_var_defaults ){
   $query_var_defaults['cache_domain'] = microtime();
   return $query_var_defaults;  
});

#5 follow-up: @dlh
2 years ago

Hey @peterwilsoncc — yes, I think that one-liner would have the same effect, but with a couple other effects that might not be as desirable:

  • Term objects would no longer be persistently cached, since those also go into the 'terms' group. In the scenario I was describing, individual terms themselves don't change all that often, but they're still accessed all the time, so having them in the object cache might still be beneficial.
  • More speculatively, sites would would be cut off from any future caching enhancements that core puts into place under terms.

Still, for sites like the one I was describing, perhaps those are the tradeoffs they have to make? Or let me know if I've misunderstood anything.

@spacedmonkey - if I'm following correctly, I don't think setting cache_domain to a random string would have the expected effect. As you say, the resulting wp_cache_get() would always miss; after that, the data then stored with wp_cache_set() would never be hit. Both of those calls would represent unnecessary traffic to the caching backend, and the cached data would be dead on arrival, just waiting to be evicted or to expire without any chance of being used.

#6 follow-up: @spacedmonkey
2 years ago

If the argument for this ticket, that because terms get invalidated all the time, that WordPress should not both to cache? I am not sure I under the logic of that. For my blog for example, I don't update the site that much, so caches would be warned and used lots on post / page loads. We can't control how many times user update their site, but we can't try and cache the best we can. What value does not storing caches have? Not filling up an object cache?

#8 @spacedmonkey
2 years ago

I have created a simpler patch at #2216. I added cache_results param, which is inline with the same param in WP_Query.

#9 in reply to: ↑ 6 @dlh
2 years ago

Replying to spacedmonkey:

If the argument for this ticket, that because terms get invalidated all the time, that WordPress should not both to cache? I am not sure I under the logic of that. For my blog for example, I don't update the site that much, so caches would be warned and used lots on post / page loads. We can't control how many times user update their site, but we can't try and cache the best we can. What value does not storing caches have? Not filling up an object cache?

No, I don't think the argument is that WordPress shouldn't bother to cache, but instead that there might be situations in which this particular caching behavior could be counterproductive, and it would nice to be able to control it and measure the effect of bypassing it.

For your blog and sites like it, I agree, the proposal in this ticket would probably be unnecessary. The sites that I have in mind are sites that publish new content several times an hour, as well as networks of tens or hundreds such sites.

For those sites, we can't control how often they update a site, but we can estimate how often they will. And, yes: In my experience, at least, every network request to the object cache and cached object that takes up space without being used has an effect on the performance of the site or sites, and so the value is in reducing those.

#10 in reply to: ↑ 5 @peterwilsoncc
2 years ago

Replying to dlh:

  • Term objects would no longer be persistently cached, since those also go into the 'terms' group. In the scenario I was describing, individual terms themselves don't change all that often, but they're still accessed all the time, so having them in the object cache might still be beneficial.

Of course, thanks for pointing that out among the other items.

#11 @spacedmonkey
2 years ago

Any chance of a review of my PR @peterwilsoncc ?

#12 @peterwilsoncc
2 years ago

I'll post a review on the PR.

I noticed the argument cache_domain was added in [18128] for #13318 but apparently never used in the wp_cache_*() function calls.

Reading the ticket, it seems that this was to allow get_terms() to use a custom domain so plugin developers could change the domain used by WP_Term_Query if they wished to make it non-persistent without affecting the domain used for term objects. This accounts for the issue @dlh identifies in the comment above.

I could also be completely misunderstanding the purpose of the ticket and cache_domain was intended to be affect the hash generated for the cache key.

Rabbit holes down the rabbit holes...

#13 @dlh
2 years ago

For my part, I had read the intent of [18128] to be the latter: to affect the hash for the cache key. But making it possible to change the direct the queries into a different cache group, e.g. a cache_group parameter, would also be a nice way to address the original concern.

#14 @spacedmonkey
9 months ago

Current patch has been refreshed and ready for another review.

@peterwilsoncc commented on PR #2216:


9 months ago
#15

The equivalent parameter in WP_Query also prevents the caching of the individual objects (ie posts).

In the intervening 20 months since writing this comment, I've realised it was incorrect. In WP_Query the cache_results parameter only affects the main query, not the individual posts.

#16 in reply to: ↑ 3 @peterwilsoncc
9 months ago

Replying to peterwilsoncc:

For sites wishing to avoid the issue you describe, a one line plugin wp_cache_add_non_persistent_groups( 'terms' ) may achieve the same thing.

@dlh are you able to let me know if I am missing something?

In #57625 the cache group for term queries was changed to term-queries so this may be an option again for your use case @dlh.


I've added a couple of notes to the updated PR.

This ticket was mentioned in Slack in #core-performance by spacedmonkey. View the logs.


9 months ago

#18 @spacedmonkey
9 months ago

Ready for review again.

#19 @spacedmonkey
9 months ago

  • Keywords commit needs-dev-note added
  • Milestone changed from Awaiting Review to 6.4
  • Owner set to spacedmonkey
  • Status changed from new to assigned

Marking as ready for commit.

#20 @spacedmonkey
9 months ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 56585:

Taxonomy: Introduce 'cache_results' parameter to WP_Term_Query for bypassing query caching.

Incorporating a new 'cache_results' parameter into the WP_Term_Query class, this commit empowers developers with the ability to bypass query caches, explicitly triggering database queries when needed. This brings the WP_Term_Query class inline with WP_Query and WP_User_Query that already have a 'cache_results' parameter.

Update the term_exists function to use this new parameter, so the term query caches are not used while importing.

Props dlh, spacedmonkey, peterwilsoncc.
Fixes #52710.

#22 @spacedmonkey
9 months ago

  • Keywords commit removed

#23 @codente
8 months ago

I'm reviewing tickets that may require dev notes with @webcommsat and we're wondering if this change should be included in the email to the plugin authors which is currently being collated by Marcomms. As a plugin maintainer myself, due to the nature of the change and the purpose of it being to give developers more control over caching, I feel it would be useful to plugin authors.

Documentation tracker issue for this trac ticket:
https://github.com/WordPress/Documentation-Issue-Tracker/issues/1181

We have noted that performance have clustered a number of tickets together for the dev notes.
We will cross-reference the issues to reflect this.

#24 @johnbillion
7 months ago

In 57046:

Docs: Correct some docblock formatting errors.

Fixes #59784

See #12009, #52710

Note: See TracTickets for help on using tickets.