WordPress.org

Make WordPress Core

Opened 4 years ago

Closed 2 years ago

#34239 closed task (blessed) (duplicate)

Split queries in `get_terms()` and `wp_get_object_terms()`

Reported by: boonebgorges Owned by:
Milestone: Priority: normal
Severity: normal Version:
Component: Taxonomy Keywords: needs-patch
Focuses: performance Cc:
PR Number:

Description

The cache strategy in get_terms() looks like this:

  1. Query for full rows from the term/term_taxonomy tables.
  2. Update the cache for the individual terms found.
  3. Store the results of the query in a single cache key.

wp_get_object_terms() does 1 and 2, but doesn't cache its query results at all.

This is a wasteful strategy.

  • In the case of get_terms(), term objects can be stored dozens of times in the cache. Each combination of arguments passed to get_terms() results in a separate cache key, and each of these slots is home to full term items. We don't have problems with invalidation because we're using the sledgehammer of a 'last_changed' incrementor, but we are certainly taking up more space in persistent cache storage than we need to.
  • $wpdb->get_results( "SELECT * FROM ..." ) is significantly more expensive than $wpdb->get_col( "SELECT term_id FROM ..." ), both in terms of the memory used to store results and the index strategy that MySQL can use. Selecting full results was probably necessary when terms could be shared between taxonomies - term_id wouldn't have been a unique identifier - but we shouldn't have that problem anymore.

Let's split the query in both of these functions, so that we do the following:

  • The primary SQL query will fetch term_ids only. $wpdb->get_col( "SELECT term_id FROM..." )
  • We'll cache the results of that query, which will be an array of integers, instead of an array of stdClass objects with a bunch of data in them, using the existing 'last_changed' technique. (For now, we can limit ourselves to caching only get_terms() queries. It'd be great to do wp_get_object_terms() too, but this will probably take some more research.)
  • Do a SELECT * FROM ... WHERE term_id IN (...) query to get data corresponding to uncached terms, then prime the cache for those terms.
  • Fill the term objects from the cache before returning from the function.

Both get_terms() and wp_get_object_terms() support a 'fields' parameter, which allows the selection of only a subset of fields, as an alternative to 'all' (or 'all_with_object_id'). At some point down the line, it'd be great to convert as many of these as possible to use the persistent cache described above. But for now, it probably makes sense to start with the 'all' queries only.

Change History (3)

#1 @boonebgorges
4 years ago

In 35032:

Don't cache WP_Term objects in wp_get_object_cache().

The data stored in the cache should be raw database query results, not
WP_Term objects (which may be modified by plugins, and may contain additional
properties that shouldn't be cached).

If term relationships caches were handled in wp_get_object_terms() - where
a database query takes place - it would be straightforward to cache raw data.
See #34239. Since, in fact, get_the_terms() caches the value it gets from
wp_get_object_terms(), we need a technique that allows us to get raw data
from a WP_Term object. Mirroring WP_User, we introduce a data property
on term objects, which get_the_terms() uses to fetch cacheable term info.

Fixes #34262.

#2 @boonebgorges
3 years ago

Previously: #24189

#3 @boonebgorges
2 years ago

  • Milestone Future Release deleted
  • Resolution set to duplicate
  • Status changed from new to closed

Closing this ticket in favor of #37189, which has patches.

Note: See TracTickets for help on using tickets.