Opened 9 months ago
Last modified 3 weeks ago
#21760 new enhancement
get_term_by() calls are not cached
| Reported by: |
|
Owned by: | |
|---|---|---|---|
| Priority: | normal | Milestone: | Future Release |
| Component: | Taxonomy | Version: | 2.3 |
| Severity: | normal | Keywords: | has-patch needs-unit-tests |
| Cc: | aaroncampbell, ben@…, erick@… |
Description (last modified by wonderboymusic)
get_term() is the simplest way to retrieve one term, but it requires term_id and taxonomy. Because of this, terms are cached with term_id as key and $taxonomy as bucket. As a result, you can't easily grab a term by slug, unless you use get_term_by( 'slug' ). get_term_by( 'slug' ) and get_term_by( 'name' ) don't even have a query cache, so they go to the database every time. Because you can't get a term by slug without hitting the db, every place you want to get a term by slug: you first have to transform it into a term_id where it will then be cached. This is inefficient because the user may query by slug constantly and never by term_id.
Attachments (7)
Change History (37)
wonderboymusic — 9 months ago
See also #11531.
We should probably rename these buckets. Rather than $taxonomy and {$taxonomy}_relationships, we should use something that isn't valid in taxonomy/CPT name. taxonomy:$taxonomy and relationships:$taxonomy seem sufficient. These are the only buckets I know of that have a dynamic component to them.
comment:3
wonderboymusic — 9 months ago
Per chat with Nacin - gonna work on separate cache buckets for each type and will resurrect get_term_by() which can leverage them. Also, get_term() would have a hard time distinguishing between id, name, and slug for the category named, slugged, term_id'd "1" etc
- Summary changed from Allow get_term() to accept string as first argument to get_term_by() calls are not cached
FWIW, by overloading get_term( $id ... to accept int|string|object, it brings up conditions where
- Slug is numeric
- a term_id is passed in, but return term_ids as strings in some locations, so you can't rely upon type being an int for term_ids
comment:7
wonderboymusic — 9 months ago
Attached a new patch:
1) get_term_by() calls are cached
2) term_cache_group() abstracts cache bucket for terms
3) update_term_cache() adds to cache of all 3 buckets
4) delete_term_cache() deletes cache of all 3 buckets
5) get_terms() calls update_term_cache(), deprecates the buggy list_terms_excusions in favor of get_terms filter
6) Lots of cleanup and streamlining of clean_term_cache() calls
- Keywords has-patch needs-unit-tests added; needs-patch removed
This looks like an awesome patch, but it's tiring to read, due to all the whitespace fixes. I'll try again in the morning.
In related news, the test suite doesn't seem to have a single call to get_term_by( 'slug', ... );
Also, the patch seems to fail an existing test:
There was 1 failure: 1) Tests_Term::test_set_object_terms_by_name Failed asserting that '1' matches expected 5. /home/scribu/svn/wp-tests/tests/term.php:135
Finally, we get some use out of these darn things. :)
wonderboymusic — 9 months ago
comment:10
wonderboymusic — 9 months ago
Updated the patch - the problem was a $wpdb->prepare thing in the IN query in clean_term_cache(). All tests pass now.
WITH ONE CAVEAT:
tests-posts-tax-cache-group.diff needs to be applied to Unit Tests so that the Post tests read from the proper term cache group
comment:11
wonderboymusic — 9 months ago
- Description modified (diff)
comment:12
scribu — 9 months ago
Right, except term_cache_group() is introduced in your main patch. :)
In this case, just add additional test cases to tests-posts-tax-cache-group.diff
comment:13
scribu — 9 months ago
Also, I'm not sure term_cache_group() should be a public function; maybe rename it to _term_cache_group() and mark it as @private.
comment:14
ryan — 9 months ago
Instead of caching multiple copies of an object, we usually cache IDs that can be used to look up a single cached copy. slug -> ID, ID -> object.
I'm not really digging term_cache_group(). I consider cache keys and groups public API that shouldn't change.
comment:15
nacin — 9 months ago
Some thoughts:
- The code in delete_term_cache() seems like it belongs in clean_term_cache(). (This code is similarly in clean_user_cache().)
- Not *entirely* sure that term_cache_group() is necessary. It only gets called in get_term(), get_term_by(), get_terms(), and a few relationship functions. Doing "taxonomy:$taxonomy:slugs" etc. seems pretty safe, and not unreasonably more complicated (or longer) than _term_cache_group(). Though... we might need to special-case the core taxonomies to avoid renaming those buckets, as ryan pointed out.
- With user login, slug, and email buckets, we don't cache the user object, but rather, just the ID. That way, we can then consult the ID-based bucket for the object. We should do the same here. See WP_User::get_data_by() for how that works.
- Removing list_terms_exclusions should probably come in the separate ticket, #21267.
- In get_cat_ID(), you changed if ( $cat ) to an is_object() check. Is that necessary? Will get_term_by() ever turn a value that evaluates to true that is *not* an object?
- I had to do an svn diff -x --ignore-all-space to be able to review the patch effectively. Whitespace changes, especially in functions not quite touched by the patch, make things a bit more difficult to review.
comment:16
wonderboymusic — 9 months ago
Updated patch to only cache IDs for slug and name - sorry about the whitespace, some of the term functions are esoteric and hard to read without some daylight. I can revert those changes in a subsequent .diff
My reasoning for term_cache_group was 1) standardizing group generation 2) flexibility in making changes in the future, only has to be done in one place. If this is deemed unnecessary, is it to retain the old groups?
comment:17
wonderboymusic — 9 months ago
Another piece of this - get_terms calls update_term_cache if you got all the fields, regardless of checking the hide_empty arg.
$terms = $wpdb->get_results( $query );
if ( 'all' == $fields ) {
update_term_cache( $terms );
}
I did a performance test where I created 20,000 tags and called get_terms( 'post_tag', array( 'hide_empty' => false ). This will create 60,000 cache entries which exhausted memory until I bumped up to 256M. I tested posts with 1000 tags and they seemed fine, but this one may need to exclude terms with count of 0 before calling update_term_cache with every returned term
comment:18
wonderboymusic — 9 months ago
updated the patch:
1) removed all of my whitespace fixes (that was a gift from heaven, let's just say I learned my lesson)
2) restored all return-by-reference things
3) moved delete_term_cache code to clean_term_cache
The patch is WAY more light now - next up is removing term_cache_group as per Boren's feedback
comment:19
nacin — 9 months ago
Why were the return-by-references added back? I wonder if we still need them anymore.
We might need to keep term_cache_group() as we'll need to keep the standard category, post_tag, and link_category buckets, but we should be able to get away with prefixing the buckets of any custom taxonomy.
wonderboymusic — 9 months ago
comment:20
wonderboymusic — 9 months ago
return-by-references need to die everywhere, they are PHP4 fluff. I just added them back because I was trying to tighten to scope of my changes. The only scenario where they make sense to use is $a =& $b, not $a =& my_function();
Patch has been updated with ret-by-refs removed.
"we'll need to keep the standard category, post_tag, and link_category buckets" - I took a stab at this, does it accomplish what you want?
comment:21
bpetty — 7 months ago
- Keywords needs-refresh added
I don't feel very comfortable with refreshing this patch myself with the latest changes, but I'll be happy to jump back in and do some testing if someone else can jump on this quickly for 3.5.
comment:22
scribu — 7 months ago
- Milestone changed from Awaiting Review to Future Release
Considering it's an enhancement, the patch isn't trivial, it doesn't have unit tests yet and that we're 5 weeks away from release, it's not for 3.5.
comment:23
aaroncampbell — 5 months ago
- Cc aaroncampbell added
wonderboymusic — 5 months ago
wonderboymusic — 5 months ago
comment:24
wonderboymusic — 5 months ago
- Keywords needs-refresh removed
- Milestone changed from Future Release to 3.6
21760.diff = Patch refreshed for trunk, passes all Unit Tests.
term-cache-group.diff = Patch for Unit Tests so term_cache_group() works. i recall Ryan hating term_cache_group() - I am very open to suggestions on it, but I like it.
comment:25
husobj — 2 months ago
- Cc ben@… added
comment:26
follow-up:
↓ 27
ryan — 2 months ago
I don't mind the function. I mind changing cache locations. There is too much plugin code that directly accesses the current cache locations for both builtin and custom taxonomies.
list_terms_exclusions shouldn't be deprecated. I see too many plugins using it.
comment:27
in reply to:
↑ 26
nacin — 7 weeks ago
Replying to ryan:
I don't mind the function. I mind changing cache locations. There is too much plugin code that directly accesses the current cache locations for both builtin and custom taxonomies.
At that point, the function isn't needed. But, would be nice to namespace these as much as possible (and thus keep the function)? The only existing buckets are $taxonomy and {$taxonomy}_relationships. Maybe new buckets can all be taxonomy:$taxonomy:$group. I'd personally love to escape the generic $taxonomy bucket, at least for custom taxonomies, as the problem there is gonna continue to exist — #11531 — but I'll take what I can get.
comment:28
nacin — 5 weeks ago
- Milestone changed from 3.6 to Future Release
This is cool but let's review it early 3.7.
comment:29
johnjamesjacoby — 3 weeks ago
Late to this party, but already wrote up a patch based on a function we're using for WordPress.com VIP's.
Patch also cleans up get_term()
johnjamesjacoby — 3 weeks ago
comment:30
ethitter — 3 weeks ago
- Cc erick@… added

Cache keys are cast to strings. That means there would be a collision of the term_id of 1, the slug of '1', and the name of '1'. Even without the casting woes, slug and name would conflict.
For users, we handle this with separate cache buckets: users (by ID), userlogins, useremail, and userslugs.
This can just as easily be handled in the same bucket with prefixes - IDs are stored as is, login-$login, email-$email, slug-$slug. Or for terms, you'd have IDs, slug-$slug, name-$name.
update_term_cache() should set slug and name cache keys, in addition to ID, and that should then be leveraged by get_term_by() and friends.