Make WordPress Core

Opened 12 years ago

Closed 7 years ago

Last modified 6 years ago

#21760 closed enhancement (fixed)

get_term_by() calls are not cached

Reported by: wonderboymusic's profile wonderboymusic Owned by: ocean90's profile ocean90
Milestone: 4.8 Priority: normal
Severity: major Version: 2.3
Component: Taxonomy Keywords: has-patch has-unit-tests
Focuses: performance Cc:

Description (last modified by wonderboymusic)

get_term() is the simplest way to retrieve one term, but it requires term_id and taxonomy. Because of this, terms are cached with term_id as key and $taxonomy as bucket. As a result, you can't easily grab a term by slug, unless you use get_term_by( 'slug' ). get_term_by( 'slug' ) and get_term_by( 'name' ) don't even have a query cache, so they go to the database every time. Because you can't get a term by slug without hitting the db, every place you want to get a term by slug: you first have to transform it into a term_id where it will then be cached. This is inefficient because the user may query by slug constantly and never by term_id.

Attachments (31)

get-term-mixed.diff (11.5 KB) - added by wonderboymusic 12 years ago.
tests-posts-tax-cache-group.diff (1.3 KB) - added by wonderboymusic 12 years ago.
term-cache.diff (13.0 KB) - added by wonderboymusic 12 years ago.
term-cache-group.diff (549 bytes) - added by wonderboymusic 11 years ago.
21760.diff (10.8 KB) - added by wonderboymusic 11 years ago.
21760.2.diff (9.8 KB) - added by ryan 11 years ago.
Refresh patch, don't deprecate list_terms_exclusions, back compat cache keys
21760.patch (7.7 KB) - added by johnjamesjacoby 11 years ago.
21760.3.diff (2.6 KB) - added by wonderboymusic 10 years ago.
21760.4.diff (3.0 KB) - added by wonderboymusic 10 years ago.
21760.5.diff (5.2 KB) - added by wonderboymusic 10 years ago.
21760.6.diff (5.0 KB) - added by wonderboymusic 10 years ago.
21760.7.diff (6.8 KB) - added by wonderboymusic 10 years ago.
21760.8.diff (6.9 KB) - added by tollmanz 10 years ago.
s/$bucket/$group and hash name keys
21760.9.diff (7.0 KB) - added by tollmanz 10 years ago.
Add notion of forcing the incrementor update
21760.10.diff (9.2 KB) - added by tollmanz 10 years ago.
21760.2.patch (10.5 KB) - added by boonebgorges 9 years ago.
21760.11.diff (2.2 KB) - added by nacin 9 years ago.
I don't see how this is any less DRY. It is certainly more obvious as to what is happening.
21760.12.diff (5.1 KB) - added by tollmanz 9 years ago.
21760.13.diff (4.1 KB) - added by tollmanz 9 years ago.
21760.14.diff (10.0 KB) - added by boonebgorges 9 years ago.
21760.3.patch (1.1 KB) - added by spacedmonkey 8 years ago.
21760.4.patch (2.2 KB) - added by spacedmonkey 8 years ago.
21760.5.patch (9.9 KB) - added by ocean90 7 years ago.
21760.6.patch (9.9 KB) - added by johnjamesjacoby 7 years ago.
Same as .5 but swaps 'hide_empty' for 'get' => 'all'
21760.6.1.patch (9.3 KB) - added by ocean90 7 years ago.
Same as .6, but removes my debug cruft from .5
21760.7.patch (10.1 KB) - added by ocean90 7 years ago.
21760.8.patch (11.5 KB) - added by ocean90 7 years ago.
21760.9.patch (12.2 KB) - added by ocean90 7 years ago.
21760.10.patch (3.2 KB) - added by ocean90 7 years ago.
21760.11.patch (560 bytes) - added by sstoqnov 7 years ago.
Return false if the term value is empty
21760.12.patch (4.7 KB) - added by sstoqnov 7 years ago.

Download all attachments as: .zip

Change History (160)

#1 @nacin
12 years ago

Cache keys are cast to strings. That means there would be a collision of the term_id of 1, the slug of '1', and the name of '1'. Even without the casting woes, slug and name would conflict.

For users, we handle this with separate cache buckets: users (by ID), userlogins, useremail, and userslugs.

This can just as easily be handled in the same bucket with prefixes - IDs are stored as is, login-$login, email-$email, slug-$slug. Or for terms, you'd have IDs, slug-$slug, name-$name.

update_term_cache() should set slug and name cache keys, in addition to ID, and that should then be leveraged by get_term_by() and friends.

#2 @nacin
12 years ago

See also #11531.

We should probably rename these buckets. Rather than $taxonomy and {$taxonomy}_relationships, we should use something that isn't valid in taxonomy/CPT name. taxonomy:$taxonomy and relationships:$taxonomy seem sufficient. These are the only buckets I know of that have a dynamic component to them.

#3 @wonderboymusic
12 years ago

Per chat with Nacin - gonna work on separate cache buckets for each type and will resurrect get_term_by() which can leverage them. Also, get_term() would have a hard time distinguishing between id, name, and slug for the category named, slugged, term_id'd "1" etc

#4 @scribu
12 years ago

  • Keywords needs-patch added; has-patch removed

#5 @scribu
12 years ago

  • Summary changed from Allow get_term() to accept string as first argument to get_term_by() calls are not cached

#6 @dd32
12 years ago

FWIW, by overloading get_term( $id ... to accept int|string|object, it brings up conditions where

  • Slug is numeric
  • a term_id is passed in, but return term_ids as strings in some locations, so you can't rely upon type being an int for term_ids

#7 @wonderboymusic
12 years ago

Attached a new patch:

1) get_term_by() calls are cached
2) term_cache_group() abstracts cache bucket for terms
3) update_term_cache() adds to cache of all 3 buckets
4) delete_term_cache() deletes cache of all 3 buckets
5) get_terms() calls update_term_cache(), deprecates the buggy list_terms_excusions in favor of get_terms filter
6) Lots of cleanup and streamlining of clean_term_cache() calls

#8 @scribu
12 years ago

  • Keywords has-patch needs-unit-tests added; needs-patch removed

This looks like an awesome patch, but it's tiring to read, due to all the whitespace fixes. I'll try again in the morning.

In related news, the test suite doesn't seem to have a single call to get_term_by( 'slug', ... );

#9 @scribu
12 years ago

Also, the patch seems to fail an existing test:

There was 1 failure:

1) Tests_Term::test_set_object_terms_by_name
Failed asserting that '1' matches expected 5.

/home/scribu/svn/wp-tests/tests/term.php:135

Finally, we get some use out of these darn things. :)

#10 @wonderboymusic
12 years ago

Updated the patch - the problem was a $wpdb->prepare thing in the IN query in clean_term_cache(). All tests pass now.

WITH ONE CAVEAT:
tests-posts-tax-cache-group.diff needs to be applied to Unit Tests so that the Post tests read from the proper term cache group

#11 @wonderboymusic
12 years ago

  • Description modified (diff)

#12 @scribu
12 years ago

Right, except term_cache_group() is introduced in your main patch. :)

In this case, just add additional test cases to tests-posts-tax-cache-group.diff

#13 @scribu
12 years ago

Also, I'm not sure term_cache_group() should be a public function; maybe rename it to _term_cache_group() and mark it as @private.

#14 @ryan
12 years ago

Instead of caching multiple copies of an object, we usually cache IDs that can be used to look up a single cached copy. slug -> ID, ID -> object.

I'm not really digging term_cache_group(). I consider cache keys and groups public API that shouldn't change.

#15 @nacin
12 years ago

Some thoughts:

  • The code in delete_term_cache() seems like it belongs in clean_term_cache(). (This code is similarly in clean_user_cache().)
  • Not *entirely* sure that term_cache_group() is necessary. It only gets called in get_term(), get_term_by(), get_terms(), and a few relationship functions. Doing "taxonomy:$taxonomy:slugs" etc. seems pretty safe, and not unreasonably more complicated (or longer) than _term_cache_group(). Though... we might need to special-case the core taxonomies to avoid renaming those buckets, as ryan pointed out.
  • With user login, slug, and email buckets, we don't cache the user object, but rather, just the ID. That way, we can then consult the ID-based bucket for the object. We should do the same here. See WP_User::get_data_by() for how that works.
  • Removing list_terms_exclusions should probably come in the separate ticket, #21267.
  • In get_cat_ID(), you changed if ( $cat ) to an is_object() check. Is that necessary? Will get_term_by() ever turn a value that evaluates to true that is *not* an object?
  • I had to do an svn diff -x --ignore-all-space to be able to review the patch effectively. Whitespace changes, especially in functions not quite touched by the patch, make things a bit more difficult to review.

#16 @wonderboymusic
12 years ago

Updated patch to only cache IDs for slug and name - sorry about the whitespace, some of the term functions are esoteric and hard to read without some daylight. I can revert those changes in a subsequent .diff

My reasoning for term_cache_group was 1) standardizing group generation 2) flexibility in making changes in the future, only has to be done in one place. If this is deemed unnecessary, is it to retain the old groups?

#17 @wonderboymusic
12 years ago

Another piece of this - get_terms calls update_term_cache if you got all the fields, regardless of checking the hide_empty arg.

$terms = $wpdb->get_results( $query );
if ( 'all' == $fields ) {
	update_term_cache( $terms );
}

I did a performance test where I created 20,000 tags and called get_terms( 'post_tag', array( 'hide_empty' => false ). This will create 60,000 cache entries which exhausted memory until I bumped up to 256M. I tested posts with 1000 tags and they seemed fine, but this one may need to exclude terms with count of 0 before calling update_term_cache with every returned term

#18 @wonderboymusic
12 years ago

updated the patch:

1) removed all of my whitespace fixes (that was a gift from heaven, let's just say I learned my lesson)
2) restored all return-by-reference things
3) moved delete_term_cache code to clean_term_cache

The patch is WAY more light now - next up is removing term_cache_group as per Boren's feedback

#19 @nacin
12 years ago

Why were the return-by-references added back? I wonder if we still need them anymore.

We might need to keep term_cache_group() as we'll need to keep the standard category, post_tag, and link_category buckets, but we should be able to get away with prefixing the buckets of any custom taxonomy.

#20 @wonderboymusic
12 years ago

return-by-references need to die everywhere, they are PHP4 fluff. I just added them back because I was trying to tighten to scope of my changes. The only scenario where they make sense to use is $a =& $b, not $a =& my_function();

Patch has been updated with ret-by-refs removed.

"we'll need to keep the standard category, post_tag, and link_category buckets" - I took a stab at this, does it accomplish what you want?

#21 @bpetty
11 years ago

  • Keywords needs-refresh added

I don't feel very comfortable with refreshing this patch myself with the latest changes, but I'll be happy to jump back in and do some testing if someone else can jump on this quickly for 3.5.

#22 @scribu
11 years ago

  • Milestone changed from Awaiting Review to Future Release

Considering it's an enhancement, the patch isn't trivial, it doesn't have unit tests yet and that we're 5 weeks away from release, it's not for 3.5.

#23 @aaroncampbell
11 years ago

  • Cc aaroncampbell added

#24 @wonderboymusic
11 years ago

  • Keywords needs-refresh removed
  • Milestone changed from Future Release to 3.6

21760.diff Patch refreshed for trunk, passes all Unit Tests.

term-cache-group.diff Patch for Unit Tests so term_cache_group() works. i recall Ryan hating term_cache_group() - I am very open to suggestions on it, but I like it.

Last edited 11 years ago by wonderboymusic (previous) (diff)

#25 @husobj
11 years ago

  • Cc ben@… added

#26 follow-up: @ryan
11 years ago

I don't mind the function. I mind changing cache locations. There is too much plugin code that directly accesses the current cache locations for both builtin and custom taxonomies.

list_terms_exclusions shouldn't be deprecated. I see too many plugins using it.

@ryan
11 years ago

Refresh patch, don't deprecate list_terms_exclusions, back compat cache keys

#27 in reply to: ↑ 26 @nacin
11 years ago

Replying to ryan:

I don't mind the function. I mind changing cache locations. There is too much plugin code that directly accesses the current cache locations for both builtin and custom taxonomies.

At that point, the function isn't needed. But, would be nice to namespace these as much as possible (and thus keep the function)? The only existing buckets are $taxonomy and {$taxonomy}_relationships. Maybe new buckets can all be taxonomy:$taxonomy:$group. I'd personally love to escape the generic $taxonomy bucket, at least for custom taxonomies, as the problem there is gonna continue to exist — #11531 — but I'll take what I can get.

#28 @nacin
11 years ago

  • Milestone changed from 3.6 to Future Release

This is cool but let's review it early 3.7.

#29 @johnjamesjacoby
11 years ago

Late to this party, but already wrote up a patch based on a function we're using for WordPress.com VIP's.

Patch also cleans up get_term()

#30 @ethitter
11 years ago

  • Cc erick@… added

#31 @wonderboymusic
11 years ago

  • Owner set to johnjamesjacoby
  • Status changed from new to assigned

Assigning to JJJ to investigate his patch

#32 @Chouby
11 years ago

  • Cc frederic.demarle@… added

#33 @nacin
10 years ago

Can someone step up to opine on this? How are we doing for unit tests in this area? Would be good to make sure there are some tests to assert there is no cache pollution.

#34 @nacin
10 years ago

#11531 was marked as a duplicate.

#35 @wonderboymusic
10 years ago

  • Keywords needs-refresh added
  • Milestone changed from Future Release to 4.0
  • Owner changed from johnjamesjacoby to wonderboymusic

this should definitely be tackled as part of our Taxonomy work in 4.0 - I will take another stab at this

#36 @helen
10 years ago

wonderboymusic - speak/unit test now or punt from 4.0, leaving it to you.

#37 @wonderboymusic
10 years ago

I will either look at this with gusto in the next few days or punt with gusto

#38 @wonderboymusic
10 years ago

  • Keywords needs-refresh removed

I'm gonna do some profiling/testing, but 21760.3.diff should do it for get_term_by( 'slug', ... ).

#39 @wonderboymusic
10 years ago

21760.4.diff is better.

Unit test query counts

Install (before patch): 403
Install (after patch): 384

Test suite (before patch): 208474
Test suite (after patch): 208257

#40 @wonderboymusic
10 years ago

  • Keywords needs-unit-tests removed

21760.5.diff adds unit tests

#41 @wonderboymusic
10 years ago

21760.6.diff is a simpler approach and passively invalidates slugs via an increment on the bucket. In my previous attempts, clean_term_cache() ended up producing 1000-2000 more database queries when running the entire test suite. No bueno.

#42 @wonderboymusic
10 years ago

After adding entries for slug AND name in 21760.7.diff:

Test suite (after patch): 208,212 (-262 database calls)

The homepage of my test install makes 6 less queries.

Added unit tests for get_term_by( 'name', .... )

#43 @wonderboymusic
10 years ago

  • Keywords commit added

I like this, but I will throw some cold water on my face and wait

#44 @tollmanz
10 years ago

Nice work! I love the idea of wp_get_last_changed(). An incrementor API like this would be most excellent in core.

A couple thoughts on the patch:

  • Line 980 of src/wp-includes/taxonomy.php should use $term->term_id as the key, not the $term object.
  • $bucket is not a term used in core (other than fleetingly in a few comments). I think for clarity and consistency, we should stick with the $group nomenclature.
  • The keys for the term names will prove problematic. Whitespace is not allowed in Memcached keys. Additionally, these keys can only be a grand total of 250 characters in Memcached (and I'm assuming there are other limits in other object caches). These should be hashed or sanitized in someway before being used as keys.
  • Playing devil's advocate, if a term is not found in get_term_by(), should that result still be cached to avoid the redundant look up in the future? I think that would be nice, but complicates checking for the "non-existent" values, as well as the invalidation of that cache.
  • I don't think update_term_cache() will work. 1) Using wp_cache_add() will fail to save new data as it is already set. A quick and dirty fix for this is to use wp_cache_set(). It'll guarantee new data; however, there is a slight performance penalty. 2) Since we are using incrementors, we should...well...use them. We should generate a new incrementor and save the new values with the new incrementor. This causes a cache miss on the old incrementor and saves the new one. This could be accomplished by adding a $force parameter to wp_get_last_changed(). If set to true, it would not pull the incrementor from cache and instead regenerate it and save over the old one. Alternatively, it could just erase the old one.

Finally, as excited as I am about the incrementor idea, I do have some concerns. Incrementors work great with caches that use an LRU eviction policy (e.g., Memcached). You can fairly carelessly add incrementors to cause cache misses/invalidations. This works well because the old incrementors are no longer accessed and eventually fall off the slab. With some caching systems, the eviction policy is a bit problematic for an incrementor system (e.g., APC fills the cache until full, them dumps everything). We need to be really careful about how we do this to make sure it will consistently work across object caches. Ideally, when we change the incrementor, we delete the old one or overwrite it.

I think we need to be very careful as we implement this. An incrementor system can be extremely powerful, yet also really dangerous. Perhaps we should start a separate ticket to explore this? If we were to implement this in core, it would be great to have a more robust system for handling this with methods for setting, getting, and invalidating incrementors.

Last edited 10 years ago by tollmanz (previous) (diff)

#45 @tollmanz
10 years ago

Scratch my comment about using $term->term_id instead of $term. I see that $term is changed from the object into the term ID earlier in the code.

@tollmanz
10 years ago

s/$bucket/$group and hash name keys

@tollmanz
10 years ago

Add notion of forcing the incrementor update

#46 @tollmanz
10 years ago

21760.8.diff and 21760.9.diff change the $bucket nomenclature to $group, hash the term name key, and add a way to invalidate the incrementor. The first two changes are in .8 and all three are in .9.

#47 @rmccue
10 years ago

Psst, while we're in the term caching arena, #28743 (especially if we want to hit clean_term_cache anyway).

@tollmanz
10 years ago

#48 @tollmanz
10 years ago

21760.10.diff iterates on this patch by adding /updating the following:

  • I found a slight bug in the get_term_by() adds the $term object to the cache. The values were cached before the $term object was run through the sanitization function. The sanitization function converts ints to strings. Therefore, a term grabbed from the cache would be different than a term grabbed directly from MySQL since the term grabbed from cache would never be sanitized. I updated two of the unit tests to specifically test for this.
  • I added suspended cached invalidation awareness to address #28743.
  • A unit test was added to test #28743.

Thoughts/questions:

  • I'm starting to think that we shouldn't add the if ( $cache ) conditional. If the value is cached, the filters and sanitization functions are completely missed. I can see an argument that the terms will have already been run through the filters and sanitization routines; however, this would cause problems if a user, using a persistent cache, installs a plugin that should further manipulate these terms. The user would need to wait until the next cache invalidation for it to take effect. These things cause devs to go batty.
  • We should probably add suspended cached invalidation awareness to update_term_cache(), right? I looked at how this is handled in update_post_cache() and it is not aware of suspended cached invalidation; however, that is because the function is not an invalidation function. It will add items to cache, but does not overwrite it or delete cached items. It will never do this. update_term_cache() operates differently. It nukes the cached items and forces an update. As such, it should respect suspended cached invalidation, right?

#49 @wonderboymusic
10 years ago

tollmanz: good stuff, waiting for feedback from @nacin

This ticket was mentioned in IRC in #wordpress-dev by DrewAPicture. View the logs.


10 years ago

#51 @ocean90
10 years ago

  • Keywords 4.1-early added
  • Milestone changed from 4.0 to Future Release

#52 @boonebgorges
9 years ago

  • Keywords commit 4.1-early removed
  • Milestone changed from Future Release to 4.1

21760.2.patch does the following:

  • refresh for current trunk (whitespace was preventing from applying)
  • I like tollmanz's suggestion https://core.trac.wordpress.org/ticket/21760#comment:48 that cached values get returned through the filters, so it's implemented here
  • I also think he's correct that cache invalidation preferences should be respected in update_term_cache(), so it's in there too
  • The change in update_term_cache() that forces 'last_changed' to be incremented caused a failing test related to get_terms(). It turns out that get_terms() is calling update_term_cache() whenever you call it with 'fields=all'. This looks like a bug to me, since the terms are properly cached after the queries are run later in the function (and that bit of code goes back to Ye Olden Days of WP). I've removed it.

With some caching systems, the eviction policy is a bit problematic for an incrementor system (e.g., APC fills the cache until full, them dumps everything). We need to be really careful about how we do this to make sure it will consistently work across object caches. Ideally, when we change the incrementor, we delete the old one or overwrite it.

FWIW, I think this is only the default behavior. I'm pretty sure that if you set a non-zero ttl, only expired items are dumped when the cache fills. More generally, while I totally agree that we should constantly be mindful of the realities of various caching systems, it's also reasonable to expect that the people implementing those systems will be cognizant of the requirements of the particular backend they've chosen, and will configure it accordingly.

#53 @wonderboymusic
9 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 29915:

Cache get_term_by() calls:

  • Add a helper function, wp_get_last_changed(), to retrieve a last-modified timestamp by cache group
  • When caching a term, also make cache entries for slug and name via slug:{$term_id} and name:{$term_id} keys in the $taxonomy:$last_changed bucket that reference the term_id
  • In clean_term_cache() and update_term_cache(), respect $_wp_suspend_cache_invalidation
  • Original term cache entries maintain BC

Adds unit tests.

Props wonderboymusic, tollmanz, boonebgorges.
Fixes #21760.

#54 @wonderboymusic
9 years ago

My description of those cache keys was wrong....

slug:{$term->slug} and name:md5( $term->name )

Last edited 9 years ago by wonderboymusic (previous) (diff)

#55 @nacin
9 years ago

This looks great, but I'm reopening for some potential tweaks:

  • wp_get_last_changed() is really weird. I am not against helpers but in this case it seems to do too many things and is generally confusing to see. It gets the value but sometimes sets the value? The name is wrong. As it stands I'd eliminate it.
  • This suddenly adds new and arbitrary groups. That's not really good. We should have individual buckets for slugs and names (as done in users), potentially with generic names (like 'termslugs' and 'termnames' ).
  • Why the md5? Key length? Whitespace?

#56 @nacin
9 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

#57 follow-up: @wonderboymusic
9 years ago

Why the md5? Key length? Whitespace?

Whitespace is not allowed in Memcached keys

wp_get_last_changed() is mostly for DRY - open to suggestions on that one, I couldn't think of a better name

#58 in reply to: ↑ 57 @nacin
9 years ago

Replying to wonderboymusic:

Why the md5? Key length? Whitespace?

Whitespace is not allowed in Memcached keys

This isn't our problem, per [27300]. Plenty of other areas of core could be affected by this. For example, who is to say a slug can't have a space?

@nacin
9 years ago

I don't see how this is any less DRY. It is certainly more obvious as to what is happening.

#59 @boonebgorges
9 years ago

#28743 was marked as a duplicate.

@tollmanz
9 years ago

#60 @tollmanz
9 years ago

21760.12.diff combines the changes @nacin made with his other suggestions, which include:

  • Removing md5 hashes (seems to be consistent with how similar strings are handled in WP)
  • Attempts to use a more sane grouping convention. It borrows from the user caching and adds the incrementor to the key
  • I also noticed that the incrementor was only used with the name and slug caches. I added the incrementor to the term object cache as well, but perhaps I am missing some reason why this isn't a good idea.


#61 @wonderboymusic
9 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 30073:

Adjust caching for get_term_by() calls:

  • Remove md5 hashes for term name cache keys
  • Remove the namespace for the keys for names and slugs and add them to the group names
  • Remove wp_get_last_changed(), which @nacin hated


Props tollmanz.
Fixes #21760.

#62 @boonebgorges
9 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

This has broken many things. $ phpunit --group taxonomy

#63 @boonebgorges
9 years ago

In 30080:

Remove call to wp_get_last_changed() from unit tests.

This function was removed in [30073].

See #21760.

#64 @boonebgorges
9 years ago

[30080] remove the fatal error so that at least the tests can continue to run. A number are still failing.

#65 @tollmanz
9 years ago

I ran the whole unit test suite before adding these changes because I was worried about such breakage. I saw no failures, but I think this was because I was running them wrong. When running phpunit --group taxonomy, does it run different tests than what is run with phpunit? I know it will run a smaller subset, but wondering if it also includes ones that wouldn't be run with phpunit alone.

#66 @boonebgorges
9 years ago

Ah, it probably skipped these tests because the ticket was open. $ phpunit --group 21760 should do it.

@tollmanz
9 years ago

#67 @tollmanz
9 years ago

21760.13.diff adds the cache group change to one more place that was missed in the previous patch. Additionally, it updates the unit tests to use the cache group change that is used in the update. The update allows the tests pass for this ticket and for the tests in general.

Time to riot!

#68 @boonebgorges
9 years ago

Let the rioting commence.

#69 @boonebgorges
9 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 30108:

Clean up get_term_by() caching.

  • Fix cache key/group modification that was missed in [30073].
  • Update unit tests to reflect new key/group format.

Props tollmanz.
Fixes #21760.

#70 @boonebgorges
9 years ago

In 30112:

Clean up cache invalidation suspension global in unit tests.

This fixes a test that was introduced in [30073] which was polluting later
tests.

See #21760.

This ticket was mentioned in Slack in #core by jeremyfelt. View the logs.


9 years ago

#72 follow-up: @nacin
9 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

This most likely broke something on .com (whether they know about it yet or not) and in some plugins that mess directly with term caches. But it's not just that.

First, the pedantic stuff I've previously covered: We cannot change the group names like this. If we could, we could have abandoned 'post_tag', 'category', and $taxonomy a long time ago. We can likely change $taxonomy to "taxonomy-$taxonomy" or just 'taxonomy' or 'term' (proposed either here or elsewhere).

(I wrote two years ago on this ticket: "we'll need to keep the standard category, post_tag, and link_category buckets, but we should be able to get away with prefixing the buckets of any custom taxonomy.")

Here's how I would expect last_changed to be used:

  • The term 1, name Apple, slug apple, is saved.
  • A term object is cached with key = 1, value = object, group = 'term' (or similar)
  • A term slug is cached with key = apple, value = 1 (ID), group = 'termslugs' (or similar)
  • A term name is cached with key = 'Apple', value = 1 (ID), group = 'termnames' (or similar)
  • The key last_changed for the group 'term' is set to "0.51245500 1418627718" (microtime())
  • get_terms() caches its queries. When doing so, it includes the value of last_changed in its cache key

Later, this term is updated:

  • The caches for 1 (group term), apple (group termslugs), Apple (group termnames) are deleted
  • The new object, slug, and name are set into cache
  • last_changed is bumped, and get_terms()'s cache is invalidated

As it stands now, whenever any term is edited, every term's cache gets completely wiped out. That seems like it's worse than it was previously. What am I missing?

Additionally, what's in trunk now actually makes a lot of code redundant or useless. What's the point of wp_cache_delete($term->term_id, $term->taxonomy); when that's no longer the name of the group, and any change invalidates everything everywhere? Does clean_term_cache() even work anymore? It doesn't look like it does. That's a problem and it's going to break things.

At this point, I think we need to revert this from the branch and try again in 4.2.

This ticket was mentioned in Slack in #core by nacin. View the logs.


9 years ago

#74 in reply to: ↑ 72 @ericmann
9 years ago

Replying to nacin:

This most likely broke something on .com (whether they know about it yet or not) and in some plugins that mess directly with term caches. But it's not just that.

Breaking anything at this stage feels like a sub-optimal situation. I agree with the overall sentiment that we should roll this back, take a step away, and revisit early in 4.2.

Here's how I would expect last_changed to be used: ...

The notes in this workflow actually pan out to some great tests to make sure the code is fairly robust, doesn't break anything, and does exactly what we want it to do. Additionally, we can make sure that the changes to cache don't introduce any breaking changes that affect large installations like .com.

At this point, I think we need to revert this from the branch and try again in 4.2.

Agree. Really wanted to see this ship in 4.1, but it doesn't feel ready.

#75 @boonebgorges
9 years ago

nacin and ericmann - This sounds sensible to me. The workflow nacin describes sounds like a narrower use of the last_changed iterator, which IMO is a good thing - we should only be invalidating with a sledge hammer when we can't do it with a laser beam.

This ticket was mentioned in Slack in #core by boone. View the logs.


9 years ago

#77 @boonebgorges
9 years ago

In 30900:

Remove caching for get_term_by() calls.

The new cache group scheme causes term invalidation to be overly broad, so
that busting the cache for one term will bust the cache for all terms in the
taxonomy. We'll have another go at more focused use of the 'last_changed'
incrementor in a future release.

Reverts [29915], [30073], [30080], [30108], [30112].
See #21760.

#78 @tollmanz
9 years ago

Bummer that we didn't catch this earlier :( Thanks for removing this boonebgorges. Hopefully we can figure out a more reasonable patch for 4.2.

#79 @johnbillion
9 years ago

  • Keywords commit fixed-major added

#80 @johnbillion
9 years ago

In 30914:

Remove caching for get_term_by() calls.

The new cache group scheme causes term invalidation to be overly broad, so
that busting the cache for one term will bust the cache for all terms in the
taxonomy. We'll have another go at more focused use of the 'last_changed'
incrementor in a future release.

Reverts [29915], [30073], [30080], [30108], [30112].

Merges [30900] to the 4.1 branch.

See #21760.

#81 @johnbillion
9 years ago

  • Keywords needs-patch 2nd-opinion added; has-patch commit fixed-major removed
  • Milestone changed from 4.1 to Future Release

#82 @boonebgorges
9 years ago

In 30954:

Update individual term caches in get_terms().

This was removed in [29915] as part of the attempt to add cache support to
get_term_by(). When that support was removed in [30900], it was not properly
restored.

This changeset includes a unit test to verify that the cache is properly primed
for terms found in get_terms(), as well as tests to verify the other cache
setting that was touched by [30900].

Fixes #30749. See #21760.

#83 @nacin
9 years ago

In 30959:

Update individual term caches in get_terms().

Merges [30954] to the 4.1 branch.

props boonebgorges.
fixes #30749. see #21760.

This ticket was mentioned in Slack in #core by ericmann. View the logs.


9 years ago

#85 @dd32
9 years ago

#27572 was marked as a duplicate.

This ticket was mentioned in Slack in #core by boone. View the logs.


8 years ago

#87 @boonebgorges
8 years ago

Brainstorm: Why not make get_term_by() a wrapper for get_terms()? The latter function has 'name' and 'slug' params, and 'term_id' and 'term_taxonomy_id' would be trivial to add. More importantly, get_terms() queries are already fairly heavily cached.

The biggest concern would be performance. get_terms() adds lots of CPU cycles, its cache strategy is perhaps less efficient (parameter hashes as keys), and its SQL queries may not be as optimized. That being said, it may be worth a small performance hit on uncached queries in exchange for cache coverage of get_term_by(), as well as the increase in DRYness.

#88 @boonebgorges
8 years ago

#36950 was marked as a duplicate.

#90 @spacedmonkey
8 years ago

  • Focuses performance added
  • Keywords has-patch needs-unit-tests added; needs-patch removed

The latest patch that I added is pretty simple. It creates the caches when first requesting get_term_by. Invalidation happens when the term cache clean is called. I think this is a solvable problem...

#91 @spacedmonkey
8 years ago

Decided to go another direction with this patch. It now uses get_terms which has has pretty good caching in it. It makes the code pretty easy to read and highly maintainable.

This ticket was mentioned in Slack in #core-multisite by spacedmonkey. View the logs.


8 years ago

This ticket was mentioned in Slack in #core by boone. View the logs.


8 years ago

#94 @ocean90
7 years ago

@pento and I are looking into #meta1829. We noticed that o2 does several uncached term queries. Some of them could be reduced by caching term calls statically. The others are caused by get_term_by().

The latest patch 21760.4.patch by @spacedmonkey looks sane to me. It makes get_term_by() a wrapper for get_term() which uses WP_Term_Query and its query cache. @boonebgorges suggested this idea in comment:87 and I think it's worth a try. I applied the patch on my dotorg sandbox and the number of queries went down by ~50%.

Looking at the current test coverage, Tests_Term_GetTermBy has a few tests but I think we should extend them, especially with tests for slashed data and something for sanitize_title().

@boonebgorges What do you think? Happy to own this for 4.7.

#95 @pento
7 years ago

It looks like the tests reverted in [30900] will provide additional coverage, with some tweaking. (Different cache keys, terms are now WP_Terms instead of stdClasses.)

Last edited 7 years ago by pento (previous) (diff)

#96 @boonebgorges
7 years ago

  • Keywords 2nd-opinion removed
  • Milestone changed from Future Release to 4.7
  • Owner changed from wonderboymusic to ocean90
  • Status changed from reopened to assigned

@ocean90 Thumbs up from me.

@pento I'd prefer to see those tests rewritten so that they don't directly reach into the cache - as you note, cache keys etc are very brittle. Butyes, many of them can be repurposed. Tests for invalidation should just call get_term_by() a second time:

$before = get_term_by( 'slug', 'foo' );
wp_update_term( $before->term_id, array(
    'name' => 'Bar',
) );
$after = get_term_by( 'slug', 'foo' );
$this->assertSame( 'Bar', $after->name );

#97 @ocean90
7 years ago

I'm using this pull request to continue working on the tests.
21760.4.patch has a bug for term_taxonomy_id lookups because the taxonomy argument isn't unset.

@ocean90
7 years ago

#98 follow-up: @ocean90
7 years ago

This is a before/after overview for the SQL queries with 21760.8.patch:

get_term_by( 'slug' )

Before:

SELECT t.*, tt.* FROM wptests_terms AS t INNER JOIN wptests_term_taxonomy AS tt ON t.term_id = tt.term_id WHERE t.slug = 'foo' AND tt.taxonomy = 'category' LIMIT 1

After:

SELECT t.*, tt.* FROM wptests_terms AS t INNER JOIN wptests_term_taxonomy AS tt ON t.term_id = tt.term_id WHERE tt.taxonomy IN ('category') AND t.slug = 'foo' LIMIT 1


get_term_by( 'name' )

Before:

SELECT t.*, tt.* FROM wptests_terms AS t INNER JOIN wptests_term_taxonomy AS tt ON t.term_id = tt.term_id WHERE t.name = 'Foo' AND tt.taxonomy = 'category' LIMIT 1

After:

SELECT t.*, tt.* FROM wptests_terms AS t INNER JOIN wptests_term_taxonomy AS tt ON t.term_id = tt.term_id WHERE tt.taxonomy IN ('category') AND t.name IN ('Foo') LIMIT 1


get_term_by( 'id' )

Before:

SELECT t.*, tt.* FROM wptests_terms AS t INNER JOIN wptests_term_taxonomy AS tt ON t.term_id = tt.term_id WHERE t.term_id = 4"

After:

SELECT t.*, tt.* FROM wptests_terms AS t INNER JOIN wptests_term_taxonomy AS tt ON t.term_id = tt.term_id WHERE t.term_id = 4"


get_term_by( 'term_taxonomy_id' )

Before:

SELECT t.*, tt.* FROM wptests_terms AS t INNER JOIN wptests_term_taxonomy AS tt ON t.term_id = tt.term_id WHERE tt.term_taxonomy_id = '5' LIMIT 1"

After:

SELECT t.*, tt.* FROM wptests_terms AS t INNER JOIN wptests_term_taxonomy AS tt ON t.term_id = tt.term_id WHERE tt.term_taxonomy_id = 5 LIMIT 1

Last edited 7 years ago by ocean90 (previous) (diff)

@johnjamesjacoby
7 years ago

Same as .5 but swaps 'hide_empty' for 'get' => 'all'

#99 @johnjamesjacoby
7 years ago

21760.6.patch uses 'get' => 'all' for a few reasons:

  • Will return terms without posts
  • Ignores the term hierarchy entirely (both parents & children)
  • Never pads counts, preventing unintended calls to _pad_term_counts()

These will bring back the LIMIT query clause lost in .5.patch.

@ocean90
7 years ago

Same as .6, but removes my debug cruft from .5

#100 follow-up: @ocean90
7 years ago

Thanks @johnjamesjacoby, updated the overview. Going to work on some tests for that now.

#101 in reply to: ↑ 100 @johnjamesjacoby
7 years ago

Replying to ocean90:

Thanks @johnjamesjacoby, updated the overview. Going to work on some tests for that now.

Thanks for doing the bulk of the work. :)

A few things to add, in regards to @ocean90's query analysis:

  • wp_terms.name is indexed, so the additional ORDER BY t.name bit will not negatively impact performance, even though it's not necessary with a LIMIT 1
  • MySQL/MariaDB internally optimizes tt.taxonomy IN ('category') as tt.taxonomy = 'category', so this change will not negatively impact performance either (same with t.name IN ('Foo')).
Last edited 7 years ago by johnjamesjacoby (previous) (diff)

@ocean90
7 years ago

#103 in reply to: ↑ 98 @ocean90
7 years ago

get_term_by( 'id' )

Before:

SELECT t.*, tt.* FROM wptests_terms AS t INNER JOIN wptests_term_taxonomy AS tt ON t.term_id = tt.term_id WHERE t.term_id = 4"

After:

SELECT t.*, tt.* FROM wptests_terms AS t INNER JOIN wptests_term_taxonomy AS tt ON t.term_id = tt.term_id WHERE tt.taxonomy IN ('category') AND t.term_id IN ( 4 ) LIMIT 1

The after query includes a check for the taxonomy which the before query doesn't have.

Before patch: Tests_Term_GetTermBy->test_get_term_by_id, get_term_by, get_term, WP_Term::get_instance
With 21760.7.patch: Tests_Term_GetTermBy->test_get_term_by_id, get_term_by, get_terms, WP_Term_Query->query, WP_Term_Query->get_terms

If someone passes an ID to get_term_by() we can continue short-circuiting the function and use get_term() directly like before. Done in this commit.

@ocean90
7 years ago

#104 @ocean90
7 years ago

  • Keywords needs-testing has-unit-tests added; needs-unit-tests removed

21760.8.patch addresses the feedback from @boonebgorges on the PR and comment:103.

@ocean90
7 years ago

#106 @ocean90
7 years ago

  • Keywords commit added

I did a small benchmark test with this script. These are the results after 100 iterations:

Total

Time Queries Memory
Before 0.4824721813 1900 2.751159668
After (no persistent cache) 0.4861509800 400 3.027343750
After (with persistent cache) 0.4600477219 4 3.149253845

Difference:

Time Queries Memory
After (no persistent cache) +0.003678798676 -1500 +0.2761840820
After (with persistent cache) -0.022424459460 -1896 +0.3980941772


Average

Time Queries Memory
Before 0.004824721813 19 0.02751159668
After (no persistent cache) 0.004861509800 4 0.0302734375
After (with persistent cache) 0.004600477219 0.04 0.03149253845

Difference:

Time Queries Memory
After (no persistent cache) +0.00003678798676 -15 +0.002761840820
After (with persistent cache) -0.00022424459460 -18.96 +0.003980941772

Numbers per iteration can be found in this sheet.

Suggesting 21760.9.patch for commit.

#108 @johnjamesjacoby
7 years ago

The dramatic reduction in number of queries is clearly worth the minuscule memory increase, which makes sense considering it's now keeping objects in the cache (persistent or not.)

+1 to 21760.9.patch also.

#109 @ocean90
7 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 38677:

Taxonomy: Use WP_Term_Query in get_term_by().

WP_Term_Query already supports querying terms by 'slug', 'name', and 'term_taxonomy_id'. Its additional arguments allow us to generate nearly the same SQL queries as before.
This change has one yuge benefit: the term queries are now cached.

Add tests to increase coverage of get_term_by().

Props spacedmonkey, boonebgorges, johnjamesjacoby, pento, ocean90.
Fixes #21760.

#110 @pavelevap
7 years ago

Interesting, I had a mistake in my code using get_term_by( 'ID', ... and it worked until 4.6. Now, I had to change ID to id in latest WP 4.7 beta. I am not sure if it si related somehow to this issue...

#111 @vnsavage
7 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened
  • Type changed from enhancement to defect (bug)

@ocean90 - this change can lead to endless loop/stack exhaustion in existing plugins where a filter on get_terms() calls get_term_by() inside. An example of such usage is Jetpack's featured content.

#112 follow-up: @ocean90
7 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed
  • Type changed from defect (bug) to enhancement

@vnsavage Can you please open a new ticket? We can look into it but I'm not sure if we can do anything for this case. Note that this is already fixed in Jetpack. See also https://wordpress.slack.com/archives/core/p1480465372004559.

#113 in reply to: ↑ 112 @barry
7 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

Replying to ocean90:

@vnsavage Can you please open a new ticket? We can look into it but I'm not sure if we can do anything for this case. Note that this is already fixed in Jetpack. See also https://wordpress.slack.com/archives/core/p1480465372004559.

I am not sure why there needs to be a new ticket. This ticket is where the breaking change was committed so it seems best to keep all of the conversation in one place.

I think at least the breaking change part of this commit needs to be reverted. It not only affects every site running the latest release of Jetpack and a supported theme, but also at least the following themes from the .org theme repository:

anderson-lite
courage
dynamic-news-lite
glades
momentous-lite
rubine-lite
smartline-lite

Keep in mind that between auto-updates and host-provided WP core upgrades, lots of sites will be upgraded without user intervention when 4.7 is released. If we knowingly break even a small % of those sites it will severely impact the trust we have built in our so far, pretty awesome, auto-upgrade track record.

Also of note, this triggers a segmentation fault in some php configurations which could impact unrelated requests (if it's able to create memory corruption, etc). Actual impact depends on the specific configuration, but it could be pretty bad.

#114 @dd32
7 years ago

  • Keywords commit removed

Okay, so after looking through this, [38677] has to be reverted from the 4.7 branch, and either something put in place for trunk/4.8 or reverted there too (not immediately though).

Since it's not mentioned, here's a minimal example of the breakage (there are other ways of triggering it, but this is the most direct)

add_filter( 'get_terms', function( $terms ) {

	get_term_by( 'slug', 'foo', 'category' );

	return $terms;
} );

get_term_by( 'slug', 'foobar-ugh', 'category' );

I'll admit that using term functions within term filters seems like a bad idea - something is going to break eventually - unfortunately though, it's worked, and filtering terms has become a very popular thing to do.

There's options for fixing this, the most obvious one is adding an $arg to get_terms() to skip running the get_terms filter - such as $args['suppress_filters'], which would also need to apply to WP_Term_Query IMHO. Doing that feels hacky though.

#115 @pento
7 years ago

+1 to revert in 4.7, add a fix in 4.8. suppress_filters at least matches what folks are used to in WP_Query, but I'm not sure that's a good reason to use it.

#116 @dd32
7 years ago

In 39454:

Revert [38677] from the 4.7 branch.
This avoids fatal errors caused with recursive calling of term functions within the get_terms filter.

See #21760.

#117 @dd32
7 years ago

  • Keywords early needs-patch added; has-patch needs-testing has-unit-tests removed
  • Milestone changed from 4.7 to 4.8

I missed it in the revert, but I retained the tests in tests/phpunit/tests/term/getTermBy.php as they passed before and after, the tests in tests/phpunit/tests/term/cache.php were removed as the function is back to having no caching (4.6 behaviour).

#118 @dd32
7 years ago

Here's some code in one of the themes that triggers it: https://themes.trac.wordpress.org/browser/dynamic-news-lite/1.4.7/inc/featured-content.php#L52

@ocean90 @johnjamesjacoby @boonebgorges I'd appreciate if one of you could look into this ticket and fix it in trunk sooner rather than later, else I'll be forced to revert this from trunk too to fix wp-themes.com previews.

#119 @johnjamesjacoby
7 years ago

I will dedicate time on Monday for this. Thanks @dd32 for wrangling.

@ocean90
7 years ago

#120 @ocean90
7 years ago

  • Keywords has-patch has-unit-tests added; early needs-patch removed

21760.10.patch adds a suppress_filter parameter which is only used for get_terms in get_terms().

#121 @dd32
7 years ago

21760.10.patch looks good to me, I was thinking it should be marked internal or something, but I don't see the harm in it - if you pass it, you're doing so for a reason.

@sstoqnov
7 years ago

Return false if the term value is empty

#122 @sstoqnov
7 years ago

  • Keywords needs-refresh needs-unit-tests added; has-unit-tests removed
  • Severity changed from normal to major

Hi everybody.

I found a small bug, when I was trying to fix this issue. https://github.com/Automattic/jetpack/issues/6607
When we add an empty $value, the response is always the tag with the lowest term_id.

I dig a bit more deeper and I found that WP_Term_query->get_terms method checks is the value is empty and doesn't add the where clause if it's ( https://core.trac.wordpress.org/browser/trunk/src/wp-includes/class-wp-term-query.php#L477 )

Here is an example queries:

When we have value

SELECT t.*, tt.* FROM test_terms AS t INNER JOIN test_term_taxonomy AS tt ON t.term_id = tt.term_id WHERE tt.taxonomy IN ('post_tag') AND t.name IN ('test') LIMIT 1

When the value is empty

SELECT t.*, tt.* FROM test_terms AS t INNER JOIN test_term_taxonomy AS tt ON t.term_id = tt.term_id WHERE tt.taxonomy IN ('post_tag') LIMIT 1

As you can see the AND t.name IN ('test') is missing from second request. 21760.11.patch fix this issue.

#123 @boonebgorges
7 years ago

In 40275:

Don't run 'get_terms' filter when querying for terms within get_term_by().

Historically, it has been possible to call get_term_by() within
a 'get_terms' filter callback. Since get_term_by() was refactored
to use get_terms() internally [38677], callbacks of this nature
have resulted in infinite loops.

As a workaround, we introduce a 'suppress_filter' option to get_terms(),
and use it when calling the function from within get_term_by().

Props ocean90.
See #21760.

#124 @boonebgorges
7 years ago

  • Keywords needs-patch added; has-patch needs-refresh removed

@sstoqnov Thank you for the report and for the patch. You're right that this behavior is incorrect.

Prior to the refactor in [38677], get_term_by( 'name', '', 'post_tag' ) would always return false. Not because of a special case, as you suggest in 21760.11.patch, but because it'd result in a query like SELECT ... WHERE name = '', and it's impossible to create a term (through wp_insert_term(), at least) with an empty string as name or slug.

A blanket empty( $value ) check is probably too broad, though. It'd cause failures in cases like the following: a term has the name '0', and you try to fetch it using get_term_by( 'name', '0', 'post_tag' ). This kind of query is successful prior to [38677]. A more precise logic will probably convert 'name' or 'slug' $value to a string, and reject anything with 0 strlen, but this will need some unit tests. @sstoqnov Fancy writing some? :)

This ticket was mentioned in Slack in #meta by sergey. View the logs.


7 years ago

@sstoqnov
7 years ago

#126 @sstoqnov
7 years ago

  • Keywords has-patch has-unit-tests added; needs-unit-tests needs-patch removed

@boonebgorges Thank you for your comment!
The example case is very specific, but I agree that it's possible.

I've uploaded a new patch including unit tests.
Note that I've changed the WP_Term_Querytoo, because get_terms didn't work when I tried to fetch the terms using '0' string for slug.

Feedback is much welcomed :)

#127 @boonebgorges
7 years ago

@sstoqnov Thanks very much for the excellent patch.

I think this wraps things up for the current ticket.

#128 @boonebgorges
7 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 40293:

Improve querying for terms with falsey names and slugs.

Prior to [38677], get_term_by() would always return false if
an empty string were passed as the queried 'name' or 'slug'. The
refactor to use get_terms() broke this behavior; inappropriately
imprecise empty() checks caused the 'name' or 'slug' clause to be
discarded altogether when fetching terms, resulting in an incorrect
term being returned from the function.

We fix the regression by special-casing truly empty values passed
to get_term_by(), and ensuring that WP_Term_Query is properly
able to handle 0 and '0' term queries.

Props sstoqnov.
Fixes #21760.

This ticket was mentioned in Slack in #core by desrosj. View the logs.


6 years ago

Note: See TracTickets for help on using tickets.