WordPress.org

Make WordPress Core

Opened 3 years ago

Closed 3 years ago

Last modified 22 months ago

#32044 closed defect (bug) (fixed)

is_object_in_term() leaks DB queries

Reported by: joehoyle Owned by: boonebgorges
Milestone: 4.4 Priority: normal
Severity: normal Version: 4.1.1
Component: Taxonomy Keywords: needs-patch needs-unit-tests
Focuses: Cc:

Description

is_object_in_term() should either use the Object Cache'd term relationships and populate the cache for next time, or not use them at all. Currently, is_object_in_term() reads from the cache with get_object_term_cache(), and falls back to wp_get_object_terms() - in that case, it doesn't then update the post-term relationship caches for next time.

I'm not sure if is_object_in_term() is meant to respect object cache'd relationships or not - but right now its doing it half way. I don't quite know the history of the post-term relationships not being all that trusted (which is presumably why there is wp_get_obejct_terms() and get_the_terms() however, it seems just using the get_the_terms instead of wp_get_object_terms with have no regressions in this case, and cause the cache to be primed for 'next time'.

Attached patch does just that.

Attachments (1)

32044.diff (657 bytes) - added by joehoyle 3 years ago.

Download all attachments as: .zip

Change History (18)

@joehoyle
3 years ago

#1 @joehoyle
3 years ago

Of not: this also affects has_term and has_post_format as they use this function.

#2 @boonebgorges
3 years ago

  • Component changed from Cache API to Taxonomy
  • Keywords 4.3-early added
  • Milestone changed from Awaiting Review to Future Release

Good catch. Let's look at this for 4.3. It would be nice to have some unit tests describing the proper cache behavior here.

#3 @boonebgorges
3 years ago

  • Keywords needs-patch needs-unit-tests added; 4.3-early removed
  • Milestone changed from Future Release to 4.3

I've looked a little more into this. A couple takeaways:

  • We can't use get_the_terms(). It assumes that the object type is a post type, while is_object_in_term() should be object_type-agnostic.
  • The easiest thing to do is to tear cache support out of the function :)
  • get_object_term_cache() requires object ids and a taxonomy name. update_object_term_cache() requires object IDs and an *object type*. When we're inside of is_object_in_term(), we don't have access to object types, which is why we can't use update_object_term_cache() to prime the cache at this point.
  • This all points to the fact that our cache support for term relationships is pretty lousy. A proper fix would be to add support to wp_get_object_terms(). This will probably require modifying update_object_term_cache() so that you can pass in an array of taxonomies, rather than relying on get_object_taxonomies(). This will be a fair bit of work, but it will result in broader performance improvements. However, the fact that wp_get_object_terms() does not have cache support is semi-famous in the WP world, so I'll have to do some more research into the back story of why that's the case.

I'm going to move this into the 4.3 milestone for further investigation. Worst case, we should at least make is_object_in_term() more consistent. But maybe we can use this as an excuse to fix term-relationship caches for real.

#4 @joehoyle
3 years ago

@boonebgorges thanks for looking into it! I had overlooked the whole object vs post thing... does anyone actually use that?! Looking at things like register_taxonomy_for_object_type it seems a lot of that stuff can only use post types, not other object types. Given that get_object_term_cache's key is $id, $taxonomy I'm not sure how this works if you were to use ID: 1 for two different object types - as they'll be sharing the same cache keys.

I'm going to move this into the 4.3 milestone for further investigation. Worst case, we should at least make is_object_in_term() more consistent. But maybe we can use this as an excuse to fix term-relationship caches for real.

I think we should at least have is_object_in_term update the relationship caches. In my somewhat limited experience of contributing to core, "use this as an excuse to fix a bigger problem" often means this will be left for some time. I'm +1 on making incremental changes to at least make it not cause queries when it doesn't need to.

#5 @boonebgorges
3 years ago

There are lots of plugins that use taxonomies for things other than post types. 'user' is a common one. (I have written a number that use taxonomies for BuddyPress objects.) is_object_in_term() currently works for these cases (except perhaps in edge cases where an object type not in the term shares an ID with a post that is in the term - this needs testing). Updating the general relationship caches in is_object_in_term() is going to cause weird problems if we do it in isolation. If we don't have a general solution for 4.3, then we should remove the cache check in this function altogether.

#6 follow-up: @joehoyle
3 years ago

Ah, sorry missed that we needed the type to call update_object_term_cache. However, I don't see why we can't do wp_cache_add($object_id, $terms, $taxonomy . '_relationships'); it's going to read that exact cache key via is_object_in_term

#7 in reply to: ↑ 6 @boonebgorges
3 years ago

Replying to joehoyle:

Ah, sorry missed that we needed the type to call update_object_term_cache. However, I don't see why we can't do wp_cache_add($object_id, $terms, $taxonomy . '_relationships'); it's going to read that exact cache key via is_object_in_term

The current behavior is '{$taxonomy}_relationships' will only include relationships with post type objects. Updating the cache in is_object_in_term() will change the behavior by adding non-post-type objects. I'm not sure whether or how this change will manifest itself, but surely it'll cause bugs somewhere, so I want to be conservative about modifying what gets cached.

#8 @joehoyle
3 years ago

Got you - so relationship caches are currently only used for object type == post. Maybe I'll write some tests to expose the cache issue with multiple object types with the same ID getting polluted caches.

#9 @obenland
3 years ago

  • Owner set to boonebgorges
  • Status changed from new to assigned

This ticket was mentioned in Slack in #core by jorbin. View the logs.


3 years ago

#11 @jorbin
3 years ago

  • Milestone changed from 4.3 to Future Release

As Boone pointed out, this has the potential for breaking changes. Therefore it, needed to go in before beta so it gets maximum testing. Punting back to future release.

#12 @wonderboymusic
3 years ago

Bump alert

#13 @boonebgorges
3 years ago

In 34811:

Don't prime term meta cache in is_object_in_term().

Term meta is not necessary in this case, so priming the cache wastes a query.

See #10142, #32044.

#14 @boonebgorges
3 years ago

  • Milestone changed from Future Release to 4.4

Circling back around here. I think the real historical reason why is_object_in_term() doesn't update the object-term cache is because object-term caches are usually update en masse, during update_object_term_cache(). The latter function updates an object's term relationships across all taxonomies.

I racked my brain trying to think of a way to cause a bug by setting the object-term cache in is_object_in_term(), and I couldn't think of anything. For example, there could be a problem if a user and a post had the same ID, and a cache that had been primed for the post contained different terms than belonged to the user. But this would require that a single taxonomy was being shared by users and posts, which causes all sorts of other bugs. (wp_get_object_terms() cannot distinguish between different object types, so the SQL query itself would return incorrect results.)

This points toward better general cache support for all taxonomy relationships - maybe at the level of wp_get_object_terms() - but for now, let's put it here and see if the world explodes.

#15 @boonebgorges
3 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 34812:

Update the taxonomy relationship cache in is_object_in_term().

This function attempts to read from the relationship cache, and uses any data
it finds. If it finds no data, it does a query for the data it needs. Since we
are going to the trouble to query for the relationships, and since we are
already using cached data when available, let's go ahead and cache it for
later use.

Props joehoyle, boonebgorges.
Fixes #32044.

#16 @boonebgorges
22 months ago

In 38277:

In is_object_in_term(), return error object rather than caching it.

This change prevents an error object from being stored in the cache,
and prevents notices from being thrown when plucking term IDs to put
into the relationship cache.

See #32044, #36814.

Props rpayne7264.
Fixes #37721.

#17 @boonebgorges
22 months ago

In 38346:

In is_object_in_term(), return error object rather than caching it.

This change prevents an error object from being stored in the cache,
and prevents notices from being thrown when plucking term IDs to put
into the relationship cache.

See #32044, #36814.

Merges [38277] to the 4.6 branch.

Props rpayne7264.
Fixes #37721.

Note: See TracTickets for help on using tickets.