WordPress.org

Make WordPress Core

Opened 3 years ago

Closed 3 years ago

#37198 closed enhancement (fixed)

Use `WP_Term_Query` for `wp_get_object_terms()`

Reported by: flixos90 Owned by: boonebgorges
Milestone: 4.7 Priority: normal
Severity: normal Version: 4.6
Component: Taxonomy Keywords: has-patch
Focuses: Cc:

Description

As discussed in #35381, WP_Term_Query should be used for wp_get_object_terms() as well. It works almost similar, except for that it requires to pass object IDs and make an additional SQL JOIN with the term_relationships table. In addition to a better structured code, this would also provide significant cache benefits.

Attachments (3)

37198.diff (20.4 KB) - added by flixos90 3 years ago.
37198.2.diff (15.2 KB) - added by boonebgorges 3 years ago.
37198.3.diff (23.6 KB) - added by boonebgorges 3 years ago.

Download all attachments as: .zip

Change History (18)

@flixos90
3 years ago

#1 @flixos90
3 years ago

  • Keywords has-patch added; needs-patch removed

I played around with this to create a first draft for a patch in 37198.diff.

In addition to conditionally adding the necessary JOIN statement (only when object IDs are passed), there were some additional tweaks necessary as several things that used to happen in get_terms() (and now WP_Term_Query) were not applicable / compatible with how wp_get_object_terms() work. Therefore I include checks for whether object_ids is empty or not in several locations to be backwards-compatible.

Also, new values for the fields argument are supported as needed for wp_get_object_terms() (all_with_object_id, tt_ids, slugs).

There is a new private method post_process_terms which is used to ensure that the terms are sanitized if object IDs are passed (wp_get_object_terms() expects sanitized data). This method is not only run after doing the SQL query, but also after retrieving cached results.

#2 @DrewAPicture
3 years ago

  • Milestone changed from Awaiting Review to Future Release

#3 @lukecavanagh
3 years ago

The patch applies cleanly.

This ticket was mentioned in Slack in #core by flixos90. View the logs.


3 years ago

@boonebgorges
3 years ago

#5 follow-up: @boonebgorges
3 years ago

  • Keywords 2nd-opinion added; early removed
  • Milestone changed from Future Release to 4.7

@flixos90 Thank you for the hard work you've done on the first pass, and sorry for the time it took to get around to reviewing it. It's a hard one :)

Your patch is extremely conservative, in the sense that (a) it guarantees 100% backward compatibility, and (b) it attempts not to touch any existing code beyond what must be touched to refactor wp_get_object_terms(). I like this as a general approach :) But in this case, I think that we can simplify our strategy by looking a bit deeper at the back compat issues.

37198.2.diff is a somewhat simplified version of your patch. (It's also sloppy - I left out some documentation and default params. This will be cleaned up.) Here are the important differences:

  • // Verification of term hierarchy must be skipped for wp_get_object_terms(). If I'm reading it correctly, this block in 37198.diff was meant to address Test_Term_WpGetObjectTerms::test_parent(). These tests fail without the block. The reason is that the tests were written incorrectly: they define hierarchical terms, in a taxonomy that is non-hierarchical. By changing the tests, we can avoid the additional logic. This is technically a compatibility break. Currently, if you manually create hierarchical terms in a non-hierarchical taxonomy, and you call wp_get_object_terms() with 'parent', only the object-terms with the correct 'parent' will be returned. With my patch, doing this same thing will result in an empty array being returned. IMO, passing 'parent' when the taxonomy is non-hierarchical is not meaningful, and it's OK to break it - but I would be interested to hear what you think.
  • When building the SELECT clause, I default to selecting t.*, tt.* whenever possible. The performance differences are negligible, and keeping a smaller number of variations in the SQL query means better cache coverage. Note that this part of the patch will need refreshing when the main SELECT clause in WP_Term_Query grabs only the term_id (or term_taxonomy_id) column. See #37189, #34239.
  • I removed the post_process_terms() logic. The sanitization you mention was introduced in [26010] to ensure that integer values are, in fact, integers. But we now have WP_Term for this purpose. So, instead, my patch sends results through get_term() when fields=all_with_object_id, and also adds some integer-casting when fields=ids and some other numeric values. I think this logic can probably be simplified even further with #34239 etc, but for now I think it's clearer to ensure that WP_Term objects are returned.
  • You had a special clause for 'orderby' in wp_get_object_terms() to 'id'. I think this is related to the fact that some of our unit tests passed orderby=t.term_id to wp_get_object_terms(). (You also have some logic in parse_orderby() for this.) But I'm fairly sure that these tests are incorrect: t.term_id has never been officially supported as a value for 'orderby'. It worked in some cases because, since it didn't match a whitelisted 'orderby' value, it fell through to the default - which happened to be term_id. However, I don't think this is something that we must continue to support. I could be convinced otherwise if we found a bunch of uses of orderby=t.term_id in the wild, but even then I'd strongly consider the break. My patch fixes the tests, instead.

So, my approach is a bit more cavalier, but results in a considerably smaller patch. What do you think, @flixos90 ?

#6 in reply to: ↑ 5 @flixos90
3 years ago

Replying to boonebgorges:

Your patch is extremely conservative, in the sense that (a) it guarantees 100% backward compatibility, and (b) it attempts not to touch any existing code beyond what must be touched to refactor wp_get_object_terms(). I like this as a general approach :) But in this case, I think that we can simplify our strategy by looking a bit deeper at the back compat issues.

I'm glad you see it that way. :) I wasn't sure about how backward-compatible it should be, but indeed, especially the post_process_terms() logic in my patch to take care of previous handling in wp_get_object_terms() was kind of painful, so I like handling it in a unified way, plus it's smaller and better readable like you said.

IMO, passing 'parent' when the taxonomy is non-hierarchical is not meaningful, and it's OK to break it - but I would be interested to hear what you think.

I agree, returning an empty result when passing a parent to a non-hierarchical taxonomy is actually what I would expect it to return.

Your other changes make sense to me as well. I'm not exactly sure about whether getting rid of the extra sanitization I did could introduce BC issues, but you probably have a better overview of that. I think we should mention this in a dev-note, and maybe get this in some time soon to see if anything breaks.

One thing I'd like to add back in is only allowing "object_id-related" query vars if object_ids is passed and not empty. On the one hand, providing fields => all_with_object_id doesn't make sense without passing object_ids anyway, but on the other hand we will be more foolproof by preventing it entirely. The same applies to sorting by term order (this can break the SQL query atm), although I don't like how I did it in the first patch (the method is encapsulated otherwise, but for this particular reason accesses a class property - passing a bool like $has_object_ids to the method is structurally better I think).

Are you planning to add documentation and default params back in? You could probably merge a lot of my first patch for that. Looking forward to see a comprehensive patch. :)

@boonebgorges
3 years ago

#7 follow-up: @boonebgorges
3 years ago

Are you planning to add documentation and default params back in? You could probably merge a lot of my first patch for that. Looking forward to see a comprehensive patch. :)

Sorry about this, I got lazy :)

37198.3.diff is a more complete patch. It adds back in the documentation etc. Also:

  • I'm enforcing the 'term_count' thing by changing the value of 'order' to something legal before passing it to parse_orderby()
  • I moved the deduplication logic so that it can be done in a single block (instead of conditionally using array_unique())
  • Added some tests

So, fun fact, this change adds caching to all wp_get_object_terms() calls. Yet we still have a separate {$taxonomy}_relationships cache. We can't easily tear the latter bit out, because it's scattered through the codebase. Plus, it's a more durable cache than the WP_Term_Query cache, because WP_Term_Query is based on last_changed, which will be bumped pretty frequently. Still, having two layers of cache makes me uneasy. I wrote a test that demonstrates that things are working OK. Do you have thoughts about it, @flixos90 ?

#8 in reply to: ↑ 7 @flixos90
3 years ago

Replying to boonebgorges:

  • I'm enforcing the 'term_count' thing by changing the value of 'order' to something legal before passing it to parse_orderby()

I think it's term_order, other than that it appears to be good solution. :)

So, fun fact, this change adds caching to all wp_get_object_terms() calls. Yet we still have a separate {$taxonomy}_relationships cache. We can't easily tear the latter bit out, because it's scattered through the codebase. Plus, it's a more durable cache than the WP_Term_Query cache, because WP_Term_Query is based on last_changed, which will be bumped pretty frequently. Still, having two layers of cache makes me uneasy. I wrote a test that demonstrates that things are working OK. Do you have thoughts about it, @flixos90 ?

That's interesting. I don't think it will be a problem since the caches shouldn't interfere with each other. Also, the {$taxonomy}_relationships cache has a more specific use-case (terms of _one_ taxonomy for _one_ object) and it is actually built using wp_get_object_terms(), so I think we can ignore it here. They both have their use-case.

What we could theoretically do is actually make use of that cache in wp_get_object_terms():

  • check if _one_ object ID and _one_ taxonomy is provided and if $orderby doesn't include term_order
  • if so, check the {$taxonomy}_relationships cache for that object ID using get_object_term_cache()
  • if the cache brings a successful response, we adjust the query arguments:
    • remove the object_ids argument
    • add the term IDs to a include argument
  • send the (possibly modified) arguments to a WP_Term_Query

If any of the above checks fails, we proceed as usual instead. I'm not sure if that actually makes sense performance-wise, but just wanted to mention it as a possibility. When having the relationships cached, this would allow for much simpler queries (and I assume there _are_ many cases where terms for one object ID and one taxonomy are requested.

#9 @boonebgorges
3 years ago

  • Keywords 2nd-opinion removed
  • Owner set to boonebgorges
  • Status changed from new to assigned

I think it's term_order, other than that it appears to be good solution. :)

Whoops :)

That's interesting. I don't think it will be a problem since the caches shouldn't interfere with each other.

Right. For posterity's sake, here's how it'll work in practice. Functions like get_the_terms(), which call get_object_term_cache(), do something like the following:

1. $terms = get_object_term_cache( $post_id, $taxonomy );
2. if ( false === $terms ) {
3.    $terms = wp_get_object_terms( $post_id, $taxonomy );
4.    // then set the {$taxonomy}_relationships cache
5. }

Under normal circumstances, one of the following conditions will hold:

  1. The relationship cache is full, and so is hit on line 1.
  2. The relationship cache is empty, but the cache related to the wp_get_object_terms() call is full, so that cached value is returned and used to populate the {$taxonomy}_relationships cache.
  3. The relationship cache is empty, as is the wp_get_object_terms() cache. The database is hit and the term query cache is populated. This is then used to populate the {$taxonomy}_relationships cache.

What we could theoretically do is actually make use of that cache in wp_get_object_terms()

That's an interesting thought. Essentially, it'd save us a JOIN against wp_term_relationships. Let's explore the possibility in a separate ticket, with some benchmarks. (It'll take a bit of confusing code to make it work right, so it's not worth doing if it doesn't result in visible benefits.)

#10 @boonebgorges
3 years ago

#31105 was marked as a duplicate.

#11 @boonebgorges
3 years ago

In 38667:

Taxonomy: Use WP_Term_Query when querying for object terms.

The new 'object_ids' parameter for WP_Term_Query allows queries for
terms that "belong to" a given object. This change makes it possible
to use WP_Term_Query inside of wp_get_object_terms(), rather than
assembling a SQL query.

The refactor has a couple of benefits:

  • Less redundancy.
  • Better consistency in accepted arguments between the term query functions. See #31105.
  • Less redundancy.
  • Object term queries are now cached. The get_object_term_cache() cache remains, and will be a somewhat less fragile secondary cache in front of the query cache (which is subject to frequent invalidation).
  • Less redundancy.

A small breaking change: Previously, if a non-hierarchical taxonomy had
terms that had a non-zero 'parent' (perhaps because of a direct SQL
query), wp_get_object_terms() would respect the 'parent' argument.
This is in contrast to WP_Term_Query and get_terms(), which have
always rejected 'parent' queries for non-hierarchical taxonomies. For
consistency, the behavior of get_terms() is being applied across the
board: passing 'parent' for a non-hierarchical taxonomy will result in
an empty result set (since the cached taxonomy hierarchy will be empty).

Props flixos90, boonebgorges.
See #37198.

#12 @boonebgorges
3 years ago

#29942 was marked as a duplicate.

This ticket was mentioned in Slack in #core by chriscct7. View the logs.


3 years ago

This ticket was mentioned in Slack in #core by jeffpaul. View the logs.


3 years ago

#15 @boonebgorges
3 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

No problems reported, which I interpret as a confirmation that the change is perfect in every conceivable way.

Note: See TracTickets for help on using tickets.