WordPress.org

Make WordPress Core

Opened 15 months ago

Closed 9 months ago

Last modified 9 months ago

#35381 closed defect (bug) (fixed)

Introduce `WP_Term_Query`

Reported by: boonebgorges Owned by:
Milestone: 4.6 Priority: normal
Severity: normal Version:
Component: Taxonomy Keywords: has-patch needs-testing
Focuses: Cc:

Description

WP_Query, WP_User_Query, and WP_Comment_Query parallel each other in a few ways that make them easier to maintain, easier to enhance, and easier for developers to use. At some point in the future - say, when WP can start using PHP traits - we may actually be able to abstract out some of the shared functionality. (RIP WP_Object_Query.)

In the meantime, get_terms() is the black sheep of the family. The more we add onto it, the harder it gets to maintain. See eg #34996. Let's impose some structure on it, so that it looks and feels more like our other object query classes.

Attachments (5)

35381.diff (50.2 KB) - added by flixos90 11 months ago.
WP_Term_Query class
35381.2.diff (48.1 KB) - added by boonebgorges 10 months ago.
35381.3.patch (711 bytes) - added by spacedmonkey 10 months ago.
35381.3.diff (2.4 KB) - added by boonebgorges 10 months ago.
term_id.patch (945 bytes) - added by spacedmonkey 10 months ago.

Download all attachments as: .zip

Change History (40)

#1 @spacedmonkey
11 months ago

Massive +1 to this. There are so many term based uncached functions (https://vip.wordpress.com/documentation/caching/uncached-functions/). It would be a massive performance win, not to mention it would be clear code.

#2 @flixos90
11 months ago

I just finished working on an initial approach for the class. Will post this a little later tonight with some notes.

@flixos90
11 months ago

WP_Term_Query class

#3 @flixos90
11 months ago

  • Keywords has-patch needs-testing dev-feedback added; needs-patch removed

35381.diff is a first take on this class. I used a similar approach like the one we have been using for other more recent query classes (like WP_Comment_Query and now WP_Site_Query) where the main query only queries IDs and then uses get_term() to get objects. My goal for this first patch was to mimic the functionality of the get_terms() function. In a later iteration the class should also be able to be used by wp_get_object_terms() - it isn't yet, but this is just a note to keep in mind for further enhancement.

Some details about the implementation:

  • Backwards compatibility is a major issue here: there were a number of filters in get_terms() that I have now moved into the class while also adding the usual actions and filters that we have in other query classes. My tests so far worked fine, but we should have a detailed look at all the filters and whether they work as intended. I also added a $suppress_filters argument which should probably (when active) disable more filters than it currently does in this implementation.
  • I left the check whether a taxonomy exists (possibly throws WP_Error) and whether there is a hierarchical taxonomy in get_terms(). I think both of them are not explicitly required to run for any term query.
  • I found a bug that is possible to happen even in current versions of WordPress. In version 4.5 we allowed to query terms of any taxonomy - in this case the $taxonomy argument is null. However there are multiple occassions where it is assumed to be an array. In all cases of a foreach loop, I added a check whether $taxonomy is a trueish value (in this case it's always an array). At one point there is a reset() call which obviously will throw a notice at this point if $taxonomy is null. Try the following query for example: get_terms( array( 'name' => 'uncategorized' ) ). We should remove the requirement for passing a taxonomy for other functions (since it's no longer necessary in a lot of cases since term splitting) - for this particular notice, we need to see if we can adjust sanitize_term_field(). Btw, this also makes the documentation slightly incorrect where $taxonomy is stated to always be an array.
  • Besides the $suppress_filters argument, I also added a no_found_rows argument (similar to other query classes) and a $count argument. The latter is already present through $fields = 'count', but I thought we could standardize it, similar to the other query classes. It's a duplicate though, so if that is an issue, maybe we can deprecate $fields = 'count' - or otherwise get rid of $count again if there's no other way.
  • Another problem in my opinion is the get_terms_fields filter. Since this class always queries for IDs (or a count), using the get_terms_fields filter to modify the requested fields in any way will most probably mess up the query - it's basically not of any use at this point. I wonder if we can deprecate/drop it - it does not affect what the query returns anyway with this implementation.
  • I added a private _prime_term_caches() function, similar like we have for posts and comments. This function is used internally by the query class.
Last edited 11 months ago by flixos90 (previous) (diff)

#4 @spacedmonkey
11 months ago

I have been reviewing performance in the WordPress core and noticed that all the terms functions are the worst performing in my tests. All of the terms functions performance a high number sql queries (with joins) and there is no consistent caching method. Many of these functions do not cache at all. I have done a quick search of the code base of functions what query the terms table. Perhaps the internals of these functions can be replaced to use WP_Term_Query. I have ordered this list in order likelyhood they could implement WP_Term_Query.

get_terms
get_term_by
term_exists
wp_get_object_terms
wp_unique_term_slug
_split_shared_term
wp_insert_term
global_terms

Like the WP_Site_Query ticket, the patches should be broken up two. One for the class and the other for the implementation. Otherwise, good work @flixos90

#5 @boonebgorges
11 months ago

@spacedmonkey I agree that it makes sense to move toward centralizing term-related queries, yes. Let's not do that in this specific ticket, but let's keep it in mind to make sure that the architectural decisions made here don't preclude the kind of improvements you're talking about.

@flixos90 This is really excellent. Thanks for working on it.

Running $ phpunit --group taxonomy, a bunch of stuff is failing. Most of the notices appear to be due to some variable names that weren't changed properly, but there also appears to be a bunch of tests related to hierarchical terms that are related to each other. I haven't had a chance to debug this in detail.

Most of your improvements sound worth considering, but I'm concerned about doing them all at once, especially in the absence of unit tests that describe the issues:

I also added a $suppress_filters argument which should probably (when active) disable more filters than it currently does in this implementation.

The only place where we currently have this feature is in WP_Query. I think we need a separate ticket to discuss whether it's a feature that we need here, and if so, how it should behave.

I found a bug that is possible to happen even in current versions of WordPress

This is pretty complicated, and I'm very wary of touching it without tests. I suggest we fix it separately, either before or after WP_Term_Query. I wonder whether your fix is related to the unit test failures we're seeing. (It sounds like the fix might be straightforward enough to do *before* and then include in a refreshed patch here, but I'd like to hear what you think about it.)

Another problem in my opinion is the get_terms_fields filter. Since this class always queries for IDs (or a count), using the get_terms_fields filter to modify the requested fields in any way will most probably mess up the query

Yeah. I'm not a fan of this filter, or 'fields' arguments in general, since they tend to result in cache misses. I would imagine that most people using the filter are doing it to remove fields from the SELECT statement, in which case there are no real compatibility concerns. However, it's possible that people are using it to *add* fields, possibly related to another table (which would be joined in another filter like 'terms_clauses'). This needs research before we make a decision. A search through the plugin repo is probably a good place to start, as is some reading on the backstory behind 'get_terms_fields' and friends. See #9004.

@flixos90 If you're able to debug the unit test issues and perhaps address some of the items above, that would be great. This ticket is something I would like to prioritize for 4.6, and I'll be able to dedicate time to refining the patch, but not until next week.

This ticket was mentioned in Slack in #core by spacedmonkey. View the logs.


11 months ago

This ticket was mentioned in Slack in #core by spacedmonkey. View the logs.


11 months ago

#8 @ocean90
11 months ago

  • Milestone changed from Future Release to 4.6

#9 @boonebgorges
10 months ago

In 37519:

Tests: get_terms() 'search' test should have more precise fixtures..

Without a fixture that does not match the search term, it's possible for the
test to pass even if the search clause isn't built properly.

See #13992, #35381.

#10 follow-up: @boonebgorges
10 months ago

I've spent some time over the last few days with this issue. 35381.2.diff is an attempt to reconcile @flixos90's original patch with a more modest approach.

The largest hurdle in 35381.diff is the modification of the query so that it fetches only term IDs. get_terms() currently needs to fetch full term objects, because it does some post-query processing on them to handle 'child_of' and empty subcategories. See https://core.trac.wordpress.org/browser/tags/4.5.2/src/wp-includes/taxonomy.php?marks=1662-1696#L1644. It was the absence of this logic that was causing many of the unit test failures I described in my comment above.

Querying for term IDs, rather than full term objects, is something I would absolutely like to do. See #34239. But I feel that if we insist on it as a prerequisite for WP_Term_Query, it's going to hold up the current ticket forever. So I've rolled that change out in 35381.2.diff.

I changed a couple of other things from the original patch, too.

  • Pagination/FOUND ROWS stuff has been removed. We don't expose this anywhere in the interface, so I don't think there's a need for it here. If someone wants it in the future, we can talk about it then.
  • Since we're still querying for full term objects, there's no need for _prime_term_caches(). Though see #36814, where it will likely be introduced in the context of get_the_terms().
  • I don't think we need or necessarily want 'suppress_filters'. It's been removed.
  • I moved the 'get_terms' filter so that it runs only in get_terms(). The other filters have to stay where they are, for obvious reasons.
  • Some of the bits of argument parsing that you'd left in get_terms(), I moved to WP_Term_Query. I think we need most of it in every case where we're fetching terms. If you disagree, I'd be interested to hear the reasons.
  • I broke some 'orderby' logic into a new method parse_orderby_meta(). This is mainly to break up the huge parse_orderby() method, which is full of conditional statements. Like you, I left the 'get_terms_orderby' filter before the meta_query orderby stuff is parsed, but I wonder if anything would actually break by moving it afterward - it seems like, in fact, it may fix potential bugs.

Thoughts are welcome. I'll come back to this patch in a few days when I have a clearer head to reassess.

#11 in reply to: ↑ 10 @flixos90
10 months ago

Replying to boonebgorges:

Querying for term IDs, rather than full term objects, is something I would absolutely like to do. See #34239. But I feel that if we insist on it as a prerequisite for WP_Term_Query, it's going to hold up the current ticket forever. So I've rolled that change out in 35381.2.diff.

That makes sense to me as well. But let's deal with that in a later ticket. We should query only IDs at some point and by doing this also in a way align the behavior of this class with the other recent query classes. It will probably still cause some issues with backwards compatibility though, but at least we already have WP_Term_Query then.

  • Some of the bits of argument parsing that you'd left in get_terms(), I moved to WP_Term_Query. I think we need most of it in every case where we're fetching terms. If you disagree, I'd be interested to hear the reasons.

It's only the check for hierarchical taxonomies that you moved right? I can't quite remember why I left it in there, but looking at it now, I agree moving it into the class.

Looking good so far. What should be the next steps here? Can we start thinking about get_the_terms() yet?

#12 @spacedmonkey
10 months ago

Is there a good reason why the results are only cached for a day? See the following line

wp_cache_add( $cache_key, $terms, 'terms', DAY_IN_SECONDS )

If is confusing, a day seems like a random amount of time. The cache key changes on term cache clean anyway, so it should stay in cache forever right?

I also think that the value stored in cache should be just an array of ids. I don't think we should be storing the whole object in there, as it not required and will just use up memory.

#13 @boonebgorges
10 months ago

It's only the check for hierarchical taxonomies that you moved right? I can't quite remember why I left it in there, but looking at it now, I agree moving it into the class.

Yeah. The check for invalid taxonomies stays in get_terms(), because it's the only place where get_terms() currently returns a WP_Error. This way, we can simplify the return value of WP_Term_Query::get_terms() - it'll always be an array.

Is there a good reason why the results are only cached for a day?

[15583] #11431 Since that conversation, I feel like there's been a move in core to take less of an opinionated position on the internals of cache backends. The attitude is that it's really up to the cache, not WP, to set decent eviction policies. I'd be in favor of readdressing this and coming up with some sort of general policy about it, but this ticket is not the place, and we need more justification than the whim of the people in this conversation :)

I also think that the value stored in cache should be just an array of ids. I don't think we should be storing the whole object in there, as it not required and will just use up memory.

See #34239, comment 10, comment 11

Can we start thinking about get_the_terms() yet?

See #36814. Since get_the_terms() uses wp_get_object_terms(), I don't think it's directly related. I do think we should try to adapt WP_Term_Query for use in wp_get_object_terms(), but I think it will be much harder than what we've done here. We should set up a separate ticket for it so that it doesn't hold up the current improvement for 4.6. See #31105, #29942.

What should be the next steps here?

Let's do it.

#14 @boonebgorges
10 months ago

In 37572:

Introduce WP_Term_Query and use in get_terms().

WP_Term_Query is modeled on existing query classes, such as those used
for comments and users. It provides a more consistent structure for generating
term queries, and should make it easier to add new functionality in the future.

Props flixos90, boonebgorges.
See #35381.

#15 @flixos90
10 months ago

@boonebgorges I think you forgot to add the $term_query variable to the filter in 37572. :)

One other thing I'm curious about is why you changed the filter arguments to use $args and $taxonomies instead of relying on the query vars from WP_Term_Query like you did in 35381.2.diff. Or is that only because you missed that line? I thought the way in the patch was better because that way $args was filled with its defaults.

#17 @spacedmonkey
10 months ago

35381.3.patch adds improved caching.

#18 @boonebgorges
10 months ago

35381.3.patch adds improved caching.

See https://core.trac.wordpress.org/ticket/35381#comment:13. There are historical reasons why this cache has an expiration. A new ticket would be an appropriate place to discuss whether and how this can be changed. As for the caching of term IDs only, this patch is going to break the current caching strategy because we expect term objects when wp_cache_get() earlier in the method. In any case, that discussion would be better suited for #34239, which is about splitting the SQL query, but would bear directly on the question of what gets cached.

think you forgot to add the $term_query variable to the filter in 37572. :)

One other thing I'm curious about is why you changed the filter arguments to use $args and $taxonomies instead of relying on the query vars from WP_Term_Query like you did in 35381.2.diff​. Or is that only because you missed that line? I thought the way in the patch was better because that way $args was filled with its defaults.

I had to massage the patch to get it to reapply after some recent documentation changes in trunk. I must have missed this. I will fix it later.

#19 follow-up: @boonebgorges
10 months ago

In 37576:

Pass the proper values to get_terms action.

  • $term_query should be passed.
  • Second and third params should come from the $term_query->query_vars array, so that they're fully parsed.

These changes were missed in [37572].

Props flixos90, sebastian.pisula.
See #35381.
Fixes #36951.

#20 @DrewAPicture
10 months ago

In 37577:

Docs: Link up a reference to the get_terms_args filter in the hook doc for get_terms_defaults.

See #35381. See #36921.

#21 in reply to: ↑ 19 @jtsternberg
10 months ago

These changes were missed in [37572].

Does this, by chance, fix some issues users are experiencing with using get_terms the old way? Related: https://github.com/WebDevStudios/CMB2/issues/653

#22 follow-up: @boonebgorges
10 months ago

  • Keywords dev-feedback removed

@jtsternberg Thanks for bringing this to my attention.

Replying to jtsternberg:

These changes were missed in [37572].

Does this, by chance, fix some issues users are experiencing with using get_terms the old way? Related: https://github.com/WebDevStudios/CMB2/issues/653

No, it's related only to the values passed to the get_terms filter.

To follow up on the linked ticket: The changes in this ticket are designed to be 100% backward-compatible. The changes are completely internal to get_terms(); existing use of get_terms() should remain unchanged.

I don't have complete steps to reproduce, but based on https://github.com/WebDevStudios/CMB2/commit/20d4b9529f8285425a72a5c082016aa950c7660c I'm guessing that the issue is that arguments are being passed as a querystring (hide_empty=0) rather than an array. By the looks of things, this syntax was supported after [37572] only accidentally, because of the use of wp_parse_args(). When the use of wp_parse_args() early in get_terms() was removed in [37576], it uncovered this compatibility break.

35381.3.diff should fix the issue by ensuring that querystring-style arguments are supported, in both legacy syntax (( $taxonomy, $args )) and new syntax (( $args )). I'm using wp_parse_args() to do the string parsing, as it makes the syntax a bit clearer than referring directly to wp_parse_str().

@jtsternberg and/or @ipstenu, could you please test 35381.3.diff to see if it addresses the problem? This should supercede https://github.com/WebDevStudios/CMB2/commit/20d4b9529f8285425a72a5c082016aa950c7660c

#23 in reply to: ↑ 22 @jtsternberg
10 months ago

@jtsternberg and/or @ipstenu, could you please test 35381.3.diff to see if it addresses the problem? This should supercede https://github.com/WebDevStudios/CMB2/commit/20d4b9529f8285425a72a5c082016aa950c7660c

Yes, rolling back CMB2, and testing before/after your patch, it definitely addresses the issue. The debug.log notices are gone and (more importantly), it works as expected: http://b.ustin.co/127D5

#24 @boonebgorges
10 months ago

In 37599:

Ensure that get_terms() can accept querystring-style arguments.

Prior to [37572], arguments passed to get_terms() were passed immediately
through wp_parse_args(), which made it possible to pass arguments as a
querystring (hide_empty=0) rather than an array
(array( 'hide_empty' => false )). [37572] moved default argument parsing
into WP_Term_Query, while assuming that arguments passed to get_terms()
would be formatted as an array.

To provide compatibility, we now parse all args passed to get_terms() into
an array before processing.

See #35381.

This ticket was mentioned in Slack in #core by ipstenu. View the logs.


10 months ago

#26 @ocean90
10 months ago

  • Keywords needs-dev-note added

#27 @boonebgorges
10 months ago

In 37622:

Remove unused variable from get_terms().

Missed in [37572]. See #35381.

Props JustinSainton.
See #36992.

#28 follow-up: @spacedmonkey
10 months ago

I just noticed that wp_term_query can not query by term_taxonomy_id. I believe it should, as it would then allow us to use get_terms to replace the internals of functions like term_exists, get_term_by and the wp_tax_query. See the following lines.

https://github.com/WordPress/WordPress/blob/master/wp-includes/class-wp-tax-query.php#L617-L655

#29 in reply to: ↑ 28 @flixos90
10 months ago

Replying to spacedmonkey:

I just noticed that wp_term_query can not query by term_taxonomy_id. I believe it should, as it would then allow us to use get_terms to replace the internals of functions like term_exists, get_term_by and the wp_tax_query.

Sounds like an idea worth considering! We should probably deal with it in a separate ticket though.

#30 @spacedmonkey
10 months ago

I have added term_taxonomy_id param in #37038 ticket. It is a nice implementation of WP_Term_Query in another key place.

#31 @spacedmonkey
10 months ago

Added a new patch to change the caching for $fields == 'all' to cache only the term_ids. Hopefully this is a better patch than before.

#32 @boonebgorges
9 months ago

  • Resolution set to fixed
  • Status changed from new to closed

It's getting close to the end of the enhancement period, so let's call this done for 4.6. Future improvements may be suggested in separate tickets.

#33 @spacedmonkey
9 months ago

@boonebgorges I added a patch to this to only cache term ids. Did you see that? Is that in scope of this patch or should I create neither ticket?

#34 @boonebgorges
9 months ago

@spacedmonkey Sorry about that - I saw the patch but wasn't sure what it accomplished in the context of this specific ticket. I'm afraid I don't have the bandwidth to manage any more enhancements to WP_Term_Query for 4.6, so if it's an improvement you'd like to see in a future release, please break it off into its own ticket with some explanation. (Or find an existing ticket - I think there are a couple related to what is stored in the get_terms() cache. See https://www.google.com/?gws_rd=ssl#q=site:core.trac.wordpress.org+terms+cache. #34239 may be a place to start.)

#35 @ocean90
9 months ago

  • Keywords needs-dev-note removed
Note: See TracTickets for help on using tickets.