#36949 closed enhancement (fixed)
Term exists should use get_terms internally
Reported by: | spacedmonkey | Owned by: | spacedmonkey |
---|---|---|---|
Milestone: | 6.0 | Priority: | normal |
Severity: | normal | Version: | 2.3 |
Component: | Taxonomy | Keywords: | has-dev-note |
Focuses: | performance | Cc: |
Description
More than WP_Term_Query has been merged, it makes sense to converge to use it in all internal functions to query the terms table. Using WP_Term_Query / get_terms will also add caching to a functions that do not currently have it.
Attachments (5)
Change History (48)
#2
@
9 years ago
- Keywords needs-patch added; has-patch removed
- Milestone changed from Awaiting Review to Future Release
#3
@
8 years ago
- Keywords has-patch added; needs-unit-tests needs-patch removed
Updated the patch so it now passes all but three of the unit tests. All the taxonomy tests pass. The existing tests for term_exists are pretty solid, so not sure how many more tests are needed.
#4
@
8 years ago
36949.3.patch includes some formatting cleanup.
I looked into one of the failures - Tests_Import_Import::test_small_import()
. What's happening here is that the importer suspends cache validation, such that parent-term lookups fail during bulk term import. So here's a real-life instance of the kind of case I hypothesized above. I assume the other failures are similar. We have a couple options available.
- Don't suspend cache invalidation for bulk operations.
- Don't use
term_exists()
during bulk operations. (We'd use some version that skips the cache.) - Don't change the behavior of
term_exists()
It's fairly easy to make the internal changes necessary for 1 and 2, but both are going to cause serious problems for third-party libraries that suspend cache invalidation for imports and other bulk operations.
I think term_exists()
is not a great function anyway, and I wouldn't mind if we moved away from its use in core. If we phased out our own internal uses, we could replace them with functions that leverage the cache, and also demonstrate the proper way to handle bulk operations (perhaps those would continue to use term_exists()
?).
#5
@
8 years ago
Thinking about this a little more. The problem with imports etc is that cache invalidation is being skipped, but the cache is still being accessed for lookups. What we ought to have is something like wp_suspend_cache_access()
(bad name, but you get the idea), which would prevent the cache from being accessed for lookups in places like term_exists()
and get_term_by()
.
#6
@
6 years ago
- Keywords has-unit-tests added
I have updated the patch with 36949.diff.
Now all tests pass with a very simple work around for cached queries on import. Simple disable the caching on import. This means all tests pass. The test coverage is pretty good for term_exists, so I believe we may not need to add anymore tests.
@boonebgorges can you take look at the patch?
#7
@
6 years ago
- Keywords close added
Thanks for the patch, @spacedmonkey. The WP_IMPORTING
check is clever. But it's also ad hoc, which makes me nervous. If wordpress-importer expects term_exists()
to query the database directly, then it's likely that other plugins in the wild are doing the same thing.
I mentioned in https://core.trac.wordpress.org/ticket/36949#comment:4 that we might consider simply leaving term_exists()
as is, but phasing out its use in core. To me, this feels like a much safer option. The fact that term_exists()
accepts IDs or slugs or names means that it's already subject to all sorts of bugs related to similarly named terms (I'm too lazy to search for tickets, but I'm sure you could find lots). This ambiguity is almost always bad, and is the reason why we have get_term_by()
.
As such, I'd much rather spend effort phasing out the internal use of term_exists()
, rather than making the proposed internal change (which has somewhat marginal benefit) and then dealing with the inevitable compatibility cleanups. Does this seems reasonable?
#8
@
6 years ago
Looking there function term_exists
seems to be used by many many plugins, there is 2452 references in the plugin repo at time of writing. I feel like it is unlikely that we could ever get rid of it's use in the wild, as even jetpack and contact form 7 use it.
I think that we should go ahead with using get_terms
, but disable caching by default with a new param like cached
on the term query class. I would also not suppress filters, meaning that, if you want go in and turn on caching, you can do so at your own risk.
#9
@
6 years ago
Agreed that we're never going to be able to get rid of term_exists()
altogether.
The original goal in this ticket - if I understand it correctly - is to improve performance in term_exists()
. If we force caching to be disabled, we're not accomplishing this goal, and we may actually be making it worse, since WP_Term_Query
does some other cache warming. So the motivation is not clear to me. There is, of course, some value - in terms of aesthetics and maintenance - in consolidating on the use of WP_Term_Query
. But the possibility of bugs makes it seem not worth the effort to me.
As you can tell, I personally feel very squeamish about touching this function, especially in the absence of really substantial benefits.
#10
@
6 years ago
There a number of reasons I want this to go into core, here is my reasoning.
- Constancy
term_exists one of the few places in core to not use WP_Term_Query
. Once have this is merged, we have one piece of code that generates all the queries to the terms tables. That makes future changes to terms table much easier, as we need only edit in one place. It would make a ticket like #30262 easier.
- Maintenance
Similar to above, but this should reduce maintenance overhead, as there is less code in play. Any bugs we find, can be fixeding in WP_Term_Query
and other part of code will see the benefit.
- Filters
Using WP_Term_Query
add a host of useful filters, that would allow for better intergrations with plugins. Take #41246 that would make it possible to load terms from external source like elastic search.
- Caching
My idea above, I said that by default, caching would be disable, this doesn't mean if you know what you doing, you can't turn it back on. Take the for example the following snippet.
add_filter( 'get_terms_args', function( $args ) {
// Leave disabled if importing
if( defined('WP_IMPORTING') ) {
return $args;
}
// Leave disabled if using WP CLI
if( defined('WP_CLI') && WP_CLI) {
return $args;
}
$args['cache'] = true;
return $args;
});
This would allow developers to turn off / on caching for term_exists
when they want and put the control in their hands.
#11
@
6 years ago
In 36949.2.diff I added a similar patch with a new cached_results
param added.
#13
@
5 years ago
@spacedmonkey Thanks for continuing to plug away at the ticket. I have reviewed the patch and the cached_results
trick seems like it will probably work, though I must say I'm not thrilled about the ideal of giving developers the ability to bypass the cache on a case basis.
More generally, I worry that this change is going to introduce strange bugs that are difficult to trace for third-party tools that are using term_exists()
in a way that expects it to be uncached and unfiltered. Think, for example, of plugins that filter various parts of WP_Term_Query
- say, to exclude certain terms from being viewed publicly. Currently, these filters do not apply to term_exists()
, but if they suddenly are, I can imagine it causing problems in various ways.
For this reason, I continue to think that this change is unwise. As above https://core.trac.wordpress.org/ticket/36949#comment:9, a better way to consolidate and improve maintainability is to reduce the use of term_exists()
throughout WP.
If another team member thinks that @spacedmonkey is right and I'm wrong, I'd be OK with being outvoted. But then I want those other team members to take responsibility for shepherding the change and cleaning up the (IMO inevitable) compatibility breaks that are reported afterward.
This ticket was mentioned in PR #2161 on WordPress/wordpress-develop by spacedmonkey.
3 years ago
#14
Trac ticket: https://core.trac.wordpress.org/ticket/36949
#15
@
3 years ago
- Milestone changed from Future Release to 6.0
- Owner set to spacedmonkey
- Status changed from new to assigned
Updated the patch to be github PR that can be found here.
This ticket was mentioned in PR #2195 on WordPress/wordpress-develop by spacedmonkey.
3 years ago
#16
Trac ticket: https://core.trac.wordpress.org/ticket/36949
spacedmonkey commented on PR #2161:
3 years ago
#17
This ticket was mentioned in PR #2318 on WordPress/wordpress-develop by spacedmonkey.
3 years ago
#18
Trac ticket:
spacedmonkey commented on PR #2195:
3 years ago
#19
Closing in favour of #2318
#20
@
3 years ago
I have a new PR that has unit tests passing and has a new unit test #2318. It has a nice workaround for import issues, some fixes to some invalid tests and a new unit test. Please can review again.
#21
@
3 years ago
@spacedmonkey Your latest PR looks solid to me from a technical perspective, awesome work!
However, I also acknowledge @boonebgorges's concerns from before that this change may subvert some (to be fair, quite obscure) expectations from developers that the queries here would be uncached, not go through any filtering etc.
I personally am not opposed to this being committed to core, but I'm also not confident enough to commit this myself. I agree that for sure the safer change would be to replace term_exists()
calls with something like !! get_term()
throughout core, and we could potentially even deprecate term_exists()
at some point then.
#22
@
3 years ago
- Keywords needs-dev-note commit added; close removed
I understand the concerns that @boonebgorges has raised and originally I agreed with them. But after working on #54511 and spending 4+ hours reviewing of the taxonomy code to ensure that caches are invalidates, I am confident that this change will not breaking change. The fact that in earlier version of this patch, only 5 tests fail, prove that to me. three of breakages, were tests where raw sql queries were run in tests and the correct clean_term_cache
calls were not added. This is a case should / would not happen in the wild, as performing raw queries without clearing caching after, will result in invalid / corrupted caches anyway.
As for importer, there is a pretty clever workaround using cache_domains
in the patch. But again, imports are a special case. The importer suspends cache invalidation, see this results in the test failures. Which is a very special case.
I have added the keywords, needs-dev-note. There is a small chance their might be breakages in custom migration / importer tools or code that does custom sql queries. The fix is simple, either implement clean_term_cache
function calls in the correct place or do a full object cache flush or simply. I will provide examples in the dev note.
This sound reasonable @flixos90 ?
#23
@
3 years ago
@spacedmonkey As far as I understand @boonebgorges's concerns, it's not only about cache invalidation but also about the expectation that the term_exists()
function has always been using uncached and unfiltered SQL queries, which is now being changed with this patch, so there is certainly risk of breakage. At the same time, it's IMO an obscure expectation that a certain query should be cached - in a way, that would mean we could never add caching to something after introducing an uncached query. Again, a safer approach would be to phase out term_exists()
usage, and maybe even deprecate it. Related question: What does that mean for `post_exists()`? Maybe we should align on one approach for both of these functions before actually committing something.
I agree that writing a dev note with examples will be important if you commit this change.
#24
@
3 years ago
In [38677] get_term_by is changed to use WP_Term_Query. A very similar change to this one. There are other examples were we made function return caches, where they did not before, another example is [37479] and [40921]. Expecting functions internals not to change, is not reasonable IMO. As long as the return values stay the same.
I believe we should leave this function and post_exists()
as is.
The fact it uses the filters, is a benefit and not a downside. For those using something like elastic search to offload database queries, this would be seen as massive upside. CC @adamsilverstein as he as experience with ES.
#25
@
3 years ago
Please take this as an ill-considered rubber-ducking question!!
🦆🦆🦆 Could term_exists bypass the cache if cache invalidation is suspended? 🦆🦆🦆
#26
@
3 years ago
No @peterwilsoncc that is a good question.
It would simple to change the current workaround for bypassing caches, from this.
if ( defined( 'WP_IMPORTING' ) && WP_IMPORTING ) {
$defaults['cache_domain'] = microtime();
}
To
function term_exists( $term, $taxonomy = '', $parent = null ) {
global $_wp_suspend_cache_invalidation;
...
if ( !empty( $_wp_suspend_cache_invalidation ) ) {
$defaults['cache_domain'] = microtime();
}
This logic is a little cleaner and might catch more of the edge cases we are worried about. I once called wp_suspend_cache_invalidation
in a custom migration tool I wrote...
#27
@
3 years ago
I have actioned feedback from @peterwilsoncc. If now uses $_wp_suspend_cache_invalidation
global. This makes the code a lot cleaner and means I could add a unit test.
If now means if you want an uncached term_exists call you can do this.
wp_suspend_cache_invalidation( true );
$term = term_exists( 'thing' );
wp_suspend_cache_invalidation( false );
Which might be useful for devs.
#28
@
3 years ago
- Keywords commit removed
I pushed an intentionally broken test to PR#2318 earlier today to demonstrate a question I had about get_term()
-- ie, the singular. I'll remove the commit keyword for now.
Do yell out if I am highlighting something that is intentionally out of scope.
#29
follow-up:
↓ 30
@
3 years ago
Thanks @peterwilsoncc . It is was a good spot. I have refactor the patch to remove usage of get_term
and now everything uses get_terms
. Unit tests are passing now
#30
in reply to:
↑ 29
@
3 years ago
- Keywords needs-testing added
Replying to spacedmonkey:
Thanks @peterwilsoncc . It is was a good spot. I have refactor the patch to remove usage of
get_term
and now everything usesget_terms
. Unit tests are passing now
Thanks, both for the updated code and taking my push of a broken test to your branch in the spirit intended.
The code looks good to me, I've added the needs-testing keyword on this in the hope some of the testing folk can take a look at your branch before committing.
#31
@
3 years ago
While working on this ticket, I found a weird behaviour. If you pass an int value as the term, parent value is not respected. I have created issue for this behaviour. #55358
#32
@
3 years ago
- Keywords commit added
Marking this as commit ready for the linked pull request, @spacedmonkey are you able to do the honors?
#33
follow-up:
↓ 35
@
3 years ago
- Resolution set to fixed
- Status changed from assigned to closed
In 52921:
#35
in reply to:
↑ 33
;
follow-up:
↓ 36
@
3 years ago
- Resolution fixed deleted
- Status changed from closed to reopened
May I ask to introduce a new filter term_exists_args
?
Indeed each time get_terms()
replaces a direct sql query, it introduces a breaking change. get_terms()
is widely used and can be filtered with terms_clauses
. The issue is that this filter will now be applied to term_exists()
when it was not applied prior to [52921]. Currently there is no way to determine if the call to get_terms()
is done by term_exists()
or not, so we cannot disable the filter accordingly. Unless using an ugly hack with debug_backtrace()
...
A new filter term_exists_args
would allow to work around the breaking chanege in an elegant way.
Related:
#36
in reply to:
↑ 35
@
3 years ago
- Keywords has-patch has-unit-tests needs-testing commit removed
Replying to Chouby:
May I ask to introduce a new filter
term_exists_args
?
I think that makes sense, possibly with the wp_
prefix to be polite to plugins. @spacedmonkey, would you be able to look at this?
I've done a little keyword tidy up to reflect the need for a filter patch.
#37
@
3 years ago
@Chouby @peterwilsoncc
I am happy to add a new filter. But I think the change should be tracked by a new ticket. Any chance we could create a new ticket for this and I will start work on a patch?
#38
@
3 years ago
@spacedmonkey I am happy if you'd prefer that.
@Chouby are you able to create the follow up ticket to ensure you get props when it gets committed?
Thanks. This is a good idea in theory, though I'm not sure how it will pan out in practice. First, there are errors in the patch that are causing invalid SQL in some cases. Second, relying on the cache in
term_exists()
is likely to surface invalidation bugs in places where WordPress usesterm_exists()
internally to avoid duplicates. There needs to be a review of these places. A good place to start is the dozens of failing tests when runningphpunit
with the patch applied :)