Make WordPress Core

Opened 8 years ago

Closed 21 months ago

Last modified 20 months ago

#36949 closed enhancement (fixed)

Term exists should use get_terms internally

Reported by: spacedmonkey's profile spacedmonkey Owned by: spacedmonkey's profile spacedmonkey
Milestone: 6.0 Priority: normal
Severity: normal Version: 2.3
Component: Taxonomy Keywords: has-dev-note
Focuses: performance Cc:

Description

More than WP_Term_Query has been merged, it makes sense to converge to use it in all internal functions to query the terms table. Using WP_Term_Query / get_terms will also add caching to a functions that do not currently have it.

Attachments (5)

36949.patch (3.7 KB) - added by spacedmonkey 8 years ago.
36949.2.patch (5.2 KB) - added by spacedmonkey 7 years ago.
36949.3.patch (5.2 KB) - added by boonebgorges 7 years ago.
36949.diff (6.0 KB) - added by spacedmonkey 5 years ago.
36949.2.diff (5.0 KB) - added by spacedmonkey 5 years ago.

Download all attachments as: .zip

Change History (48)

@spacedmonkey
8 years ago

#1 @spacedmonkey
8 years ago

  • Keywords has-patch needs-unit-tests added

#2 @boonebgorges
8 years ago

  • Keywords needs-patch added; has-patch removed
  • Milestone changed from Awaiting Review to Future Release

Thanks. This is a good idea in theory, though I'm not sure how it will pan out in practice. First, there are errors in the patch that are causing invalid SQL in some cases. Second, relying on the cache in term_exists() is likely to surface invalidation bugs in places where WordPress uses term_exists() internally to avoid duplicates. There needs to be a review of these places. A good place to start is the dozens of failing tests when running phpunit with the patch applied :)

#3 @spacedmonkey
7 years ago

  • Keywords has-patch added; needs-unit-tests needs-patch removed

Updated the patch so it now passes all but three of the unit tests. All the taxonomy tests pass. The existing tests for term_exists are pretty solid, so not sure how many more tests are needed.

#4 @boonebgorges
7 years ago

36949.3.patch includes some formatting cleanup.

I looked into one of the failures - Tests_Import_Import::test_small_import(). What's happening here is that the importer suspends cache validation, such that parent-term lookups fail during bulk term import. So here's a real-life instance of the kind of case I hypothesized above. I assume the other failures are similar. We have a couple options available.

  1. Don't suspend cache invalidation for bulk operations.
  2. Don't use term_exists() during bulk operations. (We'd use some version that skips the cache.)
  3. Don't change the behavior of term_exists()

It's fairly easy to make the internal changes necessary for 1 and 2, but both are going to cause serious problems for third-party libraries that suspend cache invalidation for imports and other bulk operations.

I think term_exists() is not a great function anyway, and I wouldn't mind if we moved away from its use in core. If we phased out our own internal uses, we could replace them with functions that leverage the cache, and also demonstrate the proper way to handle bulk operations (perhaps those would continue to use term_exists()?).

#5 @boonebgorges
7 years ago

Thinking about this a little more. The problem with imports etc is that cache invalidation is being skipped, but the cache is still being accessed for lookups. What we ought to have is something like wp_suspend_cache_access() (bad name, but you get the idea), which would prevent the cache from being accessed for lookups in places like term_exists() and get_term_by().

@spacedmonkey
5 years ago

#6 @spacedmonkey
5 years ago

  • Keywords has-unit-tests added

I have updated the patch with 36949.diff.

Now all tests pass with a very simple work around for cached queries on import. Simple disable the caching on import. This means all tests pass. The test coverage is pretty good for term_exists, so I believe we may not need to add anymore tests.

@boonebgorges can you take look at the patch?

Last edited 5 years ago by spacedmonkey (previous) (diff)

#7 @boonebgorges
5 years ago

  • Keywords close added

Thanks for the patch, @spacedmonkey. The WP_IMPORTING check is clever. But it's also ad hoc, which makes me nervous. If wordpress-importer expects term_exists() to query the database directly, then it's likely that other plugins in the wild are doing the same thing.

I mentioned in https://core.trac.wordpress.org/ticket/36949#comment:4 that we might consider simply leaving term_exists() as is, but phasing out its use in core. To me, this feels like a much safer option. The fact that term_exists() accepts IDs or slugs or names means that it's already subject to all sorts of bugs related to similarly named terms (I'm too lazy to search for tickets, but I'm sure you could find lots). This ambiguity is almost always bad, and is the reason why we have get_term_by().

As such, I'd much rather spend effort phasing out the internal use of term_exists(), rather than making the proposed internal change (which has somewhat marginal benefit) and then dealing with the inevitable compatibility cleanups. Does this seems reasonable?

#8 @spacedmonkey
5 years ago

Looking there function term_exists seems to be used by many many plugins, there is 2452 references in the plugin repo at time of writing. I feel like it is unlikely that we could ever get rid of it's use in the wild, as even jetpack and contact form 7 use it.

I think that we should go ahead with using get_terms, but disable caching by default with a new param like cached on the term query class. I would also not suppress filters, meaning that, if you want go in and turn on caching, you can do so at your own risk.

#9 @boonebgorges
5 years ago

Agreed that we're never going to be able to get rid of term_exists() altogether.

The original goal in this ticket - if I understand it correctly - is to improve performance in term_exists(). If we force caching to be disabled, we're not accomplishing this goal, and we may actually be making it worse, since WP_Term_Query does some other cache warming. So the motivation is not clear to me. There is, of course, some value - in terms of aesthetics and maintenance - in consolidating on the use of WP_Term_Query. But the possibility of bugs makes it seem not worth the effort to me.

As you can tell, I personally feel very squeamish about touching this function, especially in the absence of really substantial benefits.

#10 @spacedmonkey
5 years ago

There a number of reasons I want this to go into core, here is my reasoning.

  1. Constancy

term_exists one of the few places in core to not use WP_Term_Query. Once have this is merged, we have one piece of code that generates all the queries to the terms tables. That makes future changes to terms table much easier, as we need only edit in one place. It would make a ticket like #30262 easier.

  1. Maintenance

Similar to above, but this should reduce maintenance overhead, as there is less code in play. Any bugs we find, can be fixeding in WP_Term_Query and other part of code will see the benefit.

  1. Filters

Using WP_Term_Query add a host of useful filters, that would allow for better intergrations with plugins. Take #41246 that would make it possible to load terms from external source like elastic search.

  1. Caching

My idea above, I said that by default, caching would be disable, this doesn't mean if you know what you doing, you can't turn it back on. Take the for example the following snippet.

add_filter( 'get_terms_args', function( $args ) {
      // Leave disabled if importing 
      if( defined('WP_IMPORTING') ) {
            return $args;
      }
      
      // Leave disabled if using WP CLI 
      if( defined('WP_CLI') && WP_CLI) {
            return $args;
      }
      
      $args['cache'] = true;

      return $args;
});

This would allow developers to turn off / on caching for term_exists when they want and put the control in their hands.

@spacedmonkey
5 years ago

#11 @spacedmonkey
5 years ago

In 36949.2.diff I added a similar patch with a new cached_results param added.

#12 @spacedmonkey
4 years ago

The latest patch on this seems solid. Any chance of a look at this @boonebgorges

#13 @boonebgorges
4 years ago

@spacedmonkey Thanks for continuing to plug away at the ticket. I have reviewed the patch and the cached_results trick seems like it will probably work, though I must say I'm not thrilled about the ideal of giving developers the ability to bypass the cache on a case basis.

More generally, I worry that this change is going to introduce strange bugs that are difficult to trace for third-party tools that are using term_exists() in a way that expects it to be uncached and unfiltered. Think, for example, of plugins that filter various parts of WP_Term_Query - say, to exclude certain terms from being viewed publicly. Currently, these filters do not apply to term_exists(), but if they suddenly are, I can imagine it causing problems in various ways.

For this reason, I continue to think that this change is unwise. As above https://core.trac.wordpress.org/ticket/36949#comment:9, a better way to consolidate and improve maintainability is to reduce the use of term_exists() throughout WP.

If another team member thinks that @spacedmonkey is right and I'm wrong, I'd be OK with being outvoted. But then I want those other team members to take responsibility for shepherding the change and cleaning up the (IMO inevitable) compatibility breaks that are reported afterward.

#15 @spacedmonkey
23 months ago

  • Milestone changed from Future Release to 6.0
  • Owner set to spacedmonkey
  • Status changed from new to assigned

Updated the patch to be github PR that can be found here.

This ticket was mentioned in PR #2318 on WordPress/wordpress-develop by spacedmonkey.


22 months ago
#18

Trac ticket:

spacedmonkey commented on PR #2195:


22 months ago
#19

Closing in favour of #2318

#20 @spacedmonkey
22 months ago

I have a new PR that has unit tests passing and has a new unit test #2318. It has a nice workaround for import issues, some fixes to some invalid tests and a new unit test. Please can review again.

#21 @flixos90
22 months ago

@spacedmonkey Your latest PR looks solid to me from a technical perspective, awesome work!

However, I also acknowledge @boonebgorges's concerns from before that this change may subvert some (to be fair, quite obscure) expectations from developers that the queries here would be uncached, not go through any filtering etc.

I personally am not opposed to this being committed to core, but I'm also not confident enough to commit this myself. I agree that for sure the safer change would be to replace term_exists() calls with something like !! get_term() throughout core, and we could potentially even deprecate term_exists() at some point then.

#22 @spacedmonkey
22 months ago

  • Keywords needs-dev-note commit added; close removed

I understand the concerns that @boonebgorges has raised and originally I agreed with them. But after working on #54511 and spending 4+ hours reviewing of the taxonomy code to ensure that caches are invalidates, I am confident that this change will not breaking change. The fact that in earlier version of this patch, only 5 tests fail, prove that to me. three of breakages, were tests where raw sql queries were run in tests and the correct clean_term_cache calls were not added. This is a case should / would not happen in the wild, as performing raw queries without clearing caching after, will result in invalid / corrupted caches anyway.

As for importer, there is a pretty clever workaround using cache_domains in the patch. But again, imports are a special case. The importer suspends cache invalidation, see this results in the test failures. Which is a very special case.

I have added the keywords, needs-dev-note. There is a small chance their might be breakages in custom migration / importer tools or code that does custom sql queries. The fix is simple, either implement clean_term_cache function calls in the correct place or do a full object cache flush or simply. I will provide examples in the dev note.

This sound reasonable @flixos90 ?

#23 @flixos90
22 months ago

@spacedmonkey As far as I understand @boonebgorges's concerns, it's not only about cache invalidation but also about the expectation that the term_exists() function has always been using uncached and unfiltered SQL queries, which is now being changed with this patch, so there is certainly risk of breakage. At the same time, it's IMO an obscure expectation that a certain query should be cached - in a way, that would mean we could never add caching to something after introducing an uncached query. Again, a safer approach would be to phase out term_exists() usage, and maybe even deprecate it. Related question: What does that mean for `post_exists()`? Maybe we should align on one approach for both of these functions before actually committing something.

I agree that writing a dev note with examples will be important if you commit this change.

#24 @spacedmonkey
22 months ago

In [38677] get_term_by is changed to use WP_Term_Query. A very similar change to this one. There are other examples were we made function return caches, where they did not before, another example is [37479] and [40921]. Expecting functions internals not to change, is not reasonable IMO. As long as the return values stay the same.

I believe we should leave this function and post_exists() as is.

The fact it uses the filters, is a benefit and not a downside. For those using something like elastic search to offload database queries, this would be seen as massive upside. CC @adamsilverstein as he as experience with ES.

#25 @peterwilsoncc
22 months ago

Please take this as an ill-considered rubber-ducking question!!

🦆🦆🦆 Could term_exists bypass the cache if cache invalidation is suspended? 🦆🦆🦆

#26 @spacedmonkey
22 months ago

No @peterwilsoncc that is a good question.

It would simple to change the current workaround for bypassing caches, from this.

  if ( defined( 'WP_IMPORTING' ) && WP_IMPORTING ) {
     $defaults['cache_domain'] = microtime();
  }

To

   function term_exists( $term, $taxonomy = '', $parent = null ) {
    global $_wp_suspend_cache_invalidation;
    ...
    if ( !empty( $_wp_suspend_cache_invalidation ) ) {
       $defaults['cache_domain'] = microtime();
    }

This logic is a little cleaner and might catch more of the edge cases we are worried about. I once called wp_suspend_cache_invalidation in a custom migration tool I wrote...

#27 @spacedmonkey
22 months ago

I have actioned feedback from @peterwilsoncc. If now uses $_wp_suspend_cache_invalidation global. This makes the code a lot cleaner and means I could add a unit test.

If now means if you want an uncached term_exists call you can do this.

   wp_suspend_cache_invalidation( true );
   $term = term_exists( 'thing' );
   wp_suspend_cache_invalidation( false );

Which might be useful for devs.

#28 @peterwilsoncc
21 months ago

  • Keywords commit removed

I pushed an intentionally broken test to PR#2318 earlier today to demonstrate a question I had about get_term() -- ie, the singular. I'll remove the commit keyword for now.

Do yell out if I am highlighting something that is intentionally out of scope.

#29 follow-up: @spacedmonkey
21 months ago

Thanks @peterwilsoncc . It is was a good spot. I have refactor the patch to remove usage of get_term and now everything uses get_terms. Unit tests are passing now

#30 in reply to: ↑ 29 @peterwilsoncc
21 months ago

  • Keywords needs-testing added

Replying to spacedmonkey:

Thanks @peterwilsoncc . It is was a good spot. I have refactor the patch to remove usage of get_term and now everything uses get_terms. Unit tests are passing now

Thanks, both for the updated code and taking my push of a broken test to your branch in the spirit intended.

The code looks good to me, I've added the needs-testing keyword on this in the hope some of the testing folk can take a look at your branch before committing.

#31 @spacedmonkey
21 months ago

While working on this ticket, I found a weird behaviour. If you pass an int value as the term, parent value is not respected. I have created issue for this behaviour. #55358

#32 @peterwilsoncc
21 months ago

  • Keywords commit added

Marking this as commit ready for the linked pull request, @spacedmonkey are you able to do the honors?

#33 follow-up: @spacedmonkey
21 months ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 52921:

Taxonomy: Use get_terms instead of a database lookup in term_exists().

Replace raw SQL queries to the terms table, with a call to the get_terms function. Using get_terms means that term_exists is now cached. For developers using term_exists where cache invalidation is disabled, such as importing, a workaround was added to ensure that queries are uncached.

Props Spacedmonkey, boonebgorges, flixos90, peterwilsoncc.
Fixes #36949.

#34 @costdev
21 months ago

Related ticket #53152 is fixed by [52921].

#35 in reply to: ↑ 33 ; follow-up: @Chouby
21 months ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

May I ask to introduce a new filter term_exists_args?

Indeed each time get_terms() replaces a direct sql query, it introduces a breaking change. get_terms() is widely used and can be filtered with terms_clauses. The issue is that this filter will now be applied to term_exists() when it was not applied prior to [52921]. Currently there is no way to determine if the call to get_terms() is done by term_exists() or not, so we cannot disable the filter accordingly. Unless using an ugly hack with debug_backtrace()...

A new filter term_exists_args would allow to work around the breaking chanege in an elegant way.

Related:

  • [40994] introduced the filter wp_get_object_term_args after wp_get_object_terms() switched to use get_terms() since [38667].
  • #42104 Same issue for transform_query()
  • #54534 Same issue for get_terms_by()

#36 in reply to: ↑ 35 @peterwilsoncc
21 months ago

  • Keywords has-patch has-unit-tests needs-testing commit removed

Replying to Chouby:

May I ask to introduce a new filter term_exists_args?

I think that makes sense, possibly with the wp_ prefix to be polite to plugins. @spacedmonkey, would you be able to look at this?


I've done a little keyword tidy up to reflect the need for a filter patch.

#37 @spacedmonkey
21 months ago

@Chouby @peterwilsoncc

I am happy to add a new filter. But I think the change should be tracked by a new ticket. Any chance we could create a new ticket for this and I will start work on a patch?

#38 @peterwilsoncc
21 months ago

@spacedmonkey I am happy if you'd prefer that.

@Chouby are you able to create the follow up ticket to ensure you get props when it gets committed?

#39 follow-up: @Chouby
21 months ago

@peterwilsoncc @spacedmonkey I opened #55439

#40 in reply to: ↑ 39 @peterwilsoncc
21 months ago

  • Resolution set to fixed
  • Status changed from reopened to closed

Replying to Chouby:

@peterwilsoncc @spacedmonkey I opened #55439

Thank you, I really appreciate it.

#41 @spacedmonkey
21 months ago

#55439 was committed in [52987].

#42 @spacedmonkey
20 months ago

Dev note ready for review.

Note: See TracTickets for help on using tickets.