WordPress.org

Make WordPress Core

Opened 4 years ago

Last modified 3 months ago

#36949 new enhancement

Term exists should use get_terms internally

Reported by: spacedmonkey Owned by:
Milestone: Future Release Priority: normal
Severity: normal Version: 2.3
Component: Taxonomy Keywords: has-patch has-unit-tests close
Focuses: performance Cc:

Description

More than WP_Term_Query has been merged, it makes sense to converge to use it in all internal functions to query the terms table. Using WP_Term_Query / get_terms will also add caching to a functions that do not currently have it.

Attachments (5)

36949.patch (3.7 KB) - added by spacedmonkey 4 years ago.
36949.2.patch (5.2 KB) - added by spacedmonkey 3 years ago.
36949.3.patch (5.2 KB) - added by boonebgorges 3 years ago.
36949.diff (6.0 KB) - added by spacedmonkey 18 months ago.
36949.2.diff (5.0 KB) - added by spacedmonkey 18 months ago.

Download all attachments as: .zip

Change History (18)

@spacedmonkey
4 years ago

#1 @spacedmonkey
4 years ago

  • Keywords has-patch needs-unit-tests added

#2 @boonebgorges
4 years ago

  • Keywords needs-patch added; has-patch removed
  • Milestone changed from Awaiting Review to Future Release

Thanks. This is a good idea in theory, though I'm not sure how it will pan out in practice. First, there are errors in the patch that are causing invalid SQL in some cases. Second, relying on the cache in term_exists() is likely to surface invalidation bugs in places where WordPress uses term_exists() internally to avoid duplicates. There needs to be a review of these places. A good place to start is the dozens of failing tests when running phpunit with the patch applied :)

#3 @spacedmonkey
3 years ago

  • Keywords has-patch added; needs-unit-tests needs-patch removed

Updated the patch so it now passes all but three of the unit tests. All the taxonomy tests pass. The existing tests for term_exists are pretty solid, so not sure how many more tests are needed.

#4 @boonebgorges
3 years ago

36949.3.patch includes some formatting cleanup.

I looked into one of the failures - Tests_Import_Import::test_small_import(). What's happening here is that the importer suspends cache validation, such that parent-term lookups fail during bulk term import. So here's a real-life instance of the kind of case I hypothesized above. I assume the other failures are similar. We have a couple options available.

  1. Don't suspend cache invalidation for bulk operations.
  2. Don't use term_exists() during bulk operations. (We'd use some version that skips the cache.)
  3. Don't change the behavior of term_exists()

It's fairly easy to make the internal changes necessary for 1 and 2, but both are going to cause serious problems for third-party libraries that suspend cache invalidation for imports and other bulk operations.

I think term_exists() is not a great function anyway, and I wouldn't mind if we moved away from its use in core. If we phased out our own internal uses, we could replace them with functions that leverage the cache, and also demonstrate the proper way to handle bulk operations (perhaps those would continue to use term_exists()?).

#5 @boonebgorges
3 years ago

Thinking about this a little more. The problem with imports etc is that cache invalidation is being skipped, but the cache is still being accessed for lookups. What we ought to have is something like wp_suspend_cache_access() (bad name, but you get the idea), which would prevent the cache from being accessed for lookups in places like term_exists() and get_term_by().

@spacedmonkey
18 months ago

#6 @spacedmonkey
18 months ago

  • Keywords has-unit-tests added

I have updated the patch with 36949.diff.

Now all tests pass with a very simple work around for cached queries on import. Simple disable the caching on import. This means all tests pass. The test coverage is pretty good for term_exists, so I believe we may not need to add anymore tests.

@boonebgorges can you take look at the patch?

Last edited 18 months ago by spacedmonkey (previous) (diff)

#7 @boonebgorges
18 months ago

  • Keywords close added

Thanks for the patch, @spacedmonkey. The WP_IMPORTING check is clever. But it's also ad hoc, which makes me nervous. If wordpress-importer expects term_exists() to query the database directly, then it's likely that other plugins in the wild are doing the same thing.

I mentioned in https://core.trac.wordpress.org/ticket/36949#comment:4 that we might consider simply leaving term_exists() as is, but phasing out its use in core. To me, this feels like a much safer option. The fact that term_exists() accepts IDs or slugs or names means that it's already subject to all sorts of bugs related to similarly named terms (I'm too lazy to search for tickets, but I'm sure you could find lots). This ambiguity is almost always bad, and is the reason why we have get_term_by().

As such, I'd much rather spend effort phasing out the internal use of term_exists(), rather than making the proposed internal change (which has somewhat marginal benefit) and then dealing with the inevitable compatibility cleanups. Does this seems reasonable?

#8 @spacedmonkey
18 months ago

Looking there function term_exists seems to be used by many many plugins, there is 2452 references in the plugin repo at time of writing. I feel like it is unlikely that we could ever get rid of it's use in the wild, as even jetpack and contact form 7 use it.

I think that we should go ahead with using get_terms, but disable caching by default with a new param like cached on the term query class. I would also not suppress filters, meaning that, if you want go in and turn on caching, you can do so at your own risk.

#9 @boonebgorges
18 months ago

Agreed that we're never going to be able to get rid of term_exists() altogether.

The original goal in this ticket - if I understand it correctly - is to improve performance in term_exists(). If we force caching to be disabled, we're not accomplishing this goal, and we may actually be making it worse, since WP_Term_Query does some other cache warming. So the motivation is not clear to me. There is, of course, some value - in terms of aesthetics and maintenance - in consolidating on the use of WP_Term_Query. But the possibility of bugs makes it seem not worth the effort to me.

As you can tell, I personally feel very squeamish about touching this function, especially in the absence of really substantial benefits.

#10 @spacedmonkey
18 months ago

There a number of reasons I want this to go into core, here is my reasoning.

  1. Constancy

term_exists one of the few places in core to not use WP_Term_Query. Once have this is merged, we have one piece of code that generates all the queries to the terms tables. That makes future changes to terms table much easier, as we need only edit in one place. It would make a ticket like #30262 easier.

  1. Maintenance

Similar to above, but this should reduce maintenance overhead, as there is less code in play. Any bugs we find, can be fixeding in WP_Term_Query and other part of code will see the benefit.

  1. Filters

Using WP_Term_Query add a host of useful filters, that would allow for better intergrations with plugins. Take #41246 that would make it possible to load terms from external source like elastic search.

  1. Caching

My idea above, I said that by default, caching would be disable, this doesn't mean if you know what you doing, you can't turn it back on. Take the for example the following snippet.

add_filter( 'get_terms_args', function( $args ) {
      // Leave disabled if importing 
      if( defined('WP_IMPORTING') ) {
            return $args;
      }
      
      // Leave disabled if using WP CLI 
      if( defined('WP_CLI') && WP_CLI) {
            return $args;
      }
      
      $args['cache'] = true;

      return $args;
});

This would allow developers to turn off / on caching for term_exists when they want and put the control in their hands.

#11 @spacedmonkey
18 months ago

In 36949.2.diff I added a similar patch with a new cached_results param added.

#12 @spacedmonkey
3 months ago

The latest patch on this seems solid. Any chance of a look at this @boonebgorges

#13 @boonebgorges
3 months ago

@spacedmonkey Thanks for continuing to plug away at the ticket. I have reviewed the patch and the cached_results trick seems like it will probably work, though I must say I'm not thrilled about the ideal of giving developers the ability to bypass the cache on a case basis.

More generally, I worry that this change is going to introduce strange bugs that are difficult to trace for third-party tools that are using term_exists() in a way that expects it to be uncached and unfiltered. Think, for example, of plugins that filter various parts of WP_Term_Query - say, to exclude certain terms from being viewed publicly. Currently, these filters do not apply to term_exists(), but if they suddenly are, I can imagine it causing problems in various ways.

For this reason, I continue to think that this change is unwise. As above https://core.trac.wordpress.org/ticket/36949#comment:9, a better way to consolidate and improve maintainability is to reduce the use of term_exists() throughout WP.

If another team member thinks that @spacedmonkey is right and I'm wrong, I'd be OK with being outvoted. But then I want those other team members to take responsibility for shepherding the change and cleaning up the (IMO inevitable) compatibility breaks that are reported afterward.

Note: See TracTickets for help on using tickets.