Make WordPress Core

Opened 9 years ago

Closed 9 years ago

#34533 closed defect (bug) (fixed)

Term splitting in `get_term()` can have unexpected effects

Reported by: dlh's profile dlh Owned by: boonebgorges's profile boonebgorges
Milestone: 4.4 Priority: high
Severity: normal Version: 4.4
Component: Taxonomy Keywords: has-patch
Focuses: Cc:

Description

As of [34997], get_term() attempts to split a shared term if it finds a term in a different taxonomy than the one requested. This new feature can cause a slight change to previous behavior and have unexpected effects in some edge cases.

  1. Previously, wp_update_term() would split the term being updated if that term was shared. wp_update_term() calls get_term(), but get_term() splits the term found, not the term requested. If the splitting in get_term() leaves no more shared terms for the updated term ID, then the updated term is no longer split.
  1. If you collect several shared term IDs in advance and try to update them, then after calling wp_update_term() on the first term, the remaining terms might have been split and no longer exist. The first test below tries to show this case.
  1. Calling get_term() with an invalid term ID and taxonomy pairing will cause the term ID to be split after get_term() detects the mismatch. The second test below attempts to show this case. In itself, the behavior might not be "bad," just surprising.

Changing get_term() so that it splits the requested term, rather than the found term, could be a workaround. But in that scenario, the requested term couldn't be split unless its term taxonomy ID was available.

function test_wp_update_term_splits_other_term() {
        global $wpdb;

        register_taxonomy( 'test_tax', 'post' );
        register_taxonomy( 'test_tax_2', 'post' );

        $t1 = wp_insert_term( 'Bar', 'test_tax' );
        $t2 = wp_insert_term( 'Bar', 'test_tax_2' );

        $wpdb->update( $wpdb->term_taxonomy,
                array( 'term_id' => $t1['term_id'] ),
                array( 'term_taxonomy_id' => $t2['term_taxonomy_id'] ),
                array( '%d' ),
                array( '%d' )
        );

        $t2 = wp_update_term( $t1['term_id'], 'test_tax_2', array( 'name' => 'New Bar' ) );
        $this->assertNotInstanceOf( 'WP_Error', $t2, 'Error from first wp_update_term()' );

        $t1 = wp_update_term( $t1['term_id'], 'test_tax', array( 'name' => 'New Bar' ) );
        $this->assertNotInstanceOf( 'WP_Error', $t1, 'Error from second wp_update_term()' );
}

function test_get_invalid_term_splits_shared_term() {
        global $wpdb;

        register_taxonomy( 'test_tax', 'post' );
        register_taxonomy( 'test_tax_2', 'post' );

        $t1 = wp_insert_term( 'Bar', 'test_tax' );
        $t2 = wp_insert_term( 'Bar', 'test_tax_2' );

        $wpdb->update( $wpdb->term_taxonomy,
                array( 'term_id' => $t1['term_id'] ),
                array( 'term_taxonomy_id' => $t2['term_taxonomy_id'] ),
                array( '%d' ),
                array( '%d' )
        );

        $this->assertInstanceOf( 'WP_Term', get_term( $t1['term_id'], 'test_tax' ), 'Error before get_term()' );
        get_term( $t1['term_id'], 'category' );
        $this->assertInstanceOf( 'WP_Term', get_term( $t1['term_id'], 'test_tax' ), 'Error after get_term()' );
}

Attachments (2)

34533.diff (6.3 KB) - added by boonebgorges 9 years ago.
34533.2.diff (7.0 KB) - added by boonebgorges 9 years ago.

Download all attachments as: .zip

Change History (9)

#1 @boonebgorges
9 years ago

  • Keywords has-patch added
  • Milestone changed from Awaiting Review to 4.4
  • Priority changed from normal to high

@dlh and I had a number of helpful talks about this issue at WordCamp NYC last weekend.

In a nutshell: I'd originally introduced the term-splitting fallback in [34997] as a sort of trick. I figured, if a term is still shared at this point, we may as well clean up the mess and split it. But, as noted in the ticket description, this ends up resulting in some funny behavior. Even if we were to make the suggested switch, so that we split the term from the requested taxonomy, we still have problems if someone has a $term_id, uses it to get_term(), and then attempts to do something else with $term_id, since it may have changed in the process of fetching the term.

Given these problems, I'd like to remove the term-splitting fallback. Here's the logic that I think should be in its place. Assume that $term_id is shared between tax1 and tax2.

  • get_term( $term_id, 'tax1' ) should return the matching term
  • get_term( $term_id, 'tax3' ) (invalid tax) should return null, for legacy reasons
  • get_term( $term_id ) should return a WP_Error explaining that $term_id is ambiguous.

See 34533.diff.

Two notes about the suggested implementation:

  1. Terms are now cached according to $term_id only. As such, when a term is shared, we must refrain from caching it. That is, even if get_term( $term_id, 'tax1' ) returns the expected term, it should never load it into the cache.
  2. get_term() can now return a WP_Error object, which it never did in the past. However, it only does it when $taxonomy is omitted. Prior to 4.4, $taxonomy was a required parameter. So there should be no backward compatibility concerns with the new return type.

It'd be great to get another set of eyes on this - especially those of @dlh :-D

@boonebgorges
9 years ago

@boonebgorges
9 years ago

#2 @boonebgorges
9 years ago

I realized after writing the first patch that there could be further cache confusion when there are shared terms. Calling get_term( $term_id, 'foo' ) will populate the cache with the foo term; if it's shared with taxonomy bar, then get_term( $term_id, 'bar' ) will incorrectly fetch the foo term from the cache. 34533.2.diff fixes this.

There is still the possibility for cache entanglement. Specifically, if you don't specify a $taxonomy in get_term(), then you're going to get whatever's in the cache, and if your $term_id is shared between taxonomies, what's in the cache may not be what you wanted. There's not really anything we can do about this, and I think it's reasonable to say that if you want support for the new get_term( $term_id ) syntax, that you need to have run the term-splitting routine in order to guarantee coherent caching. (The only other alternative, without ripping the whole thing out, is to skip single-term caching when 'finished_splitting_shared_terms' is not true. Pretty ugly.)

Cache confusion for single-term caches would be much harder to deal with if we were using those caches in places like get_terms(). Luckily, we are not doing this at the moment, but it's going to be an unpleasant hurdle when we finally do.

#3 @dlh
9 years ago

Sure, I'll test the patch @boonebgorges.

#4 @wonderboymusic
9 years ago

  • Owner set to boonebgorges
  • Status changed from new to assigned

#5 @dlh
9 years ago

On term-splitting, 34533.2 makes sense to me. I'm not as familiar with the current approaches to caching terms in core, so others are probably better-suited to comment about those changes.

One other potential issue is that get_term( $term_id ) could get a term from an invalid taxonomy. Here's a test that fails with the patch:

public function test_get_term_with_only_id_is_error_when_invalid_taxonomy() {
        global $wp_taxonomies;
        $t = self::factory()->term->create( array( 'taxonomy' => 'wptests_tax' ) );

        $this->assertInstanceOf( 'WP_Term', get_term( $t ) );

        // "Unregister" the taxonomy.
        unset( $wp_taxonomies['wptests_tax'] );

        // Test when the term would come from the cache or the DB.
        $this->assertNotInstanceOf( 'WP_Term', get_term( $t ) );
        clean_term_cache( $t, 'terms' );
        $this->assertNotInstanceOf( 'WP_Term', get_term( $t ) );
}

#6 @boonebgorges
9 years ago

Thanks for the feedback, @dlh.

One other potential issue is that get_term( $term_id ) could get a term from an invalid taxonomy

Hrm, yeah. If you pass an invalid $taxonomy to get_term(), it will of course return a WP_Error. What we can definitely do is add a bit of extra logic before bailing from WP_Term::get_instance() with 'ambiguous_term_id', so that a term shared between a valid and an invalid taxonomy will return the term from the valid taxonomy, rather than an error.

However, in the simpler case where you have an *unshared* term in an invalid taxonomy, I'm not sure what the proper behavior should be. It would be easy enough to bail with an invalid_taxonomy error if the located term is from an invalid taxonomy. But then it'll be effectively impossible to pull up that data through the API. That said, I guess it is already impossible to do so? Like, you could have terms in the database that are inaccessible because the taxonomy has been unregistered? So maybe this is the thing to do here too.

#7 @boonebgorges
9 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 35537:

Make get_term() behave more consistently in the context of shared terms.

When WP_Term was introduced in [34997], the $taxonomy parameter for
get_term() was made optional. This meant that, when the optional param was
omitted, get_term() had no way of determining which term was intended when
the term_id was shared between multiple taxonomies. As a (somewhat sneaky) way
of fixing things, get_term() split any shared terms it found. But this could
cause problems with developer expectations: it's not clear why requesting a
term should result in a database update, much less a potential change in the
ID of a term.

In place of this technique, this changeset introduces a number of changes that
make the handling of shared terms a bit less insane:

  • When a taxonomy is provided to get_term(), and a cached term is found matching the term_id, make sure the taxonomy also matches before returning it.
  • When a taxonomy is not provided, ensure that the term is not shared before adding it to the cache.
  • When a term is shared between taxonomies and no taxonomy is provided, return a WP_Error rather than splitting the term.
  • When a term is shared between taxonomies, only one of which is valid, return the term from that taxonomy.

Props boonebgorges, dlh.
Fixes #34533.

Note: See TracTickets for help on using tickets.