Make WordPress Core

Opened 10 years ago

Closed 10 years ago

Last modified 10 years ago

#30335 closed defect (bug) (wontfix)

Splitting shared terms on term update breaks backward compatibility when using an old term_id

Reported by: boonebgorges's profile boonebgorges Owned by:
Milestone: Priority: high
Severity: major Version: 4.1
Component: Taxonomy Keywords: has-patch
Focuses: Cc:

Description

See #5809.

When a shared term is split into two separate terms, one of the terms will have a new term_id. This will cause problems for anyone who has previously stored a term_id, and is using it to look up that term post-split.

An example. Say you have a term with term_id=5 and slug=foo. In wp_term_taxonomy, you have two rows that look like this:

term_id   term_taxonomy_id    taxonomy
---------------------------------------
  5               8            post_tag
  5               9            category

Now, say you run the category 'foo' through wp_update_term(). It will be recognized as a shared term, and will be split into a new row in wp_terms (say, term_id=10).

If you are running a plugin or other customization that has previously stored the term_id 5 to refer to the *category* 'foo', future lookups will not work. Examples:

  1. get_term( 5, 'category' ) will return null.
  2. Building a Walker_CategoryDropdown and passing selected=5 will break the pre-selection.
  3. People who are making direct SELECT (or, god forbid, UPDATE) queries using the term_id will find their queries failing.
  4. And maybe other stuff.

Note that, as far as I can find, none of these things are happening in core.

I'll follow up with some next steps.

Attachments (9)

30335.patch (6.2 KB) - added by boonebgorges 10 years ago.
As pulled out in [30336]
30335.2.patch (10.2 KB) - added by boonebgorges 10 years ago.
30335.3.patch (9.4 KB) - added by boonebgorges 10 years ago.
Backward compatibility for fetching terms by the old term_id
30335.4.patch (9.0 KB) - added by mboynes 10 years ago.
Update for mapping old term ids to new term ids after splitting
30335.5.patch (10.4 KB) - added by mboynes 10 years ago.
Add split term support to wp_dropdown_categories
30335.6.patch (13.4 KB) - added by mboynes 10 years ago.
Update default categories if they get split; add taxonomy to split_shared_term action
30335.7.patch (13.2 KB) - added by mboynes 10 years ago.
Autoload split_terms option, make wp_get_split_term public
30335.8.patch (15.6 KB) - added by mboynes 10 years ago.
Update menu items when terms split
30335.9.patch (7.3 KB) - added by boonebgorges 10 years ago.

Download all attachments as: .zip

Change History (51)

#1 @boonebgorges
10 years ago

On johnbillion's request, we're going to pull term splitting from wp_update_term() until we get this sorted out. https://wordpress.slack.com/archives/core/p1415905418001152

This ticket was mentioned in Slack in #core by boone. View the logs.


10 years ago

#3 @boonebgorges
10 years ago

Regarding potential solutions. Thinking about the possible reasons why someone would store a term_id like this, the most obvious is that they want a simple way to pull up the term at a later date, likely using get_term() (see #1 above). In this case, we can take advantage of the fact that (a) all of our core functions that pull term from the DB either run through term_taxonomy_id, in which case this issue will be irrelevant since splitting doesn't change this value, or they go through get_term(), and (b) when passing a term_id to get_term(), you must also provide a taxonomy name. So here's the idea:

  • When splitting terms, save information about the split somewhere (say, in an option '_split_terms', which is an array of [old_term_id,new_term_id,taxonomy] triplets)
  • In get_term() (and WP_Tax_Query::transform_query(), and maybe when using the 'include'/'exclude' params of get_terms()), do some logic to detect queries for the old term id (maybe only as a fallback if the proper query fails), and do a lookup against the cached split terms data.

I think this would cover a large percentage of the potential fallout from split terms. I'll follow up with a patch.

#4 @johnbillion
10 years ago

  • Severity changed from major to blocker

#5 @boonebgorges
10 years ago

In 30336:

Don't split shared terms on term update.

Splitting shared terms means assigning a new term_id to a given term_taxonomy_id.
It was uncovered that this change could cause problems for sites that have
cached the original term_id somehow - say, in postmeta - since future lookups
using that term_id will now fail.

Removing for 4.1-beta1. We'll look at improvements to backward compatibility
to try to get this back into a later beta.

Props mboynes.
See #30335.

@boonebgorges
10 years ago

As pulled out in [30336]

This ticket was mentioned in Slack in #core by johnbillion. View the logs.


10 years ago

#8 @johnbillion
10 years ago

  • Keywords needs-patch added
  • Severity changed from blocker to major

Following on from some discussion in #core, it's been suggested to un-revert this before beta 1 ships, in order to gauge its impact. If a term is already shared, the benefit of splitting it most likely outweighs the cost.

Additionally, we should fire an action after the term is split, to allow plugins to perform damage mitigation with regard to stored term IDs.

This ticket was mentioned in Slack in #core by boone. View the logs.


10 years ago

#10 @boonebgorges
10 years ago

30335.2.patch reinstates the term splitting on wp_update_term(), with the following improvements:

  • As requested, a 'split_shared_term' action is fired at the end of _split_shared_term().
  • Added previously missing support for the reassigning of child terms, plus unit tests.

This ticket was mentioned in Slack in #core by boone. View the logs.


10 years ago

#12 @boonebgorges
10 years ago

In 30344:

Reinstate term splitting on term update.

Originally introduced in [30241] and reverted in [30336], term splitting is
back and better than ever. Now with *more unit tests* and *improved treatment
of child terms*!

See #30335.

#13 @boonebgorges
10 years ago

In 30347:

Flush cache for newly created term in _split_shared_term().

The term itself does not have any cached values yet, but in some cases the new
term's taxonomy may need its cached hierarchy to be refreshed as a result of
the term splitting.

Props jorbin.
See #30335.

This ticket was mentioned in Slack in #core by mboynes. View the logs.


10 years ago

@boonebgorges
10 years ago

Backward compatibility for fetching terms by the old term_id

#15 follow-up: @boonebgorges
10 years ago

  • Keywords has-patch 2nd-opinion added; needs-patch removed

30335.3.patch is a first pass at providing backward compatibility for the use of the pre-split term_id in a number of cases. The cases covered are get_term(), WP_Tax_Query (when 'field=term_id'), and get_terms() (when using a param that accepts term IDs, like 'include' or 'parent'). In brief: when a term is split in _split_shared_term(), the affected term_taxonomy_id is stored (in the '_split_terms' option, keyed by the old term_id). In get_term(), if an integer term_id is provided and no term is found in the cache or the DB by that term, the split term store is checked as a fallback. In get_terms() and WP_Tax_Query::transform_query(), a fallback query won't work, so we parse the passed term_ids *before* running the query, and swap out old term IDs with new ones, if found.

The implementation I chose for storing and fetching split term data is designed for performance on large sites, where split terms are most likely to occur. The only thing stored in the '_split_terms' option is term_taxonomy_ids, both to keep the option as small as possible (so it won't overload a cache block). _get_split_term() bails quickly if the passed term_id does not have a match in this array. If it *does* find a match, get_term_by() is run, which as of 4.1 is properly cached. So, on a site with a persistent object cache, no additional database queries should be triggered.

I guess I have a couple concerns/points for discussion:

  • This system has a slight risk of false positives. If term 5 is shared between terms in 'post_tag' and 'category', and then the category is split to term_id=13, a later lookup get_term( 5, 'category' ) will return term 13. It's possible that someone could choose to pass 5 to get_term() in this way without intending to reference the old term_id, and getting back term 13 would be a pretty odd surprise. The chances of this are very small, but worth thinking about (would have to happen with a shared term that's been split - rare to begin with - and where someone just happens to query for a term with that ID, in which case they should be expecting bad (empty) results anyway).
  • I've chosen to provide silent fallback support. But maybe we should throw a _doing_it_wrong()?
  • How many cases does this actually cover? From some discussions with mboynes and a few others who have built custom systems that cache 'term_id', I think it will cover quite a large percentage. If you have written something that has cached term IDs in this way, your eyes on this patch would be most appreciated.

This ticket was mentioned in Slack in #core by boone. View the logs.


10 years ago

#17 @mboynes
10 years ago

I think this is a great start and should catch the vast majority of situations where this might be a problem, thanks @boonebgorges! Here are my thoughts after playing around with this:

  • I would propose tweaking this to store the old term ids in multiple options, one per taxonomy. Within each option, store an array of old_tem_id => new_term_id. This makes the lookups even faster and prevents having to loop over each tt_id (let's say you had a very generic term which exists in 10 taxonomies, you might then have to lookup 9 terms before you get the right one). I'm going to add a patch which does this. (Update: added in 30335.4.patch)
  • This is currently going to autoload the option. Do we want it to? I would say no, and I'm going to include a patch which modifies that behavior. (Update: added in 30335.4.patch)
  • Should _get_split_term() be @private? Not that it's actually private, but why aren't plugins and themes welcome to use it?

I gave some more thought on other things we have to consider:

  1. Nav menus. Term IDs for menu items linking to terms get stored in post meta. This update will fix the display of terms (since it uses get_terms() on display), but when a term is deleted, it won't remove menu items referencing the old term ID (more on this below). Should we try to update menu items when terms get split?
  2. The above deletion happens through the 'delete_term' hook, so I dug a little deeper and it looks like '_wp_delete_tax_menu_item' is the only core function hooked into that action. When a term is deleted, should we check if there's an old term ID linked to it and fire 'delete_term' for that term ID too? This would involve faking the 4th param ($deleted_term), since the old term ID won't have a term object. Then, of course, we should remove the old_tem_id => new_term_id reference in the options table.
  3. default_category, default_post_format, default_link_category: I think these should just get updated when a term splits if the old term id is a default.
  4. This was mentioned in the ticket, but should we add a check to wp_dropdown_categories() to see if the 'selected' arg is an old term? (Update: added in 30335.5.patch)
Last edited 10 years ago by mboynes (previous) (diff)

@mboynes
10 years ago

Update for mapping old term ids to new term ids after splitting

@mboynes
10 years ago

Add split term support to wp_dropdown_categories

#18 @boonebgorges
10 years ago

  • Keywords needs-patch added; has-patch 2nd-opinion removed

mboynes - Wow, thanks for your work on this.

I would propose tweaking this to store the old term ids in multiple options, one per taxonomy. Within each option, store an array of old_tem_id => new_term_id

This is a much better idea than what I'd originally done. Big +1.

This is currently going to autoload the option. Do we want it to? I would say no

I know I originally suggested this, but when I wrote my first patch I waffled. autoload is basically irrelevant for sites with a persistent cache, so let's weigh it for sites without. The case for autoload=no is that this is a value that won't be used often on most sites, and it adds some overhead to load it into memory on every pageload, especially if there are lots of split terms. The case for autoload=yes is that the memory overhead is small because split terms are likely to be pretty rare, even on very large sites; and those large sites are the ones most likely to have persistent caching; and on sites where there is no persistent caching, we don't want to cause an additional DB query potentially on every pageload (as will be required by, say, a nav item linking to a split term). On balance, the case for autoload=no seems less persuasive to me, but I'm happy to be swayed.

Should _get_split_term() be @private? Not that it's actually private, but why aren't plugins and themes welcome to use it?

My thinking was that _get_split_term() - or at least the way we use it internally - is as a sort of magical maneuver that we don't really want plugin authors to attempt to duplicate. But now that I'm writing this, I'm reconsidering - there are legitimate instances where plugins would want to do what we're doing here to provide the same kind of backward compatibility. Let's switch it.

Should we try to update menu items when terms get split?

Good call. I'd missed this. Yes, we should.

When a term is deleted, should we check if there's an old term ID linked to it and fire 'delete_term' for that term ID too?

I would say no. In _split_shared_term(), I explicitly decided *not* to fire the term creation actions, because while a row is technically being inserted into the terms table, it's not the case that a "term" is being "created" in any real sense. I'd say that this case is similar. I'd even shy away from deleting the entry from the _split_terms_$taxonomy option, because then there'll be no record of what happened, and in any case it's harmless to have it there, since future term lookups will return false anyway. I'd suggest that we ought to handle cases of term deletion on a case-by-case basis, and encourage plugin authors to do the same, though tbh I think it's going to be extremely rare that it's necessary to do anything on term deletion, since fetching a deleted term is going to return empty whether it was split or not.

default_category, default_post_format, default_link_category: I think these should just get updated when a term splits if the old term id is a default.

Yes. I made a note to do this when I wrote the first patch, but I neglected to check the note :) I'd say this is a case where we want to hook to 'split_shared_term'.

This was mentioned in the ticket, but should we add a check to wp_dropdown_categories() to see if the 'selected' arg is an old term?

Sure, this is a pretty easy win.

I will be able to work on an updated patch within the next few days, but if you beat me to it, then all the better :) Once you and I are pretty satisfied with it, I'd like to run what we've got by nacin and johnbillion to see what they think of the approach. Thanks for your help!!

This ticket was mentioned in Slack in #core by mboynes. View the logs.


10 years ago

@mboynes
10 years ago

Update default categories if they get split; add taxonomy to split_shared_term action

@mboynes
10 years ago

Autoload split_terms option, make wp_get_split_term public

#20 @mboynes
10 years ago

On balance, the case for autoload=no seems less persuasive to me

Hard to argue with your breakdown! I'm in full agreement; re-enabled in 30335.7.patch.

there are legitimate instances where plugins would want to do what we're doing here to provide the same kind of backward compatibility

Agreed; Updated this to wp_get_split_term() and removed the @private tag in 30335.7.patch.

I would say no [about firing delete_term for the old term id when a split term is deleted].

I'm leaning in the same direction, so let's not do anything for now, and see if anyone comes along with a different opinion.

I'd even shy away from deleting the entry from the _split_terms_$taxonomy option, because then there'll be no record of what happened

Great point, I left that alone too.

Going back to menus, I looked into updating these but I'm a little hesitant. In order to identify which menu items need to be updated, we'll have to do a fairly ugly meta query. Using raw SQL to improve efficiency, we're still stuck with something like:

SELECT m1.post_id
FROM wp_postmeta AS m1
	INNER JOIN wp_postmeta AS m2 ON m2.post_id=m1.post_id
	INNER JOIN wp_postmeta AS m3 ON m3.post_id=m1.post_id
WHERE ( m1.meta_key = '_menu_item_type' AND m1.meta_value = 'taxonomy' )
	AND ( m2.meta_key = '_menu_item_object' AND m2.meta_value = 'category' )
	AND ( m3.meta_key = '_menu_item_object_id' AND m3.meta_value = 123 )

Once we've identified the post ID, we can use update_post_meta() to update each found value. I'll play around with this in a massive dataset and submit a patch if it seems like we won't have performance issues.

@mboynes
10 years ago

Update menu items when terms split

#21 @mboynes
10 years ago

30335.8.patch handles the menus. I tested the SQL query against a very large database and the slowest query was still <50ms.

@boonebgorges I think I'm satisfied at this point, so have a look and if you feel the same way, we can reach for outside opinions from johnbillion and nacin. Thanks!

#22 follow-up: @boonebgorges
10 years ago

  • Keywords has-patch added; needs-patch removed

mboynes - This is fantastic. Thanks for your continued work on it.

For anyone arriving late to the conversation, here's a summary of what 30335.8.patch does. When splitting a term, we save (in an option '_split_terms_$taxonomy') an array of split term IDs (old_term_id => new_term_id). Then, in the following places, we detect the presence of pre-split term_ids and do a silent old/new term_id swapout:

  • Calling get_term() with an old term_id
  • Calling get_terms() when using one or more of the parameters that accept term_ids (include, exclude, parent, etc)
  • Using WP_Tax_Query with field=id
  • Calling wp_dropdown_categories() with 'selected' (which expects a term_id)

Furthermore, at the moment of term splitting (the 'split_shared_term' action), we swap out old term ids for new in the following places where WP stores term_ids in the DB:

  • 'default_category', 'default_link_category', 'default_email_category' (wp_options)
  • Nav menu items that go to taxonomy archives (wp_postmeta)

We feel confident that this will provide backward compatibility for a large percentage of cases where old term_ids have been cached, and are looking for feedback.

This ticket was mentioned in Slack in #core by boone. View the logs.


10 years ago

This ticket was mentioned in Slack in #core by boone. View the logs.


10 years ago

This ticket was mentioned in Slack in #core by johnbillion. View the logs.


10 years ago

This ticket was mentioned in Slack in #core by mboynes. View the logs.


10 years ago

#27 @boonebgorges
10 years ago

In 30492:

Store data about old and new term IDs when shared terms are split.

Props mboynes.
See #30335.

#28 @boonebgorges
10 years ago

In 30493:

Pass the taxonomy of the split tt_id to the 'split_shared_term' filter.

Props mboynes.
See #30335.

#29 @boonebgorges
10 years ago

In 30494:

Improve cleanup of cached term_ids after shared terms are split.

  • If the split term ID is stored as 'default_category', 'default_link_category', or 'default_email_category', update it to the new ID.
  • If the split term ID is associated with a nav menu item, update that piece of postmeta to the new ID.

Props mboynes.
See #30335.

#30 @boonebgorges
10 years ago

30335.9.patch is the backward compatibility proposal from mboynes and myself. The non-controversial bits (cleaning up core instances of term_id cached in the database) have been committed; see [30494].

This ticket was mentioned in Slack in #core by boone. View the logs.


10 years ago

#32 @johnbillion
10 years ago

In 30498:

split_shared_term is an action, not a filter.

See #30335

#33 @johnbillion
10 years ago

In 30499:

Remove whitespace accidentally introduced in r30498

See #30335

This ticket was mentioned in Slack in #core by drew. View the logs.


10 years ago

#35 in reply to: ↑ 15 @husobj
10 years ago

Replying to boonebgorges:

  • I've chosen to provide silent fallback support. But maybe we should throw a _doing_it_wrong()?

Was there any additional thought on wether _doing_it_wrong() should be included?

It may be useful to be able to track down in error logs wether there was an old term_id saved as post meta or an option somewhere?

This ticket was mentioned in Slack in #core by boone. View the logs.


10 years ago

This ticket was mentioned in Slack in #core by mboynes. View the logs.


10 years ago

#38 in reply to: ↑ 22 ; follow-up: @Chouby
10 years ago

Replying to boonebgorges:

We feel confident that this will provide backward compatibility for a large percentage of cases where old term_ids have been cached, and are looking for feedback.

Some positive feedback here. I write such plugin which caches term_ids (and even associations between these term_ids). I just tested with the beta 2 and the action 'split_shared_term' is just perfect to update the cache. One remark though: since I store associations of term_ids, I need to split the associated terms. The best function to do this is to use _split_shared_term. Is there any reason why you want to keep it private?

Version 0, edited 10 years ago by Chouby (next)

#39 in reply to: ↑ 38 @johnbillion
10 years ago

Replying to Chouby:

One remark though: since I store associations of term_ids, I need to split the associated terms. The best function to do this is to use _split_shared_term(). Is there any reason why you want to keep it private?

We are looking at reverting this functionality (once again) for 4.1. If you could join us in the #core channel in WordPress Slack in half an hour from now that would be great.

This ticket was mentioned in Slack in #core by johnbillion. View the logs.


10 years ago

#41 @boonebgorges
10 years ago

  • Milestone 4.1 deleted
  • Resolution set to wontfix
  • Status changed from new to closed

After some discussion, we've decided the following:

  1. Term splitting on wp_update_term() is being removed for 4.1, and is now slated for 4.2. See #5809.
  2. In 4.2, we will not be providing any sort of backward compatibility of the type proposed in this ticket. Our analysis showed that it would cover too few of the actual cases of breakage.

For some background on these decisions, see:

#42 @boonebgorges
10 years ago

In 30585:

Revert shared taxonomy term splitting for 4.1.

This is a revert of [30494], [30492], [30347], and [30334]. The latter
changeset was a revert of [30336], which was a revert of [30241].

Watch for Term Splitting, Version III in version 4.2, coming soon to a
WordPress trunk near you.

See #30335, #5809.

This ticket was mentioned in Slack in #core by boone. View the logs.


10 years ago

Note: See TracTickets for help on using tickets.