#30335 closed defect (bug) (wontfix)
Splitting shared terms on term update breaks backward compatibility when using an old term_id
Reported by: | boonebgorges | Owned by: | |
---|---|---|---|
Milestone: | Priority: | high | |
Severity: | major | Version: | 4.1 |
Component: | Taxonomy | Keywords: | has-patch |
Focuses: | Cc: |
Description
See #5809.
When a shared term is split into two separate terms, one of the terms will have a new term_id. This will cause problems for anyone who has previously stored a term_id, and is using it to look up that term post-split.
An example. Say you have a term with term_id=5
and slug=foo
. In wp_term_taxonomy
, you have two rows that look like this:
term_id term_taxonomy_id taxonomy --------------------------------------- 5 8 post_tag 5 9 category
Now, say you run the category 'foo' through wp_update_term()
. It will be recognized as a shared term, and will be split into a new row in wp_terms
(say, term_id=10
).
If you are running a plugin or other customization that has previously stored the term_id 5 to refer to the *category* 'foo', future lookups will not work. Examples:
get_term( 5, 'category' )
will return null.- Building a
Walker_CategoryDropdown
and passingselected=5
will break the pre-selection. - People who are making direct SELECT (or, god forbid, UPDATE) queries using the
term_id
will find their queries failing. - And maybe other stuff.
Note that, as far as I can find, none of these things are happening in core.
I'll follow up with some next steps.
Attachments (9)
Change History (51)
This ticket was mentioned in Slack in #core by boone. View the logs.
10 years ago
#3
@
10 years ago
Regarding potential solutions. Thinking about the possible reasons why someone would store a term_id
like this, the most obvious is that they want a simple way to pull up the term at a later date, likely using get_term()
(see #1 above). In this case, we can take advantage of the fact that (a) all of our core functions that pull term from the DB either run through term_taxonomy_id
, in which case this issue will be irrelevant since splitting doesn't change this value, or they go through get_term()
, and (b) when passing a term_id
to get_term()
, you must also provide a taxonomy name. So here's the idea:
- When splitting terms, save information about the split somewhere (say, in an option '_split_terms', which is an array of [old_term_id,new_term_id,taxonomy] triplets)
- In
get_term()
(andWP_Tax_Query::transform_query()
, and maybe when using the 'include'/'exclude' params ofget_terms()
), do some logic to detect queries for the old term id (maybe only as a fallback if the proper query fails), and do a lookup against the cached split terms data.
I think this would cover a large percentage of the potential fallout from split terms. I'll follow up with a patch.
This ticket was mentioned in Slack in #core by johnbillion. View the logs.
10 years ago
#8
@
10 years ago
- Keywords needs-patch added
- Severity changed from blocker to major
Following on from some discussion in #core, it's been suggested to un-revert this before beta 1 ships, in order to gauge its impact. If a term is already shared, the benefit of splitting it most likely outweighs the cost.
Additionally, we should fire an action after the term is split, to allow plugins to perform damage mitigation with regard to stored term IDs.
This ticket was mentioned in Slack in #core by boone. View the logs.
10 years ago
#10
@
10 years ago
30335.2.patch reinstates the term splitting on wp_update_term()
, with the following improvements:
- As requested, a 'split_shared_term' action is fired at the end of
_split_shared_term()
. - Added previously missing support for the reassigning of child terms, plus unit tests.
This ticket was mentioned in Slack in #core by boone. View the logs.
10 years ago
This ticket was mentioned in Slack in #core by mboynes. View the logs.
10 years ago
#15
follow-up:
↓ 35
@
10 years ago
- Keywords has-patch 2nd-opinion added; needs-patch removed
30335.3.patch is a first pass at providing backward compatibility for the use of the pre-split term_id in a number of cases. The cases covered are get_term()
, WP_Tax_Query
(when 'field=term_id'), and get_terms()
(when using a param that accepts term IDs, like 'include' or 'parent'). In brief: when a term is split in _split_shared_term()
, the affected term_taxonomy_id is stored (in the '_split_terms' option, keyed by the old term_id). In get_term()
, if an integer term_id is provided and no term is found in the cache or the DB by that term, the split term store is checked as a fallback. In get_terms()
and WP_Tax_Query::transform_query()
, a fallback query won't work, so we parse the passed term_ids *before* running the query, and swap out old term IDs with new ones, if found.
The implementation I chose for storing and fetching split term data is designed for performance on large sites, where split terms are most likely to occur. The only thing stored in the '_split_terms' option is term_taxonomy_ids, both to keep the option as small as possible (so it won't overload a cache block). _get_split_term()
bails quickly if the passed term_id does not have a match in this array. If it *does* find a match, get_term_by()
is run, which as of 4.1 is properly cached. So, on a site with a persistent object cache, no additional database queries should be triggered.
I guess I have a couple concerns/points for discussion:
- This system has a slight risk of false positives. If term 5 is shared between terms in 'post_tag' and 'category', and then the category is split to term_id=13, a later lookup
get_term( 5, 'category' )
will return term 13. It's possible that someone could choose to pass 5 toget_term()
in this way without intending to reference the old term_id, and getting back term 13 would be a pretty odd surprise. The chances of this are very small, but worth thinking about (would have to happen with a shared term that's been split - rare to begin with - and where someone just happens to query for a term with that ID, in which case they should be expecting bad (empty) results anyway). - I've chosen to provide silent fallback support. But maybe we should throw a
_doing_it_wrong()
? - How many cases does this actually cover? From some discussions with mboynes and a few others who have built custom systems that cache 'term_id', I think it will cover quite a large percentage. If you have written something that has cached term IDs in this way, your eyes on this patch would be most appreciated.
This ticket was mentioned in Slack in #core by boone. View the logs.
10 years ago
#17
@
10 years ago
I think this is a great start and should catch the vast majority of situations where this might be a problem, thanks @boonebgorges! Here are my thoughts after playing around with this:
- I would propose tweaking this to store the old term ids in multiple options, one per taxonomy. Within each option, store an array of
old_tem_id => new_term_id
. This makes the lookups even faster and prevents having to loop over each tt_id (let's say you had a very generic term which exists in 10 taxonomies, you might then have to lookup 9 terms before you get the right one). I'm going to add a patch which does this. (Update: added in 30335.4.patch) - This is currently going to autoload the option. Do we want it to? I would say no, and I'm going to include a patch which modifies that behavior. (Update: added in 30335.4.patch)
- Should
_get_split_term()
be@private
? Not that it's actually private, but why aren't plugins and themes welcome to use it?
I gave some more thought on other things we have to consider:
- Nav menus. Term IDs for menu items linking to terms get stored in post meta. This update will fix the display of terms (since it uses
get_terms()
on display), but when a term is deleted, it won't remove menu items referencing the old term ID (more on this below). Should we try to update menu items when terms get split? - The above deletion happens through the
'delete_term'
hook, so I dug a little deeper and it looks like'_wp_delete_tax_menu_item'
is the only core function hooked into that action. When a term is deleted, should we check if there's an old term ID linked to it and fire'delete_term'
for that term ID too? This would involve faking the 4th param ($deleted_term
), since the old term ID won't have a term object. Then, of course, we should remove theold_tem_id => new_term_id
reference in the options table. default_category
,default_post_format
,default_link_category
: I think these should just get updated when a term splits if the old term id is a default.- This was mentioned in the ticket, but should we add a check to
wp_dropdown_categories()
to see if the 'selected' arg is an old term?
#18
@
10 years ago
- Keywords needs-patch added; has-patch 2nd-opinion removed
mboynes - Wow, thanks for your work on this.
I would propose tweaking this to store the old term ids in multiple options, one per taxonomy. Within each option, store an array of old_tem_id => new_term_id
This is a much better idea than what I'd originally done. Big +1.
This is currently going to autoload the option. Do we want it to? I would say no
I know I originally suggested this, but when I wrote my first patch I waffled. autoload is basically irrelevant for sites with a persistent cache, so let's weigh it for sites without. The case for autoload=no is that this is a value that won't be used often on most sites, and it adds some overhead to load it into memory on every pageload, especially if there are lots of split terms. The case for autoload=yes is that the memory overhead is small because split terms are likely to be pretty rare, even on very large sites; and those large sites are the ones most likely to have persistent caching; and on sites where there is no persistent caching, we don't want to cause an additional DB query potentially on every pageload (as will be required by, say, a nav item linking to a split term). On balance, the case for autoload=no seems less persuasive to me, but I'm happy to be swayed.
Should _get_split_term() be @private? Not that it's actually private, but why aren't plugins and themes welcome to use it?
My thinking was that _get_split_term()
- or at least the way we use it internally - is as a sort of magical maneuver that we don't really want plugin authors to attempt to duplicate. But now that I'm writing this, I'm reconsidering - there are legitimate instances where plugins would want to do what we're doing here to provide the same kind of backward compatibility. Let's switch it.
Should we try to update menu items when terms get split?
Good call. I'd missed this. Yes, we should.
When a term is deleted, should we check if there's an old term ID linked to it and fire 'delete_term' for that term ID too?
I would say no. In _split_shared_term()
, I explicitly decided *not* to fire the term creation actions, because while a row is technically being inserted into the terms table, it's not the case that a "term" is being "created" in any real sense. I'd say that this case is similar. I'd even shy away from deleting the entry from the _split_terms_$taxonomy
option, because then there'll be no record of what happened, and in any case it's harmless to have it there, since future term lookups will return false anyway. I'd suggest that we ought to handle cases of term deletion on a case-by-case basis, and encourage plugin authors to do the same, though tbh I think it's going to be extremely rare that it's necessary to do anything on term deletion, since fetching a deleted term is going to return empty whether it was split or not.
default_category, default_post_format, default_link_category: I think these should just get updated when a term splits if the old term id is a default.
Yes. I made a note to do this when I wrote the first patch, but I neglected to check the note :) I'd say this is a case where we want to hook to 'split_shared_term'.
This was mentioned in the ticket, but should we add a check to wp_dropdown_categories() to see if the 'selected' arg is an old term?
Sure, this is a pretty easy win.
I will be able to work on an updated patch within the next few days, but if you beat me to it, then all the better :) Once you and I are pretty satisfied with it, I'd like to run what we've got by nacin and johnbillion to see what they think of the approach. Thanks for your help!!
This ticket was mentioned in Slack in #core by mboynes. View the logs.
10 years ago
@
10 years ago
Update default categories if they get split; add taxonomy to split_shared_term
action
#20
@
10 years ago
On balance, the case for autoload=no seems less persuasive to me
Hard to argue with your breakdown! I'm in full agreement; re-enabled in 30335.7.patch.
there are legitimate instances where plugins would want to do what we're doing here to provide the same kind of backward compatibility
Agreed; Updated this to wp_get_split_term()
and removed the @private
tag in 30335.7.patch.
I would say no [about firing
delete_term
for the old term id when a split term is deleted].
I'm leaning in the same direction, so let's not do anything for now, and see if anyone comes along with a different opinion.
I'd even shy away from deleting the entry from the _split_terms_$taxonomy option, because then there'll be no record of what happened
Great point, I left that alone too.
Going back to menus, I looked into updating these but I'm a little hesitant. In order to identify which menu items need to be updated, we'll have to do a fairly ugly meta query. Using raw SQL to improve efficiency, we're still stuck with something like:
SELECT m1.post_id FROM wp_postmeta AS m1 INNER JOIN wp_postmeta AS m2 ON m2.post_id=m1.post_id INNER JOIN wp_postmeta AS m3 ON m3.post_id=m1.post_id WHERE ( m1.meta_key = '_menu_item_type' AND m1.meta_value = 'taxonomy' ) AND ( m2.meta_key = '_menu_item_object' AND m2.meta_value = 'category' ) AND ( m3.meta_key = '_menu_item_object_id' AND m3.meta_value = 123 )
Once we've identified the post ID, we can use update_post_meta()
to update each found value. I'll play around with this in a massive dataset and submit a patch if it seems like we won't have performance issues.
#21
@
10 years ago
30335.8.patch handles the menus. I tested the SQL query against a very large database and the slowest query was still <50ms.
@boonebgorges I think I'm satisfied at this point, so have a look and if you feel the same way, we can reach for outside opinions from johnbillion and nacin. Thanks!
#22
follow-up:
↓ 38
@
10 years ago
- Keywords has-patch added; needs-patch removed
mboynes - This is fantastic. Thanks for your continued work on it.
For anyone arriving late to the conversation, here's a summary of what 30335.8.patch does. When splitting a term, we save (in an option '_split_terms_$taxonomy'
) an array of split term IDs (old_term_id => new_term_id
). Then, in the following places, we detect the presence of pre-split term_ids and do a silent old/new term_id swapout:
- Calling
get_term()
with an old term_id - Calling
get_terms()
when using one or more of the parameters that accept term_ids (include, exclude, parent, etc) - Using
WP_Tax_Query
withfield=id
- Calling
wp_dropdown_categories()
with 'selected' (which expects a term_id)
Furthermore, at the moment of term splitting (the 'split_shared_term' action), we swap out old term ids for new in the following places where WP stores term_ids in the DB:
- 'default_category', 'default_link_category', 'default_email_category' (wp_options)
- Nav menu items that go to taxonomy archives (wp_postmeta)
We feel confident that this will provide backward compatibility for a large percentage of cases where old term_ids have been cached, and are looking for feedback.
This ticket was mentioned in Slack in #core by boone. View the logs.
10 years ago
This ticket was mentioned in Slack in #core by boone. View the logs.
10 years ago
This ticket was mentioned in Slack in #core by johnbillion. View the logs.
10 years ago
This ticket was mentioned in Slack in #core by mboynes. View the logs.
10 years ago
#30
@
10 years ago
30335.9.patch is the backward compatibility proposal from mboynes and myself. The non-controversial bits (cleaning up core instances of term_id cached in the database) have been committed; see [30494].
This ticket was mentioned in Slack in #core by boone. View the logs.
10 years ago
This ticket was mentioned in Slack in #core by drew. View the logs.
10 years ago
#35
in reply to:
↑ 15
@
10 years ago
Replying to boonebgorges:
- I've chosen to provide silent fallback support. But maybe we should throw a
_doing_it_wrong()
?
Was there any additional thought on wether _doing_it_wrong()
should be included?
It may be useful to be able to track down in error logs wether there was an old term_id saved as post meta or an option somewhere?
This ticket was mentioned in Slack in #core by boone. View the logs.
10 years ago
This ticket was mentioned in Slack in #core by mboynes. View the logs.
10 years ago
#38
in reply to:
↑ 22
;
follow-up:
↓ 39
@
10 years ago
Replying to boonebgorges:
We feel confident that this will provide backward compatibility for a large percentage of cases where old term_ids have been cached, and are looking for feedback.
Some positive feedback here. I write such plugin which caches term_ids (and even associations between these term_ids). I just tested with the beta 2 and the action 'split_shared_term' is just perfect to update the cache. One remark though: since I store associations of term_ids, I need to split the associated terms. The best function to do this is to use _split_shared_term(). Is there any reason why you want to keep it private?
#39
in reply to:
↑ 38
@
10 years ago
Replying to Chouby:
One remark though: since I store associations of term_ids, I need to split the associated terms. The best function to do this is to use _split_shared_term(). Is there any reason why you want to keep it private?
We are looking at reverting this functionality (once again) for 4.1. If you could join us in the #core channel in WordPress Slack in half an hour from now that would be great.
This ticket was mentioned in Slack in #core by johnbillion. View the logs.
10 years ago
#41
@
10 years ago
- Milestone 4.1 deleted
- Resolution set to wontfix
- Status changed from new to closed
After some discussion, we've decided the following:
- Term splitting on
wp_update_term()
is being removed for 4.1, and is now slated for 4.2. See #5809. - In 4.2, we will not be providing any sort of backward compatibility of the type proposed in this ticket. Our analysis showed that it would cover too few of the actual cases of breakage.
For some background on these decisions, see:
On johnbillion's request, we're going to pull term splitting from
wp_update_term()
until we get this sorted out. https://wordpress.slack.com/archives/core/p1415905418001152