WordPress.org

Make WordPress Core

Opened 4 years ago

Closed 4 years ago

#34544 closed defect (bug) (fixed)

Attaching metadata to shared terms can result in data corruption

Reported by: boonebgorges Owned by: boonebgorges
Milestone: 4.4 Priority: high
Severity: normal Version:
Component: Taxonomy Keywords: has-patch 2nd-opinion
Focuses: Cc:

Description

WP 4.3 splits shared taxonomy terms on a cron job, which means that there may be sites where shared terms still exist after upgrading to 4.4. If a $term_id is shared between two or more taxonomies, you'll get unexpected results if you add_term_meta( $term_id, 'foo', 'bar' ). You could, for example, intend to add metadata to a category that shares a term_id with a post_tag, then have the term split so that the category gets a new term_id, at which point your metadata is associated with the post_tag rather than the category.

After some discussion with @dlh, @aaronjorbin, @mboynes, @dhshredder, I think the correct approach is to bail out of add_term_meta() or update_term_meta() if we detect that the $term_id is shared. A couple of implementation details to decide on:

  1. Checking to see whether a term is shared introduces a non-trivial amount of overhead. We should probably set a flag if we know that the installation has no shared terms, and skip the check if that's true. We already have a 'finished_splitting_shared_terms' flag for wp-cron, but I'm not 100% sure that it'll be reliable in all cases. We may need a flag that's set via a more direct database query, maybe at the end of _split_shared_term().
  2. Currently, *_metadata() functions and their wrappers return false on error. This is not very helpful. When a term is shared, we could throw a _doing_it_wrong(), though this can't be caught programatically. Or we could return a WP_Error, though it'll add yet another kind of return value.

Attachments (1)

34544.diff (6.3 KB) - added by boonebgorges 4 years ago.

Download all attachments as: .zip

Change History (19)

#1 @boonebgorges
4 years ago

Adding @jorbin @aaroncampbell to CC

#2 @jorbin
4 years ago

An additional option worth considering is throwing a catchable exception. This would mean that we don't need to add a new return value ( a bonus), but it would be the first place we throw an exception in WordPress. This would require education of plugin and site developers since an uncaught exception will cause breakage.

For the flag, I agree that this is the right way to go. While it isn't 100% guaranteed to be true ( after it is set, a developer could still manually make shared terms), I think developers doing that need to manage everything themselves.

This ticket was mentioned in Slack in #core-multisite by boone. View the logs.


4 years ago

@boonebgorges
4 years ago

#4 @boonebgorges
4 years ago

  • Keywords has-patch added

34544.diff is an initial patch addressing the issue.

  • I've gone with a WP_Error. It's annoying to add another return value type, but (a) these are new functions, so there are no compatibility concerns, and (b) returning false on error is pretty lousy behavior. In principle, I like @jorbin's idea of throwing catchable exceptions, but it should be part of a larger conversation about WP error reporting.
  • I'm reusing the 'finished_splitting_shared_terms' option introduced as a helper for the cron job in the 4.3 upgrade. New installations will always have it set to 1. Updates where all terms are split will have it set to 1. Updates where terms are not yet split, either because cron is broken or because cron has not had enough time to batch-split all terms, will have it set to 0. As a failsafe for cases where wp-cron isn't running for some reason, I added an extra check to the end of _split_shared_term(); after splitting a term, if we see that we've just split the final shared term, we set the flag to true. (Previously, we were only setting it to true during a cron callback.)

I'd appreciate feedback from one of this ticket's illustrious followers on the approach.

#5 @johnjamesjacoby
4 years ago

  • Keywords 2nd-opinion added

34544.diff gets the job done, but seems backwards to me. It's also possible I don't fully understand the harm that adding metadata to a shared term might cause.

Most of the time terms are not shared across taxonomies, so potentially hitting the database on each metadata add/update is kinda sucky, though there's not really any better way to know for sure.

If the problem is that splitting terms causes metadata to be orphaned, I expect the newly split term to have the same metadata as it's origin. Less of a split; more of a clone? The term splitting code could check for metadata at the origin and copy it to the new term. This approach would allow us to leave the metadata functions unmodified, and guarantee metadata exists as it's intended to after terms are split.

Other, more silly thoughts. How about the edge-case where installations are choosing not to run the term-splitting code, but do want term metadata. Regarding the newly proposed finished_splitting_shared_terms option, would it make more sense to compare to the db_version instead, or to separately version each database table and come up with a naming convention for doing that to other tables in the future?

#6 follow-ups: @boonebgorges
4 years ago

@johnjamesjacoby Thanks for the feedback.

If the problem is that splitting terms causes metadata to be orphaned

Actually, this is the first time I'd actually thought of the metadata being orphaned. My immediate concern is a situation like this. Two taxonomies, cool_band and city; a term Chicago with ID 45, in each taxonomy. If you add_term_meta( 45, 'thumbnail', $photo ), where $photo is a picture of the Sears Tower, then the thumbnail will also apply to the cool_band Chicago. This is clearly incorrect. More briefly: the termmeta functions accept only a $term_id for identifying a term, but when terms are shared, $term_id is insufficient.

make more sense to compare to the db_version instead

During the 4.3 update, we trigger the splitting of shared terms in a separate cron job. db_version is, unfortunately, not a reliable indicator of anything here. finished_splitting_shared_terms is what the 4.3 cron job uses to know when it can stop spawning itself.

#7 follow-up: @nerrad
4 years ago

in the theme of silly thoughts. What about doing this? If a shared term is detected, split it, then add the meta? I guess the potential pitfall to that is maybe still not knowing what specific term (after the split) to apply the meta to?

#8 in reply to: ↑ 7 @boonebgorges
4 years ago

  • Milestone 4.4 deleted
  • Status changed from new to closed

Replying to nerrad:

in the theme of silly thoughts. What about doing this? If a shared term is detected, split it, then add the meta? I guess the potential pitfall to that is maybe still not knowing what specific term (after the split) to apply the meta to?

Yup, that's exactly the issue. See #34533 for some more discussion.

#9 @boonebgorges
4 years ago

  • Milestone set to 4.4

I'm going to go with the proposed solution to get some testing in the next beta. Let's reopen if we find issues.

#10 @boonebgorges
4 years ago

  • Owner set to boonebgorges
  • Resolution set to fixed

In 35515:

Don't allow term meta to be added to shared taxonomy terms.

add_term_meta() and update_term_meta() identify terms by $term_id. In
cases where a term is shared between taxonomies, $term_id is insufficient to
distinguish where the metadata belongs.

When attempting to add/update termmeta on a shared term, a WP_Error object
is returned. This gives developers enough information to decide whether they'd
like to force the term to be split and retry the save, or show an error in the
UI, or whatever.

Props boonebgorges, mboynes, DH-Shredder, jorbin, aaroncampbell.
Fixes #34544.

#11 in reply to: ↑ 6 @dd32
4 years ago

Replying to boonebgorges:

@johnjamesjacoby Thanks for the feedback.

If the problem is that splitting terms causes metadata to be orphaned

Actually, this is the first time I'd actually thought of the metadata being orphaned. My immediate concern is a situation like this. Two taxonomies, cool_band and city; a term Chicago with ID 45, in each taxonomy. If you add_term_meta( 45, 'thumbnail', $photo ), where $photo is a picture of the Sears Tower, then the thumbnail will also apply to the cool_band Chicago. This is clearly incorrect. More briefly: the termmeta functions accept only a $term_id for identifying a term, but when terms are shared, $term_id is insufficient.

I think it's far enough an edgecase that this would be the expected outcome if you add metadata to a shared term that's later split. ie. that is, that the meta is added to the shared term, and meta is duplicated upon eventual split.

Use-cases like the one outlined above are so edgecase that if you're using/relying upon that and still using shared terms, you've got bigger problems that we can't automagically fix.

#12 @dd32
4 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

I'm re-opening simply because I don't agree with the direction taken here, and would like to get others thoughts before we lock ourselves into this.

#13 @boonebgorges
4 years ago

This ticket was originally meant to address two parallel issues:

(1) Adding meta to a shared term would cause "false positives", cases where metadata would be attached to an unexpected term_taxonomy
(2) Meta added to a shared term might end up with the "wrong" term after the split

The suggestion by @johnjamesjacoby and @dd32 addresses (2), at the expense of (1) - there'll be false positives both before the split (perhaps unavoidable) and after (difficult to debug). The approach committed in [35515] addresses both (1) and (2), but the trade-off is that metadata is simply not added at all, resulting in unexpected missing data when the proper error checks aren't in place.

Maybe we can think about this terms of data remediation. After [35515], a developer concerned about the possibility of shared terms could include logic like this in their plugin:

$added = add_term_meta( $term_id, 'foo', 'bar' );
if ( is_wp_error( $added ) && 'ambiguous_term_id' === $added->get_error_code() ) {
    $new_term_id = _split_shared_term( $term_id, $term_taxonomy_id );
    add_term_meta( $term_id, 'foo', 'bar' );
}

Let's imagine, on the other hand, that meta added to shared terms is duplicated across siblings at the time of split. The recommended technique for handling split terms is a 'split_shared_term' callback:

function wp34544_split_shared_term( $term_id, $new_term_id, $tt_id, $taxonomy ) {
    // We can pull up a list of metadata that has been copied to the new term...
    $term_meta = get_term_meta( $new_term_id );

    // ... but then what do we do with it?
}
add_action( 'split_shared_term', 'wp34544_split_shared_term', 10, 4 );

At the time of split_shared_term, there's no way to tell whether a given row in termmeta ought to go with the $new_term_id. (We'd need metadata-meta.) So it's not obvious how a dev would clean up bad data if we went with this technique.

#14 in reply to: ↑ 6 @aaroncampbell
4 years ago

I was putting together a longer response with examples, but @boonebgorges got his in first and really hit the same notes I wanted to hit, so I'm really kind of chiming in with a simple "yeah, that".

I think making a simple way for a developer to see the issue and fix it (returning a WP_Error) is a great way forward. Especially since this is new functionality.

The thing I think Boone skipped is that not only can you end up with meta on a term you didn't expect it on if terms are shared, but you can easily stomp on your own meta by accident. In his example of attaching 'photo' meta to Chicago expecting it to be the city not the band, imagine if someone were to see the issue and upload a photo of the band. Meta needs to be attached to a single term, not a (likely unexpected) cluster of them.

I think the fix in [35515] is a good solution to a rather complex problem. It allows data integrity while also eliminating most of the (potential) confusion.

#15 @jorbin
4 years ago

[35515] seems like the right solution to me, as much as I would really like to actually throw an exception.

Looking at the concerns laid out thus far:

Still having shared terms is a bit of an edgecase, but the most likely case where this would occur is from skipping 4.3. Historically, WordPress upgrades have been doable in one step, from any version to any other version. We shouldn't require an intermediate stop of 4.3 (and for term splitting to finish) to upgrade to 4.4 and use term meta. This is absolutely a change that adds complexity, but I think it's worth it to keep updates easy.

For the sites that haven't split terms yet do to a desire to "opt out", I don't think that is an edge case that should be supported. Core needs to have the flexibility to make underlying data changes and the decision was made to split terms.

One unmentioned use case I could image is for plugin that we will call the_awesome_plugin which already implements term meta in some way. The_awesome_plugin contains code for migrating to 4.4 term_meta. A user that jumps over 4.3 and tries to run the_awesome_plugin's 4.4 script (or this script runs automatically) could end up with unintended data (such as naming Boone the city in North Carolina the 2nd best karaoke singer when really you mean to name Boone the developer the 2nd best karaoke singer).

This ticket was mentioned in Slack in #core by boone. View the logs.


4 years ago

This ticket was mentioned in Slack in #core by wonderboymusic. View the logs.


4 years ago

#18 @boonebgorges
4 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

Given the lack of movement, and the general agreement among the last few commenters, and the fact that we're going to RC1 today, I'm going to reclose the ticket. @dd32 and others, if you have an alternative suggestion that addresses the concerns listed above, please feel free to reopen with details.

Note: See TracTickets for help on using tickets.