Make WordPress Core

Opened 7 years ago

Last modified 4 years ago

#38280 assigned enhancement

Make term count for a specific object type available

Reported by: desrosj's profile desrosj Owned by: desrosj's profile desrosj
Milestone: Future Release Priority: normal
Severity: normal Version: 4.7
Component: Taxonomy Keywords: has-patch has-unit-tests needs-refresh
Focuses: Cc:

Description

Term count is the total number of relationships with no context of the object types the relationships belong to.

Currently, the best way to determine the count of a term for a specific object type is to run a WP_Query with a WP_Tax_Query for that term.

$term_count_query = new WP_Query( array(
        'post_type' => 'some-post-type',
        'tax_query' => array(
                array(
                        'taxonomy' => 'some-taxonomy',
                        'field' => 'slug',
                        'terms' => array( 'term-slug' )
                ),
        ),
        'posts_per_page' => 1,
) );
$term_count_for_post_type = $term_count_query->found_posts;

Now that term meta is in core, it would be great if these counts were stored in term meta for easy access.

To start, these counts could be available through a function. To go a step further, an argument could be added to get_terms() and other functions to filter the term counts and reflect the object types terms are being grabbed for.

Attachments (19)

38280.diff (3.6 KB) - added by desrosj 7 years ago.
Stores object type specific counts in term meta when term counts are updated. Adds wp_get_term_count_for_object_type() to retrieve term counts.
38280.2.diff (6.1 KB) - added by desrosj 7 years ago.
Updated ticket adds unit tests, casts get_term_meta() call into an array to prevent PHP notice, improved function Docblock.
38280.3.diff (8.0 KB) - added by desrosj 7 years ago.
38280.4.diff (9.8 KB) - added by desrosj 7 years ago.
38280.5.diff (11.7 KB) - added by desrosj 7 years ago.
Updated patch adds another meta key for tracking the object types that were counted. Also removes all previously counted object types each time to prevent abandoned meta keys.
38280.6.diff (12.1 KB) - added by boonebgorges 7 years ago.
38280.7.diff (10.8 KB) - added by ivankristianto 6 years ago.
Update patch to apply cleanly
38280.8.diff (11.4 KB) - added by ivankristianto 6 years ago.
term_taxonomy_id could be different than term_id in term_relationships table. this patch fix it
38280.9.diff (14.2 KB) - added by desrosj 6 years ago.
38280.10.diff (14.8 KB) - added by boonebgorges 6 years ago.
38280.11.diff (5.3 KB) - added by ivankristianto 6 years ago.
38280.12.diff (16.2 KB) - added by boonebgorges 6 years ago.
38280.13.diff (16.0 KB) - added by desrosj 6 years ago.
38280.14.diff (17.5 KB) - added by boonebgorges 6 years ago.
38280.15.diff (20.9 KB) - added by boonebgorges 6 years ago.
38380.diff (9.7 KB) - added by desrosj 6 years ago.
38280.16.diff (9.7 KB) - added by desrosj 6 years ago.
38280.17.diff (20.3 KB) - added by ivankristianto 6 years ago.
Added back wpGetTermObjectCount test
38280.18.diff (20.4 KB) - added by boonebgorges 6 years ago.

Download all attachments as: .zip

Change History (49)

@desrosj
7 years ago

Stores object type specific counts in term meta when term counts are updated. Adds wp_get_term_count_for_object_type() to retrieve term counts.

@desrosj
7 years ago

Updated ticket adds unit tests, casts get_term_meta() call into an array to prevent PHP notice, improved function Docblock.

@desrosj
7 years ago

#1 @desrosj
7 years ago

Talked this over with @johnbillion at WordCamp US. He suggested single meta entries for each object type. This would unlock the potential for sorting terms by object type counts.

The newest patch reflects this change, and also adds unit tests.

The patch works and I can verify that through testing it and checking the database/displaying counts and function results. The tests all pass when run directly (using phpunit -group 38280 or phpunit -group taxonomy), but for some reason they fail when running all unit tests. A second set of eyes would be appreciated.

#2 @desrosj
7 years ago

Also, the attached patch follows the following methodology:

  • Do not create meta entry if one object type is registered on a taxonomy.
  • If two or more object types are registered, only store a meta entry for object types who's counts are greater than 0.
  • The term count remains the cumulative count for all relationships.

#3 @johnbillion
7 years ago

This is looking good. Some feedback:

  • This function will return 0 for terms that haven't yet had their post count cache populated (ie. all terms that already exist when the site gets upgraded to WP 4.8, for example). Should the term count its posts if the term meta cache doesn't yet exist?
  • The $wpdb->prepare() call added on line 3427 of src/wp-includes/taxonomy.php should properly prepare $type instead of concatenating it into the SQL.
  • There should be a test that asserts that the count is correctly updated when terms are removed from an object with wp_remove_object_terms().

@desrosj
7 years ago

#4 @desrosj
7 years ago

Updated patch to address all three items.

Bullet point one made me think of a couple scenarios that could result in some abandoned meta caches:

  • What if a taxonomy is removed from a post type?
  • What if a post type is removed?

Should we check for these some how?

#5 @desrosj
7 years ago

  • Keywords needs-refresh has-patch has-unit-tests added

#7 @desrosj
7 years ago

  • Keywords needs-refresh removed

Patch actually does not need a refresh. Still having the issue of tests passing when the taxonomy or 38280 group is run, but failing when all tests are run though.

#8 @boonebgorges
7 years ago

@desrosj Nice job on this patch so far.

The comment // When a count for an object type does not exist, if any other meta caches exist we can assume 0. makes me nervous. I guess this is because you're not storing 0s, and the presence a count for another object type means that WP has recently done a relevant count. I worry that this is going to be subject to weird race conditions or data integrity problems. (For example: what if a taxonomy originally assigned to 'post' becomes assigned to array( 'post', 'page' )?) How about some combination of your original array-based approach and the one-entry-per-object-type approach? There'd be a single key containing a list of the object types for which a count exists, but the counts themselves would be stored in their own rows. This way, if $type appears in the array, and if get_term_meta( $term_id, '_wp_object_count_' . $type, true ) returned false for a $type, we can be confident that $type has indeed been counted.

Bullet point one made me think of a couple scenarios that could result in some abandoned meta caches

These situations are rare enough that I wouldn't worry about it too much. As a precaution, _update_post_term_count() could delete_post_meta() for all of a post's existing counts before generating and saving new ones (another argument for having a separate list of the object types that have been counted). This way, the stale data will be removed during the next count.

@desrosj
7 years ago

Updated patch adds another meta key for tracking the object types that were counted. Also removes all previously counted object types each time to prevent abandoned meta keys.

This ticket was mentioned in Slack in #core by desrosj. View the logs.


7 years ago

@boonebgorges
7 years ago

#10 @boonebgorges
7 years ago

38280.6.diff does some formatting cleanup, including some slight reorganization of return statements in wp_get_term_count_for_object_type() to reduce the amount of if indentation.

This reorganization shows that there are some cases in which wp_get_term_count_for_object_type() doesn't return a value at all. Should we be returning 0 as a default value? The fact that these cases didn't come up makes me think we're missing some cases in the unit tests.

Can you explain this logic to me?

 	* If the object type has been counted, and other counts exist in meta, we can 
 	* assume this term has 0 relationships for this object type. 

Why is it necessary to check for *other* counts? Shouldn't it be enough for the type to be listed in _wp_counted_object_types? If there's a scenario where these values might get out of whack, we should (a) have a test that describes it, and/or (b) defend against that scenario :)

The invalidation part is looking pretty good to me.

This ticket was mentioned in Slack in #core-multisite by desrosj. View the logs.


6 years ago

#12 @ivankristianto
6 years ago

  • Keywords needs-refresh added

I failed to merge this patch

@ivankristianto
6 years ago

Update patch to apply cleanly

#13 @ivankristianto
6 years ago

  • Keywords needs-refresh removed

@ivankristianto
6 years ago

term_taxonomy_id could be different than term_id in term_relationships table. this patch fix it

@desrosj
6 years ago

#14 @desrosj
6 years ago

Thanks for the patch refresh, @ivankristianto!

Looks like the changes in 38280.8.diff are breaking the unit tests. After looking into it further, I think those changes should be reverted (which I have done in 38280.9.diff). While the term_id could be different than term_taxonomy_id, the function's inline documentation explicitly specifies List of Term taxonomy IDs.. I also looked at where _update_post_term_count() is called in core and the only function that calls it is wp_update_term_count_now(). This function's documentation also specifies The term_taxonomy_id of terms to update., so I think we can avoid that additional query.

A term ID check may be needed to ensure the correct term ID for the meta function calls, but I need to look into this further. @boonebgorges does anything stick out to you there?

Also in 38280.9.diff:

  • Change version number on new functions to 5.0.0.
  • Added a @since tag to _update_post_term_count() noting the new meta information.
  • If a WP_Error is returned by get_term() return from wp_get_term_count_for_object_type(). Previously, it was just assumed a term would always be returned. null can also be returned for "miscellaneous failures". I was looking to better understand when null would be returned so I could write tests to catch it before adding a return statement for this condition.
  • Removed the else statement at the bottom of _update_post_term_count(). The meta entries deleted in this else are already deleted earlier on in the function.
  • Moved tests for wp_get_term_count_for_object_type() into a separate test class.

Replying to boonebgorges:

Can you explain this logic to me?

 	* If the object type has been counted, and other counts exist in meta, we can 
 	* assume this term has 0 relationships for this object type. 

Why is it necessary to check for *other* counts? Shouldn't it be enough for the type to be listed in _wp_counted_object_types? If there's a scenario where these values might get out of whack, we should (a) have a test that describes it, and/or (b) defend against that scenario :)

I think this was something I missed when previously when adding the _wp_counted_object_types meta cache. I have removed it.

This reorganization shows that there are some cases in which wp_get_term_count_for_object_type() doesn't return a value > at all.

Looking at this further, it seems the only time we would not return a value would be if wp_update_term_count_now() returned a "falsey" value. But, that function always returns true. Wondering if this could ever cause an endless loop in the last conditional of the function (if the meta does not successfully get stored, it may call itself recursively).

I think this is in a great spot now pending the items above. I would like to cut the test methods into smaller ones (and maybe use some fixtures) as well. I couldn't find any tests for wp_update_term_count_now() or _update_post_term_count() either. If we don't add them here, a new ticket may be good for tracking.

#15 @ivankristianto
6 years ago

@desrosj
Thank you for catch that error.
since we need to pass term_taxonomy_id for wp_update_term_count_now
You need to convert term_id in this line:

if ( wp_update_term_count_now( array( $term_id ), $taxonomy ) ) {


Because it will count different terms if the term_id is not equal term_taxonomy_id

Last edited 6 years ago by ivankristianto (previous) (diff)

#16 @boonebgorges
6 years ago

38280.10.diff makes the change pointed out by @ivankristianto, which I was in the process of patching when he posted his comment :)

Fixing this error uncovers some bug that I haven't grasped yet. The logic of wp_get_term_count_for_object_type() is basically: see if we have a cached count, but if not, call wp_update_term_count_now(), and then re-call wp_get_term_count_for_object_type(), the idea being that calling wp_update_term_count_now() should always populate the cached count. But this doesn't appear to be happening in all cases, so that this recursive call ends up with an infinite regress. This appears to happen when the taxonomy is associated with more than one object type, and when term_id and term_taxonomy_id are out of sync, which suggests that there's still some inconsistency there. Try running 38280.10.diff with the entire --group taxonomy and you'll see what I mean.

I made a couple other changes as well:

  • In _update_post_term_count(), you were double-escaping the $object_types (esc_sql() + $wpdb->prepare()). Your rearrangement means that the esc_sql() call is no longer needed, so I've removed it.
  • Rearranged the logic at the end of wp_get_term_count_for_object_type() to make it clearer what's being returned in the last case.
  • In the complex tests, use a custom taxonomy rather than category. Category has all sorts of special cases ('Uncategorized', etc) and, at first, I thought it might be the cause of the bug described above, though perhaps I'm wrong about that.
  • Don't return false from wp_get_term_couont_for_object_type() in case of a bad object type. Seems more informative and more internally consistent to return a WP_Error.

Perhaps @ivankristianto or @desrosj will be able to look at the errors and find what's causing the infinite regress. To me, it's sort of a code smell - ideally, the function wouldn't need to call itself again to get the cached value, though I realize the underlying count functions don't return friendly values.

#17 @ivankristianto
6 years ago

@boonebgorges it's because we use term_taxonomy_id in _update_post_term_count argument. while we do update_term_meta and delete_term_meta which accept term_id
So back to the case where term_id is not equal to term_taxonomy_id issue.

I upload patch 38280.11.diff to query back to get the term_id for this purpose.

But it cause 1 test failed from test_termmeta_cache_should_not_be_primed_when_update_term_meta_cache_is_false which I don't understand why. Can you please help?

#18 @boonebgorges
6 years ago

@ivankristianto Good call, thanks!

The failing test is because _update_post_term_count() now calls get_term_meta(), which populates the (entire) term meta cache for each term. I don't think this is a bug, but it means that the test needs some additional logic to ensure an empty termmeta cache before running its assertion. I've made the change in 38280.12.diff.

I also swapped out your database query with get_term_by( 'term_taxonomy_id' ), and changed the variable names to make it clear where you're referencing a term_taxonomy_id vs a term object/term_id.

@desrosj
6 years ago

#19 follow-up: @desrosj
6 years ago

38280.13.diff removes bool from the wp_get_etrm_count_for_object_type() inline documentation and adds the taxonomy to the first WP_Error that is returned when a taxonomy does not exist. Also fixes a 3 minor alignment and spacing issues in the unit tests to adhere to the coding standards.

@boonebgorges Do you think adding a return value to _update_post_term_count() is realistic? It is a pretty old function (2.3), but adding a return value (or at least a false return when it fails) would solve the infinite loop issue.

#20 in reply to: ↑ 19 @boonebgorges
6 years ago

Replying to desrosj:

38280.13.diff removes bool from the wp_get_etrm_count_for_object_type() inline documentation and adds the taxonomy to the first WP_Error that is returned when a taxonomy does not exist. Also fixes a 3 minor alignment and spacing issues in the unit tests to adhere to the coding standards.

@boonebgorges Do you think adding a return value to _update_post_term_count() is realistic? It is a pretty old function (2.3), but adding a return value (or at least a false return when it fails) would solve the infinite loop issue.

Thank you sir. I don't think it's feasible to change the return value for the update functions, for a few reasons. One is backward compatibility. Another is that taxonomy terms can be attached to things that aren't posts, which means that the count fallback needs to be more resiliant. 38280.14.diff has a test that demonstrates this - the test, along with 38280.13.diff, will trigger the infinite regress.

38280.14.diff has one proof-of-concept way to avoid the regress. Basically: break the call to termmeta into its own reusable function. The function returns false if no meta is found, indicating a count should take place. If it returns false, then we trigger wp_update_term_count_now(), and then fetch from meta again. If it's *still* false, we just return 0 from the function. Upside: No more regress, even for non-post-type objects.

Downside: wp_get_term_count_for_object_type( $term_id, $non_post_type ) will *always* trigger a recount, except when a plugin updates the term-object counts itself. This seems dangerous. So, an alternative technique might be to simply bail out of wp_get_term_count_for_object_type() if the object_type is (a) not found in _wp_counted_object_types (this covers cases where the plugin properly handles term counts), and (b) is not a post type. This 0 return value for non-post-types could be filtered, in case a plugin wants to do something fancy.

I'm thinking that this latter technique might be the simplest and safest, but I'd welcome feedback before writing yet another patch :-D

#21 @ivankristianto
6 years ago

@boonebgorges
I think it's a good idea to bail it early and also the safest.

And I found a potential issue, if the taxonomy has custom update_count_callback , the _update_post_term_count() will never get called. Thus function wp_get_term_count_for_object_type() will never return the correct count.

I propose to move the logic to count terms per object types from _update_post_term_count into reusable function and call it from wp_update_term_count_now instead.

#22 follow-up: @boonebgorges
6 years ago

38280.15.diff is a rethink.

In the case of post types, term-object counts will be regenerated whenever wp_update_term_count_now() is run. This, in turn, is triggered by wp_set_object_terms(), etc. In other words, the data should always exist for terms that are touched after the 5.0 release - that is, those that are created/updated/attached to posts/removed from posts after 5.0. The time when we need to worry about "backfilling" the data is for terms that have *not* been touched since 5.0.

In the case of non-post-types (or post types that have their own custom update_count_callback), there is nothing that core can do when the count is requested, except to fail gracefully.

To accommodate both scenarios with a single approach, I introduced a new action 'wp_no_object_term_count_found'. Core will hook into it for post types, and will run wp_update_term_count_now(). (This covers "legacy" terms.) For non-post-types or post types with custom update_count_callback, plugins can hook to it and populate the object-term counts, and/or use the main filter I've added at the end of the function.

How does this general approach feel to others?

Moving forward, can we think about new names for the function/functionality? term_count_for_object_type() sounds backward - the thing being counted is "objects of a given (object/post) type that have a given term", not "terms that belong to a given (object/post) type". How about wp_get_term_object_count()? I can't think of anything more melodious :)

I also have questions about the "if" conditions where I put the comment are these right?. I know that these are meant to be shortcuts, but I worry that they have the potential to produce corrupted data. For example, the 1 >= count( $taxonomy_object->object_type ) call assumes that plugins have properly registered their taxonomies with object types - but do we enforce this elsewhere? What if the $taxonomy_object->object_type is not the same as $object_type? It seems better to return no data in these cases than to return something potentially incorrect. Can anyone help to think through the possibilities here?

This ticket was mentioned in Slack in #core by davecpage. View the logs.


6 years ago

#24 in reply to: ↑ 22 ; follow-up: @desrosj
6 years ago

Replying to boonebgorges:

How does this general approach feel to others?

I like this approach a lot. It feels a lot more natural. 38280.16.diff adds tests for the new meta retrieval function.

Some questions.

  • Should the priority of core's wp_no_object_term_count_found hook be higher than the default of 10 to ensure it runs before custom hooks?
  • Should wp_get_term_count_for_object_type() have a dynamic hook that allows a filter to only be added for a specific taxonomy? "wp_term_count_for_object_type_$taxonomy", for example.
  • If the second attempt to get the object count after the "no count found" hook fails, should a WP_Error be returned?

Also, I think wp_get_term_count_for_object_type_from_meta() could also return false when an object does not belong to a taxonomy. Object types can be removed from a taxonomy and not have their counts updated for quite some time. Returning false here would trigger a recount and flush the removed object type from the meta cache.

How about wp_get_term_object_count()?

Sounds good to me. I made name changes and tweaked some docblocks to more accurately indicate what is counted in 38280.16.diff.

I also have questions about the "if" conditions where I put the comment are these right?. I know that these are meant to be shortcuts, but I worry that they have the potential to produce corrupted data. For example, the 1 >= count( $taxonomy_object->object_type ) call assumes that plugins have properly registered their taxonomies with object types - but do we enforce this elsewhere? What if the $taxonomy_object->object_type is not the same as $object_type? It seems better to return no data in these cases than to return something potentially incorrect. Can anyone help to think through the possibilities here?

When a taxonomy is registered, no object type validation is performed. So, you could register a foo object type for a taxonomy even if foo objects are not registered. Maybe 1 >= count( $taxonomy_object->object_type ) could be changed to count the number of object types that currently exist instead. This would add a need to compare the number of object types stored in _wp_counted_object_types with the number of active in order to trigger recounting if an object type is removed, but could prevent inaccurate counts when an object type is removed.

I also created a branch on my WordPress Develop fork to show the tests passing. This branch reflects 38280.16.diff.

Version 0, edited 6 years ago by desrosj (next)

@desrosj
6 years ago

@desrosj
6 years ago

@ivankristianto
6 years ago

Added back wpGetTermObjectCount test

#25 in reply to: ↑ 24 @ivankristianto
6 years ago

Replying to desrosj:

Replying to boonebgorges:

I also have questions about the "if" conditions where I put the comment are these right?. I know that these are meant to be shortcuts, but I worry that they have the potential to produce corrupted data. For example, the 1 >= count( $taxonomy_object->object_type ) call assumes that plugins have properly registered their taxonomies with object types - but do we enforce this elsewhere? What if the $taxonomy_object->object_type is not the same as $object_type? It seems better to return no data in these cases than to return something potentially incorrect. Can anyone help to think through the possibilities here?

When a taxonomy is registered, no object type validation is performed. So, you could register a foo object type for a taxonomy even if foo objects are not registered. Maybe 1 >= count( $taxonomy_object->object_type ) could be changed to count the number of object types that currently exist instead. This would add a need to compare the number of object types stored in _wp_counted_object_types with the number of active in order to trigger recounting if an object type is removed, but could prevent inaccurate counts when an object type is removed.

I agree with @desrosj here, $taxonomy_object->object_type could be different and we didn't enforced that. I prefer we remove that condition and go straight get the count from the meta / recount it. It's more safe rather than return false data.

I did add back the unit testing that missing in previous patch 38280.17.diff. And also rename the function name as well.

#26 @boonebgorges
6 years ago

I agree with @desrosj here, $taxonomy_object->object_type could be different and we didn't enforced that. I prefer we remove that condition and go straight get the count from the meta / recount it. It's more safe rather than return false data.

I've made this change in 38280.18.diff. Note that this also means that _wp_counted_object_types and _wp_object_count_{$object_type} is now set for *every* term. This is not ideal, especially when most taxonomies are, in fact, used only with a single object type. So I'm on the fence about it. The alternative strategy would be to stick with the 1 === count( ... ) technique, and urge the importance of proper object-type registration and custom count callbacks for custom taxonomies, somewhere in the documentation. What do others think?

Should the priority of core's wp_no_object_term_count_found hook be higher than the default of 10 to ensure it runs before custom hooks?

I don't see a powerful reason for doing this. It's just as likely - maybe more? - that a developer will want to hook in *before* the core callback, so as to call remove_action().

Should wp_get_term_count_for_object_type() have a dynamic hook that allows a filter to only be added for a specific taxonomy? "wp_term_count_for_object_type_$taxonomy", for example.

Dynamic hooks make the most sense when the generic hook is called so frequently that the numerous calls to call_user_func() cause performance problems. I don't think that's going to be the case for 'wp_term_object_count'. So my preference would be to err on the side of fewer, and more general, hooks, and devs can check $term->taxonomy in their callbacks. Happy to hear counterarguments, though.

If the second attempt to get the object count after the "no count found" hook fails, should a WP_Error be returned?

I guess that would be like:

$count = wp_get_term_object_count_from_meta( $term_id, $object_id );
if ( false === $count ) {
    return new WP_Error( ... );
}

$count = (int) $count;

Is that right? What information would the error provide to a developer? I guess that the object type is not a post type, and the plugin in question didn't properly register a count callback?

This ticket was mentioned in Slack in #core by davecpage. View the logs.


6 years ago

#28 @desrosj
5 years ago

Diving back into this today, I thought of an issue with the conditionals in wp_get_term_object_count(). The first condition checked is 0 === $term->count. If the object types registered to a taxonomy change and the count is 0, a recount will not occur until a term is added or removed from an object. This could result in inaccurate counts,

Maybe we have been thinking about this the wrong way. Maybe, instead of checking the number of object types registered to a taxonomy (which could be the same even if you remove one taxonomy and add another), the object types themselves should be compared. Something like:

$new_types = array_diff( $registered_object_types, $counted_object_types );
$missing_counted_types = array_diff( $counted_object_types, $registered_object_types );

if ( ! empty( $new_types ) || ! empty( $missing_counted_types ) {
    // Object types have changed. Recount.
} elseif ( 0 === $term->count ) {
    // Check for 0 count.
} else {
    $count = wp_get_term_object_count_from_meta( $term_id, $object_type );
}

This would also require _wp_counted_object_types for every term, which I am also undecided on and want to think about a bit more. The only alternative I can think of in the current state is storing a cache as an option when there is only one object type on a taxonomy. But, I think that introduces some needless complexity.

What information would the error provide to a developer?

The scenarios where a count could not be determined, I think, are:

  • An error occurred trying to recount
  • Post type does not exist.
  • A custom callback was not properly implemented.

#29 @desrosj
4 years ago

  • Milestone changed from Awaiting Review to Future Release
  • Owner set to desrosj
  • Status changed from new to assigned

Going to give this another look over and try to get it ready to go.

#30 @desrosj
4 years ago

  • Keywords needs-refresh added
Note: See TracTickets for help on using tickets.