Make WordPress Core

Opened 8 years ago

Last modified 7 months ago

#40351 reopened enhancement

Term post re-counts scale poorly, are common and difficult to avoid

Reported by: mattoperry's profile mattoperry Owned by: whyisjake's profile whyisjake
Milestone: Future Release Priority: normal
Severity: normal Version: 4.8
Component: Taxonomy Keywords: has-patch has-unit-tests early
Focuses: performance Cc:

Description

Under normal conditions whenever a post status transition occurs, _update_term_count_on_transition_post_status attempts to completely recalculate the post counts for each term related to the post by re-counting each term's total number of post relationships.. For sites with large term relationship tables, large numbers of total terms and high numbers of terms per post, this recalculation does not scale well and can lead to wp-admin lag (while saving a post for example) or failed queries.

A typical bad scenario looks like this:

Consider a site with a large term_relationship table and a post with (say) 30 terms assigned to it. When that post is updated, an eventual call to _update_post_term_count will cause that function to execute 60 total queries on some very large tables. 30 are SELECTs:

SELECT COUNT(*) FROM $wpdb->term_relationships, $wpdb->posts WHERE $wpdb->posts.ID = $wpdb->term_relationships.object_id AND post_status = 'publish' AND post_type IN ('" . implode("', '", $object_types ) . "') AND term_taxonomy_id = %d

(This is actually the hopeful case, since if we need to count attachments too, those are added to the above query via a subselect which probably makes things worse.)

Interspersed among the 30 SELECTs are 30 UPDATEs to the tt table, generated by:

$wpdb->update( $wpdb->term_taxonomy, compact( 'count' ), array( 'term_taxonomy_id' => $term ) );

One result of all of this can be failed queries -- typically the SELECTs -- and incorrect term post counts. Even more frequently the issue manifests as slow post updating behavior (the lag a user feels while waiting for a post to save or publish.)

We currently provide two mechanisms (besides just unhooking _update_term_count_on_transition_post_status entirely and not updating term post counts at all) to do something about this problem. The first is to defer post counts, but this just delays the badness until a later call of wp_update_term_count. The second is to define a per-taxonomy custom update callback using update_count_callback.

Neither of these mechanisms is intended to improve performance, and both are obscure. It would be better to by default increment or decrement post counts directly in the tt table in response to posts entering or leaving published (or other countable) stati, and reserve complete recalculations for special occasions (such as when a term is edited.) Using this strategy, the 60 total queries in the example above could -- on publish -- be replaced by a a single query that would look something like:

UPDATE {$wpdb->term_taxonomy} AS tt SET tt.count = tt.count + 1 WHERE tt.term_taxonomy_id IN ( .... )

(where the final list is a list of tt_ids.)

This solution is implemented now as a plugin here: https://github.com/Automattic/lightweight-term-count-update.

Patch based on this on the way soon.

Attachments (10)

cpu-usage.png (50.7 KB) - added by peterwilsoncc 5 years ago.
40351.diff (7.1 KB) - added by rebasaurus 4 years ago.
Porting of the LTCU plugin into core
40351-1.diff (7.2 KB) - added by rebasaurus 4 years ago.
Add $taxonomy arg to countable_status filter
40351.2.patch (7.2 KB) - added by davidbaumwald 4 years ago.
Updated patch for 5.6
40351.3.patch (9.7 KB) - added by lcyh78 4 years ago.
Add the filter to the original count code in _update_post_term_count (used in edit-{taxonomy} screen).
40351.4.patch (9.7 KB) - added by lcyh78 4 years ago.
Updated consolidated patch. $transition was not set on adding new post.
40351.2.diff (34.7 KB) - added by peterwilsoncc 4 years ago.
40351.3.diff (46.6 KB) - added by peterwilsoncc 4 years ago.
Wrapping up the docs.
40351.4.diff (893 bytes) - added by Chouby 4 years ago.
Fixes the $taxonomy parameter for the actions edit_term_taxonomy and edited_term_taxonomy
40351.5.patch (668 bytes) - added by lcyh78 4 years ago.
Alternative fix proposed. Seems problem occurs as parameter $taxonomy is changed to an object a few lines earlier. This does as in wp_set_object_terms and wp_remove_object_terms.

Download all attachments as: .zip

Change History (70)

This ticket was mentioned in Slack in #core by mattoperry. View the logs.


7 years ago

#2 @desrosj
7 years ago

  • Keywords needs-patch needs-unit-tests added

#3 @peterwilsoncc
5 years ago

cpu-usage.png shows the room for improvement on large sites.

Work (a large news organisation) enabled light-weight term counting on March 7 due to CPU alerts firing daily during the wp_scheduled_auto_draft_delete and wp_scheduled_delete cron jobs. CPU use immediately dropped 30%+ during those tasks.

Note: for various reasons such as the number of users, we generate a lot of auto-drafts so these results aren't typical on a standard WordPress site, this is very specific to a high usage site.

At the very least, I think we can introduce some pre-flight hooks to the term counting functions to allow customisation of the database query. I'd suggest preflight filters for wp_update_term_count_now(), _update_post_term_count() and _update_generic_term_count().

#4 @johnbillion
5 years ago

I've also seen the Lightweight Term Count Update plugin reduce processing time from literally minutes when there's a large number of terms, to fractions of a second.

#5 @whyisjake
4 years ago

  • Milestone changed from Awaiting Review to 5.5
  • Owner set to whyisjake
  • Status changed from new to assigned

This ticket was mentioned in Slack in #core by whyisjake. View the logs.


4 years ago

This ticket was mentioned in Slack in #core by david.baumwald. View the logs.


4 years ago

#8 @davidbaumwald
4 years ago

@whyisjake Is this one landable with 5.5 Beta 1 due to be released in just a few days?

@rebasaurus
4 years ago

Porting of the LTCU plugin into core

#9 @rebasaurus
4 years ago

  • Keywords has-patch needs-testing dev-feedback added; needs-patch removed

#10 @johnbillion
4 years ago

  • Keywords early added; dev-feedback removed
  • Milestone changed from 5.5 to 5.6

I think this should go in early in a cycle (unless Jake wants to run with it for 5.5). I'm willing to shepherd this for 5.6 as it has a huge performance impact on large sites.

#11 @johnbillion
4 years ago

cc @mboynes

#12 follow-up: @whyisjake
4 years ago

I'd love to see this land in 5.5, getting some more eyes on it.

#13 @lcyh78
4 years ago

Would it be possible to have the filter countable_status to include $taxonomy as a parameter as it is available with both usages.
Some custom post types plugins wish to count using a different set of statuses than the default and providing the taxonomy being counted will make it easy to implement this requirement with the filter.

Thanks in advance,
Neil James

#14 in reply to: ↑ 12 @peterwilsoncc
4 years ago

Replying to whyisjake:

I'd love to see this land in 5.5, getting some more eyes on it.

I'd rather put it in early in 5.6, it's a substantial change of approach.

In 40351.diff, the existing query is used on the edit terms screen. I can't remember if that is in the original plugin but it would be good to avoid that in core if possible.

@rebasaurus
4 years ago

Add $taxonomy arg to countable_status filter

@davidbaumwald
4 years ago

Updated patch for 5.6

@lcyh78
4 years ago

Add the filter to the original count code in _update_post_term_count (used in edit-{taxonomy} screen).

@lcyh78
4 years ago

Updated consolidated patch. $transition was not set on adding new post.

This ticket was mentioned in Slack in #core by hellofromtonya. View the logs.


4 years ago

#16 follow-up: @hellofromTonya
4 years ago

  • Keywords needs-dev-note added

As discussed during <bug-scrub>, this one will get a Misc Dev Note for the new filter.

#18 @peterwilsoncc
4 years ago

  • Keywords needs-refresh added

I've started a pull request based on 40351.4.patch so the existing tests begin running on the screen.

It's still a WIP but so far I have made a few minor changes:

  • Documented countable_status filter, I suspect it will need a rename to the plural
  • Some tidy up for coding standards
  • Modified maybe_recount_posts_for_term() to allow for developers deferring post counts by switching to wp_update_term_count()
  • There were a few errors in the patch that I am slowly tidying up and the unit tests alert me to them, as best I can tell mostly type casting that needs fixing up

#19 @peterwilsoncc
4 years ago

Further notes on what needs to be added for this to be ready

  • Shortly after deleted_term_relationships and added_term_relationship fire, calls to wp_update_term_count() need to be removed/modified as the proposed patch doesn't defer term counting like the original plugin.
  • Including a term count on edit_term adds a new location in which terms are counted and doesn't seem like it's needed
  • countable_status is new, very useful but unrelated to this feature. It's probably best left for another ticket.

I'm leaving my PR as is at moment (in some need of change) so I can get some others' feedback on it before updating. As many eyes as possible would be dandy, thanks.

This ticket was mentioned in PR #519 on WordPress/wordpress-develop by peterwilsoncc.


4 years ago
#20

  • Keywords has-unit-tests added; needs-unit-tests needs-refresh removed

Completely revised approach to scaling term counting.

  • respects deferring of term counting
  • replaces calls to the current functions with calls to new functions
  • does not introduce new full count points
  • terms added in wp_insert_post() can be double counted (this was happening with the earlier approach too)
  • has a single unit test to demonstrate the bug & prevent passing.

Trac ticket: https://core.trac.wordpress.org/ticket/40351

#21 @peterwilsoncc
4 years ago

I've created a new PR with a different slightly different approach to account for how term counting currently works in Core.

Both approaches are currently a little broken, off-by-one errors, but I think the revised approach is an improvement in terms of maintaining backward compatibility. The new PR is at https://github.com/WordPress/wordpress-develop/pull/519, comments on it would be dandy.

This ticket was mentioned in Slack in #core by hellofromtonya. View the logs.


4 years ago

#23 @peterwilsoncc
4 years ago

In updated pull request #519:

  • unit tests for term counts, they pass on both trunk and the branch now :)
  • register_taxonomy() accepts a new argument update_count_by_callback for incrementing and decrementing term count. For backcompat it calls update_count_callback if it's defined by "count_by" is not
  • A new function _wp_prevent_term_counting() used to prevent double counting when adding terms within wp_insert_post() and wp_publish_post()
  • DB queries are deferred until the final numbers are known when transitioning post counts to reduce the number of UPDATE calls
  • wp_set_object_terms() and wp_remove_object_terms() now check the taxonomies object type to determine whether the counts should be updated immediately
  • All of Core's calls to the older functions have been removed

Questions:

  • Has anything been missed?
  • Should I be checking trashing and deleting posts too in the unit tests?
  • Is _wp_prevent_term_counting() too much of a hack and using a reflection preferable?
  • Has anything been missed (yes, asking this question twice)?

As the definition of early 5.6 is beginning to be stretched some additional eyes on this would be superb.

peterwilsoncc commented on PR #519:


4 years ago
#24

@NeilWJames Thanks for reviewing.

As update_count_callback() is expecting two arguments in current releases of WP, leaving $modify_by out of the call is intentional.

If a developer has defined a custom counting callback, calling that will ensure their counts are correct until they update how they are registering their taxonomy.

This ticket was mentioned in Slack in #core by hellofromtonya. View the logs.


4 years ago

#26 @hellofromTonya
4 years ago

  • Keywords dev-feedback added

Per discussion in today's bug scrub, adding needs dev-feedback.

This ticket was mentioned in Slack in #core by peterwilsoncc. View the logs.


4 years ago

#28 @peterwilsoncc
4 years ago

Following the discussion in last week's 0500 UTC dev chat I have done further testing.

PR updates since:

  • further tests: basically putting in to code the tests cases I thought of.
  • one minor change to actual code to prevent off-by-X errors.
  • typo fix in Taxonomy class

I reached out to Boone Gorges via Slack for additional feedback: while happy with the code he does have some concern that off-by-one error will remain in the database in the event of race conditions due to simultaneous updates or propagation delays across database servers.

I have noticed an issue with taxonomies applying to users or comments: deleting the object does not delete the object/term relationship, thus causing term counts to be out. I have tested this on both trunk and this PR's branch and the problem occurs on both indicating term counts for non-post objects may already be out and can not be handled by this ticket.

I think this is good to go and aim to commit just prior to this week's dev chat in two days. If anything else if found between now and then, it can be removed from the milestone.

@peterwilsoncc
4 years ago

Wrapping up the docs.

#29 @peterwilsoncc
4 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 49141:

Taxonomy: Improve performance of term recounting database queries.

When modifying terms assigned to an object, replace full term recounts with incrementing/decrementing the count as appropriate. This provides a significant performance boost on sites with a high number of term/object relationships and/or posts.

Introduces the functions wp_increment_term_count(), wp_decrement_term_count(), wp_modify_term_count_by() and wp_modify_term_count_by_now() for updating the term count.

Introduces the function _wp_prevent_term_counting() for preventing double counting on posts that are about to transition.

Adds the parameter update_count_by_callback to register_taxonomy() to allow developers to use a custom callback for incrementing or decrementing a term count.

Props boonebgorges, davidbaumwald, hellofromTonya, johnbillion, lcyh78, mattoperry, peterwilsoncc, rebasaurus, whyisjake.
Fixes #40351.

#30 in reply to: ↑ 16 ; follow-up: @peterwilsoncc
4 years ago

  • Keywords needs-dev-note removed

Replying to hellofromTonya:

As discussed during <bug-scrub>, this one will get a Misc Dev Note for the new filter.

As noted in comment 19, I removed the new filter as despite been handy it's off-purpose for this ticket.

I have opened #51517 for the purpose of including additional statuses in term counts.

#31 in reply to: ↑ 30 @hellofromTonya
4 years ago

Replying to peterwilsoncc:

Replying to hellofromTonya:

As discussed during <bug-scrub>, this one will get a Misc Dev Note for the new filter.

As noted in comment 19, I removed the new filter as despite been handy it's off-purpose for this ticket.

I have opened #51517 for the purpose of including additional statuses in term counts.

@peterwilsoncc Do you think the new functions added by this ticket need a dev note?

#32 @whyisjake
4 years ago

Also, are there any considerations for sites that are running the liteweight term counts plugin?

#33 @peterwilsoncc
4 years ago

  • Keywords needs-dev-note added

@hellofromTonya Good point, the new functions and methodology will need a dedicated dev note.

@whyisjake I've opened an issue on the LWTC repository for disabling the plugin in 5.6. It would be dandy if some Automatticians could check for any problems for sites that don't update. As the plugin disables term counts, I don't think there will be.

#34 @ocean90
4 years ago

  • Keywords needs-patch needs-unit-tests added; has-patch needs-testing early has-unit-tests dev-feedback removed
  • Resolution fixed deleted
  • Status changed from closed to reopened

@peterwilsoncc A closure has no access to variables of the parent scope thus the fallback for update_count_by_callback is producing PHP warnings because $args['update_count_callback'] is not set.

#35 @dd32
4 years ago

The PHP Warning is specifically:

wp-includes/class-wp-taxonomy.php:430 - call_user_func() expects parameter 1 to be a valid callback, no array or string given

Adding a use( $args ) to the closure callback will fix it, if that's acceptable for core usage, otherwise the callback will need to run something like $args = get_taxonomy( $taxonomy ).

#36 @davidbaumwald
4 years ago

The use( $args ) should be allowed, as it's used in other parts of core already. For just one example, see https://core.trac.wordpress.org/browser/trunk/src/wp-includes/user.php#L3242

This ticket was mentioned in PR #602 on WordPress/wordpress-develop by peterwilsoncc.


4 years ago
#37

  • Keywords has-patch has-unit-tests added; needs-patch needs-unit-tests removed

#38 @peterwilsoncc
4 years ago

@ocean90 Good catch, thanks.

I've set up a pull request to fix this (using use) and added some tests for the callbacks to the register_taxonomy tests.

#39 @peterwilsoncc
4 years ago

About to commit pull request 602 after discussing with @dd32 to confirm it would fix the warnings on meta properties.

Something that has occurred to me is that it's not possible to use a custom callback for full counts (the update_count_callback argument) and the default callback for partial counts (the update_count_by_callback argument).

A developer defining either argument with the default callback will cause an infinite loop. This has been the case for as long as update_count_callback has existed and I am unable to see a situation in which a developer would need a custom callback for full but not partial counts.

#40 @peterwilsoncc
4 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 49171:

Taxonomy: Fix warnings thrown by custom term count callbacks.

Add a use to a closure to avoid an undefined variable throwing a warning. Adds unit tests to ensure the custom callbacks run as expected when defined.

Follow up to [49141].
Props ocean90, dd32.
Fixes #40351.

@Chouby
4 years ago

Fixes the $taxonomy parameter for the actions edit_term_taxonomy and edited_term_taxonomy

#41 @Chouby
4 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

The new function wp_modify_term_count_by_now() fires the actions edit_term_taxonomy and edited_term_taxonomy with a wrong value for the $taxonomy parameter. It sends the WP_Taxonomy object instead of the taxonomy name, as documented for the actions.
I propose 40351.4.diff to fix the issue.

#42 @Chouby
4 years ago

In Polylang we heavily rely on taxonomies, including taxonomy for terms to create languages and translation groups for terms. However [49141] introduces an issue in counting for these taxonomies.

I wrote 2 simple tests to explain what we are doing.

In this first test, we automatically create a term in a taxomomy "term_taxonomy" when we create a tag. This test passes with WordPress 5.6 beta 1:

<?php
function test_wp_set_object_terms_hooked_to_create_term() {
        $args['update_count_callback'] = '_update_generic_term_count';
        register_taxonomy( 'term_taxonomy', 'term', $args );

        add_action( 'create_term', function( $term_id, $tt_id, $taxonomy ) {
                if ( 'post_tag' === $taxonomy ) {
                        $term = wp_insert_term( uniqid(), 'term_taxonomy' );
                        $lang = $term['term_id'];
                        wp_set_object_terms( $term_id, $lang, 'term_taxonomy' );
                }
        }, 10, 3 );

        $tag_id = $this->factory->tag->create();
        $terms = wp_get_object_terms( $tag_id, 'term_taxonomy' );
        $this->assertCount( 1, $terms ); // Check that a term has been assigned to the tag.
        $term = reset( $terms );
        $this->assertEquals( 1, $term->count ); // Check the term count.
}

In this second test, we do the same but the tag is created by wp_insert_post(). This test passes with WordPress 5.5 but fails wiht 5.6 beta 1:

<?php
function test_wp_set_object_terms_hooked_to_create_term_in_wp_insert_post() {
        $args['update_count_callback'] = '_update_generic_term_count';
        register_taxonomy( 'term_taxonomy', 'term', $args );

        add_action( 'create_term', function( $term_id, $tt_id, $taxonomy ) {
                if ( 'post_tag' === $taxonomy ) {
                        $term = wp_insert_term( uniqid(), 'term_taxonomy' );
                        $lang = $term['term_id'];
                        wp_set_object_terms( $term_id, $lang, 'term_taxonomy' );
                }
        }, 10, 3 );

        $post_id = $this->factory->post->create( array( 'tags_input' => array( 'my_tag' ) ) );
        $tags = wp_get_post_tags( $post_id );
        $tag = reset( $tags );
        $terms = wp_get_object_terms( $tag->term_id, 'term_taxonomy' );
        $this->assertCount( 1, $terms ); // Check that a term has been assigned to the tag.
        $term = reset( $terms );
        $this->assertEquals( 1, $term->count ); // Check the term count, fails in WP 5.6.
}

The issue is caused by the call to _wp_prevent_term_counting() by wp_insert_post(). It's ok for all the terms assigned directly by this function, but it breaks the counting of terms assigned in functions hooked to actions fired while the counting is prevented.

I am not sure it would fix all issues but one possible solution for Polylang would be to make _wp_prevent_term_counting() taxonomy dependent. That the term counting would be prevented only for taxonomies handled directly by wp_insert_post() instead of all taxonomies at the same time.

Last edited 4 years ago by Chouby (previous) (diff)

@lcyh78
4 years ago

Alternative fix proposed. Seems problem occurs as parameter $taxonomy is changed to an object a few lines earlier. This does as in wp_set_object_terms and wp_remove_object_terms.

#43 @peterwilsoncc
4 years ago

@Chouby @lcyh78 Thanks for the patches re: sending the taxonomy object instead of the name, I will put that in early next week. I will probably use 40351.4.diff as it matches the code used for full term counts in _update_post_term_count().

@Chouby For the terms created and assigned using the filter, I will create a follow-up ticket as the patch might need a few iterations and unit tests so I think it will be confusing to continue it here. Again, I will action this next week and cc you on the ticket when I do so.

#44 @Chouby
4 years ago

@peterwilsoncc Thanks. Let me know, I will propose there a new test without extra taxonomy but only categories, which tends to demonstrate that preventing counting should be term dependent.

#45 @peterwilsoncc
4 years ago

#51630 created for followup to the issue @Chouby reported in comment:42 above.

#46 @peterwilsoncc
4 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 49316:

Taxonomy: Fix values passed to actions in wp_modify_term_count_by_now().

Replace the WP_Taxonomy object with the taxonomy slug in the values passed to the actions edit_term_taxonomy and edited_term_taxonomy within wp_modify_term_count_by_now().

Follow up to [49141], [49171].
Props Chouby, lcyh78.
Fixes #40351.

#47 @peterwilsoncc
4 years ago

Hopefully nothing further will come up on this but if it does, please open another ticket on the Taxonomy component and cc me in the description for followup.

It's a fairly major change in approach so will be good to track things separately, as is happening in #51630.

This ticket was mentioned in Slack in #core by hellofromtonya. View the logs.


4 years ago

This ticket was mentioned in PR #675 on WordPress/wordpress-develop by peterwilsoncc.


4 years ago
#49

Trac ticket: https://core.trac.wordpress.org/ticket/40351

inherit status checks when adding/removing term counts.

#51 @peterwilsoncc
4 years ago

  • Keywords needs-dev-note removed
  • Milestone changed from 5.6 to Future Release
  • Resolution fixed deleted
  • Status changed from closed to reopened

Reopening with intent to revert, the fix on #51630 is leaving me a little nervous and without it the enhancement is broken. PR to revert is up and will be committed in the next few hours.

#52 @peterwilsoncc
4 years ago

In 49451:

Taxonomy: Revert Light-weight/partial term counts.

Partial revert of [49141], [49171], [49316].

All functional changes are removed, appropriate term counting unit tests are retained.

See #40351.

#53 @peterwilsoncc
3 years ago

  • Keywords needs-patch added; has-patch has-unit-tests removed
  • Milestone changed from Future Release to 5.9

Putting this on 5.9 for wrapping up, it was very close in 5.6 but didn't quite make it.

During the 5.6 I started on a pull request with one potential approach but ended up reverting as I wasn't happy with it. Preventing double counts needs to be more specific than the original code that was reverted but there are neater was to do it than in the PR.

This ticket was mentioned in PR #1601 on WordPress/wordpress-develop by peterwilsoncc.


3 years ago
#54

  • Keywords has-patch has-unit-tests added; needs-patch removed

This reverts commit 088fd3cd390a8006a88810ab43073c6a0ca25e07.

Trac ticket: TBA (leaving blank until ready to link).

#55 follow-up: @johnbillion
3 years ago

@peterwilsoncc Would you be so kind as to summarize the outstanding issues with this change please? Do any of the issues identified affect LTCU (and can therefore be fixed and tested in the plugin first) or only the port for core?

#56 in reply to: ↑ 55 @peterwilsoncc
3 years ago

Replying to johnbillion:

@peterwilsoncc Would you be so kind as to summarize the outstanding issues with this change please?

There's only one issue as far as I am aware.

_wp_prevent_term_counting() was designed to prevent term counting when adding/removing terms during wp_insert_post() and defer the changes to the transition_post_status hook. This is to prevent double counting a change (ie, adding a count of two for a single item).

Unfortunately it proved too blunt as terms added to another object (either a different post, or another object type) went uncounted too. There are some examples in #51630.

The LTCU plugin tracks the individual object IDs for the equivalent functionality. I considered adding something like that but as WP is functional, the function's signature became rather unwieldy.

This ticket was mentioned in Slack in #core by audrasjb. View the logs.


3 years ago

#58 @audrasjb
3 years ago

  • Keywords early added
  • Milestone changed from 5.9 to Future Release

As per today's bug scrub and since tomorrow is WP 5.9 feature freeze, I'm moving this ticket to Future release. Also adding early workflow keyword since this enhancement needs to be properly checked to avoid backwards compatibility issues.

#59 @peterwilsoncc
15 months ago

#56598 was marked as a duplicate.

#60 @yourfluf
7 months ago

This issue is quite a big impediment for large sites.. hopefully as it get can get resolved as my performance is suffering

Note: See TracTickets for help on using tickets.