Make WordPress Core

Opened 15 years ago

Closed 12 years ago

Last modified 5 years ago

#11399 closed defect (bug) (fixed)

clean_post_cache() generates enormous amounts of queries on sites with lots of pages, revisions or attachments

Reported by: denis-de-bernardy's profile Denis-de-Bernardy Owned by: nacin's profile nacin
Milestone: 3.5 Priority: normal
Severity: normal Version: 2.9
Component: Optimization Keywords: has-patch
Focuses: Cc:

Description

It's recursively doing queries such as this one:

SELECT ID FROM www_posts WHERE post_parent = 592

For one thing it's not checking if has already been run for a particular post/page. If a root page gets cleaned up several times during the same page row, we re-do the whole series of pages, attachments, revisions and what not all over again.

It prompted another thought, too: how important is it to keep non-published data in memcached? It seems to me that we could change wp_cache_add(), etc. in such a way that the non-published data remains available over in the posts bucket, without getting shoved into memcached. Were we to do so, we could change the above query so it adds a constraint on post_status. That's a lot less queries.

Attachments (4)

cpc.patch (1.6 KB) - added by jwriteclub 14 years ago.
Changes clean_post_cache to (in theory) reduce database calls
11399.sw.1.diff (610 bytes) - added by simonwheatley 12 years ago.
Don't bother recursing into revisions when cleaning post caches as they won't have changed
11399.sw.2.diff (3.4 KB) - added by simonwheatley 12 years ago.
Don't bother recursing into revisions when cleaning post caches, refactor wp_publish_post as a wrapper for wp_update_post, remove _save_post and associated add_action, don't recurse cleaning_post_caches in wp_delete_post
11399.diff (1.8 KB) - added by nacin 12 years ago.
Removes ancestors cache, as posts cache is plenty. Removes cache invalidation for children, as ancestors are no longer cached against a post.

Download all attachments as: .zip

Change History (32)

#1 @Denis-de-Bernardy
15 years ago

  • Keywords needs-patch added

#2 @hakre
15 years ago

Nice find. Maybe we should differ two cases: Clean the post cache of a certain page or clean the post cache of all pages.

#3 @jwriteclub
14 years ago

I don't think that there is a fix for the issue of recursing, at least not on the face of it.

It might be possible to generate a list of found pages (aka dynamic programming) to help prevent duplicate recursion, but the fundamental problem of recursing could only be "solved" by changing how the heirarchy is represented in the database. Rather than storing "parent" values, left and right node values could be stored (in essence, storing the tree traversal in the DB). Unfortunately, this would represent a major modification to the table, which I don't think makes sense.

@jwriteclub
14 years ago

Changes clean_post_cache to (in theory) reduce database calls

#4 @jwriteclub
14 years ago

  • Keywords has-patch needs-feedback added

I added a simple little patch to generate a list of files which need to be patch cleared, and then execute them at once. It will discontinue if it's already found the id, therefore (in theory) saving some database calls.

HOWEVER, it only exists within the context of outermost call which does NOT include the found array, and, as such, probably won't have much impact unless it is propagated across multiple calls, which would require changing what calls clean_post_cache -- or at least those calls which have the most impact.

#5 @ryan
14 years ago

  • Milestone changed from 3.0 to 3.1

#6 @nacin
14 years ago

  • Milestone changed from Awaiting Triage to Future Release

#7 follow-up: @westi
14 years ago

This isn't very scalable at the moment.

Not sure what the best approach to improving it is.

#8 in reply to: ↑ 7 @simonwheatley
12 years ago

  • Cc simon@… added

Replying to westi:

This isn't very scalable at the moment.

Not sure what the best approach to improving it is.

Could you elaborate on what you don't think is scalable? Is it the passing of the array of already found IDs?

I've run into this problem in an install today. Turning down the number of revisions helps a little.

Separately, I've been playing with a similar implementation (before I found this ticket I whipped up a quick class which did the cache clearance and stored the IDs processed in a property, to avoid re-processing) and I'm actually unconvinced that the number of queries is reduced as there's not that many duplicates in the list to start with. The problem is sheer numbers, and running all the cache clearances in series rather than as a batch.

I think the problem is also exacerbated if you are running an external object cache, as any latency for each request starts to add up.

Basically, I'm left scratching my head here.

#9 @wonderboymusic
12 years ago

A problem for every post type. This was a huge problem for forums in bbPress with 1000s of topics which combined had 100,000K+ replies - PHP would run out of memory because clean_post_cache was stacking 4 queries per post resulting in ~600,000 database queries when you edited a giant forum. JJJ's essence is here: http://bbpress.trac.wordpress.org/browser/branches/plugin/bbp-includes/bbp-core-cache.php?rev=4011

#10 @simonwheatley
12 years ago

  • Keywords needs-patch removed

Part of the problem is that the query to get the post children, which are then iterated over, includes revisions. Given that revisions shouldn't (?) change, perhaps the query could be adjusted to not clear the caches of children which are revisions. Something like the next diff.

In my client's particularly deeply nested hierarchy of this reduced the number of times clean_post_cache was called from approximately 4,500 calls to 500 calls with WP_POST_REVISIONS set to 10 (so it could have been even worse).

@simonwheatley
12 years ago

Don't bother recursing into revisions when cleaning post caches as they won't have changed

#11 follow-up: @simonwheatley
12 years ago

I think the key assertion to be certain of is: "once created, revisions don't change as part of a standard post update". If this is not true, then the patch doesn't hold water.

#12 in reply to: ↑ 11 @simonwheatley
12 years ago

Replying to simonwheatley:

I think the key assertion to be certain of is: "once created, revisions don't change as part of a standard post update". If this is not true, then the patch doesn't hold water.

Brief chat in #wordpress-dev where azaozz confirms that "revisions (should) never change".

#13 follow-up: @nacin
12 years ago

Currently, ancestors are cached directly in the post object. This is really the only reason to look through the entire tree and invalidate caches. While revisions should never change, this would result with polluted data for their ancestors.

There's a plan in #21309 to split ancestors into another cache bucket. This may go a long way to enabling us to drop this.

Second, has anyone noticed that in wp_delete_post() and possibly elsewhere, we call clean_post_cache() on its children after clean_post_cache() would already have done that?

#14 @nacin
12 years ago

  • Milestone changed from Future Release to 3.5

We also double-up with _save_post_hook(). We call clean_post_cache() in wp_insert_post() but again on the 'save_post' hook.

'save_post' is also fired in wp_publish_post(). But is there any reason why wp_publish_post() doesn't just wrap wp_insert_post()? I didn't think so either.

Let's keep calling clear_post_cache() directly in wp_insert_post(), truncate wp_publish_post(), and remove _save_post_hook(). I can't imagine we actually need them in both places; earlier is probably better.

@simonwheatley
12 years ago

Don't bother recursing into revisions when cleaning post caches, refactor wp_publish_post as a wrapper for wp_update_post, remove _save_post and associated add_action, don't recurse cleaning_post_caches in wp_delete_post

#15 in reply to: ↑ 13 ; follow-up: @simonwheatley
12 years ago

Thanks for the feedback.

Replying to nacin:

Currently, ancestors are cached directly in the post object. This is really the only reason to look through the entire tree and invalidate caches. While revisions should never change, this would result with polluted data for their ancestors.

I didn't follow this point, sorry.

11399.sw.2.diff implements other suggestions (refactor wp_publish_post as a wrapper for wp_update_post, remove _save_post and associated add_action, don't recurse cleaning_post_caches in wp_delete_post).

I think there is an issue with revision caches hanging around when posts are deleted in 11399.sw.2.diff. Possibly either manually call clean_post_cache on each revision within wp_delete_post, or have a boolean "hard" parameter for clean_post_cache which includes the revisions if set but defaults to off.

#16 in reply to: ↑ 15 @simonwheatley
12 years ago

Replying to simonwheatley:

I think there is an issue with revision caches hanging around when posts are deleted in 11399.sw.2.diff. Possibly either manually call clean_post_cache on each revision within wp_delete_post, or have a boolean "hard" parameter for clean_post_cache which includes the revisions if set but defaults to off.

False alarm. Turns out this is the product of my fevered imagination, wp_delete_post calls wp_delete_revision on each revision earlier in the function, and this will clean all the post caches.

#18 @nacin
12 years ago

In [21942]:

Have wp_publish_post() wrap wp_insert_post() directly. see #11399.

#19 @nacin
12 years ago

In [21943]:

Call clean_post_cache() in wp_insert_post() after the manual query to change GUID. Remove the second call to clean_post_cache() previously done on the save_post hook. see #11399.

@nacin
12 years ago

Removes ancestors cache, as posts cache is plenty. Removes cache invalidation for children, as ancestors are no longer cached against a post.

#20 @scribu
12 years ago

Regarding [21942], this was attempted at least once in the past and ryan reverted it due to some problem related to cron. Are you sure that problem isn't back again?

Last edited 12 years ago by scribu (previous) (diff)

#21 @nacin
12 years ago

In [21951]:

Don't repeatedly call get_object_taxonomies() in clean_object_term_cache(). see #11399.

#22 @nacin
12 years ago

  • Keywords dev-feedback needs-feedback removed

If [21942] needs to be reverted, we can do that through #21963. See also [4077] #2737 #2715 #4620.

#23 @nacin
12 years ago

  • Owner set to nacin
  • Resolution set to fixed
  • Status changed from new to closed

In [21952]:

Stop cleaning the cache of a post's children. Ancestors are no longer cached against the post object, which means this kind of walking is unnecessary. It is also prohibitively expensive with large hierarchies.

We need to remove post_ancestors non-persistent caching for this. get_post_ancestors() can simply rely on the caching of get_post() instead. Previously, it was a direct query, hence the extra layers of caching and clearing.

Child cache clearing stays in wp_delete_post() as children get a new parent.

fixes #11399.

#24 @nacin
12 years ago

In [21953]:

Properly indent after [21952]. see #11399.

#25 @nacin
12 years ago

In 22357:

clean_post_cache() no longer calls itself recursively. see #11399.

#26 @nacin
12 years ago

In 23206:

Revert [21942] and have wp_publish_post() deal with the database directly. clean_post_cache() is now also called directly due to [21943].

fixes #22944 for trunk.
Unit tests: [1174/tests].

see #11399. see #21963.

#27 follow-up: @DrewAPicture
7 years ago

@nacin Are we to a point where we can formally deprecate _save_post_hook()? You effectively soft-deprecated it in [21943]. Hard-deprecation has been raised in #41121.

#28 in reply to: ↑ 27 @nacin
5 years ago

Replying to DrewAPicture:

@nacin Are we to a point where we can formally deprecate _save_post_hook()? You effectively soft-deprecated it in [21943]. Hard-deprecation has been raised in #41121.

The reason why it was soft-deprecated in [21943] is because it was an internal callback tied to save_post. A deprecated notice felt superfluous, but probably could be added without any trouble.

Note: See TracTickets for help on using tickets.