#11399 closed defect (bug) (fixed)
clean_post_cache() generates enormous amounts of queries on sites with lots of pages, revisions or attachments
Reported by: | Denis-de-Bernardy | Owned by: | nacin |
---|---|---|---|
Milestone: | 3.5 | Priority: | normal |
Severity: | normal | Version: | 2.9 |
Component: | Optimization | Keywords: | has-patch |
Focuses: | Cc: |
Description
It's recursively doing queries such as this one:
SELECT ID FROM www_posts WHERE post_parent = 592
For one thing it's not checking if has already been run for a particular post/page. If a root page gets cleaned up several times during the same page row, we re-do the whole series of pages, attachments, revisions and what not all over again.
It prompted another thought, too: how important is it to keep non-published data in memcached? It seems to me that we could change wp_cache_add(), etc. in such a way that the non-published data remains available over in the posts bucket, without getting shoved into memcached. Were we to do so, we could change the above query so it adds a constraint on post_status. That's a lot less queries.
Attachments (4)
Change History (32)
#3
@
14 years ago
I don't think that there is a fix for the issue of recursing, at least not on the face of it.
It might be possible to generate a list of found pages (aka dynamic programming) to help prevent duplicate recursion, but the fundamental problem of recursing could only be "solved" by changing how the heirarchy is represented in the database. Rather than storing "parent" values, left and right node values could be stored (in essence, storing the tree traversal in the DB). Unfortunately, this would represent a major modification to the table, which I don't think makes sense.
#4
@
14 years ago
- Keywords has-patch needs-feedback added
I added a simple little patch to generate a list of files which need to be patch cleared, and then execute them at once. It will discontinue if it's already found the id, therefore (in theory) saving some database calls.
HOWEVER, it only exists within the context of outermost call which does NOT include the found array, and, as such, probably won't have much impact unless it is propagated across multiple calls, which would require changing what calls clean_post_cache -- or at least those calls which have the most impact.
#7
follow-up:
↓ 8
@
14 years ago
This isn't very scalable at the moment.
Not sure what the best approach to improving it is.
#8
in reply to:
↑ 7
@
12 years ago
- Cc simon@… added
Replying to westi:
This isn't very scalable at the moment.
Not sure what the best approach to improving it is.
Could you elaborate on what you don't think is scalable? Is it the passing of the array of already found IDs?
I've run into this problem in an install today. Turning down the number of revisions helps a little.
Separately, I've been playing with a similar implementation (before I found this ticket I whipped up a quick class which did the cache clearance and stored the IDs processed in a property, to avoid re-processing) and I'm actually unconvinced that the number of queries is reduced as there's not that many duplicates in the list to start with. The problem is sheer numbers, and running all the cache clearances in series rather than as a batch.
I think the problem is also exacerbated if you are running an external object cache, as any latency for each request starts to add up.
Basically, I'm left scratching my head here.
#9
@
12 years ago
A problem for every post type. This was a huge problem for forums in bbPress with 1000s of topics which combined had 100,000K+ replies - PHP would run out of memory because clean_post_cache was stacking 4 queries per post resulting in ~600,000 database queries when you edited a giant forum. JJJ's essence is here: http://bbpress.trac.wordpress.org/browser/branches/plugin/bbp-includes/bbp-core-cache.php?rev=4011
#10
@
12 years ago
- Keywords needs-patch removed
Part of the problem is that the query to get the post children, which are then iterated over, includes revisions. Given that revisions shouldn't (?) change, perhaps the query could be adjusted to not clear the caches of children which are revisions. Something like the next diff.
In my client's particularly deeply nested hierarchy of this reduced the number of times clean_post_cache
was called from approximately 4,500 calls to 500 calls with WP_POST_REVISIONS
set to 10 (so it could have been even worse).
@
12 years ago
Don't bother recursing into revisions when cleaning post caches as they won't have changed
#11
follow-up:
↓ 12
@
12 years ago
I think the key assertion to be certain of is: "once created, revisions don't change as part of a standard post update". If this is not true, then the patch doesn't hold water.
#12
in reply to:
↑ 11
@
12 years ago
Replying to simonwheatley:
I think the key assertion to be certain of is: "once created, revisions don't change as part of a standard post update". If this is not true, then the patch doesn't hold water.
Brief chat in #wordpress-dev where azaozz confirms that "revisions (should) never change".
#13
follow-up:
↓ 15
@
12 years ago
Currently, ancestors are cached directly in the post object. This is really the only reason to look through the entire tree and invalidate caches. While revisions should never change, this would result with polluted data for their ancestors.
There's a plan in #21309 to split ancestors into another cache bucket. This may go a long way to enabling us to drop this.
Second, has anyone noticed that in wp_delete_post() and possibly elsewhere, we call clean_post_cache() on its children after clean_post_cache() would already have done that?
#14
@
12 years ago
- Milestone changed from Future Release to 3.5
We also double-up with _save_post_hook(). We call clean_post_cache() in wp_insert_post() but again on the 'save_post' hook.
'save_post' is also fired in wp_publish_post(). But is there any reason why wp_publish_post() doesn't just wrap wp_insert_post()? I didn't think so either.
Let's keep calling clear_post_cache() directly in wp_insert_post(), truncate wp_publish_post(), and remove _save_post_hook(). I can't imagine we actually need them in both places; earlier is probably better.
@
12 years ago
Don't bother recursing into revisions when cleaning post caches, refactor wp_publish_post as a wrapper for wp_update_post, remove _save_post and associated add_action, don't recurse cleaning_post_caches in wp_delete_post
#15
in reply to:
↑ 13
;
follow-up:
↓ 16
@
12 years ago
Thanks for the feedback.
Replying to nacin:
Currently, ancestors are cached directly in the post object. This is really the only reason to look through the entire tree and invalidate caches. While revisions should never change, this would result with polluted data for their ancestors.
I didn't follow this point, sorry.
11399.sw.2.diff implements other suggestions (refactor wp_publish_post as a wrapper for wp_update_post, remove _save_post and associated add_action, don't recurse cleaning_post_caches in wp_delete_post).
I think there is an issue with revision caches hanging around when posts are deleted in 11399.sw.2.diff. Possibly either manually call clean_post_cache
on each revision within wp_delete_post
, or have a boolean "hard" parameter for clean_post_cache
which includes the revisions if set but defaults to off.
#16
in reply to:
↑ 15
@
12 years ago
Replying to simonwheatley:
I think there is an issue with revision caches hanging around when posts are deleted in 11399.sw.2.diff. Possibly either manually call
clean_post_cache
on each revision withinwp_delete_post
, or have a boolean "hard" parameter forclean_post_cache
which includes the revisions if set but defaults to off.
False alarm. Turns out this is the product of my fevered imagination, wp_delete_post
calls wp_delete_revision
on each revision earlier in the function, and this will clean all the post caches.
@
12 years ago
Removes ancestors cache, as posts cache is plenty. Removes cache invalidation for children, as ancestors are no longer cached against a post.
#20
@
12 years ago
Regarding [21942], this was attempted at least once in the past and ryan reverted it due to some problem related to cron. Are you sure that problem isn't back again?
#23
@
12 years ago
- Owner set to nacin
- Resolution set to fixed
- Status changed from new to closed
In [21952]:
#28
in reply to:
↑ 27
@
5 years ago
Replying to DrewAPicture:
@nacin Are we to a point where we can formally deprecate
_save_post_hook()
? You effectively soft-deprecated it in [21943]. Hard-deprecation has been raised in #41121.
The reason why it was soft-deprecated in [21943] is because it was an internal callback tied to save_post
. A deprecated notice felt superfluous, but probably could be added without any trouble.
Nice find. Maybe we should differ two cases: Clean the post cache of a certain page or clean the post cache of all pages.