#59188 closed defect (bug) (fixed)
WP_Posts_List_Table causes post, postmeta, and term caches to be primed for all pages in the DB on load
Reported by: |
|
Owned by: |
|
---|---|---|---|
Milestone: | 6.4 | Priority: | normal |
Severity: | normal | Version: | 6.1 |
Component: | Query | Keywords: | has-patch dev-feedback has-unit-tests needs-dev-note |
Focuses: | administration, performance | Cc: |
Description
This behavior primarily triggers if a user is using an external object cache on the second load of /wp-admin/edit.php?post_type=page once the query cache has been populated for id=>parent relationships for pages. It is reproducible on the Pages listing, but will also apply to any custom post type declared as hierarchical.
The WP_Posts_List_Table class has a method called prepare_items which is called when building the table view at /wp-admin/edit.php?post_type=page. This function calls wp_edit_posts_query():
This function checks to see if the post type is hierarchical, and if it is, sets posts_per_page to -1. This is done because it wouldn't be possible to build a hierarchical view of posts with pagination without understanding the relationship between all of the IDs in the database for that post type:
https://github.com/WordPress/WordPress/blob/master/wp-admin/includes/post.php#L1274
Once the query parameters are built, they are passed to the wp function for execution:
https://github.com/WordPress/WordPress/blob/master/wp-admin/includes/post.php#L1283
The wp function then calls the main method on the global $wp object, which runs query_posts, which calls $wp_the_query->query, which invokes get_posts on a WP_Query object.
Once the query is built, WP_Query attempts to see if there are cached results for that query:
https://github.com/WordPress/WordPress/blob/master/wp-includes/class-wp-query.php#L3173
If cached results are found (which would only happen after the query was run once and cached, hence my comment above about why this only triggers on the second load of the pages view with a persistent object cache in play like memcached) then this logic block flows down to where it forks based on what was requested in 'fields', which in this case is not just IDs, it's id=>parent relationships, so it flows into the else and triggers _prime_post_caches for all pages (or any other hierarchical post type) including all post meta and all terms attached to those posts. Although the prime post caches function will only prime caches for posts that aren't already in the cache, if the cache was recently flushed or cache salts or versions rotated, this can be a very expensive operation that can crash a site if it's loading in a significant number of hierarchical posts and/or if those posts have a lot of metadata.
I would recommend marking a query that is looking for id=>parent with a posts_per_page of -1 as not cacheable to prevent this behavior.
Note: I created this ticket during WordCamp US Contributor Day, and was having trouble with Core Trac being overloaded with requests, which inhibited my ability to search for related tickets. If this is a duplicate, I apologize, and will be happy to contribute to related tickets.
Change History (74)
#1
@
19 months ago
- Summary changed from WP_Posts_List_Table causes post, page, and term caches to be primed for all pages in the DB on load to WP_Posts_List_Table causes post, postmeta, and term caches to be primed for all pages in the DB on load
#2
@
19 months ago
- Keywords needs-patch added
- Milestone changed from Awaiting Review to Future Release
This ticket was mentioned in PR #5091 on WordPress/wordpress-develop by @kevinfodness.
19 months ago
#3
- Keywords has-patch added; needs-patch removed
#4
@
19 months ago
- Milestone changed from Future Release to 6.4
Thanks @kevinfodness. I'm pulling this into the 6.4 milestone for review
This ticket was mentioned in PR #5193 on WordPress/wordpress-develop by @spacedmonkey.
19 months ago
#7
Trac ticket: https://core.trac.wordpress.org/ticket/59188
#8
@
19 months ago
- Keywords dev-feedback added
I put together a quick and simple PR here. I think this will fix the issue.
@spacedmonkey commented on PR #5193:
19 months ago
#9
Can you review / test this patch see if fixes the issue @kevinfodness?
This ticket was mentioned in Slack in #core-performance by spacedmonkey. View the logs.
19 months ago
#11
@
18 months ago
I've testedPR 5193 on a server running a persistent cache.
It appears the pages table has two items causing an excessive amount of caching:
- the list table itself (per the ticket)
- the
wp_dropdown_pages()
call when generating the inline edit interface
When using a persistent cache, the meta data cache is primed on the second request but not the first (possibly a bug in WP_Query
. This is happening both with and without trunk checked out.
I'll keep investigating and try to figure out what is happening.
#12
@
18 months ago
I think the issue here. Per [53941], when fields id
/ id=>parents
were passed, term and meta caches were not primed, even if update_post_term_cache
or update_post_meta_cache
is passed. See this code snippet. Now in WP 6.1+, on the second run term and meta are primed. See this line.
We could consider doing something like this.
if ( ! isset( $q['update_post_term_cache'] ) ) {
if( 'id=>parent' === $q['fields'] ){
$q['update_post_term_cache'] = false;
}else{
$q['update_post_term_cache'] = true;
}
}
if ( ! isset( $q['update_post_meta_cache'] ) ) {
if( 'id=>parent' === $q['fields'] ){
$q['update_post_meta_cache'] = false;
}else{
$q['update_post_meta_cache'] = true;
}
}
This ticket was mentioned in PR #5211 on WordPress/wordpress-develop by @spacedmonkey.
18 months ago
#13
Trac ticket:
@spacedmonkey commented on PR #5193:
18 months ago
#14
Closing in favour of https://github.com/WordPress/wordpress-develop/pull/5211
@peterwilsoncc commented on PR #5211:
18 months ago
#16
I think it's a good idea to change the defaults for these but I put together some tests based on this branch and am seeing that the options are not being respected so it might need a little work.
I've attached the diff (I had to zip it as GH doesn't accept diff's as uploads).
This ticket was mentioned in PR #5219 on WordPress/wordpress-develop by @peterwilsoncc.
18 months ago
#17
- Keywords has-unit-tests added
Trac ticket:
@peterwilsoncc commented on PR #5211:
18 months ago
#18
@spacedmonkey I've built up on this in PR #5219.
This ticket was mentioned in Slack in #core-performance by joemcgill. View the logs.
18 months ago
This ticket was mentioned in PR #5245 on WordPress/wordpress-develop by @spacedmonkey.
18 months ago
#20
Trac ticket:
@spacedmonkey commented on PR #5219:
18 months ago
#21
@peterwilsoncc @joemcgill I have been wonder if a number of years if we should just make update_post_term_cache
and update_post_meta_cache
work consistently. At the moment, these do not work consistently and there are code paths where do and do not work. I have put together a PR based on this to make this the case.
https://github.com/WordPress/wordpress-develop/pull/5245
Do we think this is a good idea or be a problem?
This ticket was mentioned in Slack in #core-performance by spacedmonkey. View the logs.
18 months ago
This ticket was mentioned in PR #5253 on WordPress/wordpress-develop by @spacedmonkey.
18 months ago
#23
Trac ticket: https://core.trac.wordpress.org/ticket/59188
This ticket was mentioned in Slack in #core-performance by spacedmonkey. View the logs.
18 months ago
This ticket was mentioned in Slack in #core-performance by spacedmonkey. View the logs.
18 months ago
This ticket was mentioned in Slack in #core-performance by spacedmonkey. View the logs.
18 months ago
#29
@
18 months ago
I attempted to reproduce Kevin's experience while on trunk, but didn't see the reported issue on the second page load of /wp-admin/edit.php?post_type=page
However, I was able to reproduce the issue described in #59319 where the page crashes due to out of memory issue. This was reproducible with a data set of 60,000 pages. While @spacedmonkey's PR to not cache the terms and meta is an improvement, the root of the issue appears to stem from the posts_per_page being set to -1, which causes memory exhaustion with a large data set.
#30
@
18 months ago
@LinSoftware Before, _prime_post_caches
was called, meaning for a site with 60k worth of pages, that would load all the post objects ( including post content ) and post meta and terms. But this code avoids using _prime_post_caches
and just caches the post parents, which is an int.
60k work of ints in an array is large, but I do not believe we would be hitting memory limits.
I believe it maybe impossible to remove the posts_per_page being set to -1, as this might be a breaking change.
Out of interest, are you using a object caching plugin that supports wp_cache_get_multiple?
#31
@
18 months ago
I tested this and it works well for me. I tested it using a facsimile of WordPress VIP's environment, which uses memcached, and supports wp_cache_get_multiple. I did not test it using a huge number of pages, though.
#32
@
18 months ago
I just tested the patch with a sandbox site with 60,000 pages and it worked fine.
#33
@
18 months ago
I am using the same object cache dropin as Kevin which does support wp_cache_get_multiple. I'm not sure why we're getting different results. I'll continue to investigate.
This ticket was mentioned in Slack in #core-performance by mukeshpanchal27. View the logs.
18 months ago
This ticket was mentioned in Slack in #core-performance by spacedmonkey. View the logs.
18 months ago
#36
@
18 months ago
This patch works fine to solve for the issue raised here. The issue I was seeing is unrelated to this change.
This ticket was mentioned in Slack in #core by spacedmonkey. View the logs.
18 months ago
This ticket was mentioned in PR #5370 on WordPress/wordpress-develop by @spacedmonkey.
18 months ago
#39
Fixes the phpcs issue introduced in https://core.trac.wordpress.org/changeset/56763.
Trac ticket: https://core.trac.wordpress.org/ticket/59188
This ticket was mentioned in PR #5372 on WordPress/wordpress-develop by @spacedmonkey.
18 months ago
#42
Trac ticket: https://core.trac.wordpress.org/ticket/59188
Props @mukeshpanchal27
@spacedmonkey commented on PR #5253:
18 months ago
#43
@spacedmonkey commented on PR #5253:
18 months ago
#44
@mukeshpanchal27 Feedback actioned in https://github.com/WordPress/wordpress-develop/pull/5372
@spacedmonkey commented on PR #5372:
18 months ago
#47
This ticket was mentioned in PR #5386 on WordPress/wordpress-develop by @peterwilsoncc.
18 months ago
#48
https://core.trac.wordpress.org/ticket/59188
- primes parent id cache in
update_post_cache()
- adds
@since
annotation - renames function for clarity --- it is currently a little unclear and could imply the parent post object is cached in its entirity
- adds parameter to
_prime_post_caches
for priming the post parent ID caches if not already primed.
The docblock for the new parameter needs some work.
#49
@
18 months ago
I initially logged this on the incorrect ticket, so reposting here
I've opened a follow up pull request to address some concerns I have
- I'm not sure the name of the function is clear that it only primes the parent IDs, not the parent post object (I think @spacedmonkey had this question at some point too)
- Include priming of the parent ID cache when priming post objects to avoid additional queries at a later time.
- Include a parameter for caching the parent ID of already cached posts.
Jonny and I have discussed whether all of this is appropriate on the PR so it will need some additional eyes for a decision.
#50
@
18 months ago
I am 100% behind the function name change. Makes sense.
I am not sure about priming the parent cache in update_post_cache. Early versions of this patch did this and I removed it. In my research, id=>parents queries are pretty uncommon. In core it seems to mostly be used on the page list in wp-admin. This new cache is only useful for this type of query. So it is only primed on that type of query. Meaning if you use id=>parents lot the caches will be primed. Loading this cache whenever a post is loaded feels a little wasteful. I will list the reasons.
- Most of post types are not hierarchical, meaning that this cache would not be needed / used.
- This cache is only used in this query. Meaning that the cache would be set and loaded into memory for every post. I know that the cache is only an int, but if you have 100s of posts on a page, this will take up php memory that is not needed.
- Adding another multi add for this cache, adds overhead. This is more calls to external object cache server for on unused cache.
I think that we should consider priming this parent cache, if this cache is used anywhere outside WP_Query. We could consider using it in the following functions.
- wp_get_post_parent_id
- get_post_parent
- get_post_ancestors
Those functions could be converted to us this cache, when id is passed. This would save the need to cache lookup and creation of new WP_Post object. But I think this change should be considered in another ticket, as it is out of the scope of this bug fix.
Does this make sense @peterwilsoncc ?
joemcgill commented on PR #5386:
18 months ago
#51
@peterwilsoncc and @spacedmonkey, it seems like there's consensus here to commit the function name change and the @since
annotation updates, but some disagreement about whether the new cache should be primed or if it should be removed and replaced with updates to the main WP_Query cache, if I'm understanding @peterwilsoncc's comment above correctly:
I'd question the value of introducing the new cache and instead change the shape of the cache in WP_Query calls requesting id=>parent fields to include the post parent
Can you two clarify what is needed to move this forward?
@spacedmonkey commented on PR #5386:
18 months ago
#52
@joemcgill
What we are trying to avoid from the original ticket, was lookuping the whole post object if all you need is a parent id. This is why this new cache was introduced.
So here is something I considered in the original PR.
- We could change all id queries to get id and parent id. Then cache the parent ids in as another key in cache the array. Okay, this would help id=>parent queries, but make id queries slower, as you are requested more from the database. As id queries are FAR more common, I think we should optimize for id queries over the rare id=>parent.
- On the first run on the query, the parent cache is primed if id=>parent. These caches are not expiring. There are only invalided if you update the post or flush the whole cache. Meaning, if you are using object caching, then the first time you load the post list page in wp-admin, the parent will be primed and cache will live forever. If not there a couple of posts that were updated, the new function will do a very small query to get the missing caches.
- Priming this cache when it is not used or needed, takes resources. Adding this priming may negatively affect performance.
How about just updating the post parent cache on wp_insert_post
, that way the value would be updated in cache just once and not primed all the time?
This ticket was mentioned in PR #5420 on WordPress/wordpress-develop by @spacedmonkey.
18 months ago
#53
PR is based on https://github.com/WordPress/wordpress-develop/pull/5386. It is simplier PR, that just changes the function name and prime the post parent cache on post save.
Trac ticket: https://core.trac.wordpress.org/ticket/59188
#54
@
18 months ago
I have created a scaled down PR https://github.com/WordPress/wordpress-develop/pull/5420 for review.
@spacedmonkey commented on PR #5420:
18 months ago
#55
wp_cache_flush_group( 'posts' )
This is a documented way of clearing caches. You should call clean_post_cache
is the way of cleaning caches.
We could reuse the cache group post and make the id something like this post_parent-123
This ticket was mentioned in PR #5442 on WordPress/wordpress-develop by @spacedmonkey.
18 months ago
#56
Trac ticket: https://core.trac.wordpress.org/ticket/59188
@spacedmonkey commented on PR #5420:
18 months ago
#57
Closing in favour of https://github.com/WordPress/wordpress-develop/pull/5442
This ticket was mentioned in PR #5443 on WordPress/wordpress-develop by @spacedmonkey.
18 months ago
#58
Break out from https://github.com/WordPress/wordpress-develop/pull/5420.
Trac ticket: https://core.trac.wordpress.org/ticket/59188
@peterwilsoncc commented on PR #5442:
18 months ago
#60
This ticket was mentioned in PR #5447 on WordPress/wordpress-develop by @peterwilsoncc.
18 months ago
#61
Modifies _prime_post_parent_id_caches()
to use a prefixed cache key and the posts
group for caching the post parent IDs. This is to ensure the caches are cleared as part of a call to wp_cache_flush_group( 'posts' )
.
#62
@
18 months ago
The function has been renamed to _prime_post_parent_id_caches()
in [56811] to ensure it gets in to the beta scheduled 24ish hours from now.
Still under discussion:
@spacedmonkey commented on PR #5447:
18 months ago
#63
Couple of tweaks requested here. I don't feel super strongly about this change. I don't think it is needed, but I don't think it hurts anything either.
@spacedmonkey commented on PR #5447:
18 months ago
#64
I also +1 have a util to get generate cache keys. That code is repeated a lot.
This ticket was mentioned in Slack in #core-performance by spacedmonkey. View the logs.
18 months ago
@peterwilsoncc commented on PR #5447:
18 months ago
#66
I've merged in @spacedmonkey's suggestions.
While the individual keys are generated several times, ie post_parent::{$id}
, the collection of keys is only generated once in each file so I'm not sure it's worth committing to a utility function for two repetitions. If such a function becomes necessary then it would be good to answer questions about whether it should be specific or general purpose first.
It's too last minute now but I am thinking it might be possible to add a prefix to _get_non_cached_ids()
to allow for future instances similar to this as WP improves caching for DB queries.
This ticket was mentioned in Slack in #core-performance by mukeshpanchal27. View the logs.
18 months ago
@spacedmonkey commented on PR #5447:
18 months ago
#68
@peterwilsoncc this is good to commit
@peterwilsoncc commented on PR #5447:
18 months ago
#69
@spacedmonkey Thanks, I've just rebased and will commit it today.
@peterwilsoncc commented on PR #5447:
18 months ago
#71
#72
follow-up:
↓ 73
@
17 months ago
@peterwilsoncc Is this good to close now, or something else needs to be done here? Asking since I saw the other commit with the name change landing.
#73
in reply to:
↑ 72
@
17 months ago
- Resolution set to fixed
- Status changed from reopened to closed
Replying to flixos90:
@peterwilsoncc Is this good to close now, or something else needs to be done here? Asking since I saw the other commit with the name change landing.
The only remaining open question in this ticket is whether the newly introduced post parent ID cache should be primed when a post is saved, See PR #5443.
Personally, I don't see a lot of value in this, given that for most sites, this cache will be non-persistent, and for sites that do have a persistent cache, this DB query would be more efficiently called when reading versus when writing data, where it would only be in the cache when some query is requesting it—and then, only the first time.
I'm going to mark this as closed, and we can always add the post save priming later in a follow-up ticket.
Fixes a bug whereby the Pages list view in the admin (or any other hierarchical post type) performs a query for
id=>parent
with a limit of-1
which results in priming post caches for all posts of that post type, including postmeta and terms, that aren't already in the cache, which can crash sites with an empty cache and a lot of posts/postmeta.