Make WordPress Core

Opened 19 months ago

Closed 17 months ago

Last modified 5 months ago

#59188 closed defect (bug) (fixed)

WP_Posts_List_Table causes post, postmeta, and term caches to be primed for all pages in the DB on load

Reported by: kevinfodness's profile kevinfodness Owned by: spacedmonkey's profile spacedmonkey
Milestone: 6.4 Priority: normal
Severity: normal Version: 6.1
Component: Query Keywords: has-patch dev-feedback has-unit-tests needs-dev-note
Focuses: administration, performance Cc:

Description

This behavior primarily triggers if a user is using an external object cache on the second load of /wp-admin/edit.php?post_type=page once the query cache has been populated for id=>parent relationships for pages. It is reproducible on the Pages listing, but will also apply to any custom post type declared as hierarchical.

The WP_Posts_List_Table class has a method called prepare_items which is called when building the table view at /wp-admin/edit.php?post_type=page. This function calls wp_edit_posts_query():

https://github.com/WordPress/WordPress/blob/master/wp-admin/includes/class-wp-posts-list-table.php#L165

This function checks to see if the post type is hierarchical, and if it is, sets posts_per_page to -1. This is done because it wouldn't be possible to build a hierarchical view of posts with pagination without understanding the relationship between all of the IDs in the database for that post type:

https://github.com/WordPress/WordPress/blob/master/wp-admin/includes/post.php#L1274

Once the query parameters are built, they are passed to the wp function for execution:

https://github.com/WordPress/WordPress/blob/master/wp-admin/includes/post.php#L1283

The wp function then calls the main method on the global $wp object, which runs query_posts, which calls $wp_the_query->query, which invokes get_posts on a WP_Query object.

Once the query is built, WP_Query attempts to see if there are cached results for that query:

https://github.com/WordPress/WordPress/blob/master/wp-includes/class-wp-query.php#L3173

If cached results are found (which would only happen after the query was run once and cached, hence my comment above about why this only triggers on the second load of the pages view with a persistent object cache in play like memcached) then this logic block flows down to where it forks based on what was requested in 'fields', which in this case is not just IDs, it's id=>parent relationships, so it flows into the else and triggers _prime_post_caches for all pages (or any other hierarchical post type) including all post meta and all terms attached to those posts. Although the prime post caches function will only prime caches for posts that aren't already in the cache, if the cache was recently flushed or cache salts or versions rotated, this can be a very expensive operation that can crash a site if it's loading in a significant number of hierarchical posts and/or if those posts have a lot of metadata.

I would recommend marking a query that is looking for id=>parent with a posts_per_page of -1 as not cacheable to prevent this behavior.

Note: I created this ticket during WordCamp US Contributor Day, and was having trouble with Core Trac being overloaded with requests, which inhibited my ability to search for related tickets. If this is a duplicate, I apologize, and will be happy to contribute to related tickets.

Change History (74)

#1 @kevinfodness
19 months ago

  • Summary changed from WP_Posts_List_Table causes post, page, and term caches to be primed for all pages in the DB on load to WP_Posts_List_Table causes post, postmeta, and term caches to be primed for all pages in the DB on load

#2 @joemcgill
19 months ago

  • Keywords needs-patch added
  • Milestone changed from Awaiting Review to Future Release

This ticket was mentioned in PR #5091 on WordPress/wordpress-develop by @kevinfodness.


19 months ago
#3

  • Keywords has-patch added; needs-patch removed

Fixes a bug whereby the Pages list view in the admin (or any other hierarchical post type) performs a query for id=>parent with a limit of -1 which results in priming post caches for all posts of that post type, including postmeta and terms, that aren't already in the cache, which can crash sites with an empty cache and a lot of posts/postmeta.

#4 @joemcgill
19 months ago

  • Milestone changed from Future Release to 6.4

Thanks @kevinfodness. I'm pulling this into the 6.4 milestone for review

#5 @spacedmonkey
19 months ago

  • Owner set to spacedmonkey
  • Status changed from new to assigned

#6 @spacedmonkey
19 months ago

#59319 was marked as a duplicate.

#8 @spacedmonkey
19 months ago

  • Keywords dev-feedback added

I put together a quick and simple PR here. I think this will fix the issue.

@spacedmonkey commented on PR #5193:


19 months ago
#9

Can you review / test this patch see if fixes the issue @kevinfodness?

This ticket was mentioned in Slack in #core-performance by spacedmonkey. View the logs.


19 months ago

#11 @peterwilsoncc
18 months ago

I've testedPR 5193 on a server running a persistent cache.

It appears the pages table has two items causing an excessive amount of caching:

  • the list table itself (per the ticket)
  • the wp_dropdown_pages() call when generating the inline edit interface

When using a persistent cache, the meta data cache is primed on the second request but not the first (possibly a bug in WP_Query. This is happening both with and without trunk checked out.

I'll keep investigating and try to figure out what is happening.

#12 @spacedmonkey
18 months ago

I think the issue here. Per [53941], when fields id / id=>parents were passed, term and meta caches were not primed, even if update_post_term_cache or update_post_meta_cache is passed. See this code snippet. Now in WP 6.1+, on the second run term and meta are primed. See this line.

We could consider doing something like this.

                if ( ! isset( $q['update_post_term_cache'] ) ) {
                       if( 'id=>parent' === $q['fields'] ){
                         $q['update_post_term_cache'] = false;
                       }else{
                          $q['update_post_term_cache'] = true;
                       }
                }

                if ( ! isset( $q['update_post_meta_cache'] ) ) {
                        if( 'id=>parent' === $q['fields'] ){
                         $q['update_post_meta_cache'] = false;
                       }else{
                          $q['update_post_meta_cache'] = true;
                       }
                }


This ticket was mentioned in PR #5211 on WordPress/wordpress-develop by @spacedmonkey.


18 months ago
#13

Trac ticket:

@peterwilsoncc commented on PR #5211:


18 months ago
#16

I think it's a good idea to change the defaults for these but I put together some tests based on this branch and am seeing that the options are not being respected so it might need a little work.

I've attached the diff (I had to zip it as GH doesn't accept diff's as uploads).

59188-a-tests.diff.zip

This ticket was mentioned in PR #5219 on WordPress/wordpress-develop by @peterwilsoncc.


18 months ago
#17

  • Keywords has-unit-tests added

Trac ticket:

@peterwilsoncc commented on PR #5211:


18 months ago
#18

@spacedmonkey I've built up on this in PR #5219.

This ticket was mentioned in Slack in #core-performance by joemcgill. View the logs.


18 months ago

This ticket was mentioned in PR #5245 on WordPress/wordpress-develop by @spacedmonkey.


18 months ago
#20

Trac ticket:

@spacedmonkey commented on PR #5219:


18 months ago
#21

@peterwilsoncc @joemcgill I have been wonder if a number of years if we should just make update_post_term_cache and update_post_meta_cache work consistently. At the moment, these do not work consistently and there are code paths where do and do not work. I have put together a PR based on this to make this the case.

https://github.com/WordPress/wordpress-develop/pull/5245

Do we think this is a good idea or be a problem?

This ticket was mentioned in Slack in #core-performance by spacedmonkey. View the logs.


18 months ago

This ticket was mentioned in Slack in #core-performance by spacedmonkey. View the logs.


18 months ago

#25 @spacedmonkey
18 months ago

@kevinfodness Is there any chance you could review / test my PR.

#26 @kevinfodness
18 months ago

@spacedmonkey yep, will do!

This ticket was mentioned in Slack in #core-performance by spacedmonkey. View the logs.


18 months ago

This ticket was mentioned in Slack in #core-performance by spacedmonkey. View the logs.


18 months ago

#29 @LinSoftware
18 months ago

I attempted to reproduce Kevin's experience while on trunk, but didn't see the reported issue on the second page load of /wp-admin/edit.php?post_type=page

However, I was able to reproduce the issue described in #59319 where the page crashes due to out of memory issue. This was reproducible with a data set of 60,000 pages. While @spacedmonkey's PR to not cache the terms and meta is an improvement, the root of the issue appears to stem from the posts_per_page being set to -1, which causes memory exhaustion with a large data set.

#30 @spacedmonkey
18 months ago

@LinSoftware Before, _prime_post_caches was called, meaning for a site with 60k worth of pages, that would load all the post objects ( including post content ) and post meta and terms. But this code avoids using _prime_post_caches and just caches the post parents, which is an int.

60k work of ints in an array is large, but I do not believe we would be hitting memory limits.

I believe it maybe impossible to remove the posts_per_page being set to -1, as this might be a breaking change.

Out of interest, are you using a object caching plugin that supports wp_cache_get_multiple?

#31 @kevinfodness
18 months ago

I tested this and it works well for me. I tested it using a facsimile of WordPress VIP's environment, which uses memcached, and supports wp_cache_get_multiple. I did not test it using a huge number of pages, though.

#32 @kevinfodness
18 months ago

I just tested the patch with a sandbox site with 60,000 pages and it worked fine.

#33 @LinSoftware
18 months ago

I am using the same object cache dropin as Kevin which does support wp_cache_get_multiple. I'm not sure why we're getting different results. I'll continue to investigate.

This ticket was mentioned in Slack in #core-performance by mukeshpanchal27. View the logs.


18 months ago

This ticket was mentioned in Slack in #core-performance by spacedmonkey. View the logs.


18 months ago

#36 @LinSoftware
18 months ago

This patch works fine to solve for the issue raised here. The issue I was seeing is unrelated to this change.

This ticket was mentioned in Slack in #core by spacedmonkey. View the logs.


18 months ago

#38 @spacedmonkey
18 months ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 56763:

Query: Improve caching behavior for WP_Query when retrieving id=>parent fields

In [53941], the addition of query caching to WP_Query brought about an unintended issue when querying for fields equal to id=>parent. Specifically, on websites with object caching enabled and a substantial number of pages, the second run of this query triggered the _prime_post_caches function for id=>parent. This led to the unnecessary priming of post, meta, and term caches, even when only id and parent information were requested.

This commit addresses this issue by introducing a new function, _prime_post_parents_caches, which primes a dedicated cache for post parents. This cache is primed during the initial query execution. Subsequently, the wp_cache_get_multiple function is employed to retrieve all post parent data in a single object cache request, optimizing performance.

Additionally, this commit extends the coverage of existing unit tests to ensure the reliability of the changes.

Props kevinfodness, joemcgill, peterwilsoncc, LinSoftware, thekt12, spacedmonkey.
Fixes #59188

#40 @hellofromTonya
18 months ago

In 56766:

Query: Fix a PHPCS issue in _prime_post_parents_caches() tests.

Removes an extraneous line break in the _prime_post_parents_caches() tests.

Follow-up to [56763].

Unprops spacedmonkey.
Props mukesh27, costdev.
See #59188.

#41 @spacedmonkey
18 months ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

This ticket was mentioned in PR #5372 on WordPress/wordpress-develop by @spacedmonkey.


18 months ago
#42

Trac ticket: https://core.trac.wordpress.org/ticket/59188

Props @mukeshpanchal27

#45 @spacedmonkey
18 months ago

  • Keywords needs-dev-note added

#46 @spacedmonkey
18 months ago

In 56773:

Docs: Improve documentation for _prime_post_parents_caches().

Add in missing since PHPDoc block for _prime_post_parents_caches function added in [56763].

Props mukesh27.
See #59188

This ticket was mentioned in PR #5386 on WordPress/wordpress-develop by @peterwilsoncc.


18 months ago
#48

https://core.trac.wordpress.org/ticket/59188

  • primes parent id cache in update_post_cache()
  • adds @since annotation
  • renames function for clarity --- it is currently a little unclear and could imply the parent post object is cached in its entirity
  • adds parameter to _prime_post_caches for priming the post parent ID caches if not already primed.

The docblock for the new parameter needs some work.

#49 @peterwilsoncc
18 months ago

I initially logged this on the incorrect ticket, so reposting here

I've opened a follow up pull request to address some concerns I have

  • I'm not sure the name of the function is clear that it only primes the parent IDs, not the parent post object (I think @spacedmonkey had this question at some point too)
  • Include priming of the parent ID cache when priming post objects to avoid additional queries at a later time.
  • Include a parameter for caching the parent ID of already cached posts.

Jonny and I have discussed whether all of this is appropriate on the PR so it will need some additional eyes for a decision.

#50 @spacedmonkey
18 months ago

I am 100% behind the function name change. Makes sense.

I am not sure about priming the parent cache in update_post_cache. Early versions of this patch did this and I removed it. In my research, id=>parents queries are pretty uncommon. In core it seems to mostly be used on the page list in wp-admin. This new cache is only useful for this type of query. So it is only primed on that type of query. Meaning if you use id=>parents lot the caches will be primed. Loading this cache whenever a post is loaded feels a little wasteful. I will list the reasons.

  • Most of post types are not hierarchical, meaning that this cache would not be needed / used.
  • This cache is only used in this query. Meaning that the cache would be set and loaded into memory for every post. I know that the cache is only an int, but if you have 100s of posts on a page, this will take up php memory that is not needed.
  • Adding another multi add for this cache, adds overhead. This is more calls to external object cache server for on unused cache.

I think that we should consider priming this parent cache, if this cache is used anywhere outside WP_Query. We could consider using it in the following functions.

  • wp_get_post_parent_id
  • get_post_parent
  • get_post_ancestors

Those functions could be converted to us this cache, when id is passed. This would save the need to cache lookup and creation of new WP_Post object. But I think this change should be considered in another ticket, as it is out of the scope of this bug fix.

Does this make sense @peterwilsoncc ?

joemcgill commented on PR #5386:


18 months ago
#51

@peterwilsoncc and @spacedmonkey, it seems like there's consensus here to commit the function name change and the @since annotation updates, but some disagreement about whether the new cache should be primed or if it should be removed and replaced with updates to the main WP_Query cache, if I'm understanding @peterwilsoncc's comment above correctly:

I'd question the value of introducing the new cache and instead change the shape of the cache in WP_Query calls requesting id=>parent fields to include the post parent

Can you two clarify what is needed to move this forward?

@spacedmonkey commented on PR #5386:


18 months ago
#52

@joemcgill

What we are trying to avoid from the original ticket, was lookuping the whole post object if all you need is a parent id. This is why this new cache was introduced.

So here is something I considered in the original PR.

  • We could change all id queries to get id and parent id. Then cache the parent ids in as another key in cache the array. Okay, this would help id=>parent queries, but make id queries slower, as you are requested more from the database. As id queries are FAR more common, I think we should optimize for id queries over the rare id=>parent.
  • On the first run on the query, the parent cache is primed if id=>parent. These caches are not expiring. There are only invalided if you update the post or flush the whole cache. Meaning, if you are using object caching, then the first time you load the post list page in wp-admin, the parent will be primed and cache will live forever. If not there a couple of posts that were updated, the new function will do a very small query to get the missing caches.
  • Priming this cache when it is not used or needed, takes resources. Adding this priming may negatively affect performance.

How about just updating the post parent cache on wp_insert_post, that way the value would be updated in cache just once and not primed all the time?

This ticket was mentioned in PR #5420 on WordPress/wordpress-develop by @spacedmonkey.


18 months ago
#53

PR is based on https://github.com/WordPress/wordpress-develop/pull/5386. It is simplier PR, that just changes the function name and prime the post parent cache on post save.

Trac ticket: https://core.trac.wordpress.org/ticket/59188

@spacedmonkey commented on PR #5420:


18 months ago
#55

wp_cache_flush_group( 'posts' )

This is a documented way of clearing caches. You should call clean_post_cache is the way of cleaning caches.

We could reuse the cache group post and make the id something like this post_parent-123

#59 @peterwilsoncc
18 months ago

In 56811:

Query: Rename _prime_post_parents_caches() for clarity.

Change the name of _prime_post_parents_caches() to _prime_post_parent_id_caches() to make it clearer only the parent post ID is cached rather than the entire post parent object.

Follow up to [56763].

Props spacedmonkey, joemcgill, peterwilsoncc.
See #59188.

This ticket was mentioned in PR #5447 on WordPress/wordpress-develop by @peterwilsoncc.


18 months ago
#61

Modifies _prime_post_parent_id_caches() to use a prefixed cache key and the posts group for caching the post parent IDs. This is to ensure the caches are cleared as part of a call to wp_cache_flush_group( 'posts' ).

https://core.trac.wordpress.org/ticket/59188

#62 @peterwilsoncc
18 months ago

The function has been renamed to _prime_post_parent_id_caches() in [56811] to ensure it gets in to the beta scheduled 24ish hours from now.

Still under discussion:

  • Priming the post parent on save, see PR #5443
  • Whether to move the cache to the posts group to ensure the new caches are cleared as part of a call to wp_cache_flush_group( 'posts' ) -- the cache key would require either a prefix or a suffix, see WIP PR #5447

@spacedmonkey commented on PR #5447:


18 months ago
#63

Couple of tweaks requested here. I don't feel super strongly about this change. I don't think it is needed, but I don't think it hurts anything either.

@spacedmonkey commented on PR #5447:


18 months ago
#64

I also +1 have a util to get generate cache keys. That code is repeated a lot.

This ticket was mentioned in Slack in #core-performance by spacedmonkey. View the logs.


18 months ago

@peterwilsoncc commented on PR #5447:


18 months ago
#66

I've merged in @spacedmonkey's suggestions.

While the individual keys are generated several times, ie post_parent::{$id}, the collection of keys is only generated once in each file so I'm not sure it's worth committing to a utility function for two repetitions. If such a function becomes necessary then it would be good to answer questions about whether it should be specific or general purpose first.

It's too last minute now but I am thinking it might be possible to add a prefix to _get_non_cached_ids() to allow for future instances similar to this as WP improves caching for DB queries.

This ticket was mentioned in Slack in #core-performance by mukeshpanchal27. View the logs.


18 months ago

@spacedmonkey commented on PR #5447:


18 months ago
#68

@peterwilsoncc this is good to commit

@peterwilsoncc commented on PR #5447:


18 months ago
#69

@spacedmonkey Thanks, I've just rebased and will commit it today.

#70 @peterwilsoncc
18 months ago

In 56925:

Query: Cache post parent IDs in posts group.

Move the cache of post parent IDs from the dedicated group post_parents to posts. This maintains backward compatibility for clearing all post object related data by calling wp_cache_flush_group( 'posts' ).

Post parent IDs are now cached with with the prefix post_parent: in the posts group.

Follow up to [56763].

Props spacedmonkey, joemcgill, peterwilsoncc.
See #59188.

#72 follow-up: @flixos90
17 months ago

@peterwilsoncc Is this good to close now, or something else needs to be done here? Asking since I saw the other commit with the name change landing.

#73 in reply to: ↑ 72 @joemcgill
17 months ago

  • Resolution set to fixed
  • Status changed from reopened to closed

Replying to flixos90:

@peterwilsoncc Is this good to close now, or something else needs to be done here? Asking since I saw the other commit with the name change landing.

The only remaining open question in this ticket is whether the newly introduced post parent ID cache should be primed when a post is saved, See PR #5443.

Personally, I don't see a lot of value in this, given that for most sites, this cache will be non-persistent, and for sites that do have a persistent cache, this DB query would be more efficiently called when reading versus when writing data, where it would only be in the cache when some query is requesting it—and then, only the first time.

I'm going to mark this as closed, and we can always add the post save priming later in a follow-up ticket.

This ticket was mentioned in Slack in #core-performance by joemcgill. View the logs.


16 months ago

Note: See TracTickets for help on using tickets.