WordPress.org

Make WordPress Core

#23167 closed task (blessed) (fixed)

Cache incrementors for get_pages

Reported by: nprasath002 Owned by: westi
Milestone: 3.6 Priority: normal
Severity: normal Version: 3.5
Component: Cache API Keywords: has-patch needs-unit-tests
Focuses: Cc:

Description

Make use of a caching incrementor and get pages from their individual cache buckets to avoid memory exhaustion errors in php scripts

Attachments (6)

caching_incrementors_for_get_pages.patch (2.4 KB) - added by nprasath002 15 months ago.
23167-ut.diff (2.3 KB) - added by ryan 15 months ago.
23167.diff (1.9 KB) - added by ryan 15 months ago.
Remove the last_changed bumps from wp_insert_post() since clean_post_cache() handles that.
23167-ut.2.diff (3.2 KB) - added by ryan 15 months ago.
23167-ut.3.diff (3.7 KB) - added by ryan 15 months ago.
23167.2.diff (1.4 KB) - added by ryan 15 months ago.
For post.php and comment.php

Download all attachments as: .zip

Change History (23)

comment:1 jkudish15 months ago

  • Cc jkudish added

comment:2 westi15 months ago

  • Keywords needs-unit-tests added
  • Milestone changed from Awaiting Review to 3.6
  • Owner set to westi
  • Status changed from new to reviewing

This looks great it would be great to see a little more detail from the testing you did on this here for other people to understand the issue and how this resolves it.

Some background for everyone:

The basic issue as I recall is that the data cached by get_pages can get very large for large hierarchies because of all the "post" data being cached for each node - we saw an example where the memcache object cache backend had to split the get_pages cache into 62 1 Meg chunks.

The patch addresses this by only caching the IDs as the post objects are already cached.

Some questions for nprasath002:

  • When we don't have an object cache backend are we now doing more queries for get_pages calls, if so how many more - i.e. is it a single extra query or does the query volume scale with the number of pages (From memory the old code fetched all the page info in one query).
  • Can you write some unit-tests for get_pages to show that the result returned from the function has not changed in any way from the expected behaviour?

comment:3 SergeyBiryukov15 months ago

  • Version changed from trunk to 3.5

comment:4 ryan15 months ago

We're thinking alike. #23173

23173 can be an umbrella ticket for all such changes as this.

comment:5 ryan15 months ago

Since wp_insert_post() calls clean_post_cache() it doesn't need to do its own last_changed bumping.

comment:6 ryan15 months ago

Unlike posts, pages are less likely to have a hot cache. Caching just the IDs and then running get_post() on them could result in a lot of extra queries when a persistent cache backend is not being used. We definitely need to investigate that. If we go with caching IDs, we should consider introducing and using wp_cache_get_multi().

comment:7 ryan15 months ago

Looks like caching only IDs will be just fine since get_pages() calls update_post_cache(). This ensures the pages are cached should we do the same get_pages() query later in the same page load. On the next page load the cache is empty when there is no persistent cache so get_pages() does a full pages query and cache set. Query counts are looking fine so far in testing.

Version 1, edited 15 months ago by ryan (previous) (next) (diff)

ryan15 months ago

ryan15 months ago

Remove the last_changed bumps from wp_insert_post() since clean_post_cache() handles that.

ryan15 months ago

ryan15 months ago

comment:9 ryan15 months ago

  • Type changed from enhancement to task (blessed)

comment:10 ryan15 months ago

In 23300:

In get_pages(), cache queries to individual cache buckets instead of storing them in one cached array. Also, store post IDs instead of full objects. This reduces overall memory usage as well as the size of the cache buckets. Use incrementor style passive cache invalidation.

Props nprasath002
see #23167

comment:11 ryan15 months ago

$last_changed = wp_cache_set( 'last_changed', 1, 'posts' );

The default cache returns boolean true from wp_cache_set(). This works out since true is cast to '1'. Some cache backends, however, return void. This is cast to an empty string. If the same query is run again later in the page load it uses a different cache key resulting in another query instead of using the already cached query. The keys look something like this:

get_pages:3577d9c3933b07d4dc753e71a0a49d72:
get_pages:3577d9c3933b07d4dc753e71a0a49d72:1

To accommodate non-compliant caches and to make the code clearer, let's explicitly assign 1 to last_changed.

Last edited 15 months ago by ryan (previous) (diff)

ryan15 months ago

For post.php and comment.php

comment:12 ryan15 months ago

In 23319:

Explicitly set last_changed to 1 instead of the result of wp_cache_set(). Avoids ambiguity and works with cache backends that return void instead of boolean.

see #23167

comment:13 ryan15 months ago

In 23320:

Avoid the appearance of a magic number.

Props nacin
see #23167

comment:14 kurtpayne15 months ago

  • Cc kurtpayne added

r23320 changed 'comment' to 'posts' and now unit test Tests_Comment_Query::test_get_comments_for_post is failing.

comment:15 nacin15 months ago

In 23341:

Use correct cache bucket. Fixes copy-paste error in r23320. props kurtpayne. see #23167.

comment:16 mark-k15 months ago

Resetting $last_changed to 1 is wrong. It is very unlikely but if $last_changed got purged from cache while the content of the buckets remained (it was on different memcache server which failed) you might get a key which points to old data which might still be in the cache.
In other words this way you increase the possibility of key collision.

IMO it is better to use a random initial seed every time it needs to be calculated.

comment:17 ryan14 months ago

  • Resolution set to fixed
  • Status changed from reviewing to closed

#23448 for using time in last_changed.

Note: See TracTickets for help on using tickets.