Make WordPress Core

Opened 11 years ago

Closed 11 years ago

#23167 closed task (blessed) (fixed)

Cache incrementors for get_pages

Reported by: nprasath002's profile nprasath002 Owned by: westi's profile westi
Milestone: 3.6 Priority: normal
Severity: normal Version: 3.5
Component: Cache API Keywords: has-patch needs-unit-tests
Focuses: Cc:

Description

Make use of a caching incrementor and get pages from their individual cache buckets to avoid memory exhaustion errors in php scripts

Attachments (6)

caching_incrementors_for_get_pages.patch (2.4 KB) - added by nprasath002 11 years ago.
23167-ut.diff (2.3 KB) - added by ryan 11 years ago.
23167.diff (1.9 KB) - added by ryan 11 years ago.
Remove the last_changed bumps from wp_insert_post() since clean_post_cache() handles that.
23167-ut.2.diff (3.2 KB) - added by ryan 11 years ago.
23167-ut.3.diff (3.7 KB) - added by ryan 11 years ago.
23167.2.diff (1.4 KB) - added by ryan 11 years ago.
For post.php and comment.php

Download all attachments as: .zip

Change History (23)

#1 @jkudish
11 years ago

  • Cc jkudish added

#2 @westi
11 years ago

  • Keywords needs-unit-tests added
  • Milestone changed from Awaiting Review to 3.6
  • Owner set to westi
  • Status changed from new to reviewing

This looks great it would be great to see a little more detail from the testing you did on this here for other people to understand the issue and how this resolves it.

Some background for everyone:

The basic issue as I recall is that the data cached by get_pages can get very large for large hierarchies because of all the "post" data being cached for each node - we saw an example where the memcache object cache backend had to split the get_pages cache into 62 1 Meg chunks.

The patch addresses this by only caching the IDs as the post objects are already cached.

Some questions for nprasath002:

  • When we don't have an object cache backend are we now doing more queries for get_pages calls, if so how many more - i.e. is it a single extra query or does the query volume scale with the number of pages (From memory the old code fetched all the page info in one query).
  • Can you write some unit-tests for get_pages to show that the result returned from the function has not changed in any way from the expected behaviour?

#3 @SergeyBiryukov
11 years ago

  • Version changed from trunk to 3.5

#4 @ryan
11 years ago

We're thinking alike. #23173

23173 can be an umbrella ticket for all such changes as this.

#5 @ryan
11 years ago

Since wp_insert_post() calls clean_post_cache() it doesn't need to do its own last_changed bumping.

#6 @ryan
11 years ago

Unlike posts, pages are less likely to have a hot cache. Caching just the IDs and then running get_post() on them could result in a lot of extra queries when a persistent cache backend is not being used. We definitely need to investigate that. If we go with caching IDs, we should consider introducing and using wp_cache_get_multi().

#7 @ryan
11 years ago

Looks like caching only IDs will be just fine since get_pages() calls update_post_cache(). This ensures the pages are cached should we do the same get_pages() query later in the same page load. On the next page load the cache is empty when there is no persistent cache so get_pages() does a full pages query and cache set since there are no cached IDs to loop over. Query counts are looking fine so far in testing.

Last edited 11 years ago by ryan (previous) (diff)

@ryan
11 years ago

@ryan
11 years ago

Remove the last_changed bumps from wp_insert_post() since clean_post_cache() handles that.

@ryan
11 years ago

@ryan
11 years ago

#9 @ryan
11 years ago

  • Type changed from enhancement to task (blessed)

#10 @ryan
11 years ago

In 23300:

In get_pages(), cache queries to individual cache buckets instead of storing them in one cached array. Also, store post IDs instead of full objects. This reduces overall memory usage as well as the size of the cache buckets. Use incrementor style passive cache invalidation.

Props nprasath002
see #23167

#11 @ryan
11 years ago

$last_changed = wp_cache_set( 'last_changed', 1, 'posts' );

The default cache returns boolean true from wp_cache_set(). This works out since true is cast to '1'. Some cache backends return void which is cast to an empty string. If the same query is run again later in the page load it uses a different cache key resulting in another query instead of using the already cached query. The keys look like this:

get_pages:3577d9c3933b07d4dc753e71a0a49d72:
get_pages:3577d9c3933b07d4dc753e71a0a49d72:1

To accommodate non-compliant caches and to make the code clearer, let's explicitly assign 1 to last_changed.

Version 0, edited 11 years ago by ryan (next)

@ryan
11 years ago

For post.php and comment.php

#12 @ryan
11 years ago

In 23319:

Explicitly set last_changed to 1 instead of the result of wp_cache_set(). Avoids ambiguity and works with cache backends that return void instead of boolean.

see #23167

#13 @ryan
11 years ago

In 23320:

Avoid the appearance of a magic number.

Props nacin
see #23167

#14 @kurtpayne
11 years ago

  • Cc kurtpayne added

r23320 changed 'comment' to 'posts' and now unit test Tests_Comment_Query::test_get_comments_for_post is failing.

#15 @nacin
11 years ago

In 23341:

Use correct cache bucket. Fixes copy-paste error in r23320. props kurtpayne. see #23167.

#16 @mark-k
11 years ago

Resetting $last_changed to 1 is wrong. It is very unlikely but if $last_changed got purged from cache while the content of the buckets remained (it was on different memcache server which failed) you might get a key which points to old data which might still be in the cache.
In other words this way you increase the possibility of key collision.

IMO it is better to use a random initial seed every time it needs to be calculated.

#17 @ryan
11 years ago

  • Resolution set to fixed
  • Status changed from reviewing to closed

#23448 for using time in last_changed.

Note: See TracTickets for help on using tickets.