Make WordPress Core

Opened 10 years ago

Closed 3 years ago

Last modified 3 years ago

#29894 closed defect (bug) (fixed)

get_terms() isn't caching - duplicate queries generated

Reported by: webgeekconsulting's profile webgeekconsulting Owned by: boonebgorges's profile boonebgorges
Milestone: 4.6 Priority: normal
Severity: normal Version: 4.0
Component: Taxonomy Keywords:
Focuses: performance Cc:

Description

get_terms() doesn't seem to be caching correctly and therefore generates duplicate queries. In my plugin I have the following calls:

wp_dropdown_categories(array(
	'taxonomy' => 'product_category',
	'hide_empty' => 0,
	'show_option_none' => __('Any', 'mp'),
	'name' => 'child_of',
));
$cats = get_terms('product_category', array(
	'hide_empty' => 0,
));
$cats = get_terms('product_category', array(
	'hide_empty' => 0,
));

Yes, I realize I could technically create my own cache for the duplicate get_terms() function, but no matter what I'll still have 2 queries due to using wp_dropdown_categories() and get_terms() in the same request - even though they generate the same query (see below).

SELECT t.*, tt.* FROM hf9v_terms AS t INNER JOIN hf9v_term_taxonomy AS tt ON t.term_id = tt.term_id WHERE tt.taxonomy IN ('product_category') ORDER BY t.term_id ASC
include('wp-admin/admin-footer.php'), do_action('in_admin_footer'), call_user_func_array, MP_Shortcode_Builder->display_short_code_form, call_user_func, MP_Shortcode_Builder->display_mp_list_categories_attributes, wp_dropdown_categories, get_terms
SELECT t.*, tt.* FROM hf9v_terms AS t INNER JOIN hf9v_term_taxonomy AS tt ON t.term_id = tt.term_id WHERE tt.taxonomy IN ('product_category') ORDER BY t.name ASC
include('wp-admin/admin-footer.php'), do_action('in_admin_footer'), call_user_func_array, MP_Shortcode_Builder->display_short_code_form, call_user_func, MP_Shortcode_Builder->display_mp_list_categories_attributes, get_categories, get_terms
SELECT t.*, tt.* FROM hf9v_terms AS t INNER JOIN hf9v_term_taxonomy AS tt ON t.term_id = tt.term_id WHERE tt.taxonomy IN ('product_category') ORDER BY t.name ASC
include('wp-admin/admin-footer.php'), do_action('in_admin_footer'), call_user_func_array, MP_Shortcode_Builder->display_short_code_form, call_user_func, MP_Shortcode_Builder->display_mp_list_products_attributes, get_terms

As you can see, the SQL statements are identical and therefore the db should only be queried once.

Attachments (6)

taxonomy.php (128.4 KB) - added by webgeekconsulting 10 years ago.
Patch for get_terms() not caching correctly
taxonomy.php.patch (2.4 KB) - added by webgeekconsulting 10 years ago.
taxonomy.php patch
taxonomy.php.update2.patch (1.2 KB) - added by webgeekconsulting 10 years ago.
taxonomy.php patch 2
taxonomy.php.update3.diff (1.3 KB) - added by webgeekconsulting 10 years ago.
taxonomy.php patch 3
29894.test.patch (1.1 KB) - added by boonebgorges 10 years ago.
taxonomy.php.update4.patch (6.7 KB) - added by webgeekconsulting 10 years ago.
taxonomy.php patch 4

Download all attachments as: .zip

Change History (26)

@webgeekconsulting
10 years ago

Patch for get_terms() not caching correctly

#1 @webgeekconsulting
10 years ago

I've attached a patched version of wp-includes/taxonomy.php. Attempting to hash the functions arguments can be unreliable (passing 0/1 instead of true/false and vice verse). So, instead of serializing and hashing the arguments, we can instead use a hash of the actual SQL statement.

Patch tested and verified working in 4.0.

#2 @webgeekconsulting
10 years ago

  • Keywords has-patch added

#3 follow-up: @knutsp
10 years ago

Are you able to attach a patch or diff, not just the complete file, so we can see what's changed?

@webgeekconsulting
10 years ago

taxonomy.php patch

#4 in reply to: ↑ 3 @webgeekconsulting
10 years ago

Replying to knutsp:

Are you able to attach a patch or diff, not just the complete file, so we can see what's changed?

Done - hopefully did that right. :)

#5 follow-up: @boonebgorges
10 years ago

  • Milestone changed from Awaiting Review to 4.1
  • Owner set to boonebgorges
  • Status changed from new to accepted

+1 for this change - using the SQL for a cache key seems much more sane to me.

Changing the key will cause existing persistent caches to become defunct. In certain situations this could cause a cache stampede, but that's a very remote possibility with a small item like this.

#6 in reply to: ↑ 5 @webgeekconsulting
10 years ago

Replying to boonebgorges:

+1 for this change - using the SQL for a cache key seems much more sane to me.

Changing the key will cause existing persistent caches to become defunct. In certain situations this could cause a cache stampede, but that's a very remote possibility with a small item like this.

I've attached another patch that might rectify any chance of a cache stampede. Basically, keep the existing cache check in place, but as an extra layer of protection add in the second cache check using the hash of the SQL statement as the key.

@webgeekconsulting
10 years ago

taxonomy.php patch 2

#7 follow-up: @boonebgorges
10 years ago

webgeekconsulting - This sounds like a promising idea, but taxonomy.php.update2.patch looks like it was incorrectly generated. Can you have another shot? (And ideally, generate your patch from the root of your WP installation?)

@webgeekconsulting
10 years ago

taxonomy.php patch 3

#8 in reply to: ↑ 7 @webgeekconsulting
10 years ago

Replying to boonebgorges:

webgeekconsulting - This sounds like a promising idea, but taxonomy.php.update2.patch looks like it was incorrectly generated. Can you have another shot? (And ideally, generate your patch from the root of your WP installation?)

Let's try this one... taxonomy.php.update3.diff

This ticket was mentioned in IRC in #wordpress-dev by boonebgorges. View the logs.


10 years ago

#10 follow-up: @boonebgorges
10 years ago

  • Keywords needs-patch needs-unit-tests added; has-patch removed
  • Milestone changed from 4.1 to Future Release

webgeekconsulting - Thanks - I'm not sure that update3.diff is the entire patch you meant to attach, but I can see enough to understand what you're suggesting.

After further thought, it appears that using merely the SQL to generate a cache key is not going to work. get_terms() returns different kinds of values depending on the value of the 'fields' parameter, but these different values come from the same database query. Using the SQL alone to create a cache key isn't fine-grained enough.

I think we have to go with a more conservative approach here. Let's keep using the $args array to generate the cache key, but let's typecast all of the cache keys as necessary. Eg:

$args['hide_empty'] = (bool) $args['hide_empty'];

and so on, with all of the relevant arguments. However, I don't think we can safely make these kinds of changes without more complete unit tests, because there are places in the function where loose typing is expected - see eg https://core.trac.wordpress.org/browser/tags/4.0/src/wp-includes/taxonomy.php#L1290.

I'm attaching a simple unit test that demonstrates the problem as originally reported.

#11 in reply to: ↑ 10 @webgeekconsulting
10 years ago

Replying to boonebgorges:

webgeekconsulting - Thanks - I'm not sure that update3.diff is the entire patch you meant to attach, but I can see enough to understand what you're suggesting.

After further thought, it appears that using merely the SQL to generate a cache key is not going to work. get_terms() returns different kinds of values depending on the value of the 'fields' parameter, but these different values come from the same database query. Using the SQL alone to create a cache key isn't fine-grained enough.

I think we have to go with a more conservative approach here. Let's keep using the $args array to generate the cache key, but let's typecast all of the cache keys as necessary. Eg:

$args['hide_empty'] = (bool) $args['hide_empty'];

and so on, with all of the relevant arguments. However, I don't think we can safely make these kinds of changes without more complete unit tests, because there are places in the function where loose typing is expected - see eg https://core.trac.wordpress.org/browser/tags/4.0/src/wp-includes/taxonomy.php#L1290.

I'm attaching a simple unit test that demonstrates the problem as originally reported.

That's a great point which leads me to another possible solution...

What if we were to create an object cache of the underlying query itself? Then, repeat calls to get_terms() that would use the same query, but use the data differently (e.g. the fields parameter) would use use the object cache instead of another query against the database.

If that sounds like something that work, I'll upload another patch tomorrow with these changes.

#12 follow-up: @boonebgorges
10 years ago

What if we were to create an object cache of the underlying query itself? Then, repeat calls to get_terms() that would use the same query, but use the data differently (e.g. the fields parameter) would use use the object cache instead of another query against the database.

Yeah, I was thinking of something similar while looking at this. It's pretty poor caching strategy to store the same data in the cache 5 or 6 separate times (for different values of 'fields'). If we can fix that, it'd be a bonus.

BTW, I got verification from a lead that we don't have to worry about maintaining old cache keys. Feel free to mix it up :)

@webgeekconsulting
10 years ago

taxonomy.php patch 4

#13 in reply to: ↑ 12 @webgeekconsulting
10 years ago

  • Keywords has-patch added; needs-patch removed

Replying to boonebgorges:

Yeah, I was thinking of something similar while looking at this. It's pretty poor caching strategy to store the same data in the cache 5 or 6 separate times (for different values of 'fields'). If we can fix that, it'd be a bonus.

Take a peek at the latest patch taxonomy.php.update4.patch. Instead of caching the terms, we can create a cache of the query object. Then when the query gets executed, select all possible columns allowing the code later on to filter out the fields returned (this is already being done anyways). I think this actually simplifies things quite a bit. :)

What do you think?

#14 @boonebgorges
10 years ago

webgeekconsulting - Thanks for the patch. This is the kind of thing I had in mind, too. A couple thoughts and concerns:

  • Doing SELECT * is going to increase the overhead of calling get_terms() with 'fields' values of 'count', and possibly some of the others that currently query just two or three columns. The numbers may be small, but I'd like to see some benchmarks before moving forward with these kinds of changes. Maybe seed a database with a certain number of tax items, then run a script that runs get_terms() a bunch of times, clearing the cache between each call. Over a couple thousand runs, this'll give a reasonable average of the overhead introduced by the broader SELECT statement.
  • I'd feel much more comfortable making these changes if we had better unit test coverage for the various values of 'fields' with respect to the object cache. http://core.trac.wordpress.org/browser/trunk/tests/phpunit/tests/term/getTerms.php has some tests for 'fields' and some for cache, but none that combine the two. What's needed is tests that (a) call get_terms() to prime the cache, then (b) call get_terms() again to fetch items from the cache, and then (c) verify that the values returned are correct and that the number of database queries has not been incremented.
  • The changes you've proposed here, and the ones I proposed here https://core.trac.wordpress.org/ticket/29894#comment:10 (stricter typecasting for arguments) are not mutually exclusive. It might be possible to do my suggestions for now (which would address your original issue) and then rework the cache internals as you've suggested for a later release, if it seems like the items I've suggested above are too much work to do right now.

If you're able to work on some of these tests and benchmarks, it'll make it much more likely that this stuff'll get in sooner rather than later. :-)

#15 @chriscct7
9 years ago

@boonebgorges how do you feel about doing these changes now?

#16 @boonebgorges
9 years ago

  • Keywords needs-patch added; has-patch removed

Many of the changes were actually handled in [35120], which changed get_terms() so that the cache key is generated using the SQL query. We could still look into the fields changes.

#17 @ctrlaltdelete
7 years ago

Sorry but what would be the solution to the initial problem if any today?

#18 @mukesh27
3 years ago

@boonebgorges Do you feel about doing these changes land in upcoming update?

Is it related to #55837 where Jonny and other folks are working to solve the get_terms query cache issue?

#19 @spacedmonkey
3 years ago

  • Keywords needs-unit-tests needs-patch removed
  • Resolution set to fixed
  • Status changed from accepted to closed

Fixed in [37572].

#20 @SergeyBiryukov
3 years ago

  • Milestone changed from Future Release to 4.6
Note: See TracTickets for help on using tickets.