WordPress.org

Make WordPress Core

Opened 3 years ago

Last modified 2 years ago

#29894 accepted defect (bug)

get_terms() isn't caching - duplicate queries generated

Reported by: webgeekconsulting Owned by: boonebgorges
Milestone: Future Release Priority: normal
Severity: normal Version: 4.0
Component: Taxonomy Keywords: needs-unit-tests needs-patch
Focuses: performance Cc:

Description

get_terms() doesn't seem to be caching correctly and therefore generates duplicate queries. In my plugin I have the following calls:

wp_dropdown_categories(array(
	'taxonomy' => 'product_category',
	'hide_empty' => 0,
	'show_option_none' => __('Any', 'mp'),
	'name' => 'child_of',
));
$cats = get_terms('product_category', array(
	'hide_empty' => 0,
));
$cats = get_terms('product_category', array(
	'hide_empty' => 0,
));

Yes, I realize I could technically create my own cache for the duplicate get_terms() function, but no matter what I'll still have 2 queries due to using wp_dropdown_categories() and get_terms() in the same request - even though they generate the same query (see below).

SELECT t.*, tt.* FROM hf9v_terms AS t INNER JOIN hf9v_term_taxonomy AS tt ON t.term_id = tt.term_id WHERE tt.taxonomy IN ('product_category') ORDER BY t.term_id ASC
include('wp-admin/admin-footer.php'), do_action('in_admin_footer'), call_user_func_array, MP_Shortcode_Builder->display_short_code_form, call_user_func, MP_Shortcode_Builder->display_mp_list_categories_attributes, wp_dropdown_categories, get_terms
SELECT t.*, tt.* FROM hf9v_terms AS t INNER JOIN hf9v_term_taxonomy AS tt ON t.term_id = tt.term_id WHERE tt.taxonomy IN ('product_category') ORDER BY t.name ASC
include('wp-admin/admin-footer.php'), do_action('in_admin_footer'), call_user_func_array, MP_Shortcode_Builder->display_short_code_form, call_user_func, MP_Shortcode_Builder->display_mp_list_categories_attributes, get_categories, get_terms
SELECT t.*, tt.* FROM hf9v_terms AS t INNER JOIN hf9v_term_taxonomy AS tt ON t.term_id = tt.term_id WHERE tt.taxonomy IN ('product_category') ORDER BY t.name ASC
include('wp-admin/admin-footer.php'), do_action('in_admin_footer'), call_user_func_array, MP_Shortcode_Builder->display_short_code_form, call_user_func, MP_Shortcode_Builder->display_mp_list_products_attributes, get_terms

As you can see, the SQL statements are identical and therefore the db should only be queried once.

Attachments (6)

taxonomy.php (128.4 KB) - added by webgeekconsulting 3 years ago.
Patch for get_terms() not caching correctly
taxonomy.php.patch (2.4 KB) - added by webgeekconsulting 3 years ago.
taxonomy.php patch
taxonomy.php.update2.patch (1.2 KB) - added by webgeekconsulting 3 years ago.
taxonomy.php patch 2
taxonomy.php.update3.diff (1.3 KB) - added by webgeekconsulting 3 years ago.
taxonomy.php patch 3
29894.test.patch (1.1 KB) - added by boonebgorges 3 years ago.
taxonomy.php.update4.patch (6.7 KB) - added by webgeekconsulting 3 years ago.
taxonomy.php patch 4

Download all attachments as: .zip

Change History (22)

@webgeekconsulting
3 years ago

Patch for get_terms() not caching correctly

#1 @webgeekconsulting
3 years ago

I've attached a patched version of wp-includes/taxonomy.php. Attempting to hash the functions arguments can be unreliable (passing 0/1 instead of true/false and vice verse). So, instead of serializing and hashing the arguments, we can instead use a hash of the actual SQL statement.

Patch tested and verified working in 4.0.

#2 @webgeekconsulting
3 years ago

  • Keywords has-patch added

#3 follow-up: @knutsp
3 years ago

Are you able to attach a patch or diff, not just the complete file, so we can see what's changed?

@webgeekconsulting
3 years ago

taxonomy.php patch

#4 in reply to: ↑ 3 @webgeekconsulting
3 years ago

Replying to knutsp:

Are you able to attach a patch or diff, not just the complete file, so we can see what's changed?

Done - hopefully did that right. :)

#5 follow-up: @boonebgorges
3 years ago

  • Milestone changed from Awaiting Review to 4.1
  • Owner set to boonebgorges
  • Status changed from new to accepted

+1 for this change - using the SQL for a cache key seems much more sane to me.

Changing the key will cause existing persistent caches to become defunct. In certain situations this could cause a cache stampede, but that's a very remote possibility with a small item like this.

#6 in reply to: ↑ 5 @webgeekconsulting
3 years ago

Replying to boonebgorges:

+1 for this change - using the SQL for a cache key seems much more sane to me.

Changing the key will cause existing persistent caches to become defunct. In certain situations this could cause a cache stampede, but that's a very remote possibility with a small item like this.

I've attached another patch that might rectify any chance of a cache stampede. Basically, keep the existing cache check in place, but as an extra layer of protection add in the second cache check using the hash of the SQL statement as the key.

@webgeekconsulting
3 years ago

taxonomy.php patch 2

#7 follow-up: @boonebgorges
3 years ago

webgeekconsulting - This sounds like a promising idea, but taxonomy.php.update2.patch looks like it was incorrectly generated. Can you have another shot? (And ideally, generate your patch from the root of your WP installation?)

@webgeekconsulting
3 years ago

taxonomy.php patch 3

#8 in reply to: ↑ 7 @webgeekconsulting
3 years ago

Replying to boonebgorges:

webgeekconsulting - This sounds like a promising idea, but taxonomy.php.update2.patch looks like it was incorrectly generated. Can you have another shot? (And ideally, generate your patch from the root of your WP installation?)

Let's try this one... taxonomy.php.update3.diff

This ticket was mentioned in IRC in #wordpress-dev by boonebgorges. View the logs.


3 years ago

#10 follow-up: @boonebgorges
3 years ago

  • Keywords needs-patch needs-unit-tests added; has-patch removed
  • Milestone changed from 4.1 to Future Release

webgeekconsulting - Thanks - I'm not sure that update3.diff is the entire patch you meant to attach, but I can see enough to understand what you're suggesting.

After further thought, it appears that using merely the SQL to generate a cache key is not going to work. get_terms() returns different kinds of values depending on the value of the 'fields' parameter, but these different values come from the same database query. Using the SQL alone to create a cache key isn't fine-grained enough.

I think we have to go with a more conservative approach here. Let's keep using the $args array to generate the cache key, but let's typecast all of the cache keys as necessary. Eg:

$args['hide_empty'] = (bool) $args['hide_empty'];

and so on, with all of the relevant arguments. However, I don't think we can safely make these kinds of changes without more complete unit tests, because there are places in the function where loose typing is expected - see eg https://core.trac.wordpress.org/browser/tags/4.0/src/wp-includes/taxonomy.php#L1290.

I'm attaching a simple unit test that demonstrates the problem as originally reported.

#11 in reply to: ↑ 10 @webgeekconsulting
3 years ago

Replying to boonebgorges:

webgeekconsulting - Thanks - I'm not sure that update3.diff is the entire patch you meant to attach, but I can see enough to understand what you're suggesting.

After further thought, it appears that using merely the SQL to generate a cache key is not going to work. get_terms() returns different kinds of values depending on the value of the 'fields' parameter, but these different values come from the same database query. Using the SQL alone to create a cache key isn't fine-grained enough.

I think we have to go with a more conservative approach here. Let's keep using the $args array to generate the cache key, but let's typecast all of the cache keys as necessary. Eg:

$args['hide_empty'] = (bool) $args['hide_empty'];

and so on, with all of the relevant arguments. However, I don't think we can safely make these kinds of changes without more complete unit tests, because there are places in the function where loose typing is expected - see eg https://core.trac.wordpress.org/browser/tags/4.0/src/wp-includes/taxonomy.php#L1290.

I'm attaching a simple unit test that demonstrates the problem as originally reported.

That's a great point which leads me to another possible solution...

What if we were to create an object cache of the underlying query itself? Then, repeat calls to get_terms() that would use the same query, but use the data differently (e.g. the fields parameter) would use use the object cache instead of another query against the database.

If that sounds like something that work, I'll upload another patch tomorrow with these changes.

#12 follow-up: @boonebgorges
3 years ago

What if we were to create an object cache of the underlying query itself? Then, repeat calls to get_terms() that would use the same query, but use the data differently (e.g. the fields parameter) would use use the object cache instead of another query against the database.

Yeah, I was thinking of something similar while looking at this. It's pretty poor caching strategy to store the same data in the cache 5 or 6 separate times (for different values of 'fields'). If we can fix that, it'd be a bonus.

BTW, I got verification from a lead that we don't have to worry about maintaining old cache keys. Feel free to mix it up :)

@webgeekconsulting
3 years ago

taxonomy.php patch 4

#13 in reply to: ↑ 12 @webgeekconsulting
3 years ago

  • Keywords has-patch added; needs-patch removed

Replying to boonebgorges:

Yeah, I was thinking of something similar while looking at this. It's pretty poor caching strategy to store the same data in the cache 5 or 6 separate times (for different values of 'fields'). If we can fix that, it'd be a bonus.

Take a peek at the latest patch taxonomy.php.update4.patch. Instead of caching the terms, we can create a cache of the query object. Then when the query gets executed, select all possible columns allowing the code later on to filter out the fields returned (this is already being done anyways). I think this actually simplifies things quite a bit. :)

What do you think?

#14 @boonebgorges
3 years ago

webgeekconsulting - Thanks for the patch. This is the kind of thing I had in mind, too. A couple thoughts and concerns:

  • Doing SELECT * is going to increase the overhead of calling get_terms() with 'fields' values of 'count', and possibly some of the others that currently query just two or three columns. The numbers may be small, but I'd like to see some benchmarks before moving forward with these kinds of changes. Maybe seed a database with a certain number of tax items, then run a script that runs get_terms() a bunch of times, clearing the cache between each call. Over a couple thousand runs, this'll give a reasonable average of the overhead introduced by the broader SELECT statement.
  • I'd feel much more comfortable making these changes if we had better unit test coverage for the various values of 'fields' with respect to the object cache. http://core.trac.wordpress.org/browser/trunk/tests/phpunit/tests/term/getTerms.php has some tests for 'fields' and some for cache, but none that combine the two. What's needed is tests that (a) call get_terms() to prime the cache, then (b) call get_terms() again to fetch items from the cache, and then (c) verify that the values returned are correct and that the number of database queries has not been incremented.
  • The changes you've proposed here, and the ones I proposed here https://core.trac.wordpress.org/ticket/29894#comment:10 (stricter typecasting for arguments) are not mutually exclusive. It might be possible to do my suggestions for now (which would address your original issue) and then rework the cache internals as you've suggested for a later release, if it seems like the items I've suggested above are too much work to do right now.

If you're able to work on some of these tests and benchmarks, it'll make it much more likely that this stuff'll get in sooner rather than later. :-)

#15 @chriscct7
2 years ago

@boonebgorges how do you feel about doing these changes now?

#16 @boonebgorges
2 years ago

  • Keywords needs-patch added; has-patch removed

Many of the changes were actually handled in [35120], which changed get_terms() so that the cache key is generated using the SQL query. We could still look into the fields changes.

Note: See TracTickets for help on using tickets.