#29894 closed defect (bug) (fixed)
get_terms() isn't caching - duplicate queries generated
Reported by: | webgeekconsulting | Owned by: | boonebgorges |
---|---|---|---|
Milestone: | 4.6 | Priority: | normal |
Severity: | normal | Version: | 4.0 |
Component: | Taxonomy | Keywords: | |
Focuses: | performance | Cc: |
Description
get_terms() doesn't seem to be caching correctly and therefore generates duplicate queries. In my plugin I have the following calls:
wp_dropdown_categories(array( 'taxonomy' => 'product_category', 'hide_empty' => 0, 'show_option_none' => __('Any', 'mp'), 'name' => 'child_of', ));
$cats = get_terms('product_category', array( 'hide_empty' => 0, ));
$cats = get_terms('product_category', array( 'hide_empty' => 0, ));
Yes, I realize I could technically create my own cache for the duplicate get_terms() function, but no matter what I'll still have 2 queries due to using wp_dropdown_categories() and get_terms() in the same request - even though they generate the same query (see below).
SELECT t.*, tt.* FROM hf9v_terms AS t INNER JOIN hf9v_term_taxonomy AS tt ON t.term_id = tt.term_id WHERE tt.taxonomy IN ('product_category') ORDER BY t.term_id ASC include('wp-admin/admin-footer.php'), do_action('in_admin_footer'), call_user_func_array, MP_Shortcode_Builder->display_short_code_form, call_user_func, MP_Shortcode_Builder->display_mp_list_categories_attributes, wp_dropdown_categories, get_terms
SELECT t.*, tt.* FROM hf9v_terms AS t INNER JOIN hf9v_term_taxonomy AS tt ON t.term_id = tt.term_id WHERE tt.taxonomy IN ('product_category') ORDER BY t.name ASC include('wp-admin/admin-footer.php'), do_action('in_admin_footer'), call_user_func_array, MP_Shortcode_Builder->display_short_code_form, call_user_func, MP_Shortcode_Builder->display_mp_list_categories_attributes, get_categories, get_terms
SELECT t.*, tt.* FROM hf9v_terms AS t INNER JOIN hf9v_term_taxonomy AS tt ON t.term_id = tt.term_id WHERE tt.taxonomy IN ('product_category') ORDER BY t.name ASC include('wp-admin/admin-footer.php'), do_action('in_admin_footer'), call_user_func_array, MP_Shortcode_Builder->display_short_code_form, call_user_func, MP_Shortcode_Builder->display_mp_list_products_attributes, get_terms
As you can see, the SQL statements are identical and therefore the db should only be queried once.
Attachments (6)
Change History (26)
#1
@
10 years ago
I've attached a patched version of wp-includes/taxonomy.php. Attempting to hash the functions arguments can be unreliable (passing 0/1 instead of true/false and vice verse). So, instead of serializing and hashing the arguments, we can instead use a hash of the actual SQL statement.
Patch tested and verified working in 4.0.
#3
follow-up:
↓ 4
@
10 years ago
Are you able to attach a patch or diff, not just the complete file, so we can see what's changed?
#4
in reply to:
↑ 3
@
10 years ago
Replying to knutsp:
Are you able to attach a patch or diff, not just the complete file, so we can see what's changed?
Done - hopefully did that right. :)
#5
follow-up:
↓ 6
@
10 years ago
- Milestone changed from Awaiting Review to 4.1
- Owner set to boonebgorges
- Status changed from new to accepted
+1 for this change - using the SQL for a cache key seems much more sane to me.
Changing the key will cause existing persistent caches to become defunct. In certain situations this could cause a cache stampede, but that's a very remote possibility with a small item like this.
#6
in reply to:
↑ 5
@
10 years ago
Replying to boonebgorges:
+1 for this change - using the SQL for a cache key seems much more sane to me.
Changing the key will cause existing persistent caches to become defunct. In certain situations this could cause a cache stampede, but that's a very remote possibility with a small item like this.
I've attached another patch that might rectify any chance of a cache stampede. Basically, keep the existing cache check in place, but as an extra layer of protection add in the second cache check using the hash of the SQL statement as the key.
#7
follow-up:
↓ 8
@
10 years ago
webgeekconsulting - This sounds like a promising idea, but taxonomy.php.update2.patch looks like it was incorrectly generated. Can you have another shot? (And ideally, generate your patch from the root of your WP installation?)
#8
in reply to:
↑ 7
@
10 years ago
Replying to boonebgorges:
webgeekconsulting - This sounds like a promising idea, but taxonomy.php.update2.patch looks like it was incorrectly generated. Can you have another shot? (And ideally, generate your patch from the root of your WP installation?)
Let's try this one... taxonomy.php.update3.diff
This ticket was mentioned in IRC in #wordpress-dev by boonebgorges. View the logs.
10 years ago
#10
follow-up:
↓ 11
@
10 years ago
- Keywords needs-patch needs-unit-tests added; has-patch removed
- Milestone changed from 4.1 to Future Release
webgeekconsulting - Thanks - I'm not sure that update3.diff is the entire patch you meant to attach, but I can see enough to understand what you're suggesting.
After further thought, it appears that using merely the SQL to generate a cache key is not going to work. get_terms()
returns different kinds of values depending on the value of the 'fields' parameter, but these different values come from the same database query. Using the SQL alone to create a cache key isn't fine-grained enough.
I think we have to go with a more conservative approach here. Let's keep using the $args
array to generate the cache key, but let's typecast all of the cache keys as necessary. Eg:
$args['hide_empty'] = (bool) $args['hide_empty'];
and so on, with all of the relevant arguments. However, I don't think we can safely make these kinds of changes without more complete unit tests, because there are places in the function where loose typing is expected - see eg https://core.trac.wordpress.org/browser/tags/4.0/src/wp-includes/taxonomy.php#L1290.
I'm attaching a simple unit test that demonstrates the problem as originally reported.
#11
in reply to:
↑ 10
@
10 years ago
Replying to boonebgorges:
webgeekconsulting - Thanks - I'm not sure that update3.diff is the entire patch you meant to attach, but I can see enough to understand what you're suggesting.
After further thought, it appears that using merely the SQL to generate a cache key is not going to work.
get_terms()
returns different kinds of values depending on the value of the 'fields' parameter, but these different values come from the same database query. Using the SQL alone to create a cache key isn't fine-grained enough.
I think we have to go with a more conservative approach here. Let's keep using the
$args
array to generate the cache key, but let's typecast all of the cache keys as necessary. Eg:
$args['hide_empty'] = (bool) $args['hide_empty'];and so on, with all of the relevant arguments. However, I don't think we can safely make these kinds of changes without more complete unit tests, because there are places in the function where loose typing is expected - see eg https://core.trac.wordpress.org/browser/tags/4.0/src/wp-includes/taxonomy.php#L1290.
I'm attaching a simple unit test that demonstrates the problem as originally reported.
That's a great point which leads me to another possible solution...
What if we were to create an object cache of the underlying query itself? Then, repeat calls to get_terms() that would use the same query, but use the data differently (e.g. the fields parameter) would use use the object cache instead of another query against the database.
If that sounds like something that work, I'll upload another patch tomorrow with these changes.
#12
follow-up:
↓ 13
@
10 years ago
What if we were to create an object cache of the underlying query itself? Then, repeat calls to get_terms() that would use the same query, but use the data differently (e.g. the fields parameter) would use use the object cache instead of another query against the database.
Yeah, I was thinking of something similar while looking at this. It's pretty poor caching strategy to store the same data in the cache 5 or 6 separate times (for different values of 'fields'). If we can fix that, it'd be a bonus.
BTW, I got verification from a lead that we don't have to worry about maintaining old cache keys. Feel free to mix it up :)
#13
in reply to:
↑ 12
@
10 years ago
- Keywords has-patch added; needs-patch removed
Replying to boonebgorges:
Yeah, I was thinking of something similar while looking at this. It's pretty poor caching strategy to store the same data in the cache 5 or 6 separate times (for different values of 'fields'). If we can fix that, it'd be a bonus.
Take a peek at the latest patch taxonomy.php.update4.patch. Instead of caching the terms, we can create a cache of the query object. Then when the query gets executed, select all possible columns allowing the code later on to filter out the fields returned (this is already being done anyways). I think this actually simplifies things quite a bit. :)
What do you think?
#14
@
10 years ago
webgeekconsulting - Thanks for the patch. This is the kind of thing I had in mind, too. A couple thoughts and concerns:
- Doing
SELECT *
is going to increase the overhead of callingget_terms()
with 'fields' values of 'count', and possibly some of the others that currently query just two or three columns. The numbers may be small, but I'd like to see some benchmarks before moving forward with these kinds of changes. Maybe seed a database with a certain number of tax items, then run a script that runsget_terms()
a bunch of times, clearing the cache between each call. Over a couple thousand runs, this'll give a reasonable average of the overhead introduced by the broaderSELECT
statement. - I'd feel much more comfortable making these changes if we had better unit test coverage for the various values of 'fields' with respect to the object cache. http://core.trac.wordpress.org/browser/trunk/tests/phpunit/tests/term/getTerms.php has some tests for 'fields' and some for cache, but none that combine the two. What's needed is tests that (a) call
get_terms()
to prime the cache, then (b) callget_terms()
again to fetch items from the cache, and then (c) verify that the values returned are correct and that the number of database queries has not been incremented. - The changes you've proposed here, and the ones I proposed here https://core.trac.wordpress.org/ticket/29894#comment:10 (stricter typecasting for arguments) are not mutually exclusive. It might be possible to do my suggestions for now (which would address your original issue) and then rework the cache internals as you've suggested for a later release, if it seems like the items I've suggested above are too much work to do right now.
If you're able to work on some of these tests and benchmarks, it'll make it much more likely that this stuff'll get in sooner rather than later. :-)
#16
@
9 years ago
- Keywords needs-patch added; has-patch removed
Many of the changes were actually handled in [35120], which changed get_terms()
so that the cache key is generated using the SQL query. We could still look into the fields
changes.
#18
@
3 years ago
@boonebgorges Do you feel about doing these changes land in upcoming update?
Is it related to #55837 where Jonny and other folks are working to solve the get_terms
query cache issue?
Patch for get_terms() not caching correctly