Opened 14 years ago
Closed 10 years ago
#18105 closed enhancement (fixed)
Improve JOIN efficency of tax_query
Reported by: |
|
Owned by: |
|
---|---|---|---|
Milestone: | 4.1 | Priority: | normal |
Severity: | normal | Version: | |
Component: | Query | Keywords: | |
Focuses: | Cc: |
Description
Use of multiple items in a tax_query creates an inner join for each entry. This seems unnecessary.
Example query:
array( 'tax_query' => array( 'relation' => 'OR', array('taxonomy' => 'tax1', 'field' => 'slug', 'terms' => 'term1'), array('taxonomy' => 'tax2', 'field' => 'slug', 'terms' => 'term2'), ) )
Resulting SQL:
SELECT SQL_CALC_FOUND_ROWS wp_posts.* FROM wp_posts INNER JOIN wp_term_relationships ON (wp_posts.ID = wp_term_relationships.object_id) INNER JOIN wp_term_relationships AS tt1 ON (wp_posts.ID = tt1.object_id) WHERE 1=1 AND ... AND (wp_term_relationships.term_taxonomy_id IN (XXX) OR tt1.term_taxonomy_id IN (YYY) ) ...
Each new entry in the array creates another inner join on the term_relationships table, along with an associated select argument.
However, the term_relationships table is defined with these main keys:
CREATE TABLE $wpdb->term_taxonomy ( term_taxonomy_id bigint(20) unsigned NOT NULL auto_increment, term_id bigint(20) unsigned NOT NULL default 0, taxonomy varchar(32) NOT NULL default '', ... PRIMARY KEY (term_taxonomy_id), UNIQUE KEY term_id_taxonomy (term_id,taxonomy),
This basically means that the term_taxonomy_id is a unique ID for the row, and it's pointing to a unique relationship between some term and some taxonomy. Essentially, term_relationships defines what terms are in what taxonomies.
Therefore, the extra inner join and selection are entirely unnecessary. All references to a term in a taxonomy are unique. This isn't individual terms we're looking for in this query, but term+taxonomy.
An equivalent SQL statement to the one given before would be this:
SELECT SQL_CALC_FOUND_ROWS wp_posts.* FROM wp_posts INNER JOIN wp_term_relationships ON (wp_posts.ID = wp_term_relationships.object_id) WHERE 1=1 AND ... AND (wp_term_relationships.term_taxonomy_id IN (XXX, YYY) ) ...
This is equivalent because both XXX and YYY point to a specific term+taxonomy combination, not just to a term.
For the similar relation = AND case (instead of relation = OR), this query would be equivalent:
SELECT SQL_CALC_FOUND_ROWS wp_posts.* FROM wp_posts INNER JOIN wp_term_relationships ON (wp_posts.ID = wp_term_relationships.object_id) WHERE 1=1 AND ... AND (wp_term_relationships.term_taxonomy_id IN (XXX) AND (wp_term_relationships.term_taxonomy_id IN (YYY) ) ...
We get a second statement in the where clause, but note the lack of a second inner join and lack of referencing it. This greatly improves performance.
Only one inner join is ever actually required for any number of items in the tax_query, thanks to the pre-lookups being performed to find the proper term_relationship entries.
Similar enhancements might be available for the meta table, however that is more complex and may be more difficult to improve.
Attachments (1)
Change History (10)
#2
@
14 years ago
I may be wrong about the AND case. However there's certainly room for improvement in at least some of these cases.
#3
@
13 years ago
I've also noticed that 'meta_query' behaves exactly the same way. I believe they both work from the same get_sql function.
#8
@
10 years ago
- Component changed from Database to Query
- Keywords needs-patch removed
- Milestone changed from Awaiting Review to 4.1
- Owner set to boonebgorges
- Status changed from new to accepted
Resurrecting this one. marcus.downing is correct that we can't do it in all places, but otto42 is right that we can make improvements. Here's a summary of the situation in WP_Tax_Query:
- Operators 'NOT IN', 'AND', 'EXISTS', and 'NOT EXISTS' are transformed into subqueries, so the question of table joins is moot.
- That leaves us with operator 'IN'. There are two cases:
- IN clauses joined by OR. In these cases, only one JOIN is required (this is otto42's original situation)
- IN clauses joined by AND. These require a separate JOIN. (FWIW, the situation is switched for NOT IN: AND can share a join, while OR cannot. But, as noted above, we're not using joins for NOT IN.)
So: when IN clauses are joined by OR, we should avoid extraneous JOINs. There are two general strategies. One is to handle this early: IN (3,4,5) OR IN (6,7,8)
is logically equivalent to IN (3,4,5,6,7,8)
, so we could detect the params passed to WP_Tax_Query
are not optimal, and then combine them. This ends up being harder than it sounds, because a tax query can include clauses with different 'fields', which can only be combined after transforming them all into term_taxonomy_ids
. This is inefficent in some cases.
The other strategy is to do the following when building the SQL for an individual clause:
- Before creating a JOIN, check to see whether any sibling clauses have already created a compatible JOIN. ("Sibling" is a first-order clause under the scope of the same relation. "Compatible" means that it can share a join - both are IN, and they're connected by OR.) If so, no need to join again - use the sibling's table alias.
- If no compatible sibling is found, then create a JOIN. Store the table alias, so that later compatible siblings will find it.
A patch is incoming that does all of these things. It's abstracted just a little more than it has to be, because we're going to be able to do something very similar (and with greater effect!) in WP_Meta_Query
- but, as zuzya notes, meta queries are more complicated in this regard. See #24093.
I don't believe this equivalence is true in all cases. It applies in the case listed, when the relationship is 'OR' and each taxonomy's condition is 'IN'. But it doesn't apply when the relationship is 'AND' or a taxonomy's condition is 'NOT IN'.
In the cases it does apply to, this change provides a substantial performance improvement.