Opened 23 months ago

Last modified 12 months ago

#18105 new enhancement

Improve JOIN efficency of tax_query

Reported by: Otto42 Owned by:
Priority: normal Milestone: Awaiting Review
Component: Database Version:
Severity: normal Keywords: needs-patch
Cc: 24-7@…, wordpress.trac@…

Description

Use of multiple items in a tax_query creates an inner join for each entry. This seems unnecessary.

Example query:

array(
  'tax_query' => array(
    'relation' => 'OR',
    array('taxonomy' => 'tax1', 'field' => 'slug', 'terms' => 'term1'),
    array('taxonomy' => 'tax2', 'field' => 'slug', 'terms' => 'term2'),
  )
)

Resulting SQL:

SELECT SQL_CALC_FOUND_ROWS wp_posts.* FROM wp_posts  
INNER JOIN wp_term_relationships ON (wp_posts.ID = wp_term_relationships.object_id) 
INNER JOIN wp_term_relationships AS tt1 ON (wp_posts.ID = tt1.object_id) 
WHERE 1=1 AND
...
AND (wp_term_relationships.term_taxonomy_id IN (XXX) 
  OR tt1.term_taxonomy_id IN (YYY) ) 
...

Each new entry in the array creates another inner join on the term_relationships table, along with an associated select argument.

However, the term_relationships table is defined with these main keys:

CREATE TABLE $wpdb->term_taxonomy (
 term_taxonomy_id bigint(20) unsigned NOT NULL auto_increment,
 term_id bigint(20) unsigned NOT NULL default 0,
 taxonomy varchar(32) NOT NULL default '',
...
 PRIMARY KEY  (term_taxonomy_id),
 UNIQUE KEY term_id_taxonomy (term_id,taxonomy),

This basically means that the term_taxonomy_id is a unique ID for the row, and it's pointing to a unique relationship between some term and some taxonomy. Essentially, term_relationships defines what terms are in what taxonomies.

Therefore, the extra inner join and selection are entirely unnecessary. All references to a term in a taxonomy are unique. This isn't individual terms we're looking for in this query, but term+taxonomy.

An equivalent SQL statement to the one given before would be this:

SELECT SQL_CALC_FOUND_ROWS wp_posts.* FROM wp_posts  
INNER JOIN wp_term_relationships ON (wp_posts.ID = wp_term_relationships.object_id) 
WHERE 1=1 AND
...
AND (wp_term_relationships.term_taxonomy_id IN (XXX, YYY) ) 
...

This is equivalent because both XXX and YYY point to a specific term+taxonomy combination, not just to a term.

For the similar relation = AND case (instead of relation = OR), this query would be equivalent:

SELECT SQL_CALC_FOUND_ROWS wp_posts.* FROM wp_posts  
INNER JOIN wp_term_relationships ON (wp_posts.ID = wp_term_relationships.object_id) 
WHERE 1=1 AND
...
AND (wp_term_relationships.term_taxonomy_id IN (XXX) 
AND (wp_term_relationships.term_taxonomy_id IN (YYY) ) 
...

We get a second statement in the where clause, but note the lack of a second inner join and lack of referencing it. This greatly improves performance.

Only one inner join is ever actually required for any number of items in the tax_query, thanks to the pre-lookups being performed to find the proper term_relationship entries.

Similar enhancements might be available for the meta table, however that is more complex and may be more difficult to improve.

Attachments (1)

18105.patch (871 bytes) - added by jakub.tyrcha 23 months ago.

Download all attachments as: .zip

Change History (8)

I don't believe this equivalence is true in all cases. It applies in the case listed, when the relationship is 'OR' and each taxonomy's condition is 'IN'. But it doesn't apply when the relationship is 'AND' or a taxonomy's condition is 'NOT IN'.

In the cases it does apply to, this change provides a substantial performance improvement.

I may be wrong about the AND case. However there's certainly room for improvement in at least some of these cases.

I've also noticed that 'meta_query' behaves exactly the same way. I believe they both work from the same get_sql function.

  • Cc dominicparisi@… added
  • Cc 24-7@… added
  • Cc wordpress.trac@… added
  • Cc dominicparisi@… removed
Note: See TracTickets for help on using tickets.