Make WordPress Core

Opened 3 years ago

Last modified 3 years ago

#52559 new defect (bug)

When doing JOINs for meta queries the meta_key should be in the ON clause whenever possible.

Reported by: herregroen's profile herregroen Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version:
Component: Query Keywords: needs-unit-tests
Focuses: performance Cc:

Description

Currently when doing JOINs for meta queries the meta_key is only included in the ON clause if the meta compare is NOT EXISTS. In all other cases only the post_id is included.

This means that the meta key is only filtered after the JOIN is done.

Case in point, a meta_query that with 5 different keys generates the following query:

SELECT SQL_CALC_FOUND_ROWS  wp_posts.ID FROM wp_posts
LEFT JOIN wp_postmeta ON ( wp_posts.ID = wp_postmeta.post_id )
LEFT JOIN wp_postmeta AS mt1 ON ( wp_posts.ID = mt1.post_id )
LEFT JOIN wp_postmeta AS mt2 ON ( wp_posts.ID = mt2.post_id )
LEFT JOIN wp_postmeta AS mt3 ON ( wp_posts.ID = mt3.post_id )
LEFT JOIN wp_postmeta AS mt4 ON (wp_posts.ID = mt4.post_id AND mt4.meta_key = 'events_time_frame_end' )
WHERE 1=1  AND ( 
  ( wp_postmeta.meta_key = 'events_date_till' AND wp_postmeta.meta_value >= '20210217' ) 
  AND 
  ( 
    ( mt1.meta_key = 'events_date_till' AND mt1.meta_value > '20210217' ) 
    OR 
    ( 
      ( mt2.meta_key = 'events_date_till' AND mt2.meta_value = '20210217' ) 
      AND 
      ( mt3.meta_key = 'events_time_frame_end' AND mt3.meta_value >= '14:59:19' )
    ) 
    OR 
    mt4.post_id IS NULL
  )
) AND wp_posts.post_type = 'events' AND (wp_posts.post_status = 'publish' OR wp_posts.post_status = 'acf-disabled' OR wp_posts.post_status = 'dp-rewrite-republish') GROUP BY wp_posts.ID ORDER BY wp_postmeta.meta_value ASC LIMIT 0, 10;

This query takes 60 seconds on very well provisioned database with posts that each have 50 postmeta rows in a database with 50 million total postmeta rows.

In contrast the following query:

SELECT SQL_CALC_FOUND_ROWS  wp_posts.ID FROM wp_posts
LEFT JOIN wp_postmeta ON ( wp_posts.ID = wp_postmeta.post_id AND wp_postmeta.meta_key = 'events_date_till' )
LEFT JOIN wp_postmeta AS mt1 ON ( wp_posts.ID = mt1.post_id AND mt1.meta_key = 'events_date_till' )
LEFT JOIN wp_postmeta AS mt2 ON ( wp_posts.ID = mt2.post_id AND mt2.meta_key = 'events_date_till' )
LEFT JOIN wp_postmeta AS mt3 ON ( wp_posts.ID = mt3.post_id AND mt3.meta_key = 'events_time_frame_end' )
LEFT JOIN wp_postmeta AS mt4 ON (wp_posts.ID = mt4.post_id AND mt4.meta_key = 'events_time_frame_end' )
WHERE 1=1  AND ( 
  ( wp_postmeta.meta_value >= '20210217' ) 
  AND 
  ( 
    ( mt1.meta_value > '20210217' ) 
    OR 
    ( 
      ( mt2.meta_value = '20210217' ) 
      AND 
      ( mt3.meta_value >= '14:59:19' )
    ) 
    OR 
    mt4.post_id IS NULL
  )
) AND wp_posts.post_type = 'events' AND (wp_posts.post_status = 'publish' OR wp_posts.post_status = 'acf-disabled' OR wp_posts.post_status = 'dp-rewrite-republish') GROUP BY wp_posts.ID ORDER BY wp_postmeta.meta_value ASC LIMIT 0, 10;

Takes 400ms and returns identical results.

I believe the cause here is that because in the first query filtering is only done after the join and each event has 50 postmeta fields MySQL is first creating a virtual table with 504 ( 4 because the last JOIN does include the meta_key ) rows joined resulting in 60.000.000 total that are then filtered back down.

Adding the meta_key condition to the ON clause means we're only joining 1 ( or a small number in the case of the meta_key existing in multiple rows ) rows for each JOIN meaning the virtual table being created is significantly smaller thus having much better query performance.

At the very least when there is any LEFT JOIN occurring then ALL meta_key conditions should be moved in the ON clause. Although it may be simpler to always have the meta_key condition in the ON clause.

Change History (2)

#1 @desrosj
3 years ago

  • Component changed from General to Query
  • Keywords needs-unit-tests added

#2 @jeroendk
3 years ago

I encountered this issue as well for a post meta table with (only) 200k records, for now, I "fixed" it by adding an extra index to the wp_psotmeta table:

create index post_meta_combined_idx on wp_postmeta(post_id, meta_key)

Query times are now normal again.

Note: See TracTickets for help on using tickets.