Make WordPress Core

Opened 4 years ago

Last modified 15 months ago

#49278 new enhancement

Improve meta query

Reported by: jillebehm's profile jillebehm Owned by:
Milestone: Future Release Priority: normal
Severity: normal Version: 5.3.2
Component: Query Keywords: has-patch dev-feedback early early-like-actually-early has-unit-tests
Focuses: performance Cc:

Description

When having a couple of Meta Query statements in WP_Query the query becomes very slow. I think this is because of the way the JOINs are created.

Currently the JOINs are only done on the Post ID. The JOIN can become enormous, which means that filtering (the WHERE part) will take a lot of time.

I checked /wp-includes/class-wp-meta-query.php and posted the code between line 557 and 573 .

// JOIN clauses for NOT EXISTS have their own syntax.
			if ( 'NOT EXISTS' === $meta_compare ) {
				$join .= " LEFT JOIN $this->meta_table";
				$join .= $i ? " AS $alias" : '';

				if ( 'LIKE' === $meta_compare_key ) {
					$join .= $wpdb->prepare( " ON ($this->primary_table.$this->primary_id_column = $alias.$this->meta_id_column AND $alias.meta_key LIKE %s )", '%' . $wpdb->esc_like( $clause['key'] ) . '%' );
				} else {
					$join .= $wpdb->prepare( " ON ($this->primary_table.$this->primary_id_column = $alias.$this->meta_id_column AND $alias.meta_key = %s )", $clause['key'] );
				}

				// All other JOIN clauses.
			} else {
				$join .= " INNER JOIN $this->meta_table";
				$join .= $i ? " AS $alias" : '';
				$join .= " ON ( $this->primary_table.$this->primary_id_column = $alias.$this->meta_id_column )";
			}

Apparantly when using the 'NOT EXISTS' compare the 'AND $alias.meta_key' part is added to the JOIN, but when NOT using the 'NOT EXISTS' compare this part is not there.
This means that when NOT using the 'NOT EXISTS' compare the a lot of data is joined in the temporary data set. I played with this part a bit and when adding the 'AND $alias.meta_key' part to those JOINs as well it sped up my query a lot. My query went from 38 seconds to 0.01 seconds with the same results.

My 'test' code:

// JOIN clauses for NOT EXISTS have their own syntax.
			if ( 'NOT EXISTS' === $meta_compare ) {
				$join .= " LEFT JOIN $this->meta_table";
				$join .= $i ? " AS $alias" : '';

				if ( 'LIKE' === $meta_compare_key ) {
					$join .= $wpdb->prepare( " ON ($this->primary_table.$this->primary_id_column = $alias.$this->meta_id_column AND $alias.meta_key LIKE %s )", '%' . $wpdb->esc_like( $clause['key'] ) . '%' );
				} else {
					$join .= $wpdb->prepare( " ON ($this->primary_table.$this->primary_id_column = $alias.$this->meta_id_column AND $alias.meta_key = %s )", $clause['key'] );
				}

				// All other JOIN clauses.
			} else {
				$join .= " INNER JOIN $this->meta_table";
				$join .= $i ? " AS $alias" : '';

				$valid_compares = array(
                    '=',
                    '!=',
                    '>',
                    '>=',
                    '<',
                    '<=',
                    'IN',
                    'NOT IN',
                    'EXISTS',
                );
				if( in_array($meta_compare, $valid_compares ) && !empty($clause['key']) && 'LIKE' !== $meta_compare_key ) {
                    $join .= $wpdb->prepare( " ON ( $this->primary_table.$this->primary_id_column = $alias.$this->meta_id_column AND $alias.meta_key = %s )", $clause['key']);
                }
				else {
                    $join .= " ON ( $this->primary_table.$this->primary_id_column = $alias.$this->meta_id_column )";
                }
			}

I'm not really sure if this works in all cases (with all compare/compare_key variations), but I think it would be good to check it out (on Github I've seen that the last improvements here have been done at least 2 years ago).

For now I 'solved' my slow query by parsing the JOIN and WHERE on the filter 'get_meta_sql' and add the 'AND' part in the JOIN.

Below the query that gets created before and after the changes.

Query before (38 seconds):

SELECT SQL_CALC_FOUND_ROWS  riff19_posts.ID FROM riff19_posts  INNER JOIN riff19_postmeta ON ( riff19_posts.ID = riff19_postmeta.post_id )  INNER JOIN riff19_postmeta AS mt1 ON ( riff19_posts.ID = mt1.post_id )  INNER JOIN riff19_postmeta AS mt2 ON ( riff19_posts.ID = mt2.post_id )  INNER JOIN riff19_postmeta AS mt3 ON ( riff19_posts.ID = mt3.post_id ) JOIN riff19_icl_translations wpml_translations
							ON riff19_posts.ID = wpml_translations.element_id
								AND wpml_translations.element_type = CONCAT('post_', riff19_posts.post_type)  WHERE 1=1  AND ( 
  ( riff19_postmeta.meta_key = 'pinplugin_event_start_date' AND CAST(riff19_postmeta.meta_value AS DATE) < '2020-01-23' ) 
  OR 
  ( 
    ( 
      ( mt1.meta_key = 'pinplugin_event_start_date' AND CAST(mt1.meta_value AS DATE) = '2020-01-23' ) 
      AND 
      mt2.meta_key = 'pinplugin_event_start_time' 
      AND 
      ( mt3.meta_key = 'pinplugin_event_end_time' AND CAST(mt3.meta_value AS TIME) <= '17:19:19' )
    )
  )
) AND riff19_posts.post_type = 'event' AND (riff19_posts.post_status = 'publish' OR riff19_posts.post_status = 'acf-disabled' OR riff19_posts.post_status = 'private') AND ( ( ( wpml_translations.language_code = 'nl' OR 0 ) AND riff19_posts.post_type  IN ('post','page','attachment','wp_block','location','person','news','blog','case','service','event','vacancy','whitepaper' )  ) OR riff19_posts.post_type  NOT  IN ('post','page','attachment','wp_block','location','person','news','blog','case','service','event','vacancy','whitepaper' )  ) GROUP BY riff19_posts.ID ORDER BY riff19_posts.menu_order, CAST(riff19_postmeta.meta_value AS DATE) DESC, CAST(mt2.meta_value AS TIME) DESC, CAST(mt3.meta_value AS TIME) DESC LIMIT 0, 12

Query after (0.0028 seconds):

SELECT SQL_CALC_FOUND_ROWS  riff19_posts.ID FROM riff19_posts  INNER JOIN riff19_postmeta ON ( riff19_posts.ID = riff19_postmeta.post_id AND riff19_postmeta.meta_key = 'pinplugin_event_start_date') INNER JOIN riff19_postmeta AS mt1 ON ( riff19_posts.ID = mt1.post_id AND mt1.meta_key = 'pinplugin_event_start_date') INNER JOIN riff19_postmeta AS mt2 ON ( riff19_posts.ID = mt2.post_id AND mt2.meta_key = 'pinplugin_event_start_time') INNER JOIN riff19_postmeta AS mt3 ON ( riff19_posts.ID = mt3.post_id AND mt3.meta_key = 'pinplugin_event_end_time') JOIN riff19_icl_translations wpml_translations
							ON riff19_posts.ID = wpml_translations.element_id
								AND wpml_translations.element_type = CONCAT('post_', riff19_posts.post_type)  WHERE 1=1  AND ( 
  ( riff19_postmeta.meta_key = 'pinplugin_event_start_date' AND CAST(riff19_postmeta.meta_value AS DATE) < '2020-01-23' ) 
  OR 
  ( 
    ( 
      ( mt1.meta_key = 'pinplugin_event_start_date' AND CAST(mt1.meta_value AS DATE) = '2020-01-23' ) 
      AND 
      mt2.meta_key = 'pinplugin_event_start_time' 
      AND 
      ( mt3.meta_key = 'pinplugin_event_end_time' AND CAST(mt3.meta_value AS TIME) <= '17:18:05' )
    )
  )
) AND riff19_posts.post_type = 'event' AND (riff19_posts.post_status = 'publish' OR riff19_posts.post_status = 'acf-disabled' OR riff19_posts.post_status = 'private') AND ( ( ( wpml_translations.language_code = 'nl' OR 0 ) AND riff19_posts.post_type  IN ('post','page','attachment','wp_block','location','person','news','blog','case','service','event','vacancy','whitepaper' )  ) OR riff19_posts.post_type  NOT  IN ('post','page','attachment','wp_block','location','person','news','blog','case','service','event','vacancy','whitepaper' )  ) GROUP BY riff19_posts.ID ORDER BY riff19_posts.menu_order, CAST(riff19_postmeta.meta_value AS DATE) DESC, CAST(mt2.meta_value AS TIME) DESC, CAST(mt3.meta_value AS TIME) DESC LIMIT 0, 12

Change History (29)

This ticket was mentioned in Slack in #core-media by joedolson. View the logs.


3 years ago

#2 @markparnell
3 years ago

I've independently come to the same conclusion here. Particularly on large sites (i.e. millions of rows in the meta table), this makes a massive difference to performance. I've run some quick tests and got similar results to @jillebehm - with the additional clause in the JOIN the returned data is the same, but the query runs several orders of magnitude quicker.

Would love for someone with more SQL (and WP_Meta_Query) knowledge than me to take a look, but from where I'm sitting this appears to be a simple change that could drastically improve performance for a lot of sites.

This ticket was mentioned in Slack in #core by peterwilsoncc. View the logs.


3 years ago

This ticket was mentioned in Slack in #core by jeffpaul. View the logs.


3 years ago

#5 @SergeyBiryukov
3 years ago

  • Keywords needs-patch needs-unit-tests added; dev-feedback removed

Hi there, welcome to WordPress Trac! Thanks for the report, sorry it took a while for someone to get back to you.

A patch and some unit tests would be great to move the ticket forward.

This ticket was mentioned in PR #1229 on WordPress/wordpress-develop by markparnell.


3 years ago
#6

  • Keywords has-patch added; needs-patch removed

#7 @markparnell
3 years ago

  • Keywords needs-patch added; has-patch removed

Just added a patch which applies the discussed changes. As I flagged above it will definitely need someone with more background knowledge than me here - we may need a more nuanced approach as there are quite possibly edge cases I've not considered.

I also have no idea how to go about setting up unit tests for this, or even what we would actually be testing - there are already tests in place for meta queries, but I don't know whether they cover this sufficiently.

peterwilsoncc commented on PR #1229:


3 years ago
#8

Thanks @markparnell!

I've run the actions on this PR and am noticing a few failures in the PHP unit tests in Tests_Query_MetaQuery. Most of them test the returned posts so it looks like they're meaningful (rather than a result of the query changing).

The coding standards check is also failing on two lines but if you don't want to worry about that until the functionality is sorted out, that's cool.

markparnell commented on PR #1229:


3 years ago
#9

Thanks Peter. Looks like I'm going to need to try and work out how to run the test suite on my local dev so I can investigate those errors properly. :-)

markparnell commented on PR #1229:


3 years ago
#10

I've resolved all of the test failures now. I think I've also fixed the CS issues, though I wasn't sure how to run those locally so I may have missed something.

#11 @markparnell
3 years ago

  • Keywords has-patch added; needs-patch removed

PR updated to resolve all the test failures. I'm not entirely enamoured with the resulting code - particularly having to pass additional parameters between methods - so if anyone has any suggestions for a better approach that would be very welcome.

This ticket was mentioned in Slack in #core by markparnell. View the logs.


3 years ago

#13 @markparnell
3 years ago

  • Keywords dev-feedback added; needs-unit-tests removed
  • Milestone changed from Awaiting Review to 5.8

Per the discussion in devchat, moving this to 5.8 for more visibility and adding dev-feedback in the hope that someone with solid SQL knowledge can take a look.

I've also removed the needs-unit-tests flag as there are quite a number of existing unit tests (which pass), though I recognise it's possible that additional tests may be desirable.

This ticket was mentioned in Slack in #core by antpb. View the logs.


3 years ago

#15 @markparnell
3 years ago

  • Milestone changed from 5.8 to 5.9

Since feature freeze for 5.8 is imminent, this isn't going to get the review it needs before then. Moving to 5.9 and will push for it to get some attention early in the next cycle.

This ticket was mentioned in Slack in #core by markparnell. View the logs.


3 years ago

This ticket was mentioned in Slack in #core by markparnell. View the logs.


2 years ago

#18 @audrasjb
2 years ago

  • Keywords early added

As per today's bugscrub, let's move it to next cycle and mark it early.

#19 @hellofromTonya
2 years ago

  • Milestone changed from 5.9 to Future Release

Query changes by design need careful consideration and plenty of soak time to ensure nothing breaks. With 5.9 feature freeze ~2 days away, it's too late in the release cycle for a change to query. Punting this out of 5.9. As the next milestone is not yet available, moving to Future Release. Once available, feel free to move into the next release.

#20 @hellofromTonya
2 years ago

  • Keywords early-like-actually-early added

Whoopsie forgot to mark it for really really early in the cycle.

#21 @markparnell
2 years ago

  • Milestone changed from Future Release to 6.0

Per comment:19 moving this to 6.0 now that the milestone exists.

This ticket was mentioned in Slack in #core by costdev. View the logs.


2 years ago

#23 @costdev
2 years ago

  • Keywords has-unit-tests added

Per the discussion in the bug scrub:

  • I'm adding the needs-unit-tests keyword so that this can be evaluated by the test team.
  • @craigfrancis has kindly offered to take a look at the patch when he's available.
  • While this ticket already has the performance focus, it may be worth reaching out in #performance to get eyes on this ticket when the approach has been given feedback.

This ticket was mentioned in Slack in #core by craigfrancis. View the logs.


2 years ago

#25 @craigfrancis
2 years ago

As we have a Dev Chat today, I'll summarise my conversation with Mark so far:

  • The current patch does not change anything if the query uses OR (ref $this->has_or_relation), which does not help @jillebehm (rough example). This is because find_compatible_table_alias() will try to re-use an existing alias, which would have specified the meta_key from the first condition.
  • In my own basic tests, which does not represent real data, I was able to get similar improvements by changing the field to varchar(191) (ref #33885), or running OPTIMIZE TABLE, because the updated INDEX info would lead to a much better query execution plan.
  • I have a small concern about how the get_meta_sql filter/hook might be affected; e.g. a plugin that modifies the 'join' string (e.g. to apply their own version of this fix), but I think they should expect changes.

This ticket was mentioned in Slack in #core by costdev. View the logs.


2 years ago

#27 @peterwilsoncc
2 years ago

  • Milestone changed from 6.0 to Future Release

This was discussed in a bug scrub today.

Due to the complexity of the task and the patch still requiring some work, it was decided to move it off the 6.0 milestone.

#28 @rjasdfiii
15 months ago

CAST(mt1.meta_value AS DATE) = '2020-01-23'

For MySQL and for DATE or DATETIME, that can be simplified to

mt1.meta_value = '2020-01-23'

This change will allow an index, even INDEX(32) to be of some use for optimization. Cf, the plugin: https://wordpress.org/plugins/index-wp-mysql-for-speed/

Caveat, if meta_value is actually a DATETIME and the intent is to truncate to DATE, then the CAST (or equivalent) is needed. The following would be better, but probably not worth considering:

mt1.meta_value LIKE '2020-01-23%'

#29 @rjasdfiii
15 months ago

Re: pinplugin_event_start_date and pinplugin_event_start_time

It is almost always better (in MySQL) to use a single column instead of two columns for a DATETIME. (I don't know the details of the plugin, so I can't be more specific than to suggest you consider this change.)

(updated comment...)

Yes, I think that would simplify the JOIN, which seems to be where the first version of the query was inefficient.

Last edited 15 months ago by rjasdfiii (previous) (diff)
Note: See TracTickets for help on using tickets.