Make WordPress Core

Opened 5 years ago

Last modified 3 weeks ago

#49278 reopened enhancement

Improve meta query

Reported by: jillebehm's profile jillebehm Owned by:
Milestone: Priority: normal
Severity: normal Version: 5.3.2
Component: Query Keywords: has-patch dev-feedback early early-like-actually-early has-unit-tests needs-refresh
Focuses: performance Cc:

Description

When having a couple of Meta Query statements in WP_Query the query becomes very slow. I think this is because of the way the JOINs are created.

Currently the JOINs are only done on the Post ID. The JOIN can become enormous, which means that filtering (the WHERE part) will take a lot of time.

I checked /wp-includes/class-wp-meta-query.php and posted the code between line 557 and 573 .

// JOIN clauses for NOT EXISTS have their own syntax.
			if ( 'NOT EXISTS' === $meta_compare ) {
				$join .= " LEFT JOIN $this->meta_table";
				$join .= $i ? " AS $alias" : '';

				if ( 'LIKE' === $meta_compare_key ) {
					$join .= $wpdb->prepare( " ON ($this->primary_table.$this->primary_id_column = $alias.$this->meta_id_column AND $alias.meta_key LIKE %s )", '%' . $wpdb->esc_like( $clause['key'] ) . '%' );
				} else {
					$join .= $wpdb->prepare( " ON ($this->primary_table.$this->primary_id_column = $alias.$this->meta_id_column AND $alias.meta_key = %s )", $clause['key'] );
				}

				// All other JOIN clauses.
			} else {
				$join .= " INNER JOIN $this->meta_table";
				$join .= $i ? " AS $alias" : '';
				$join .= " ON ( $this->primary_table.$this->primary_id_column = $alias.$this->meta_id_column )";
			}

Apparantly when using the 'NOT EXISTS' compare the 'AND $alias.meta_key' part is added to the JOIN, but when NOT using the 'NOT EXISTS' compare this part is not there.
This means that when NOT using the 'NOT EXISTS' compare the a lot of data is joined in the temporary data set. I played with this part a bit and when adding the 'AND $alias.meta_key' part to those JOINs as well it sped up my query a lot. My query went from 38 seconds to 0.01 seconds with the same results.

My 'test' code:

// JOIN clauses for NOT EXISTS have their own syntax.
			if ( 'NOT EXISTS' === $meta_compare ) {
				$join .= " LEFT JOIN $this->meta_table";
				$join .= $i ? " AS $alias" : '';

				if ( 'LIKE' === $meta_compare_key ) {
					$join .= $wpdb->prepare( " ON ($this->primary_table.$this->primary_id_column = $alias.$this->meta_id_column AND $alias.meta_key LIKE %s )", '%' . $wpdb->esc_like( $clause['key'] ) . '%' );
				} else {
					$join .= $wpdb->prepare( " ON ($this->primary_table.$this->primary_id_column = $alias.$this->meta_id_column AND $alias.meta_key = %s )", $clause['key'] );
				}

				// All other JOIN clauses.
			} else {
				$join .= " INNER JOIN $this->meta_table";
				$join .= $i ? " AS $alias" : '';

				$valid_compares = array(
                    '=',
                    '!=',
                    '>',
                    '>=',
                    '<',
                    '<=',
                    'IN',
                    'NOT IN',
                    'EXISTS',
                );
				if( in_array($meta_compare, $valid_compares ) && !empty($clause['key']) && 'LIKE' !== $meta_compare_key ) {
                    $join .= $wpdb->prepare( " ON ( $this->primary_table.$this->primary_id_column = $alias.$this->meta_id_column AND $alias.meta_key = %s )", $clause['key']);
                }
				else {
                    $join .= " ON ( $this->primary_table.$this->primary_id_column = $alias.$this->meta_id_column )";
                }
			}

I'm not really sure if this works in all cases (with all compare/compare_key variations), but I think it would be good to check it out (on Github I've seen that the last improvements here have been done at least 2 years ago).

For now I 'solved' my slow query by parsing the JOIN and WHERE on the filter 'get_meta_sql' and add the 'AND' part in the JOIN.

Below the query that gets created before and after the changes.

Query before (38 seconds):

SELECT SQL_CALC_FOUND_ROWS  riff19_posts.ID FROM riff19_posts  INNER JOIN riff19_postmeta ON ( riff19_posts.ID = riff19_postmeta.post_id )  INNER JOIN riff19_postmeta AS mt1 ON ( riff19_posts.ID = mt1.post_id )  INNER JOIN riff19_postmeta AS mt2 ON ( riff19_posts.ID = mt2.post_id )  INNER JOIN riff19_postmeta AS mt3 ON ( riff19_posts.ID = mt3.post_id ) JOIN riff19_icl_translations wpml_translations
							ON riff19_posts.ID = wpml_translations.element_id
								AND wpml_translations.element_type = CONCAT('post_', riff19_posts.post_type)  WHERE 1=1  AND ( 
  ( riff19_postmeta.meta_key = 'pinplugin_event_start_date' AND CAST(riff19_postmeta.meta_value AS DATE) < '2020-01-23' ) 
  OR 
  ( 
    ( 
      ( mt1.meta_key = 'pinplugin_event_start_date' AND CAST(mt1.meta_value AS DATE) = '2020-01-23' ) 
      AND 
      mt2.meta_key = 'pinplugin_event_start_time' 
      AND 
      ( mt3.meta_key = 'pinplugin_event_end_time' AND CAST(mt3.meta_value AS TIME) <= '17:19:19' )
    )
  )
) AND riff19_posts.post_type = 'event' AND (riff19_posts.post_status = 'publish' OR riff19_posts.post_status = 'acf-disabled' OR riff19_posts.post_status = 'private') AND ( ( ( wpml_translations.language_code = 'nl' OR 0 ) AND riff19_posts.post_type  IN ('post','page','attachment','wp_block','location','person','news','blog','case','service','event','vacancy','whitepaper' )  ) OR riff19_posts.post_type  NOT  IN ('post','page','attachment','wp_block','location','person','news','blog','case','service','event','vacancy','whitepaper' )  ) GROUP BY riff19_posts.ID ORDER BY riff19_posts.menu_order, CAST(riff19_postmeta.meta_value AS DATE) DESC, CAST(mt2.meta_value AS TIME) DESC, CAST(mt3.meta_value AS TIME) DESC LIMIT 0, 12

Query after (0.0028 seconds):

SELECT SQL_CALC_FOUND_ROWS  riff19_posts.ID FROM riff19_posts  INNER JOIN riff19_postmeta ON ( riff19_posts.ID = riff19_postmeta.post_id AND riff19_postmeta.meta_key = 'pinplugin_event_start_date') INNER JOIN riff19_postmeta AS mt1 ON ( riff19_posts.ID = mt1.post_id AND mt1.meta_key = 'pinplugin_event_start_date') INNER JOIN riff19_postmeta AS mt2 ON ( riff19_posts.ID = mt2.post_id AND mt2.meta_key = 'pinplugin_event_start_time') INNER JOIN riff19_postmeta AS mt3 ON ( riff19_posts.ID = mt3.post_id AND mt3.meta_key = 'pinplugin_event_end_time') JOIN riff19_icl_translations wpml_translations
							ON riff19_posts.ID = wpml_translations.element_id
								AND wpml_translations.element_type = CONCAT('post_', riff19_posts.post_type)  WHERE 1=1  AND ( 
  ( riff19_postmeta.meta_key = 'pinplugin_event_start_date' AND CAST(riff19_postmeta.meta_value AS DATE) < '2020-01-23' ) 
  OR 
  ( 
    ( 
      ( mt1.meta_key = 'pinplugin_event_start_date' AND CAST(mt1.meta_value AS DATE) = '2020-01-23' ) 
      AND 
      mt2.meta_key = 'pinplugin_event_start_time' 
      AND 
      ( mt3.meta_key = 'pinplugin_event_end_time' AND CAST(mt3.meta_value AS TIME) <= '17:18:05' )
    )
  )
) AND riff19_posts.post_type = 'event' AND (riff19_posts.post_status = 'publish' OR riff19_posts.post_status = 'acf-disabled' OR riff19_posts.post_status = 'private') AND ( ( ( wpml_translations.language_code = 'nl' OR 0 ) AND riff19_posts.post_type  IN ('post','page','attachment','wp_block','location','person','news','blog','case','service','event','vacancy','whitepaper' )  ) OR riff19_posts.post_type  NOT  IN ('post','page','attachment','wp_block','location','person','news','blog','case','service','event','vacancy','whitepaper' )  ) GROUP BY riff19_posts.ID ORDER BY riff19_posts.menu_order, CAST(riff19_postmeta.meta_value AS DATE) DESC, CAST(mt2.meta_value AS TIME) DESC, CAST(mt3.meta_value AS TIME) DESC LIMIT 0, 12

Change History (34)

This ticket was mentioned in Slack in #core-media by joedolson. View the logs.


4 years ago

#2 @markparnell
4 years ago

I've independently come to the same conclusion here. Particularly on large sites (i.e. millions of rows in the meta table), this makes a massive difference to performance. I've run some quick tests and got similar results to @jillebehm - with the additional clause in the JOIN the returned data is the same, but the query runs several orders of magnitude quicker.

Would love for someone with more SQL (and WP_Meta_Query) knowledge than me to take a look, but from where I'm sitting this appears to be a simple change that could drastically improve performance for a lot of sites.

This ticket was mentioned in Slack in #core by peterwilsoncc. View the logs.


4 years ago

This ticket was mentioned in Slack in #core by jeffpaul. View the logs.


4 years ago

#5 @SergeyBiryukov
4 years ago

  • Keywords needs-patch needs-unit-tests added; dev-feedback removed

Hi there, welcome to WordPress Trac! Thanks for the report, sorry it took a while for someone to get back to you.

A patch and some unit tests would be great to move the ticket forward.

This ticket was mentioned in PR #1229 on WordPress/wordpress-develop by markparnell.


4 years ago
#6

  • Keywords has-patch added; needs-patch removed

#7 @markparnell
4 years ago

  • Keywords needs-patch added; has-patch removed

Just added a patch which applies the discussed changes. As I flagged above it will definitely need someone with more background knowledge than me here - we may need a more nuanced approach as there are quite possibly edge cases I've not considered.

I also have no idea how to go about setting up unit tests for this, or even what we would actually be testing - there are already tests in place for meta queries, but I don't know whether they cover this sufficiently.

peterwilsoncc commented on PR #1229:


4 years ago
#8

Thanks @markparnell!

I've run the actions on this PR and am noticing a few failures in the PHP unit tests in Tests_Query_MetaQuery. Most of them test the returned posts so it looks like they're meaningful (rather than a result of the query changing).

The coding standards check is also failing on two lines but if you don't want to worry about that until the functionality is sorted out, that's cool.

markparnell commented on PR #1229:


4 years ago
#9

Thanks Peter. Looks like I'm going to need to try and work out how to run the test suite on my local dev so I can investigate those errors properly. :-)

markparnell commented on PR #1229:


4 years ago
#10

I've resolved all of the test failures now. I think I've also fixed the CS issues, though I wasn't sure how to run those locally so I may have missed something.

#11 @markparnell
4 years ago

  • Keywords has-patch added; needs-patch removed

PR updated to resolve all the test failures. I'm not entirely enamoured with the resulting code - particularly having to pass additional parameters between methods - so if anyone has any suggestions for a better approach that would be very welcome.

This ticket was mentioned in Slack in #core by markparnell. View the logs.


4 years ago

#13 @markparnell
4 years ago

  • Keywords dev-feedback added; needs-unit-tests removed
  • Milestone changed from Awaiting Review to 5.8

Per the discussion in devchat, moving this to 5.8 for more visibility and adding dev-feedback in the hope that someone with solid SQL knowledge can take a look.

I've also removed the needs-unit-tests flag as there are quite a number of existing unit tests (which pass), though I recognise it's possible that additional tests may be desirable.

This ticket was mentioned in Slack in #core by antpb. View the logs.


4 years ago

#15 @markparnell
4 years ago

  • Milestone changed from 5.8 to 5.9

Since feature freeze for 5.8 is imminent, this isn't going to get the review it needs before then. Moving to 5.9 and will push for it to get some attention early in the next cycle.

This ticket was mentioned in Slack in #core by markparnell. View the logs.


4 years ago

This ticket was mentioned in Slack in #core by markparnell. View the logs.


3 years ago

#18 @audrasjb
3 years ago

  • Keywords early added

As per today's bugscrub, let's move it to next cycle and mark it early.

#19 @hellofromTonya
3 years ago

  • Milestone changed from 5.9 to Future Release

Query changes by design need careful consideration and plenty of soak time to ensure nothing breaks. With 5.9 feature freeze ~2 days away, it's too late in the release cycle for a change to query. Punting this out of 5.9. As the next milestone is not yet available, moving to Future Release. Once available, feel free to move into the next release.

#20 @hellofromTonya
3 years ago

  • Keywords early-like-actually-early added

Whoopsie forgot to mark it for really really early in the cycle.

#21 @markparnell
3 years ago

  • Milestone changed from Future Release to 6.0

Per comment:19 moving this to 6.0 now that the milestone exists.

This ticket was mentioned in Slack in #core by costdev. View the logs.


3 years ago

#23 @costdev
3 years ago

  • Keywords has-unit-tests added

Per the discussion in the bug scrub:

  • I'm adding the needs-unit-tests keyword so that this can be evaluated by the test team.
  • @craigfrancis has kindly offered to take a look at the patch when he's available.
  • While this ticket already has the performance focus, it may be worth reaching out in #performance to get eyes on this ticket when the approach has been given feedback.

This ticket was mentioned in Slack in #core by craigfrancis. View the logs.


3 years ago

#25 @craigfrancis
3 years ago

As we have a Dev Chat today, I'll summarise my conversation with Mark so far:

  • The current patch does not change anything if the query uses OR (ref $this->has_or_relation), which does not help @jillebehm (rough example). This is because find_compatible_table_alias() will try to re-use an existing alias, which would have specified the meta_key from the first condition.
  • In my own basic tests, which does not represent real data, I was able to get similar improvements by changing the field to varchar(191) (ref #33885), or running OPTIMIZE TABLE, because the updated INDEX info would lead to a much better query execution plan.
  • I have a small concern about how the get_meta_sql filter/hook might be affected; e.g. a plugin that modifies the 'join' string (e.g. to apply their own version of this fix), but I think they should expect changes.

This ticket was mentioned in Slack in #core by costdev. View the logs.


3 years ago

#27 @peterwilsoncc
3 years ago

  • Milestone changed from 6.0 to Future Release

This was discussed in a bug scrub today.

Due to the complexity of the task and the patch still requiring some work, it was decided to move it off the 6.0 milestone.

#28 @rjasdfiii
2 years ago

CAST(mt1.meta_value AS DATE) = '2020-01-23'

For MySQL and for DATE or DATETIME, that can be simplified to

mt1.meta_value = '2020-01-23'

This change will allow an index, even INDEX(32) to be of some use for optimization. Cf, the plugin: https://wordpress.org/plugins/index-wp-mysql-for-speed/

Caveat, if meta_value is actually a DATETIME and the intent is to truncate to DATE, then the CAST (or equivalent) is needed. The following would be better, but probably not worth considering:

mt1.meta_value LIKE '2020-01-23%'

#29 @rjasdfiii
2 years ago

Re: pinplugin_event_start_date and pinplugin_event_start_time

It is almost always better (in MySQL) to use a single column instead of two columns for a DATETIME. (I don't know the details of the plugin, so I can't be more specific than to suggest you consider this change.)

Version 0, edited 2 years ago by rjasdfiii (next)

#30 @flixos90
3 weeks ago

  • Keywords needs-refresh added
  • Milestone Future Release deleted
  • Resolution set to maybelater
  • Status changed from new to closed

There hasn't been any activity on this ticket for a few years, and the PR was opened almost 4 years ago without having moved forward since.

Moving it forward is going to require very specific domain knowledge to make it happen. FWIW I personally feel not comfortable approving SQL changes like this due to my lack of familiarity with its performance nuances.

Aside: It would be great to iterate on the ticket summary to clarify what it is about, since "Improve meta query" is extremely vague.

Due to the lack of traction, I'm going to close this as maybelater for now. If you see this ticket and are interested in working on it, please reopen.

#31 @rjasdfiii
3 weeks ago

  • Resolution maybelater deleted
  • Status changed from closed to reopened

#32 @audrasjb
3 weeks ago

@rjasdfiii could you please add some thoughts about why this ticket is reopened after comment:30 by @flixos90? Thanks :)

#33 @rjasdfiii
3 weeks ago

Looking at the report again [belatedly!], I have to agree that moving the AND to the ON clause is important and "required":

Bad:

    INNER JOIN  riff19_postmeta AS mt3  ON ( riff19_posts.ID = mt3.post_id )
    WHERE ( blah-blah )
            OR ( this-and-that
                   AND  ( mt3.meta_key = 'pinplugin_event_end_time' )
               )

Good:

    INNER JOIN  riff19_postmeta AS mt3  ON ( riff19_posts.ID = mt3.post_id
                  AND  mt3.meta_key = 'pinplugin_event_end_time'

)

Explanation: The only "meta_key" of interest for "mt3" is 'pinplugin_event_end_time'. This is what the "Good" version says. The "Bad" version tests the meta_key only some of the time.

(Similarly for the other JOINs to postmeta.)

Putting the test in the ON emphasizes my point. This produces identical results, but is less clear:

    INNER JOIN  riff19_postmeta AS mt3  ON ( riff19_posts.ID = mt3.post_id )
    WHERE mt3.meta_key = 'pinplugin_event_end_time'
      AND ( blah-blah )
            OR ( this-and-that
                   AND  ( mt3.meta_key = 'pinplugin_event_end_time' )  -- now redundant
               )

With the Good version, plus the improved indexes, the lookup in mt3 can use the combined index on post_id and meta_key, hence the dramatic speedup.

#34 @rjasdfiii
3 weeks ago

@audrasjb - I reopened because I feel I have "very specific domain knowledge" in MySQL to justify the change and explain it. See comment:33 .

Note: See TracTickets for help on using tickets.