Make WordPress Core

Opened 11 years ago

Closed 10 years ago

#24093 closed defect (bug) (fixed)

WP_Meta_Query is inefficient when referencing the same keys in "OR" query

Reported by: sc0ttkclark's profile sc0ttkclark Owned by: boonebgorges's profile boonebgorges
Milestone: 4.1 Priority: normal
Severity: normal Version: 3.5.1
Component: Query Keywords: meta-query has-patch
Focuses: performance Cc:

Description

This is part of the chain of thought from #19729

The corresponding fix for #19729 was done in [22103]

I believe this fix didn't take things far enough. For example, a meta_query that does a lot of comparing on the same meta_key, will have a JOIN and a WHERE for each query array. It shouldn't have to re-join the postmeta table on each query array though, since they are based on the same meta_key.

I'll attach a suggested patch that will only join if it needs to, and use the previous alias for the first meta_key joined table if it's already been joined.

Attachments (6)

24093.patch (1.8 KB) - added by sc0ttkclark 11 years ago.
Patch to join postmeta for meta_query only as needed
24093.1.patch (1.0 KB) - added by kevinfodness 11 years ago.
Patch for meta.php to improve query performance
24093.2.patch (2.2 KB) - added by vprat 10 years ago.
Fixes SQL errors (does not yet fix the order by meta_value problem)
24093.3.patch (2.9 KB) - added by vprat 10 years ago.
Fixes the problem with ordering by meta_value
24093.4.patch (2.5 KB) - added by vprat 10 years ago.
Patch for the latest WordPress 4.0 version
24093.5.patch (23.2 KB) - added by boonebgorges 10 years ago.

Download all attachments as: .zip

Change History (45)

@sc0ttkclark
11 years ago

Patch to join postmeta for meta_query only as needed

#1 @sc0ttkclark
11 years ago

  • Version set to 3.5.1

#2 @F J Kaiser
11 years ago

Stumbled upon this patch/ticket while updating Codex on WP_Meta_Query. Nice patch! Finally.

Just one note: If you use a meta_query array

$query_args = array( 'meta_query' => array(
	'relation' => 'OR',
	array(
		'key' => 'foo_key',
		// 'value' => 'foo',
		// 'compare' => 'LIKE',
	),
	array(
		'key' => 'bar_key',
	),
) );
$meta_query = new WP_Meta_Query();
$meta_query->parse_query_vars( $query_args );
$mq_sql = $meta_query->get_sql(
	'post',
	$wpdb->posts,
	'ID',
	null
);

that only searches for two key/value pairs (so two subarrays), then it will still do a join on the native table names like

INNER JOIN wp_postmeta ON wp_post.ID = wp_postmeta.post_id

and then do an additional

INNER JOIN wp_postmeta AS mt{$i} ON (wp_posts.ID = {$alias}{$i}.post_id)

so the result will look like:

'join' => string ' INNER JOIN wp_postmeta ON wp_posts.ID = wp_postmeta.post_id INNER JOIN wp_postmeta AS mt1 ON (wp_posts.ID = mt1.post_id)'

#3 @sc0ttkclark
11 years ago

Thanks for the feedback, I'll have a look and update the patch accordingly.

#4 @F J Kaiser
11 years ago

@sc0ttkclark Please keep in mind that the $alias will be used in all returning WHERE clauses, so that needs consideration as well.

#5 @sc0ttkclark
11 years ago

For sure, will look this patch over again and see what it'll take to bring it to it's original intention.

#6 @wonderboymusic
11 years ago

  • Keywords needs-refresh added; has-patch needs-testing dev-feedback removed

#7 @wonderboymusic
11 years ago

  • Keywords meta-query added

#8 @wonderboymusic
11 years ago

  • Keywords needs-unit-tests added

#9 @nacin
11 years ago

#26281 was marked as a duplicate.

#10 @kevinfodness
11 years ago

I wrote a patch that uses subqueries instead of JOIN operations and seems to be significantly more efficient. It should work fine for both OR and AND operations. I tested the patch on a relatively large site running WordPress 3.8.

I'd be interested to get the opinions of core contributors and see the results of unit tests to determine if this is a workable solution.

I posted a detailed writeup on my blog, here: http://kevinfodness.com/2014/01/21/performance-patch-for-wp_meta_query-in-wordpress-core/

The patch itself can be found here: http://kevinfodness.com/wp-content/uploads/2014/01/wp_meta_query.patch

@kevinfodness
11 years ago

Patch for meta.php to improve query performance

#11 @kevinfodness
11 years ago

The previous patch I submitted was git formatted - per guidelines I converted it to an SVN formatted patch and attached it to this issue.

#12 @vprat
11 years ago

  • Keywords has-patch added

Hi,

any update about that issue or plans to incorporate the fix in core for 3.8.x?

#13 @Yourdigihands
11 years ago

  • Keywords needs-patch added; has-patch removed

I installed the patch and for me it works very good. I hope this update will be incorporated into the next WP release.

#14 @Mr.Defi
10 years ago

kevinfodness Your modification works fantastic when You need to query thru multiply meta_keys but it breaks order_by meta_key statment.

Here is my $args fot WP_Query

$args = array(
	'post_type' => 'szkolenia',
	'posts_per_page' => -1,
	'meta_key' => 'crs_date_from',
	'orderby'  => 'meta_value',
	'order' => 'ASC'
);

This is how mysql query looks after Your patch:

SELECT wp_posts.* FROM wp_posts  WHERE 1=1  AND wp_posts.post_type = 'szkolenia' AND (wp_posts.post_status = 'publish' OR wp_posts.post_author = 18 AND wp_posts.post_status = 'private') AND (wp_postmeta.meta_key = 'crs_date_from' ) GROUP BY wp_posts.ID ORDER BY wp_postmeta.meta_value ASC

Also this is an error from mysql:

#1054 - Unknown column 'wp_postmeta.meta_key' in 'where clause'

and before patch:

SELECT wp_posts.* FROM wp_posts  INNER JOIN wp_postmeta ON (wp_posts.ID = wp_postmeta.post_id) WHERE 1=1  AND wp_posts.post_type = 'szkolenia' AND (wp_posts.post_status = 'publish' OR wp_posts.post_author = 18 AND wp_posts.post_status = 'private') AND (wp_postmeta.meta_key = 'crs_date_from' ) GROUP BY wp_posts.ID ORDER BY wp_postmeta.meta_value ASC
Last edited 10 years ago by Mr.Defi (previous) (diff)

#15 @wonderboymusic
10 years ago

  • Milestone changed from Awaiting Review to 4.0

I want to look closer at the performance benefits here

#16 @wonderboymusic
10 years ago

  • Focuses performance added

#17 @vprat
10 years ago

I need badly this fix so I have given it another look. Starting from the second patch, I have managed to fix the DB errors that have been mentionned by Mr.Defi.

So here is my patched patch (see above, it is called 24093.2.patch)

@vprat
10 years ago

Fixes SQL errors (does not yet fix the order by meta_value problem)

@vprat
10 years ago

Fixes the problem with ordering by meta_value

#18 @vprat
10 years ago

  • Keywords has-patch added; needs-patch removed

Hi,

I have managed to fix the orderby meta issue.

So to sum up what I did:

  • Always alias the meta table in the query, starting at mt0. Before the patch, the first occurence was not aliased and the following ones were aliased starting at mt1
  • In the WP_Query class, for ordering, we simply use the mt0 alias instead of the main meta table name.
  • The rest is pretty similar to what had been proposed by the original patchers.

Let me know if there is any test case that fails and how to reproduce it, I'll try to fix the issue. I would love that patch to make its way into WP 4.0.

#19 @ve9gra
10 years ago

This "bug" reared its ugly head after using a plugin by vprat. Using mySQL profiling I found the impact to be crippling after adding more than a few groups to a user. And keep in mind that this install is very small... like around 15 posts. I'm sure a site with more posts would get impacted much earlier as adding multiple INNER JOIN statements blow up the tempDB exponentially.

You can see the thread with my investigation here - http://customer-area.marvinlabs.com/support/topic/user-in-multiple-groups-causes-page-hang-and-100-cpu/

I would really like to have this merged into core as soon as possible. I've tested the patch on 3.9.1 with no issues - or at least none that I could find.

#20 @helen
10 years ago

  • Milestone changed from 4.0 to Future Release

Sounds like we still need patch review, unit tests, and profiling data.

#21 @vprat
10 years ago

Come on, that ticket is more than a year old and is derived from a 3 year old ticket that had an incomplete fix.

Profiling data is pretty simple. A query with 6 meta query elements will simply never complete without the patch. When having 4 meta query elements, as said, it goes from a few minutes to a few milliseconds.

Unit tests have been provided with the ticket referenced in the ticket of origin. I guess they are still valid.

I was starting to consider dedicating more time to core fixes but if each patch I submit has to stay around for 3 years before it gets reviewed and committed, I'll just pass :(

#22 @helen
10 years ago

I would love if somebody could review the patch here (not my wheelhouse), and if we could get unit tests into the same patch for overall review. We are still too late for 4.0, though.

@vprat
10 years ago

Patch for the latest WordPress 4.0 version

#23 @sarukuku
10 years ago

It would be good to get this finally in. I happened to stuble on this when I was looking for an answer to a problem I have with a dynamically generated meta_query. When the amount of the meta_query arrays exceeds nine my local development server and my staging server both just freeze. The DB query seems to run indefinitely and I need to restart the MySQL server to get the site operational again.

EDIT 1: I can confirm that vprat's latest patch fixes the issue on my development and test server.

EDIT 2: The patch doesn't seem to break anything but cases WP to show errors on the admin side, example below.

WordPress database error: [Unknown column 'mt1.meta_key' in 'where clause']
SELECT wp_posts.ID FROM wp_posts LEFT JOIN wp_postmeta AS mt0 ON (wp_posts.ID = mt0.post_id AND mt0.meta_key = '_searchwp_last_index') WHERE 1=1 AND wp_posts.post_type IN ('post', 'page', 'attachment', 'person', 'product', 'service_provider', 'q_and_a') AND ((wp_posts.post_status = 'publish' OR wp_posts.post_status = 'inherit')) AND ( mt0.post_id IS NULL AND mt1.meta_key = '_searchwp_skip' ) GROUP BY wp_posts.ID ORDER BY wp_posts.post_date DESC
Last edited 10 years ago by sarukuku (previous) (diff)

#24 @sarukuku
10 years ago

  • Keywords dev-feedback added
  • Type changed from enhancement to defect (bug)

There's no mention of a limit in the WP Meta Query class reference. The meta_query implementation creates so bad SQL that it actually can cripple the whole server with only 9 compares. I think the type of this should be changed to bug.

In general I also feel that it's weird that the meta_query implementation differs so much from the taxonomy_query implementation (f.ex. missing the support for multi level query with different compares).

Last edited 10 years ago by sarukuku (previous) (diff)

#25 @boonebgorges
10 years ago

See #29560 for an extended set of unit tests that any change in SQL syntax has to pass.

A few thoughts about the patches here:

  1. Core team - this SQL is run through the 'get_meta_sql' filter, and changing to subqueries will break many plugins that make funky use of this filter. Is there a policy in place about backward compatibility for filters on SQL statements? (My take is: if the benefits of changing are big enough, make the change, but document it and warn developers.)
  1. Even if we don't switch to subqueries, one conservative step we can take, which will address at least part of the problem, is to do better initial parsing of the meta_query argument, combining redundant args into more efficient chunks. If a plugin dev passes the following meta_query:
'relation' => 'OR',
array(
    'key' => 'foo',
    'value' => 'bar',
),
array(
    'key' => 'foo',
    'value' => 'barry',
),

we can safely translate it to the following and save a table join:

array(
    'key' => 'foo',
    'value' => array( 'bar', 'barry', )
    'compare' => 'IN',
),
  1. More detailed benchmarks about the switch to subqueries would make the case for a switch much more persuasive. The post linked to by ve9gra in https://core.trac.wordpress.org/ticket/24093#comment:19 is a good start, but ideally we'd have a script for generating test data (maybe a wp-cli subcommand) for the core team to test with, and/or more detailed specs about server environment, the matrix of queries tested, etc.
Last edited 10 years ago by boonebgorges (previous) (diff)

#26 @vprat
10 years ago

@sarukuku - "The patch doesn't seem to break anything but cases WP to show errors on the admin side"

Can you please specify some steps to reproduce this? I would be happy to fix it.

@boonebgorges -

  1. It is always somehow risky for a plugin author to play with SQL directly instead of using the WP core functions. So I feel these cases should be adressed by plugin authors (and this is why I submitted a patch to improve WP Core instead of patching my plugin with such a filter).
  1. Your workaround fixes a very tiny part of the problem. Does not work for anything else than = operator for instance (LIKE...)
  1. This is very easy to reproduce. You simply need to issue a WP_Query with at least 5 clauses in the meta query. Even if you only have a few posts (10/20) in the database it will blow (will not even complete). With the patch, the query takes a few milliseconds in all cases.

This ticket was mentioned in IRC in #wordpress-dev by boonebgorges. View the logs.


10 years ago

#28 @sarukuku
10 years ago

@vprat - It seems to be only the WPSearch plugin (https://searchwp.com/) that is confused about the changes in the patch on the environment I tested the it. Plain WP installation doesn't seem to have any problems with the patch as far as I can see.

tl;dr
Patch works, random plugin was confused and caused the error.

#29 @pento
10 years ago

The unit tests in #29560 are great for this ticket - they're showing a bunch of bugs in 24093.4.patch that need fixing.

I'm not particularly fussed about us changing the SQL that goes through the get_meta_sql filter (I'm in favour of deprecating all filters on SQL, but that's a different discussion). We should definitely at least warn developers that it's changing, though.

@vprat: Could you upload your test cases to this ticket, so that we're all looking at the same test data?

#30 @pento
10 years ago

It's worth noting that this will need testing on different versions of MySQL (at least the latest major versions from 5.0 to 5.7) - various optimisations for both JOINs and sub-queries have been introduced over the years, we can't assume that performance on the latest version is indicative of performance on all versions.

#31 @boonebgorges
10 years ago

I want to continue looking into the possibility of subqueries. But, for the time being, the idea behind sc0ttclark's initial 24093.patch could, I think, be implemented right away. The idea has become more complicated now that the logic of WP_Meta_Query is recursive [29887] - using a single JOIN for multiple clauses only works if 'relation=OR', but the recursive logic means that there could be any number of relations in a query tree. I think it can look something like this:

  1. In get_sql_for_clause(), tirst time a meta_key is referenced, create a new alias and JOIN clause.
  2. Store the alias in an array keyed by the meta_key (so far, it's like in your patch)
  3. The next time you see a clause that has a meta_key that already has an alias, check to see whether the $parent_query has OR as a relation. If so, use the existing alias.

This will break down when you get several generations deep, though. If you have a query like this, I don't think you can combine the 'foo' and 'foo1' tables in this kind of query:

array(
    'relation' => 'AND',
    array(
        'relation' => 'OR',
        array(
            'key' => 'foo',
            'value' => 'bar',
        ),
        array(
            'key' => 'foo1',
            'value' => 'bar',
        ),
    ),
    array(
        'relation' => 'OR',
        array(
            'key' => 'foo',
            'value' => 'baz',
        ),
        array(
            'key' => 'foo1',
            'value' => 'baz',
        ),
    )
)

Clearly, this will need some thought, and some unit tests, and maybe some compromises regarding how many JOINS we can eliminate in case of multiply-nested queries. Who is up for a Really Fun Time writing this patch?

Version 0, edited 10 years ago by boonebgorges (next)

#32 @boonebgorges
10 years ago

  • Keywords needs-patch added; needs-refresh has-patch dev-feedback removed
  • Owner set to boonebgorges
  • Status changed from new to accepted

I've been thinking about this some more, and I think I've come up with a technique for consolidating JOINs that is along the lines of what I've laid out above. See #18105 [29902] for how it works in WP_Tax_Query. The check for "compatible aliases" will be more complicated here. I'll write some unit tests and see what I can do about a patch.

This ticket was mentioned in IRC in #wordpress-dev by boonebgorges. View the logs.


10 years ago

#34 @boonebgorges
10 years ago

  • Keywords has-patch added; needs-unit-tests needs-patch removed
  • Milestone changed from Future Release to 4.1

24093.5.patch is an attempt at removing most of the unnecessary joins from WP_Meta_Query. The basic logic is this:

  • Positive operators (those that look for a match rather than non-matches - such as IN and LIKE) that appear under the scope of an OR can share a single table join. Negative operators (!=, NOT IN, etc) connected by AND can share a table join as long as the clauses are looking at the same keys. (I think there are more cases than just this where sharing a join is possible, but I want to be conservative.)
  • So: in each call to get_sql_for_clause(), the JOIN logic goes like this. Does this clause have any siblings that are already doing a join and are compatible with this clause (by the rules listed above)? If so, use that alias. If not, create a JOIN clause, and store the alias for use by later siblings.

So that's the first thing 24093.5.patch does. As I was writing this part of the patch - which is largely a port of [29902] - I realized that most of the rat's nest of WP_Meta_Query::get_sql_for_clause() was a result of past developers attempting, in patchwork ways, to avoid unneccessary table joins. (See, for example, the old 'key only' queries.) With the new centralized logic for finding "compatible aliases", it became possible to rewrite the SQL generation logic of WP_Meta_Query so that it actually makes sense.

I've run some preliminary performance tests. I created a database with about 25000 posts and about 50,000 pseudo-random rows in wp_postmeta. Here are the queries I ran (never mind the nonsense words :) ):

	$q1 = new WP_Query( array(
		'meta_query' => array(
			'relation' => 'OR',
			array(
				'key' => 'horse',
			),
			array(
				'key' => 'nice',
				'value' => 'boo',
			),
			array(
				'key' => 'crhrcb',
				'value' => array( 'horse', 'boo', 'nice' ),
				'compare' => 'NOT IN',
			),
			array(
				'key' => 'flim',
				'value' => array( 'horse', 'boo', 'nice' ),
				'compare' => 'IN',
			),
		),
	) );

With 24093.5.patch, the query goes from 4 INNER JOINs to 2 (NOT IN can't share the join), which cuts execution time roughly in half.

It'd be great to have someone look over the logic above and the logic of the patch to make sure I'm not missing something obvious. I'm fairly confident that what I'm proposing here is solid, and it's definitely a big performance win (even if we continue to explore additional avenues for performance improvements in WP_Meta_Query).

#35 @sc0ttkclark
10 years ago

This looks awesome @boonebgorges -- and brings up another idea that could be useful to plugins, adding a filter to the return of find_compatible_table_alias would be grand. This could allow a plugin, such as Pods, to pick up a meta fields that it covers and stores in a separate table and return a custom alias, then hook in later in the joins clauses to add it's join there. Other than that, this is the bees knees and probably wings too.

#36 @boonebgorges
10 years ago

Thanks so much for having a look, sc0ttkclark! The filter idea sounds fine to me.

#37 @boonebgorges
10 years ago

In 29940:

Overhaul SQL generating logic in WP_Meta_Query to avoid unnecessary table joins.

The logic used to generate clause SQL in WP_Meta_Query is somewhat arcane,
stemming mostly from an ongoing effort to eliminate costly table joins when
they are not necessary. By systematizing the process of looking for shareable
joins - as was done in WP_Tax_Query [29902] - it becomes possible to simplify
the construction of SQL queries in get_sql_for_clause(). Moreover, the
simplified logic is actually considerably better at identifying shareable
joins, such that certain uses of WP_Meta_Query will see joins reduced by 50%
or more.

Includes integration tests for a representative cross-section of the query
clause combinations that result in shared table aliases.

Props boonebgorges, sc0ttkclark.
See #24093.

#38 @boonebgorges
10 years ago

In 29953:

Remove redundant table alias check in WP_Meta_Query.

Also adds documentation for 'meta_query_find_compatible_table_alias' filter.

See #24093.

#39 @boonebgorges
10 years ago

  • Resolution set to fixed
  • Status changed from accepted to closed

In the interest of having a ticket that we can close against the 4.1 milestone (and because the original ticket was about OR queries, which have been largely addressed by [29940]), I'd like to move discussion of the subquery/AND issue over to #30044. Thanks, everyone.

Note: See TracTickets for help on using tickets.