Opened 11 years ago
Closed 10 years ago
#24093 closed defect (bug) (fixed)
WP_Meta_Query is inefficient when referencing the same keys in "OR" query
Reported by: | sc0ttkclark | Owned by: | boonebgorges |
---|---|---|---|
Milestone: | 4.1 | Priority: | normal |
Severity: | normal | Version: | 3.5.1 |
Component: | Query | Keywords: | meta-query has-patch |
Focuses: | performance | Cc: |
Description
This is part of the chain of thought from #19729
The corresponding fix for #19729 was done in [22103]
I believe this fix didn't take things far enough. For example, a meta_query that does a lot of comparing on the same meta_key, will have a JOIN and a WHERE for each query array. It shouldn't have to re-join the postmeta table on each query array though, since they are based on the same meta_key.
I'll attach a suggested patch that will only join if it needs to, and use the previous alias for the first meta_key joined table if it's already been joined.
Attachments (6)
Change History (45)
#2
@
11 years ago
Stumbled upon this patch/ticket while updating Codex on WP_Meta_Query. Nice patch! Finally.
Just one note: If you use a meta_query
array
$query_args = array( 'meta_query' => array( 'relation' => 'OR', array( 'key' => 'foo_key', // 'value' => 'foo', // 'compare' => 'LIKE', ), array( 'key' => 'bar_key', ), ) ); $meta_query = new WP_Meta_Query(); $meta_query->parse_query_vars( $query_args ); $mq_sql = $meta_query->get_sql( 'post', $wpdb->posts, 'ID', null );
that only searches for two key/value pairs (so two subarrays), then it will still do a join on the native table names like
INNER JOIN wp_postmeta ON wp_post.ID = wp_postmeta.post_id
and then do an additional
INNER JOIN wp_postmeta AS mt{$i} ON (wp_posts.ID = {$alias}{$i}.post_id)
so the result will look like:
'join' => string ' INNER JOIN wp_postmeta ON wp_posts.ID = wp_postmeta.post_id INNER JOIN wp_postmeta AS mt1 ON (wp_posts.ID = mt1.post_id)'
#4
@
11 years ago
@sc0ttkclark Please keep in mind that the $alias
will be used in all returning WHERE
clauses, so that needs consideration as well.
#5
@
11 years ago
For sure, will look this patch over again and see what it'll take to bring it to it's original intention.
#10
@
11 years ago
I wrote a patch that uses subqueries instead of JOIN operations and seems to be significantly more efficient. It should work fine for both OR and AND operations. I tested the patch on a relatively large site running WordPress 3.8.
I'd be interested to get the opinions of core contributors and see the results of unit tests to determine if this is a workable solution.
I posted a detailed writeup on my blog, here: http://kevinfodness.com/2014/01/21/performance-patch-for-wp_meta_query-in-wordpress-core/
The patch itself can be found here: http://kevinfodness.com/wp-content/uploads/2014/01/wp_meta_query.patch
#11
@
11 years ago
The previous patch I submitted was git formatted - per guidelines I converted it to an SVN formatted patch and attached it to this issue.
#12
@
11 years ago
- Keywords has-patch added
Hi,
any update about that issue or plans to incorporate the fix in core for 3.8.x?
#13
@
11 years ago
- Keywords needs-patch added; has-patch removed
I installed the patch and for me it works very good. I hope this update will be incorporated into the next WP release.
#14
@
11 years ago
kevinfodness Your modification works fantastic when You need to query thru multiply meta_keys but it breaks order_by meta_key statment.
Here is my $args fot WP_Query
$args = array( 'post_type' => 'szkolenia', 'posts_per_page' => -1, 'meta_key' => 'crs_date_from', 'orderby' => 'meta_value', 'order' => 'ASC' );
This is how mysql query looks after Your patch:
wp_posts.* FROM wp_posts WHERE 1=1 AND wp_posts.post_type = 'szkolenia' AND (wp_posts.post_status = 'publish' OR wp_posts.post_author = 18 AND wp_posts.post_status = 'private') AND (wp_postmeta.meta_key = 'crs_date_from' ) GROUP BY wp_posts.ID ORDER BY wp_postmeta.meta_value ASC
Also this is an error from mysql:
#1054 - Unknown column 'wp_postmeta.meta_key' in 'where clause'
and before patch:
wp_posts.* FROM wp_posts INNER JOIN wp_postmeta ON (wp_posts.ID = wp_postmeta.post_id) WHERE 1=1 AND wp_posts.post_type = 'szkolenia' AND (wp_posts.post_status = 'publish' OR wp_posts.post_author = 18 AND wp_posts.post_status = 'private') AND (wp_postmeta.meta_key = 'crs_date_from' ) GROUP BY wp_posts.ID ORDER BY wp_postmeta.meta_value ASC
#15
@
10 years ago
- Milestone changed from Awaiting Review to 4.0
I want to look closer at the performance benefits here
#17
@
10 years ago
I need badly this fix so I have given it another look. Starting from the second patch, I have managed to fix the DB errors that have been mentionned by Mr.Defi.
So here is my patched patch (see above, it is called 24093.2.patch)
#18
@
10 years ago
- Keywords has-patch added; needs-patch removed
Hi,
I have managed to fix the orderby meta issue.
So to sum up what I did:
- Always alias the meta table in the query, starting at mt0. Before the patch, the first occurence was not aliased and the following ones were aliased starting at mt1
- In the WP_Query class, for ordering, we simply use the mt0 alias instead of the main meta table name.
- The rest is pretty similar to what had been proposed by the original patchers.
Let me know if there is any test case that fails and how to reproduce it, I'll try to fix the issue. I would love that patch to make its way into WP 4.0.
#19
@
10 years ago
This "bug" reared its ugly head after using a plugin by vprat. Using mySQL profiling I found the impact to be crippling after adding more than a few groups to a user. And keep in mind that this install is very small... like around 15 posts. I'm sure a site with more posts would get impacted much earlier as adding multiple INNER JOIN statements blow up the tempDB exponentially.
You can see the thread with my investigation here - http://customer-area.marvinlabs.com/support/topic/user-in-multiple-groups-causes-page-hang-and-100-cpu/
I would really like to have this merged into core as soon as possible. I've tested the patch on 3.9.1 with no issues - or at least none that I could find.
#20
@
10 years ago
- Milestone changed from 4.0 to Future Release
Sounds like we still need patch review, unit tests, and profiling data.
#21
@
10 years ago
Come on, that ticket is more than a year old and is derived from a 3 year old ticket that had an incomplete fix.
Profiling data is pretty simple. A query with 6 meta query elements will simply never complete without the patch. When having 4 meta query elements, as said, it goes from a few minutes to a few milliseconds.
Unit tests have been provided with the ticket referenced in the ticket of origin. I guess they are still valid.
I was starting to consider dedicating more time to core fixes but if each patch I submit has to stay around for 3 years before it gets reviewed and committed, I'll just pass :(
#22
@
10 years ago
I would love if somebody could review the patch here (not my wheelhouse), and if we could get unit tests into the same patch for overall review. We are still too late for 4.0, though.
#23
@
10 years ago
It would be good to get this finally in. I happened to stuble on this when I was looking for an answer to a problem I have with a dynamically generated meta_query. When the amount of the meta_query arrays exceeds nine my local development server and my staging server both just freeze. The DB query seems to run indefinitely and I need to restart the MySQL server to get the site operational again.
EDIT 1: I can confirm that vprat's latest patch fixes the issue on my development and test server.
EDIT 2: The patch doesn't seem to break anything but cases WP to show errors on the admin side, example below.
WordPress database error: [Unknown column 'mt1.meta_key' in 'where clause'] SELECT wp_posts.ID FROM wp_posts LEFT JOIN wp_postmeta AS mt0 ON (wp_posts.ID = mt0.post_id AND mt0.meta_key = '_searchwp_last_index') WHERE 1=1 AND wp_posts.post_type IN ('post', 'page', 'attachment', 'person', 'product', 'service_provider', 'q_and_a') AND ((wp_posts.post_status = 'publish' OR wp_posts.post_status = 'inherit')) AND ( mt0.post_id IS NULL AND mt1.meta_key = '_searchwp_skip' ) GROUP BY wp_posts.ID ORDER BY wp_posts.post_date DESC
#24
@
10 years ago
- Keywords dev-feedback added
- Type changed from enhancement to defect (bug)
There's no mention of a limit in the WP Meta Query class reference. The meta_query implementation creates so bad SQL that it actually can cripple the whole server with only 9 compares. I think the type of this should be changed to bug.
In general I also feel that it's weird that the meta_query implementation differs so much from the taxonomy_query implementation (f.ex. missing the support for multi level query with different compares).
#25
@
10 years ago
See #29560 for an extended set of unit tests that any change in SQL syntax has to pass.
A few thoughts about the patches here:
- Core team - this SQL is run through the 'get_meta_sql' filter, and changing to subqueries will break many plugins that make funky use of this filter. Is there a policy in place about backward compatibility for filters on SQL statements? (My take is: if the benefits of changing are big enough, make the change, but document it and warn developers.)
- Even if we don't switch to subqueries, one conservative step we can take, which will address at least part of the problem, is to do better initial parsing of the meta_query argument, combining redundant args into more efficient chunks. If a plugin dev passes the following meta_query:
'relation' => 'OR', array( 'key' => 'foo', 'value' => 'bar', ), array( 'key' => 'foo', 'value' => 'barry', ),
we can safely translate it to the following and save a table join:
array( 'key' => 'foo', 'value' => array( 'bar', 'barry', ) 'compare' => 'IN', ),
- More detailed benchmarks about the switch to subqueries would make the case for a switch much more persuasive. The post linked to by ve9gra in https://core.trac.wordpress.org/ticket/24093#comment:19 is a good start, but ideally we'd have a script for generating test data (maybe a wp-cli subcommand) for the core team to test with, and/or more detailed specs about server environment, the matrix of queries tested, etc.
#26
@
10 years ago
@sarukuku - "The patch doesn't seem to break anything but cases WP to show errors on the admin side"
Can you please specify some steps to reproduce this? I would be happy to fix it.
@boonebgorges -
- It is always somehow risky for a plugin author to play with SQL directly instead of using the WP core functions. So I feel these cases should be adressed by plugin authors (and this is why I submitted a patch to improve WP Core instead of patching my plugin with such a filter).
- Your workaround fixes a very tiny part of the problem. Does not work for anything else than = operator for instance (LIKE...)
- This is very easy to reproduce. You simply need to issue a WP_Query with at least 5 clauses in the meta query. Even if you only have a few posts (10/20) in the database it will blow (will not even complete). With the patch, the query takes a few milliseconds in all cases.
This ticket was mentioned in IRC in #wordpress-dev by boonebgorges. View the logs.
10 years ago
#28
@
10 years ago
@vprat - It seems to be only the WPSearch plugin (https://searchwp.com/) that is confused about the changes in the patch on the environment I tested the it. Plain WP installation doesn't seem to have any problems with the patch as far as I can see.
tl;dr
Patch works, random plugin was confused and caused the error.
#29
@
10 years ago
The unit tests in #29560 are great for this ticket - they're showing a bunch of bugs in 24093.4.patch that need fixing.
I'm not particularly fussed about us changing the SQL that goes through the get_meta_sql
filter (I'm in favour of deprecating all filters on SQL, but that's a different discussion). We should definitely at least warn developers that it's changing, though.
@vprat: Could you upload your test cases to this ticket, so that we're all looking at the same test data?
#30
@
10 years ago
It's worth noting that this will need testing on different versions of MySQL (at least the latest major versions from 5.0 to 5.7) - various optimisations for both JOINs and sub-queries have been introduced over the years, we can't assume that performance on the latest version is indicative of performance on all versions.
#31
@
10 years ago
I want to continue looking into the possibility of subqueries. But, for the time being, the idea behind sc0ttclark's initial 24093.patch could, I think, be implemented right away. The idea has become more complicated now that the logic of WP_Meta_Query is recursive [29887] - using a single JOIN for multiple clauses only works if 'relation=OR', but the recursive logic means that there could be any number of relations in a query tree. I think it can look something like this:
- In get_sql_for_clause(), tirst time a meta_key is referenced, create a new alias and JOIN clause.
- Store the alias in an array keyed by the meta_key (so far, it's like in your patch)
- The next time you see a clause that has a meta_key that already has an alias, check to see whether the $parent_query has OR as a relation. If so, use the existing alias.
This will break down when you get several generations deep, though. If you have a query like this, I don't think you can combine the 'foo' and 'foo1' tables, even though the $parent_query has relation=OR:
array( 'relation' => 'AND', array( 'relation' => 'OR', array( 'key' => 'foo', 'value' => 'bar', ), array( 'key' => 'foo1', 'value' => 'bar', ), ), array( 'relation' => 'OR', array( 'key' => 'foo', 'value' => 'baz', ), array( 'key' => 'foo1', 'value' => 'baz', ), ) )
Clearly, this will need some thought, and some unit tests, and maybe some compromises regarding how many JOINS we can eliminate in case of multiply-nested queries. Who is up for a Really Fun Time writing this patch?
#32
@
10 years ago
- Keywords needs-patch added; needs-refresh has-patch dev-feedback removed
- Owner set to boonebgorges
- Status changed from new to accepted
I've been thinking about this some more, and I think I've come up with a technique for consolidating JOINs that is along the lines of what I've laid out above. See #18105 [29902] for how it works in WP_Tax_Query
. The check for "compatible aliases" will be more complicated here. I'll write some unit tests and see what I can do about a patch.
This ticket was mentioned in IRC in #wordpress-dev by boonebgorges. View the logs.
10 years ago
#34
@
10 years ago
- Keywords has-patch added; needs-unit-tests needs-patch removed
- Milestone changed from Future Release to 4.1
24093.5.patch is an attempt at removing most of the unnecessary joins from WP_Meta_Query. The basic logic is this:
- Positive operators (those that look for a match rather than non-matches - such as
IN
andLIKE
) that appear under the scope of an OR can share a single table join. Negative operators (!=
,NOT IN
, etc) connected by AND can share a table join as long as the clauses are looking at the same keys. (I think there are more cases than just this where sharing a join is possible, but I want to be conservative.) - So: in each call to
get_sql_for_clause()
, the JOIN logic goes like this. Does this clause have any siblings that are already doing a join and are compatible with this clause (by the rules listed above)? If so, use that alias. If not, create a JOIN clause, and store the alias for use by later siblings.
So that's the first thing 24093.5.patch does. As I was writing this part of the patch - which is largely a port of [29902] - I realized that most of the rat's nest of WP_Meta_Query::get_sql_for_clause()
was a result of past developers attempting, in patchwork ways, to avoid unneccessary table joins. (See, for example, the old 'key only' queries.) With the new centralized logic for finding "compatible aliases", it became possible to rewrite the SQL generation logic of WP_Meta_Query
so that it actually makes sense.
I've run some preliminary performance tests. I created a database with about 25000 posts and about 50,000 pseudo-random rows in wp_postmeta. Here are the queries I ran (never mind the nonsense words :) ):
$q1 = new WP_Query( array( 'meta_query' => array( 'relation' => 'OR', array( 'key' => 'horse', ), array( 'key' => 'nice', 'value' => 'boo', ), array( 'key' => 'crhrcb', 'value' => array( 'horse', 'boo', 'nice' ), 'compare' => 'NOT IN', ), array( 'key' => 'flim', 'value' => array( 'horse', 'boo', 'nice' ), 'compare' => 'IN', ), ), ) );
With 24093.5.patch, the query goes from 4 INNER JOIN
s to 2 (NOT IN can't share the join), which cuts execution time roughly in half.
It'd be great to have someone look over the logic above and the logic of the patch to make sure I'm not missing something obvious. I'm fairly confident that what I'm proposing here is solid, and it's definitely a big performance win (even if we continue to explore additional avenues for performance improvements in WP_Meta_Query).
#35
@
10 years ago
This looks awesome @boonebgorges -- and brings up another idea that could be useful to plugins, adding a filter to the return of find_compatible_table_alias would be grand. This could allow a plugin, such as Pods, to pick up a meta fields that it covers and stores in a separate table and return a custom alias, then hook in later in the joins clauses to add it's join there. Other than that, this is the bees knees and probably wings too.
#36
@
10 years ago
Thanks so much for having a look, sc0ttkclark! The filter idea sounds fine to me.
Patch to join postmeta for meta_query only as needed