Make WordPress Core

Opened 13 years ago

Last modified 5 years ago

#19738 reopened enhancement

meta_query should check for wildcards when using LIKE operator

Reported by: ejdanderson's profile ejdanderson Owned by:
Milestone: Priority: normal
Severity: normal Version: 3.2
Component: Query Keywords:
Focuses: Cc:

Description

This seems intentional, but when using the LIKE operator in a meta query, the % wildcards are automatically added to the start and end of the value when generating the SQL.

I think a more robust way of doing this would be to only add said wildcards if one cannot be found within the value itself.

Change History (13)

#1 follow-up: @scribu
13 years ago

  • Keywords 2nd-opinion added; needs-patch removed

I'm on the fence on this one. I don't think we expose MySQL's '%' wildcards anywhere ATM.

#2 in reply to: ↑ 1 @ejdanderson
13 years ago

Replying to scribu:

I'm on the fence on this one. I don't think we expose MySQL's '%' wildcards anywhere ATM.

I dont think so either, there are a few places in the code that act in a similar way get_terms() for example, although in that case its a search parameter and seems appropriate.

I can see why exposing this is a bad idea, we don't want user input to be able to set SQL wildcards.

There have been numerous times though when I need to filter in custom SQL for a LIKE operator. A large majority of these have been to remove the leading '%'. Maybe a better solution is a flag to describe where '%' should be set - preceding, trailing, or both.

#3 @SpaceMacGyver
12 years ago

  • Cc SpaceMacGyver added

While I appreciate the need to abstract the SQL away from the query, as ejdanderson suggests a flag to control this would indeed be the appropriate means of handling this. As a stopgap measure, though, a filter on get_meta_sql to modify the WHERE clause should hopefully address this (example code provided below to remove the first % from the LIKE condition):

function remove_sql_wildcard_prefix($q)
{
        $q['where'] = preg_replace("/(LIKE ')%(.*?%')/", "$1$2", $q['where']);
        return $q;
}
add_filter('get_meta_sql', 'remove_sql_wildcard_prefix');

(disclaimer: the provided code is not yet tested, but in theory I think it works)

#4 @SergeyBiryukov
11 years ago

#23919 was marked as a duplicate.

#5 @wonderboymusic
11 years ago

  • Keywords 2nd-opinion removed
  • Milestone Awaiting Review deleted
  • Resolution set to wontfix
  • Status changed from new to closed

This might need better docs, but anyone doing wildcard matching for a meta_query should catch the error the first time they debug it.

#6 @SergeyBiryukov
11 years ago

#25431 was marked as a duplicate.

#7 @Will Brownsberger
10 years ago

Space's suggestion was helpful for me. To get the full benefit, the following additional changes seemed necessary:

(1) Add an index to wp_postmeta on the first 25 positions of the meta_key and meta_value columns. This alone results in a huge improvement for like searches with only a trailing wildcard in direct sql queries. However, it makes no difference when querying through the wp_query object.
(2) I found that WP_Query meta casts the longtext metavalues as char like so: CAST(wp_postmeta.meta_value AS CHAR). Apparently, this prevents the MYSQL optimizer from recognizing the opportunity to use the added index. If the CAST wrapper is removed, the queries use the index and improve dramatically.

I have a set of queries against a large set of meta_values and have enhanced the filter suggested above by adding the line below:. I add the filter just before the big queries and then remove so as not to interfere with other queries:

      
function remove_sql_wildcard_prefix($q) {
$q['where'] = preg_replace("/(LIKE ')%(.*?%')/", "$1$2", $q['where']);
$q["where"] = preg_replace("/(CAST\()(.*?)(.meta_value)( AS CHAR\))/", "$2$3", $q["where"] );

return $q;
}
add_filter('get_meta_sql', 'remove_sql_wildcard_prefix');

These changes resulted in a 250x acceleration in my context -- large meta value set for custom posts.

Last edited 10 years ago by Will Brownsberger (previous) (diff)

#8 follow-up: @boonebgorges
10 years ago

  • Milestone set to Future Release
  • Resolution wontfix deleted
  • Status changed from closed to reopened

I was thinking about this just the other day, so I was happy to see this ticket updated to see I wasn't the only one. I'm reopening for further consideration. I think that recognizing '%' in 'value' might pose backward compatibility problems, but there are other ways of tackling this (like another argument 'like_type' => 'both|front|back').

#9 in reply to: ↑ 8 @Will Brownsberger
10 years ago

Replying to boonebgorges:

I was thinking about this just the other day, so I was happy to see this ticket updated to see I wasn't the only one. I'm reopening for further consideration. I think that recognizing '%' in 'value' might pose backward compatibility problems, but there are other ways of tackling this (like another argument 'like_type' => 'both|front|back').

Boone, do you see risks in removing the Char type cast? I'm sure I'm creating problems for myself somehow, but the improvement is huge.

#10 @boonebgorges
10 years ago

Will - No, I don't think there's a particular danger, as long as you're only using operators like IN and =. (Things like BETWEEN and > will break if you require numeric cast.) It's actually a bit odd that we do this cast at all in the case of a default data type - I think we should look into skipping it.

The bigger change is placing an index on the meta_value column, which could cause some serious performance issues on write actions at scale. But I think this is something that would be nice to benchmark a bit.

#11 @Will Brownsberger
10 years ago

Boone, thank you for the feedback on this.

Seems like WP is optimized for searching for posts using the taxomony framework and then just grabbing meta information about the post as additional data. A search based on meta information is not a priority for normal blogging and not well supported by WP -- especially a search in which meta information is the only screen in the search so that the database has to plow through a lot of postmeta records. I am doing a CRM plugin so the number of records being stored as special post-types is much greater than the number of regular posts and the meta searching bogs down.

I was initially trying to write entirely within WP's data structure, but I concluded that if I was going to be making changes to WP's database to support my plugin, it would do less violence to the framework and involve less risk of conflict simply to add a table to the database with its own well optimized index. This forced a rewrite of all the data structure of the application, but probably for the better. It also seems better for performance all round not to clutter the post and postmeta tables with a lot of records that could be effectively partioned in a separate table.

/w.

Last edited 10 years ago by Will Brownsberger (previous) (diff)

#12 @boonebgorges
10 years ago

Will - Yes to everything you've said here. WP_Meta_Query is provided for convenience, not a recommendation :) I'm happy for WP to take whatever steps are reasonable and necessary to make meta queries perform better, but in the end, we're pretty much hamstrung by the database schema. People who need super high performance queries on a very large dataset should either use WP's taxonomies or roll their own system, like you've done. Thanks for the feedback :)

This ticket was mentioned in Slack in #core by sergey. View the logs.


9 years ago

Note: See TracTickets for help on using tickets.