WordPress.org

Make WordPress Core

Opened 3 years ago

Last modified 5 months ago

#19738 reopened enhancement

meta_query should check for wildcards when using LIKE operator

Reported by: ejdanderson Owned by:
Milestone: Future Release Priority: normal
Severity: normal Version: 3.2
Component: Query Keywords:
Focuses: Cc:

Description

This seems intentional, but when using the LIKE operator in a meta query, the % wildcards are automatically added to the start and end of the value when generating the SQL.

I think a more robust way of doing this would be to only add said wildcards if one cannot be found within the value itself.

Change History (12)

comment:1 follow-up: @scribu3 years ago

  • Keywords 2nd-opinion added; needs-patch removed

I'm on the fence on this one. I don't think we expose MySQL's '%' wildcards anywhere ATM.

comment:2 in reply to: ↑ 1 @ejdanderson3 years ago

Replying to scribu:

I'm on the fence on this one. I don't think we expose MySQL's '%' wildcards anywhere ATM.

I dont think so either, there are a few places in the code that act in a similar way get_terms() for example, although in that case its a search parameter and seems appropriate.

I can see why exposing this is a bad idea, we don't want user input to be able to set SQL wildcards.

There have been numerous times though when I need to filter in custom SQL for a LIKE operator. A large majority of these have been to remove the leading '%'. Maybe a better solution is a flag to describe where '%' should be set - preceding, trailing, or both.

comment:3 @SpaceMacGyver3 years ago

  • Cc SpaceMacGyver added

While I appreciate the need to abstract the SQL away from the query, as ejdanderson suggests a flag to control this would indeed be the appropriate means of handling this. As a stopgap measure, though, a filter on get_meta_sql to modify the WHERE clause should hopefully address this (example code provided below to remove the first % from the LIKE condition):

function remove_sql_wildcard_prefix($q)
{
        $q['where'] = preg_replace("/(LIKE ')%(.*?%')/", "$1$2", $q['where']);
        return $q;
}
add_filter('get_meta_sql', 'remove_sql_wildcard_prefix');

(disclaimer: the provided code is not yet tested, but in theory I think it works)

comment:4 @SergeyBiryukov2 years ago

#23919 was marked as a duplicate.

comment:5 @wonderboymusic21 months ago

  • Keywords 2nd-opinion removed
  • Milestone Awaiting Review deleted
  • Resolution set to wontfix
  • Status changed from new to closed

This might need better docs, but anyone doing wildcard matching for a meta_query should catch the error the first time they debug it.

comment:6 @SergeyBiryukov19 months ago

#25431 was marked as a duplicate.

comment:7 @Will Brownsberger6 months ago

Space's suggestion was helpful for me. To get the full benefit, the following additional changes seemed necessary:

(1) Add an index to wp_postmeta on the first 25 positions of the meta_key and meta_value columns. This alone results in a huge improvement for like searches with only a trailing wildcard in direct sql queries. However, it makes no difference when querying through the wp_query object.
(2) I found that WP_Query meta casts the longtext metavalues as char like so: CAST(wp_postmeta.meta_value AS CHAR). Apparently, this prevents the MYSQL optimizer from recognizing the opportunity to use the added index. If the CAST wrapper is removed, the queries use the index and improve dramatically.

I have a set of queries against a large set of meta_values and have enhanced the filter suggested above by adding the line below:. I add the filter just before the big queries and then remove so as not to interfere with other queries:

      
function remove_sql_wildcard_prefix($q) {
$q['where'] = preg_replace("/(LIKE ')%(.*?%')/", "$1$2", $q['where']);
$q["where"] = preg_replace("/(CAST\()(.*?)(.meta_value)( AS CHAR\))/", "$2$3", $q["where"] );

return $q;
}
add_filter('get_meta_sql', 'remove_sql_wildcard_prefix');

These changes resulted in a 250x acceleration in my context -- large meta value set for custom posts.

Last edited 6 months ago by Will Brownsberger (previous) (diff)

comment:8 follow-up: @boonebgorges6 months ago

  • Milestone set to Future Release
  • Resolution wontfix deleted
  • Status changed from closed to reopened

I was thinking about this just the other day, so I was happy to see this ticket updated to see I wasn't the only one. I'm reopening for further consideration. I think that recognizing '%' in 'value' might pose backward compatibility problems, but there are other ways of tackling this (like another argument 'like_type' => 'both|front|back').

comment:9 in reply to: ↑ 8 @Will Brownsberger6 months ago

Replying to boonebgorges:

I was thinking about this just the other day, so I was happy to see this ticket updated to see I wasn't the only one. I'm reopening for further consideration. I think that recognizing '%' in 'value' might pose backward compatibility problems, but there are other ways of tackling this (like another argument 'like_type' => 'both|front|back').

Boone, do you see risks in removing the Char type cast? I'm sure I'm creating problems for myself somehow, but the improvement is huge.

comment:10 @boonebgorges6 months ago

Will - No, I don't think there's a particular danger, as long as you're only using operators like IN and =. (Things like BETWEEN and > will break if you require numeric cast.) It's actually a bit odd that we do this cast at all in the case of a default data type - I think we should look into skipping it.

The bigger change is placing an index on the meta_value column, which could cause some serious performance issues on write actions at scale. But I think this is something that would be nice to benchmark a bit.

comment:11 @Will Brownsberger5 months ago

Boone, thank you for the feedback on this.

Seems like WP is optimized for searching for posts using the taxomony framework and then just grabbing meta information about the post as additional data. A search based on meta information is not a priority for normal blogging and not well supported by WP -- especially a search in which meta information is the only screen in the search so that the database has to plow through a lot of postmeta records. I am doing a CRM plugin so the number of records being stored as special post-types is much greater than the number of regular posts and the meta searching bogs down.

I was initially trying to write entirely within WP's data structure, but I concluded that if I was going to be making changes to WP's database to support my plugin, it would do less violence to the framework and involve less risk of conflict simply to add a table to the database with its own well optimized index. This forced a rewrite of all the data structure of the application, but probably for the better. It also seems better for performance all round not to clutter the post and postmeta tables with a lot of records that could be effectively partioned in a separate table.

/w.

Last edited 5 months ago by Will Brownsberger (previous) (diff)

comment:12 @boonebgorges5 months ago

Will - Yes to everything you've said here. WP_Meta_Query is provided for convenience, not a recommendation :) I'm happy for WP to take whatever steps are reasonable and necessary to make meta queries perform better, but in the end, we're pretty much hamstrung by the database schema. People who need super high performance queries on a very large dataset should either use WP's taxonomies or roll their own system, like you've done. Thanks for the feedback :)

Note: See TracTickets for help on using tickets.