Opened 14 years ago
Closed 14 years ago
#14722 closed enhancement (wontfix)
use MATCH/AGAINST instead of LIKE when querying for matching terms posts
Reported by: | pgentoo | Owned by: | |
---|---|---|---|
Milestone: | Priority: | normal | |
Severity: | normal | Version: | 3.0.1 |
Component: | Performance | Keywords: | |
Focuses: | Cc: |
Description
In query.php, in &get_posts(), the generated sql is suboptimal. I maintain several large wordpress installations (just shy of 200k posts each) and query times search for posts to display are taking around 6 seconds (on a dedicated mysql VM). I looked into this and the majority of the time shown in the profiler is from reading records from disk, due to a full table scan. I see that the _posts table already has a multicolumn FULLTEXT index on the post_title and post_content columns.
My recommendation is to update the PHP to generate MATCH() ... AGAINST() queries instead of LIKE queries, so that the existing FULLTEXT index can be leveraged.
I've pulled some test queries from my mysqldumpslow log, and made the suggested changes, which drops query times from >6s to ~.2s.
Here is an example modified query:
SELECT SQL_CALC_FOUND_ROWS wp_rlu4532_posts.* FROM wp_rlu4532_posts WHERE 1=1 AND (((wp_rlu4532_posts.post_title LIKE '%york%') OR (wp_rlu4532_posts.post_content LIKE '%york%')) AND ((wp_rlu4532_posts.post_title LIKE '%peace%') OR (wp_rlu4532_posts.post_content LIKE '%peace%')) AND ((wp_rlu4532_posts.post_title LIKE '%festival%') OR (wp_rlu4532_posts.post_content LIKE '%festival%')) AND ((wp_rlu4532_posts.post_title LIKE '%returns%') OR (wp_rlu4532_posts.post_content LIKE '%returns%')) AND ((wp_rlu4532_posts.post_title LIKE '%to%') OR (wp_rlu4532_posts.post_content LIKE '%to%')) AND ((wp_rlu4532_posts.post_title LIKE '%rowntree%') OR (wp_rlu4532_posts.post_content LIKE '%rowntree%')) AND ((wp_rlu4532_posts.post_title LIKE '%park%') OR (wp_rlu4532_posts.post_content LIKE '%park%')) OR (wp_rlu4532_posts.post_title LIKE '%york peace festival returns to rowntree park%') OR (wp_rlu4532_posts.post_content LIKE '%york peace festival returns to rowntree park%')) AND (wp_rlu4532_posts.post_password = ) AND wp_rlu4532_posts.post_type = 'post' AND (wp_rlu4532_posts.post_status = 'publish') ORDER BY wp_rlu4532_posts.post_date DESC LIMIT 0, 5;
SELECT SQL_CALC_FOUND_ROWS wp_rlu4532_posts.* FROM wp_rlu4532_posts WHERE
MATCH (post_title, post_content) AGAINST ('york')
AND
MATCH (post_title, post_content) AGAINST ('peace')
AND
MATCH (post_title, post_content) AGAINST ('festival')
AND
MATCH (post_title, post_content) AGAINST ('returns')
AND
MATCH (post_title, post_content) AGAINST ('york')
AND
MATCH (post_title, post_content) AGAINST ('to')
AND
MATCH (post_title, post_content) AGAINST ('rowntree')
AND
MATCH (post_title, post_content) AGAINST ('park')
AND
MATCH (post_title, post_content) AGAINST ('york peace festival returns to rowntree park')
AND (wp_rlu4532_posts.post_password = ) AND wp_rlu4532_posts.post_type = 'post' AND (wp_rlu4532_posts.post_status = 'publish') ORDER BY wp_rlu4532_posts.post_date DESC LIMIT 0, 5;
Change History (12)
#1
@
14 years ago
- Component changed from General to Performance
- Keywords needs-patch added
- Milestone changed from Awaiting Review to 3.1
#3
@
14 years ago
scribu, here's a patch against the 3.0 branch (although untested). Please give it a shot. :)
Index: query.php =================================================================== --- query.php (revision 15549) +++ query.php (working copy) @@ -1851,12 +1851,12 @@ $searchand = ''; foreach( (array) $q['search_terms'] as $term ) { $term = addslashes_gpc($term); - $search .= "{$searchand}(($wpdb->posts.post_title LIKE '{$n}{$term}{$n}') OR ($wpdb->posts.post_content LIKE '{$n}{$term}{$n}'))"; + $search .= "{$searchand} MATCH($wpdb->posts.post_title, $wpdb->posts.post_content) AGAINST('{$term}')"; $searchand = ' AND '; } $term = esc_sql($q['s']); if ( empty($q['sentence']) && count($q['search_terms']) > 1 && $q['search_terms'][0] != $q['s'] ) - $search .= " OR ($wpdb->posts.post_title LIKE '{$n}{$term}{$n}') OR ($wpdb->posts.post_content LIKE '{$n}{$term}{$n}')"; + $search .= " OR MATCH($wpdb->posts.post_title, $wpdb->posts.post_content) AGAINST('{$term}')"; if ( !empty($search) ) { $search = " AND ({$search}) ";
#4
@
14 years ago
- Keywords needs-patch removed
- Milestone changed from 3.1 to Awaiting Review
Yeah, I've done something similar on my localhost, but got an SQL error because...
There is no FULLTEXT index on any column in any table in the WordPress schema. The one you have was probably added by a plugin.
Speaking of which, it's pretty easy to change the generated query from a plugin, using the 'posts_search' filter.
#6
@
14 years ago
- Cc pgentoo added
yeah it must be by a plugin. I see a fulltext index called "posts_related" on the table, which indexes both of the columns in question.
I would suggest changing to a large VARCHAR and indexing it, but the low minimum requirements for wordpress (mysql 4.1.2) make that not feasible because of the 255 character limit on old versions.
Are you guys against using FULLTEXT indexes in WordPress? Or just haven't added any to date?
#7
@
14 years ago
if you're not willing to add a FULLTEXT index to the schema, can you point me to an example of how to implement the posts_search filter?
#8
@
14 years ago
Something like this (not tested):
function alter_search($sql, $wp_query) { global $wpdb; $search = $wp_query->get('s'); return $wpdb->prepare(" AND MATCH($wpdb->posts.post_title, $wpdb->posts.post_content) AGAINST(%s)", $search ); } add_filter('posts_search', 'alter_search', 10, 2);
#9
follow-up:
↓ 10
@
14 years ago
I would suggest changing to a large VARCHAR and indexing it, but the low minimum requirements for wordpress (mysql 4.1.2) make that not feasible because of the 255 character limit on old versions.
Are you guys against using FULLTEXT indexes in WordPress? Or just haven't added any to date?
I don't think we have a policy against it. It's just that we would have to make sure it would work on a lot of different server setups.
#10
in reply to:
↑ 9
@
14 years ago
Replying to scribu:
I don't think we have a policy against it. It's just that we would have to make sure it would work on a lot of different server setups.
Maybe something to consider going forward if resources allow...
#11
@
14 years ago
Justin Shreve worked on a new search API for WordPress last year as part of his Google Summer of Code project.
Results are in #10667. In the end, there was a single filter added -- posts_search -- and then a few pretty nice plugins IIRC, one of which supports MySQL FULLTEXT (which is not available everywhere).
Pretty sure this is thus a wontfix based on previous discussions and the availability of plugins to do this.
I should have known there was a better syntax for this.