Make WordPress Core

Opened 14 years ago

Closed 14 years ago

#14722 closed enhancement (wontfix)

use MATCH/AGAINST instead of LIKE when querying for matching terms posts

Reported by: pgentoo's profile pgentoo Owned by:
Milestone: Priority: normal
Severity: normal Version: 3.0.1
Component: Performance Keywords:
Focuses: Cc:

Description

In query.php, in &get_posts(), the generated sql is suboptimal. I maintain several large wordpress installations (just shy of 200k posts each) and query times search for posts to display are taking around 6 seconds (on a dedicated mysql VM). I looked into this and the majority of the time shown in the profiler is from reading records from disk, due to a full table scan. I see that the _posts table already has a multicolumn FULLTEXT index on the post_title and post_content columns.

My recommendation is to update the PHP to generate MATCH() ... AGAINST() queries instead of LIKE queries, so that the existing FULLTEXT index can be leveraged.

I've pulled some test queries from my mysqldumpslow log, and made the suggested changes, which drops query times from >6s to ~.2s.

Here is an example modified query:

SELECT SQL_CALC_FOUND_ROWS wp_rlu4532_posts.* FROM wp_rlu4532_posts WHERE 1=1 AND (((wp_rlu4532_posts.post_title LIKE '%york%') OR (wp_rlu4532_posts.post_content LIKE '%york%')) AND ((wp_rlu4532_posts.post_title LIKE '%peace%') OR (wp_rlu4532_posts.post_content LIKE '%peace%')) AND ((wp_rlu4532_posts.post_title LIKE '%festival%') OR (wp_rlu4532_posts.post_content LIKE '%festival%')) AND ((wp_rlu4532_posts.post_title LIKE '%returns%') OR (wp_rlu4532_posts.post_content LIKE '%returns%')) AND ((wp_rlu4532_posts.post_title LIKE '%to%') OR (wp_rlu4532_posts.post_content LIKE '%to%')) AND ((wp_rlu4532_posts.post_title LIKE '%rowntree%') OR (wp_rlu4532_posts.post_content LIKE '%rowntree%')) AND ((wp_rlu4532_posts.post_title LIKE '%park%') OR (wp_rlu4532_posts.post_content LIKE '%park%')) OR (wp_rlu4532_posts.post_title LIKE '%york peace festival returns to rowntree park%') OR (wp_rlu4532_posts.post_content LIKE '%york peace festival returns to rowntree park%')) AND (wp_rlu4532_posts.post_password = ) AND wp_rlu4532_posts.post_type = 'post' AND (wp_rlu4532_posts.post_status = 'publish') ORDER BY wp_rlu4532_posts.post_date DESC LIMIT 0, 5;

SELECT SQL_CALC_FOUND_ROWS wp_rlu4532_posts.* FROM wp_rlu4532_posts WHERE

MATCH (post_title, post_content) AGAINST ('york')
AND
MATCH (post_title, post_content) AGAINST ('peace')
AND
MATCH (post_title, post_content) AGAINST ('festival')
AND
MATCH (post_title, post_content) AGAINST ('returns')
AND
MATCH (post_title, post_content) AGAINST ('york')
AND
MATCH (post_title, post_content) AGAINST ('to')
AND
MATCH (post_title, post_content) AGAINST ('rowntree')
AND
MATCH (post_title, post_content) AGAINST ('park')
AND
MATCH (post_title, post_content) AGAINST ('york peace festival returns to rowntree park')
AND (wp_rlu4532_posts.post_password = ) AND wp_rlu4532_posts.post_type = 'post' AND (wp_rlu4532_posts.post_status = 'publish') ORDER BY wp_rlu4532_posts.post_date DESC LIMIT 0, 5;

Change History (12)

#1 @scribu
14 years ago

  • Component changed from General to Performance
  • Keywords needs-patch added
  • Milestone changed from Awaiting Review to 3.1

I should have known there was a better syntax for this.

#2 @scribu
14 years ago

  • Type changed from defect (bug) to enhancement

#3 @pgentoo
14 years ago

scribu, here's a patch against the 3.0 branch (although untested). Please give it a shot. :)

Index: query.php
===================================================================
--- query.php   (revision 15549)
+++ query.php   (working copy)
@@ -1851,12 +1851,12 @@
                        $searchand = '';
                        foreach( (array) $q['search_terms'] as $term ) {
                                $term = addslashes_gpc($term);
-                               $search .= "{$searchand}(($wpdb->posts.post_title LIKE '{$n}{$term}{$n}') OR ($wpdb->posts.post_content LIKE '{$n}{$term}{$n}'))";
+                               $search .= "{$searchand} MATCH($wpdb->posts.post_title, $wpdb->posts.post_content) AGAINST('{$term}')";
                                $searchand = ' AND ';
                        }
                        $term = esc_sql($q['s']);
                        if ( empty($q['sentence']) && count($q['search_terms']) > 1 && $q['search_terms'][0] != $q['s'] )
-                               $search .= " OR ($wpdb->posts.post_title LIKE '{$n}{$term}{$n}') OR ($wpdb->posts.post_content LIKE '{$n}{$term}{$n}')";
+                               $search .= " OR MATCH($wpdb->posts.post_title, $wpdb->posts.post_content) AGAINST('{$term}')";

                        if ( !empty($search) ) {
                                $search = " AND ({$search}) ";

#4 @scribu
14 years ago

  • Keywords needs-patch removed
  • Milestone changed from 3.1 to Awaiting Review

Yeah, I've done something similar on my localhost, but got an SQL error because...

There is no FULLTEXT index on any column in any table in the WordPress schema. The one you have was probably added by a plugin.

Speaking of which, it's pretty easy to change the generated query from a plugin, using the 'posts_search' filter.

#6 @pgentoo
14 years ago

  • Cc pgentoo added

yeah it must be by a plugin. I see a fulltext index called "posts_related" on the table, which indexes both of the columns in question.

I would suggest changing to a large VARCHAR and indexing it, but the low minimum requirements for wordpress (mysql 4.1.2) make that not feasible because of the 255 character limit on old versions.

Are you guys against using FULLTEXT indexes in WordPress? Or just haven't added any to date?

#7 @pgentoo
14 years ago

if you're not willing to add a FULLTEXT index to the schema, can you point me to an example of how to implement the posts_search filter?

#8 @scribu
14 years ago

Something like this (not tested):

function alter_search($sql, $wp_query) {
  global $wpdb;

  $search = $wp_query->get('s');

  return $wpdb->prepare(" AND MATCH($wpdb->posts.post_title, $wpdb->posts.post_content) AGAINST(%s)", $search );
}
add_filter('posts_search', 'alter_search', 10, 2);

#9 follow-up: @scribu
14 years ago

I would suggest changing to a large VARCHAR and indexing it, but the low minimum requirements for wordpress (mysql 4.1.2) make that not feasible because of the 255 character limit on old versions.

Are you guys against using FULLTEXT indexes in WordPress? Or just haven't added any to date?

I don't think we have a policy against it. It's just that we would have to make sure it would work on a lot of different server setups.

#10 in reply to: ↑ 9 @pgentoo
14 years ago

Replying to scribu:

I don't think we have a policy against it. It's just that we would have to make sure it would work on a lot of different server setups.

Maybe something to consider going forward if resources allow...

#11 @nacin
14 years ago

Justin Shreve worked on a new search API for WordPress last year as part of his Google Summer of Code project.

Results are in #10667. In the end, there was a single filter added -- posts_search -- and then a few pretty nice plugins IIRC, one of which supports MySQL FULLTEXT (which is not available everywhere).

Pretty sure this is thus a wontfix based on previous discussions and the availability of plugins to do this.

#12 @scribu
14 years ago

  • Milestone Awaiting Review deleted
  • Resolution set to wontfix
  • Status changed from new to closed

That's that, then.

Note: See TracTickets for help on using tickets.