Opened 12 years ago
Last modified 11 months ago
#24837 new enhancement
querying optimization for category AND searching
Reported by: |
|
Owned by: | |
---|---|---|---|
Milestone: | Priority: | normal | |
Severity: | normal | Version: | 3.6 |
Component: | Taxonomy | Keywords: | has-patch needs-testing |
Focuses: | performance | Cc: |
Description (last modified by )
querying category with an AND for two term_id's plus having a 50,000 + posts site causes the query produced by get_sql in wp-includes/taxonomy.php to take excessive amount of time.
On my site the current query in trunk takes 200 + seconds and the optimizes query takes 100 to 200 milliseconds.
Sub selects in mysql are not optimized on the server very well.
Example code to utilize query
query_posts(array('category__and' => array($term0_id,$term1_id)));
affected code starts on line 724 of wp-includes/taxonomy.php
Potential problems.
- I simply added the temp table to the $join variable not knowing if this will cause problems on other queries
- Clean up of the temporary tables may be needed after query is executed with a persistent connection but in my case they can be left there as mysql will garbage clean them after script disconnects.
Attached is a patch to wp-includes/taxonomy.php
Attachments (1)
Change History (20)
#5
@
12 years ago
before
SELECT SQL_CALC_FOUND_ROWS wp_posts.ID FROM wp_posts WHERE 1=1 AND ( ( SELECT COUNT(1) FROM wp_term_relationships WHERE term_taxonomy_id IN (5,453) AND object_id = wp_posts.ID ) = 2 ) AND wp_posts.post_type = 'post' AND (wp_posts.post_status = 'publish') GROUP BY wp_posts.ID ORDER BY wp_posts.post_date DESC LIMIT 0, 20
after I split it up into many queries that mysql can handle easier. My guess is mysql does not cache a sub queries result. This creates a temporary table (mysql could do this automatically but that's another story, sub queries are well known to be poor in mysql)
//create the temporary tables create temporary table $table ( object_id bigint(20) unsigned, term_taxonomy_id bigint(20) unsigned) create temporary table $table1 ( object_id bigint(20) unsigned, c int ) // collect all the objects + terms together into temp table insert into $table select object_id , term_taxonomy_id from wp_term_relationships where term_taxonomy_id IN ($terms) // insert the counts of how many objects that have all $terms insert into $table1 select object_id,count(*) from $table group by object_id // drop the first temp table that was use to collect the data DROP TABLE $table // add the second table to the where query $join .= " , $table1 AS tt "; $where = " tt.c = $num_terms AND tt.object_id = wp_posts.ID " }} This would make the finial query (I did this manually so hopefully I got it right) {{ SELECT SQL_CALC_FOUND_ROWS wp_posts.ID FROM wp_posts , $table1 AS tt WHERE 1=1 AND ( tt.c = $num_terms AND tt.object_id = wp_posts.ID ) AND wp_posts.post_type = 'post' AND (wp_posts.post_status = 'publish') GROUP BY wp_posts.ID ORDER BY wp_posts.post_date DESC LIMIT 0, 20
The concerns I have is that the second temp table is currently not if persistent connections are used. This would cause problems over time if the same connect was used many times either due to high volume or the connection being open for a long term.
So a hook is needed to drop the second temporary table after the sql query is completed
OR
a better query that does not require temporary tables
#6
@
12 years ago
It may require a greater rewrite, but I imagine there is also an alternative solution for this that does not require temporary tables?
#7
@
12 years ago
There may be other ways of doing this that gets mysql to handle this internally but I have not as yet found it yet. If someone knows a sql super star I'd like to hear alternatives, my mysql is quite basic.
Portability comes into question as wordpress uses a wide range of quries, this pattern optimizes sub queries works across the board for earlier 4.x mysql as well.
These types of quires have been on the mysql todo list for quite some time now, a decade or more. Latest promise is mysql 6.0, there is some that has been completed for 5.6 but it's far from being done.
Word press will likely still be run on plenty of 5.x mysql installs for quite some time so any sub queries in the code are land mines for users who happen to create a larger data set. In my case it was 120000 records in wp_term_relationships , 60000 posts.
For low end virtual machines or shared hosting the threshold would be lower.
Here is more info on sub queries
http://www.mysqlperformanceblog.com/2010/10/25/mysql-limitations-part-3-subqueries/
#8
@
12 years ago
When ordering by post_date
- probably the most effective optimization is forcing an index: FROM wp_posts USE INDEX (type_status_date)
. This will force the join type to be range
instead of index
. On a query with GROUP BY
and ORDER BY
, the index forcing, for me, has made queries faster un-cached than the cached queries without it.
Subqueries are sometimes rewritten internally by MySQL to perform worse than JOINs.
#9
@
12 years ago
I took a look and the query was all ready using index type_status_date. My feeling is that the query is simply too complex for mysql. Two group by's and a sub query. Mysql is just trying to sort out simple sub queries in the current unstable db.
+----+--------------------+-----------------------+-------+--------------------------+------------------+---------+--------------------------+-------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+-----------------------+-------+--------------------------+------------------+---------+--------------------------+-------+--------------------------+ | 1 | PRIMARY | wp_posts | index | type_status_date | type_status_date | 140 | NULL | 47456 | Using where; Using index | | 2 | DEPENDENT SUBQUERY | wp_term_relationships | ref | PRIMARY,term_taxonomy_id | PRIMARY | 8 | test.wp_posts.ID | 1213 | Using where; Using index | +----+--------------------+-----------------------+-------+--------------------------+------------------+---------+--------------------------+-------+--------------------------+
#10
@
12 years ago
no, you want to force the index so that type
becomes range
instead of index
. http://dev.mysql.com/doc/refman/5.1/en/explain-output.html#jointype_range
#11
@
12 years ago
I have not figure out away to use a range index I used FROM wp_posts USE INDEX (type_status_date)
in the query in this manner to no luck. The documentation seams to imply that the range index only works with constants.
SELECT SQL_CALC_FOUND_ROWS wp_posts.ID FROM wp_posts use index ( type_status_date) WHERE 1=1 AND ( ( SELECT COUNT(1) FROM wp_term_relationships WHERE term_taxonomy_id IN (5,453) AND object_id = wp_posts.ID ) = 2 ) AND wp_posts.post_type = 'post' AND (wp_posts.post_status = 'publish') GROUP BY wp_posts.ID ORDER BY wp_posts.post_date DESC LIMIT 0, 20;
And explain indicates type index.
The tests where done on mysql 5.5.31
#12
@
11 years ago
This may not be the exact same query, but for something we were doing recently we had to create a new compound index that includes ID and post_type (http://screencast.com/t/dGDNzoSQceQk)
Then used FROM wp_posts USE INDEX (post_type_id)
to force the new index to be used. It eliminated the temp table and improved query performances from 6 seconds to 0.32 seconds. Using mysql 5.5.30
#13
@
10 years ago
I just ran into a very slow query (13-25s) just dealing with the simple category page. Upon further analysis, I found that mysql isn't using the type_status_date
index and PRIMARY (when it does, the query runs a lot faster and says "using index" in both the query and subquery), but rather PRIMARY and term_taxonomy_id
, which is a LOT slower.
At the very least, FORCE or USE INDEX should be added here to force optimization and make it run faster.
I'm playing with adding another index right now.
The query in question btw (FORCE INDEX should be added after FROM wp_posts
:
SELECT SQL_CALC_FOUND_ROWS wp_posts.ID FROM wp_posts INNER JOIN wp_term_relationships ON ( wp_posts.ID = wp_term_relationships.object_id ) WHERE ( wp_term_relationships.term_taxonomy_id IN (3, 5201, 9699, 11025, 20317) ) AND wp_posts.post_type = 'post' AND ( wp_posts.post_status = 'publish' OR wp_posts.post_status = 'private' ) GROUP BY wp_posts.ID ORDER BY wp_posts.post_date DESC LIMIT 0, 10;
#14
@
10 years ago
WP Query is extremely complicated. What's the easiest way to insert such a simple USE INDEX directive into the SQL call? There are many filters - what do the pros recommend? @SergeyBiryukov?
#15
@
10 years ago
I was wrong - USE INDEX(type_status_date)
here doesn't actually resolve the issue, and the query still runs for a very long time.
I've been trying various index approaches but so far I haven't been able to get this thing to run fast.
Please provide an
svn diff
if possible, rather than a standarddiff
.