#10964 closed enhancement (duplicate)
Improving query_posts performance
Reported by: | buch0090 | Owned by: | |
---|---|---|---|
Milestone: | Priority: | normal | |
Severity: | normal | Version: | 2.8.4 |
Component: | Performance | Keywords: | has-patch needs-testing |
Focuses: | Cc: |
Description
We have a blog with over 80k posts and 750k records in post meta table. Noticed several inefficient queries that slowed down the home page of a theme with several widgets using the query_posts function.
Two examples:
SELECT SQL_CALC_FOUND_ROWS wp_1_posts.* FROM wp_1_posts WHERE 1=1 AND wp_1_posts.post_type = 'post' AND (wp_1_posts.post_status = 'publish') ORDER BY wp_1_posts.post_date DESC LIMIT 0, 10;
and
SELECT SQL_CALC_FOUND_ROWS wp_1_posts.* FROM wp_1_posts INNER JOIN wp_1_term_relationships ON (wp_1_posts.ID = wp_1_term_relationships.object_id) INNER JOIN wp_1_term_taxonomy ON (wp_1_term_relationships.term_taxonomy_id = wp_1_term_taxonomy.term_taxonomy_id) WHERE 1=1 AND wp_1_term_taxonomy.taxonomy = 'category' AND wp_1_term_taxonomy.term_id IN ('21', '49', '53', '488', '91', '107', '111', '136', '202', '209', '490') AND wp_1_posts.post_type = 'post' AND (wp_1_posts.post_status = 'publish') GROUP BY wp_1_posts.ID ORDER BY wp_1_posts.post_date DESC LIMIT 0, 4
Changed wp-includes>query.php get_posts function.
Starting at line 2251...
BELOW CODE MODIFIED BY DAVE BUCHANAN, SPLIT INTO TWO QUERIES /
if ( !empty($limits) )
$found_rows = 'SQL_CALC_FOUND_ROWS';
/ FIRST SELECT JUST IDs
$tmp = " SELECT $distinct $wpdb->posts.id FROM $wpdb->posts $join WHERE 1=1 $where $groupby $orderby $limits";
$tmp_q = $wpdb->get_col($tmp);
$tmp_ids = implode($tmp_q,',');
/ NOW NORMAL SELECT WHERE ID IS IN FIRST QUERY LIST
$this->request = " SELECT $found_rows $distinct $fields FROM $wpdb->posts $join WHERE $wpdb->posts.id IN ($tmp_ids) $groupby $orderby ";
if ( !$qsuppress_filters? )
$this->request = apply_filters('posts_request', $this->request);
$this->posts = $wpdb->get_results($this->request);
Let me know if you require further information or anything.
Thanks,
Dave Buchanan
Dolan Media Company
Attachments (18)
Change History (126)
#1
@
15 years ago
First post was a little jumbled, here are code changes..
In wp-includes/query.php, starting at line 2251...
BELOW CODE MODIFIED BY DAVE BUCHANAN, SPLIT INTO TWO QUERIES
if ( !empty($limits) )
$found_rows = 'SQL_CALC_FOUND_ROWS';
/ FIRST SELECT JUST IDs
$tmp = " SELECT $distinct $wpdb->posts.id FROM $wpdb->posts $join WHERE 1=1 $where $groupby $orderby $limits";
$tmp_q = $wpdb->get_col($tmp);
$tmp_ids = implode($tmp_q,',');
/ NOW NORMAL SELECT WHERE ID IS IN FIRST QUERY LIST
$this->request = " SELECT $found_rows $distinct $fields FROM $wpdb->posts $join WHERE $wpdb->posts.id IN ($tmp_ids) $groupby $orderby ";
if ( !$qsuppress_filters? )
$this->request = apply_filters('posts_request', $this->request);
$this->posts = $wpdb->get_results($this->request);
#2
@
15 years ago
Won't that break paging since SQL_CALC_FOUND_ROWS is done on a query limited by IN rather than LIMIT?
#4
@
15 years ago
Ryan,
My bad, it does break the part for displaying the page links...IE. 1 2 3 ...100 101. I will look into this, perhaps it was premature for me to post this, was just excited to contribute, thanks.
Scribu,
The main reason is eliminating the giant table scans where it returns all fields in posts table for all rows.
SELECT SQL_CALC_FOUND_ROWS wp_1_posts.* FROM wp_1_posts WHERE 1=1 AND wp_1_posts.post_type = 'post' AND (wp_1_posts.post_status = 'publish') ORDER BY wp_1_posts.post_date DESC LIMIT 0, 10;
Becomes...
SELECT SQL_CALC_FOUND_ROWS wp_1_posts.ID FROM wp_1_posts WHERE 1=1 AND wp_1_posts.post_type = 'post' AND (wp_1_posts.post_status = 'publish') ORDER BY wp_1_posts.post_date DESC LIMIT 0, 10;
then..
SELECT SQL_CALC_FOUND_ROWS wp_1_posts.* FROM wp_1_posts WHERE wp_1_posts.id IN ( [LIST OF IDS FROM PREV QUERY]) ORDER BY wp_1_posts.post_date DESC LIMIT 0, 10;
#5
follow-up:
↓ 6
@
15 years ago
So, you're first selecting the IDs and then getting all fields for those ids, instead of returning all the fields only for the desired posts directly.
That doesn't seem a good optimisation since the SELECT clause is evaluated last, after the rows have been filtered. Or does this have something to do with indexes?
Do you have any performance tests? If it's indeed faster, it should be even faster if you make it into a subquery (WP 2.9 requires MySQL 4.1 => subqueries allowed).
#6
in reply to:
↑ 5
;
follow-up:
↓ 10
@
15 years ago
Replying to scribu:
So, you're first selecting the IDs and then getting all fields for those ids, instead of returning all the fields only for the desired posts directly.
That doesn't seem a good optimisation since the SELECT clause is evaluated last, after the rows have been filtered. Or does this have something to do with indexes?
Do you have any performance tests? If it's indeed faster, it should be even faster if you make it into a subquery (WP 2.9 requires MySQL 4.1 => subqueries allowed).
Load time of uncached home page that utilizes about 15 calls to query_posts function went down considerably. From about a minute to under 10 seconds.
Not sure I'm following you concerning subquery, you would still do a select of all fields/all rows THEN do a subquery? This fix only has the post ID when selecting all rows...then a second query only selects from the IDs of the first query.
#9
@
15 years ago
there's another potential issue, I think... if a plugin adds a having clause on posts_request that uses calculated fields rather than raw field names, then this optimization may very well end up breaking the query.
#10
in reply to:
↑ 6
@
15 years ago
Replying to buch0090:
Not sure I'm following you concerning subquery, you would still do a select of all fields/all rows THEN do a subquery? This fix only has the post ID when selecting all rows...then a second query only selects from the IDs of the first query.
I meant put the first query as a subquery of the second. That way, you don't have so much back-and-forth between the DB and PHP.
#11
@
15 years ago
@Buch0090: Please try the attached patch. It:
- Implements your suggestion while ensuresing that pagination keeps working; and
- Maintains backward compatibility for plugins that use query-related hooks (example)
@Ryan: I was initially leaning towards introducing new hooks, but backward compatibility considerations led me to rethink the whole idea and pass an extra argument instead: I'm added true for the quicker query that actually fetches rows, and false for the one that only fetches IDs. That way, plugins can potentially fetch if the second argument that gets passed is not empty and bail if they want.
The order_by clause was left unchanged: if a plugin hooks into that, it might be that it's adding table joins or calculated fields with a having clause, and the order clause would end up the same in both queries. as for the limit clause, it's not needed for the quick query, so no point in filtering that one twice.
#13
follow-up:
↓ 16
@
15 years ago
Ok I've modified my original version so paging works...(see attached) please let me know if anybody runs into other issues. my site isn't using custom fields for get posts. and definitely isn't using the below issue Denis pointed out..
...
there's another potential issue, I think... if a plugin adds a having clause on
posts_request that uses calculated fields rather than raw field names, then this
optimization may very well end up breaking the query.
#16
in reply to:
↑ 13
;
follow-up:
↓ 17
@
15 years ago
- Keywords needs-patch reporter-feedback added; query_posts performance removed
Replying to buch0090:
Ok I've modified my original version so paging works...(see attached)
We need this in form of a real patch, otherwise unable to review. Please provide a patch.
#19
@
15 years ago
Hey all,
My bad for losing track of this change. This is my first post to trac. Thanks Ryan for doing the patch. Ryan your changes work perfect, much improved from my sloppiness.
Anything else that I can help with here?
Thanks.
#22
@
15 years ago
I have this on our staging environment and looks good. here's an example of 3 queries generated by a custom widget to display recent posts...I'll check in next week, plan on having this on production by Monday, using WPMU 2.8.5.
Time: 0.018220901489258
Query: SELECT SQL_CALC_FOUND_ROWS wp_1_posts.ID FROM wp_1_posts WHERE 1=1 AND wp_1_posts.ID IN (136362,136280,136334,136392,136292,136355,136282) AND wp_1_posts.post_type = 'post' AND (wp_1_posts.post_status = 'publish' OR wp_1_posts.post_status = 'private') ORDER BY wp_1_posts.post_date DESC LIMIT 0, 10
Call from: require, require_once, include, dynamic_sidebar, call_user_func_array, WP_Widget->display_callback, dmc_top_stories_simple_widget->widget, query_posts, WP_Query->query, WP_Query->get_posts
# Time: 0.00015687942504883
Query: SELECT FOUND_ROWS()
Call from: require, require_once, include, dynamic_sidebar, call_user_func_array, WP_Widget->display_callback, dmc_top_stories_simple_widget->widget, query_posts, WP_Query->query, WP_Query->get_posts
# Time: 0.00025701522827148
Query: SELECT wp_1_posts.* FROM wp_1_posts WHERE 1 = 1 AND wp_1_posts.ID IN ( 136392,136355,136334,136282,136280 ) ORDER BY wp_1_posts.post_date DESC
Call from: require, require_once, include, dynamic_sidebar, call_user_func_array, WP_Widget->display_callback, dmc_top_stories_simple_widget->widget, query_posts, WP_Query->query, WP_Query->get_posts
#23
@
15 years ago
FYI, this has worked fine on production. No long queries (similar to before) or anything after 1 week.
#25
@
15 years ago
Current patch has issues with taxonomy/term queries.
a query like
WP_Query( 'taxonomy=person&term=dave' );
Causes an SQL error [Unknown column 'p2.post_status' in 'where clause']
This is caused because on line 1940
$post_status_join = true;
Then lines 2119 - 2123
if ( $post_status_join ) { $join .= " LEFT JOIN $wpdb->posts AS p2 ON ($wpdb->posts.post_parent = p2.ID) "; foreach ( $statuswheres as $index => $statuswhere ) $statuswheres[$index] = "($statuswhere OR ($wpdb->posts.post_status = 'inherit' AND " . str_replace($wpdb->posts, 'p2', $statuswhere) . "))"; }
The $where is passed on to the quick request but the join is not and so we end up trying to query a column from a table we haven't joined to.
I haven't patched as I am not totally sure why the taxonomy code block sets $post_status_join = true;, my solution would be simply to remove line 1940
#26
@
15 years ago
Also breaks querying by post_meta
WP_Query( 'meta_key=score&orderby=meta_value' );
Causes [Unknown column 'wp_postmeta.meta_value' in 'order clause']
Again because the quick_request is being passed $order but not $join
#29
@
15 years ago
- Keywords has-patch added; needs-patch removed
- Milestone changed from Future Release to 3.1
Attached patch fixes the $orderby issues by using the order of the ID's from the first query as the orderby in the second quick_request.
This fixes previously mentioned issues with orderby meta_value and by extension ordering by a column from a table that was in the original join. I can't reproduce the other issue I was having taxonomy queries so perhaps that was caused by something else.
Anyhow like @buch0090 we are running this on production on digitaltrends.com where it has significantly reduced the number of slow queries we get from uncached pages.
Is there anyway this can be pulled back into the 3.0 release as it did have a patch before the deadline, its just that it needed tweaking?
#30
@
15 years ago
- Keywords early added
Is there anyway this can be pulled back into the 3.0 release as it did have a patch before the deadline, its just that it needed tweaking?
I'd lean no -- changing functionality of query_posts() this late in the dev cycle is asking for trouble. Marking for early 3.1 consideration for now.
#31
@
15 years ago
Understandable.
Worth mentioning that this has been tested in a production environment on a large WordPress site (millions of page views a month) and we have so far seen nothing but performance improvements.
Also if it wasn't clear before patch query.5.diff was updated to work against current trunk.
Would love to see this in 3 (pretty please).
#32
follow-up:
↓ 37
@
15 years ago
Hey, just wanted to add that we have had this on one of our WPMU production sites for about 11 weeks now. No problems reported thus far. And we've seen tremendous reductions for long running queries and temp table sizes. The site has over 80k records in posts table and 750k in post meta.
Without this fix the site would probably tilt under high loads.
But I totally understand that this is late in the game and you are pretty much under code freeze. Just wanted to update, good to see it has worked for you as well willmot.
#34
in reply to:
↑ 33
@
15 years ago
- Cc popokolok added
- Version changed from 2.8.4 to 2.9.2
Hello, first of all I would like to say that this is a supreme work you guys did here, I was looking to fix this for quite a while now... However, I got a bit confused really and got lost in these patches and diff's up there - could anyone please post here or refer me to the most up-to-date complete and tested query.php file - Buch0090 and Willmot you both said you tested this and it works, could you please refer me to the file you use? Also, I use WP 2.9.2 - will this work then?
Thanks in advance!
#36
@
15 years ago
Yes, how do I patch the 2.9.2 query.php with this fix? Sorry for any cluelessness.... Appreciate any pointers.
#37
in reply to:
↑ 32
@
15 years ago
Could someone please post a complete, patched query.php? I'm running 2.9.2 and would greatly appreciate a fix for this; my install goes down at least once a day because of slow queries from query_post.
Also, shouldn't this be upgraded to "major"? I imagine a lot of WordPress installs out there are suffering from this same problem.
#38
@
14 years ago
Any word on if there's a version of the above patch for 3.0? A quick look at the diff above and query.php in 3.0 looks like that diff won't apply directly. Just thought I'd ask before attempting to work those changes into 3.0 in my setup.
Thanks
#39
@
14 years ago
- Cc rowan@… added
Hi guys,
I've been looking at this patch after we reviewed our slow query logs and found by far the biggest culprit to be these queries. The principle seems sound - manual testing of the split to two queries shows a marked performance improvement.
We're still on 2.9.2 (lots of custom plugins, and just waiting on further 3.0.x checks in-house) so I looked at both 10964.4.diff and query.5.diff and created a diff against 2.9.2 so we could test it.
A couple of comments on those previous patches:
10964.4.diff:
- The third argument to apply_filters when applying post-paging filters to the various query placeholders appears to be to differentiate between the standard and quick versions. On line 2254 after applying the patch, I think the "true" should be "false".
query.5.diff:
- In the same block of post-paging filters (!), on line 2379, the filter arguments are malformed and appear to be missing a leading "array( ".
- Line 2391 is using another malformed filter argument setup - see "$array( quick_distinct, &$this )".
When testing, we found a problem which is a little more serious, and should probably block the release of this for the time being. Basically, the $distinct, $fields, $join, $where and $groupby query arguments are split into two in this patch - one for the quick request, one for the standard request. However, they're all initialised blank at the top of the function, presumably so that they're just available for filtering - but certain query setups can modify some of these placeholders, and not the quick version. This can result in the two queries diverging; this also causes E_USER_NOTICES (from broken queries) if other variables which are NOT split by query ($orderby or $limits) are modified at the same time, especially when they apply to a new $join.
I propose a quick fix for this would be to move the instantiation of $quick_join, $quick_where, and $quick_groupby to just before the "Apply post-paging filters on where and join" block, and set them to copy the values of $join, $where and $groupby respectively. This maintains separate post-paging filters and caching-plugin filters, but ensures that the rest of the function logic is applied to both new queries.
I'm happy to submit patches for 2.9.2 (just for people patching old installations) and trunk using this new approach if people think this is the right route to take.
#42
@
14 years ago
(query.5.diff has another serious problem - it leaves the standard postcount block beneath the modified code, so I think post counts are all off in it.)
I've gone through the various patches here, and made some more fixes; we've published this to a couple of our live servers, and it's made a huge difference to slow queries being logged (and reduced page generation time a bit too).
I'll attach two patches - one against 2.9.2 (which is what we're on, and what I'm testing properly), and one against trunk (which uses the same approaches, and which I've confirmed at least runs!).
What's changed in these patches from the ones above?
- The problems I've mentioned above regarding typos, syntax errors, and duplicated code have been fixed.
- I like the idea of the extra argument being available for the filter hooks, so that's now available in the patches against trunk.
- Problems with non-LIMITed queries have been resolved
- Problems with certain query setups - meta setups, post statuses, author names, certain category queries - are resolved. This includes the problem that willmot describes above on 25th Feb, but couldn't subsequently reproduce.
- Problems with any queries involving GROUP BY have been fixed.
As I've mentioned, applying the 2.9.2 patch is looking really good for us. Some SQL_CALC_FOUND_ROWS still seem to be appearing - for example a LIMIT 0,1 which uses a filesort, I suspect unnecessarily - but this does improve the situation dramatically.
(For those above wondering how to apply the patch against 2.9.2 - copy the patch into your wordpress install folder, then on the command line type "patch -p0 < query.php.2.9.2.patch".)
I still have a few doubts about the setup of these patches, particularly the split between quick_placeholder and placeholder variables. Denis, in terms of compatibility, what's the best way to structure these? In my current version, the four quick_ equivalent variables are initially copied from their equivalent variables, before being passed into the filter hooks. Is this still backwards compatible?
It also results in the "fast" query looking like "SELECT * FROM … $quick_where AND $wpdb->posts.ID IN (…)". The query optimiser should mean this isn't significantly slower than just a posts.ID IN query, but the $where clause will by default be copied into $quick_where...
#43
@
14 years ago
Thanks,
I'm running the trunk patch now and so far so good. I'm not seeing any odd behavior at any rate. I'm still seeing SQL_CALC_FOUND_ROWS in queries but they would appear to be reduced and performance seems to be up a bit.
Thanks for the patch and I'll provide suggestions should I have any once I've had time to dig a little deeper.
#44
@
14 years ago
I happened to be writing a plugin which had to hook into the posts_fields filter today, and so spotted that all recent versions of this patch use an incorrect hardcoded fieldlist ("*") for the final request, instead of the normal filterable values. I'll upload updated versions of my patches above which address this.
#46
@
14 years ago
- Cc Klark0 added
Experiencing this problem on a site with 50k posts. How to apply this patch against 3.0.1? or not possible yet?
#47
@
14 years ago
@Klark0 - it looks like 3.01 is possible to patch with the "query.php.trunk.patch" patch. Assuming you're on a *nix server, copy it into the root of your WordPress installation, and then run:
patch -p0 < query.php.trunk.patch
(For safety's sake, you may wish to back up wp-includes/query.php first, so that you can swap the file back if any problems occur).
For the record, we've been using the 2.9.2 patch without a problem for the last few weeks - it's made a big difference.
#48
@
14 years ago
Yup, I have it running on my production site (running 3.0.1) for a day and so far it has improved performance greatly. Server load averaged at 1.2 all day, usually it's like 5.
There's still some slow queries though as someone mentioned before, but not nearly as many.
examples:
SELECT SQL_CALC_FOUND_ROWS wp_posts.ID FROM wp_posts INNER JOIN wp_term_relationships ON (wp_posts.ID = wp_term_relationships.object_id) INNER JOIN wp_term_taxonomy ON (wp_term_relationships.term_taxonomy_id = wp_term_taxonomy.term_taxonomy_id) WHERE 1=1 AND wp_term_taxonomy.taxonomy = 'category' AND wp_term_taxonomy.term_id IN ('77', '1', '58', '63', '64') AND wp_posts.post_type = 'post' AND (wp_posts.post_status = 'publish') GROUP BY wp_posts.ID ORDER BY wp_posts.post_date DESC LIMIT 0, 1; SELECT SQL_CALC_FOUND_ROWS wp_posts.ID FROM wp_posts WHERE 1=1 AND wp_posts.post_type = 'post' AND (wp_posts.post_status = 'publish') ORDER BY wp_posts.post_date DESC LIMIT 0, 15;
@
14 years ago
I gave query.php.trunk.patch a test drive and noticed it broke some of my filters. Here is a modified patch based on the same revision that passes the WP_Query object reference to the applied filters. The patch from rowanbeentje removed the &$this
from both the posts_request
and found_posts
filters.
#52
@
14 years ago
- Keywords has-patch tested early removed
- Version changed from 2.8.4 to 3.0.1
great work, I was kicked out of different web hosting just because of the "CPU throttling" and this issue.
For applying a fix to my 3.0.1 I should only download the query.php from the main
http://core.trac.wordpress.org/browser/trunk/wp-includes/query.php
or should patch manually using query.php.r15490.patch ?
Also how can I patch using this file?
Sorry for the laziness :/
#53
@
14 years ago
- Keywords has-patch tested early added
- Version changed from 3.0.1 to 2.8.4
Adding back keywords.
No need to change version.
#54
follow-up:
↓ 55
@
14 years ago
- Keywords needs-refresh added; has-patch tested early removed
- Milestone changed from Awaiting Triage to Future Release
I don't see why we're setting $quick_blah to $blah, then passing both through the same filters... Why aren't we passing through the filters *once*, then duplicating the variables?
#55
in reply to:
↑ 54
@
14 years ago
Replying to nacin:
I don't see why we're setting $quick_blah to $blah, then passing both through the same filters... Why aren't we passing through the filters *once*, then duplicating the variables?
I believe it's so filters can apply different behaviour to quick and full - note the extra argument being passed to the filters.
I have no use for this myself, but I can envisage some potential uses for filters who want to only apply themselves to the full version to avoid the speed hit on counts where results wouldn't be used anyway.
#56
@
14 years ago
Okay -- I didn't see the extra argument.
Patch will need refresh against trunk. You will note that trunk has a singular filter for all clauses. I would rather see one of those for the quick_ clauses, versus the same filter applied over and over.
#57
follow-up:
↓ 58
@
14 years ago
Hello,
I was just curious if there was a patch against 3.0.4 at the moment?
Thanks
#58
in reply to:
↑ 57
@
14 years ago
Replying to shockdiode:
The last patch was against [15490], which is close to 3.0.1 by the looks of it, so you can probably apply that patch to 3.0.4.
#59
@
14 years ago
- Type changed from enhancement to feature request
- Version changed from 2.8.4 to 3.0.4
Please someone, explain, how to apply this patch? how to install it?
#60
@
14 years ago
- Type changed from feature request to enhancement
Tutorials can be found on the first page of trac:
#61
@
14 years ago
- Keywords has-patch added; needs-refresh removed
- Version changed from 3.0.4 to 2.8.4
#63
in reply to:
↑ 62
@
14 years ago
- Keywords needs-testing added
Replying to mozzer69:
Do we have to patch with the latest attachment only ?
Either of the latest two patches should work, though I've not been able to test them thoroughly, they're based off the other patches.
#65
follow-up:
↓ 66
@
14 years ago
If it had made it, this ticket would have been closed as fixed. Notice instead that it's set for "Future Release".
#66
in reply to:
↑ 65
@
14 years ago
scribu, thanks for the reply. New to the Trac system, didn't know how soon things get updated.
#67
follow-up:
↓ 68
@
14 years ago
Hello again,
Please I need clear comment...
I use wordpress from http://svn.automattic.com/wordpress/trunk/ with TortoiseSVN and I have few questions:
- As I understand, when I download and install updated version from /trunk/ it is wordpress new version (for example 3.1) but with a lot of patches and bug fixes and it is totally different from normal wordpress (for example 3.1) from here? (http://wordpress.org/download/)
- My site has 14000 + posts and 500-600 users online (in the evenings) and I had serious problems with site loading (which is described in the first post above). my question is, when I installed version from /trunk/ does It already have all this updates and patches, which are given here in this ticket, or should I install them manually? (for example, is patch "query.php.r15490.patch" already installed in http://svn.automattic.com/wordpress/trunk/ last version?
my site is now hosted on private hosting with 4 CPU-s and 8 GB of memory and sometimes it's loading 50-90 seconds...
Wish you best
#68
in reply to:
↑ 67
@
14 years ago
Replying to bakoline:
- 3.1 is taken from a specific revision of trunk, r17483, so it's not too different from trunk at the moment (the difference in revisions is inflated by changes being applied twice). The only patch to have been committed since looks like the one for #16622. See the Timeline to see what's been committed when.
- Trunk only has patches which have been committed by the core committers (see scribu's comment above), so you need to apply this patch manually for now. As mentioned above, I've not got an install with heavy load to test this with, so please mention any issues (my patches are based on the previous patches however, with a bug fix added).
#70
follow-up:
↓ 73
@
14 years ago
- Version changed from 2.8.4 to 3.1
Hello ALL,
I'm having the same issue on my site, using Wordpress 3.1 with 23k of post. We're using the original query.php from the original 3.1 package
My question is, what patch should i use to fix this problem?, should i edit the file query.php of 3.1 or i should download the first query.php of this ticket and start modifications? is working ok on 3.1?
I edited my original query.php 3.1 with the latests patch query.php.r17389.diff and query.php.r17389.nacin.diff but we lost the pagination
If anyone is using a fixed query.php and is working in Wordpress 3.1, can share the file?
Greetings!
#71
@
14 years ago
- Version changed from 3.1 to 2.8.4
Please do not change the version. The version is used to track when the bug was introduced/reported.
#73
in reply to:
↑ 70
@
14 years ago
Replying to kraciboy:
I had a look at the patch again and it turns out that I forgot to remove a block of code that the other patches had, so paging should be fixed in the latest one.
All patches are against SVN trunk (or should be), so it'll apply cleanly against the trunk revision I wrote it against (r17522) and may work against others. The original query.php
in this ticket will be based on an old version of WordPress, so don't use that.
#74
@
14 years ago
People, this ticket is a mess.
If you want to see it in core at some point, you need to answer these two questions:
1) How much of a performance gain does this really give us? Some clear benchmarks would really help.
2) How can we prove it doesn't break plugins that alter the WP_Query SQL?
#75
@
14 years ago
I haven't got the time or resources to answer either of scribu's questions, so I leave it to those who need this patch to resolve them.
#76
@
14 years ago
- Cc TonyTheGreat added
1) Lots of real world posts lack of true benchmarking it seems. http://www.mysqlperformanceblog.com/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/ is an old example of a case with some quick testing.
You take the example at the start if it's not sitting in memory it's going to be slow. This is a problem with the larger datasets. You use a count then a select you don't see the same sort of performance degradation with larger datasets due to the index lookup for the row count.
We do web hosting and obviously a lot of our customers run wordpress. 95% of all slow queries (10s+) are wordpress and it's use of SQL_CALC_FOUND_ROWS queries. All with datasets where post amounts are around 5000 or more.
These queries are very problematic for users still using a shared environment. It's unlikely their queries will always be in memory. So as a result a lot more of the page loads are going to see slow load times from these queries not being in memory.
For those not in shared environments they can still see issues if there is a lot of changes in the posts table causing it to be flushed from memory.
2) I really cannot comment on the problems it may cause. Right now though leaving this problem is making wordpress look bad. So needs to be addressed if it's going to require some plugins to make some changes so be it. Longer it sits the more plugins it can potentially affect when it is changed.
#77
@
14 years ago
If I understand this patch correctly, what it does is:
- Fetch *all* the post ids and then
- Fetch the rest of the rows using those IDs.
This can cause segfaults if there are really many posts. If not all posts are fetched, then counting is off, so this approach doesn't seem too good.
If SQL_CALC_FOUND_ROWS is the problem, then why not do this:
- Do the query, without SQL_CALC_FOUND_ROWS
- Do the query, with COUNT(*) and no limits, to get the total post count.
This is how we handle term and comment count, and I've seen no complaints.
More importantly, this would be a lot easier on plugins.
#78
follow-up:
↓ 79
@
14 years ago
Please test simple.10964.diff. Posting back specific timings along with as many details as possible would be awesome.
#79
in reply to:
↑ 78
@
14 years ago
Replying to scribu:
Please test simple.10964.diff. Posting back specific timings along with as many details as possible would be awesome.
I don't have time to write a patch at the moment, but I would like to suggest checking if the found_posts_query was empty before running the get_var. I've been playing around with caching the found posts results and have been seeing major performance improvements.
#80
@
14 years ago
Empty queries are automatically ignored by $wpdb. Notice that the main query isn't checked either.
#81
@
14 years ago
While I originated this ticket, I apologize that I'm not sure what is needed now in order to get this into core...And like others, I'm strapped for time but I can make time given a clear direction.
As far as performance gains, we have 25 WPMU 2.9.2 sites running all with query_posts patched. Without this query fix our sites would never stay up...Each 'blog' averages between 15k to 120k posts and 50k to 750k post meta records. What other benchmarks are you looking for? I'm unsure how far back our graphs go...
All of our sites have had this patch for over 6 to 12 months.
@scribu I'm definitely open to alternatives if it provides equal performance improvements.
#82
@
14 years ago
@scribu I'm definitely open to alternatives if it provides equal performance improvements.
Well, then please test the latest patch (simple.10964.diff) and let us know how it compares to the previous patch and to an unpatched install. Sharing those graphs (anonymized preferably) would be nice.
#84
@
14 years ago
scribu's patch can be taken a step further and lazy load the found posts query result. For the most part, the found posts result is only needed for the main loop where paging is involved, yet, most developers don't know to add the 'no_found_rows' parameter to every single WP_Query request where they don't need paging. It would require creating a magic get() method for backwards compatibility, but it can throw a deprecated warning and switch to a real getter implementation.
#86
@
14 years ago
- Cc martin.widmann@… added
@scribu: I wanted to give some feedback on the patch you submitted. I tried it out as we are currently facing some very strange issues related to the SQL_CALC_FOUND_ROWS
which caused high load on the servers.
The problem with the patch is that count(*)
and group by
when used together can create more than one rows of result, making the returned number of rows incorrect. In that case you'd have to count the rows returned.
Because there's no easy way for me to fix this in core I created a couple of filters to add to a mu-plugin:
add_filter( 'posts_clauses', 'dh_store_last_post_clauses', 10, 2 ); add_filter( 'query', 'dh_patch_10964', 1 ); function dh_store_last_post_clauses( $clauses, $wp_query ) { global $last_post_clauses; $last_post_clauses = $clauses; return $clauses; } function dh_patch_10964( $query ) { if ( strpos( $query, 'SQL_CALC_FOUND_ROWS' ) ) { add_filter( 'found_posts_query', 'dh_patch_10964_phase2', 10, 2 ); $query = str_replace( 'SQL_CALC_FOUND_ROWS', '', $query ); } return $query; } function dh_patch_10964_phase2( $query, $wp_query ) { global $wpdb, $last_post_clauses; remove_filter( 'found_posts_query', __FUNCTION__ ); if ( is_array( $last_post_clauses ) ) { $where = ''; $groupby = ''; $orderby = ''; $join = ''; $pieces = array( 'where', 'groupby', 'join', 'orderby', 'distinct', 'fields', 'limits' ); foreach ( $pieces as $piece ) $$piece = isset( $last_post_clauses[ $piece ] ) ? $last_post_clauses[ $piece ] : ''; if ( !empty($groupby) ) $groupby = 'GROUP BY ' . $groupby; if ( !empty( $orderby ) ) $orderby = 'ORDER BY ' . $orderby; $query = "SELECT count(ID) FROM $wpdb->posts $join WHERE 1=1 $where $groupby $orderby"; if ( !empty($groupby) ) { $counts = $wpdb->get_results( $query, ARRAY_N ); $rows = sizeof( $counts ); $query = "SELECT $rows cnt"; } } return $query; }
It seems a little bit hack-ish to me though.
Explanation what it does:
- I store the last post_clauses to the
$last_post_clauses
global in order to have access to this data in the count statement - I look at each
query
if it contains theSQL_CALC_FOUND_ROWS
command and remove it from there after registering a filter forfound_posts_query
- in the
found_posts_query
filter I check the clauses and create the count statement. If there's a groupby I execute the query there and return a helper sql statement that just returns the number of items instead of counting. This way the normal behavior is tricked and the stuff works.
#87
@
14 years ago
Ah, I forgot about GROUP BY. Thanks for pointing that out.
So, does this hack yield better performance than using SQL_CALC_FOUND_ROWS after all?
#88
@
14 years ago
Acutally it doesn't help once a GROUP BY is involved. With no GROUP BY it's about 4-5 times less time consuming on the db, though. Therefore it still would be a good thing to do. You'd need to check for $groupby being st though.
As the speed increase for unfiltered queries (e.g. the loop on the frontpage or the edit posts page in the admin) is substantial, I'd consider applying a modified version nevertheless.
My mu-plugin now looks like this:
add_filter( 'posts_clauses', 'dh_store_last_post_clauses', 10, 2 ); add_filter( 'query', 'dh_patch_10964', 1 ); function dh_store_last_post_clauses( $clauses, $wp_query ) { global $last_post_clauses; $last_post_clauses = $clauses; return $clauses; } function dh_patch_10964( $query ) { global $last_post_clauses; if ( is_array( $last_post_clauses) && empty( $last_post_clauses['groupby'] ) ) { if ( strpos( $query, 'SQL_CALC_FOUND_ROWS' ) ) { add_filter( 'found_posts_query', 'dh_patch_10964_phase2', 10, 2 ); $query = str_replace( 'SQL_CALC_FOUND_ROWS', '', $query ); } } return $query; } function dh_patch_10964_phase2( $query, $wp_query ) { global $wpdb, $last_post_clauses; remove_filter( 'found_posts_query', __FUNCTION__ ); if ( is_array( $last_post_clauses ) ) { $where = ''; $groupby = ''; $orderby = ''; $join = ''; $pieces = array( 'where', 'groupby', 'join', 'orderby', 'distinct', 'fields', 'limits' ); if ( empty($groupby) ) { foreach ( $pieces as $piece ) $$piece = isset( $last_post_clauses[ $piece ] ) ? $last_post_clauses[ $piece ] : ''; if ( !empty( $orderby ) ) $orderby = 'ORDER BY ' . $orderby; $query = "SELECT count(ID) FROM $wpdb->posts $join WHERE 1=1 $where $orderby"; } } return $query; }
I took the freedom to create a patch with the changes.
#89
@
14 years ago
simple_modified.10964.diff looks like a winner.
Tempted to move to 3.2 milestone, but I think it needs more testing.
Also, as you demonstrated, it can be done without modifying core.
#90
@
14 years ago
I didn't like the idea of queries with a group by taking a different path when handling the found rows than other queries.
With initial tests, however, I'm actually seeing worse results with the simple*.10964.diff versions than current core, though, the fact that they open up the ability to cache found rows queries across multiple pages and even lazy load them, I think it may work out better in the long run.
#91
@
13 years ago
The last version of this patch, I applied 9 months ago worked beautifully. My MYSQL loads went way down on a site with 75,000 posts, 500,000 page views per day.
Now I've had to upgrade to 3.2 and I completely forgot that I was running the patch. (I'm not a website admin by trade, and i am learning all this as I go)
I applied simple_modified.10964.diff and its not working as the old patch did. Loads are still up and it spikes when editing or publishing a new post and when the homepage is being generated for a logged in user.
Just wanted to give my experience with it and sorry i can't provide technical details you guys might need.
So I guess I know I have to look back at the old patch and manually apply it.
Thanks.
#92
@
13 years ago
d_kc, do you have the ability to get some performance data to share? I attempted to, but wasn't getting much difference in performance between core, query.php.r15490.patch, and the latest patch. I have a feeling that this depends heavily on server resources. On our old cluster we were getting huge performance gains using query.php.r15490.patch, but we don't see a difference on our new cluster.
Based on our previous results, I feel like the original query.php.r15490.patch query is going to have the best performance on a good percentage of systems, but can no longer produce the performance data to show it.
#93
@
13 years ago
i gauge performance on whether my server crashes or not. :D
With the old patch, speed was great. According to W3 total cache, initial page build time was 4 - 10 second max. Although as scribu said above, there were segment faults which I learned to live with for the last year or so just for the performance gains. Server loads were constant between 1.5 - 3
With the latest patches the segment faults seem to be fixed, but page build time goes as high 44 seconds and sometimes not all. Server load is contantly at 8-10.
Seriously considering splitting our site into two installs of wordpress. Only the last 50 posts get the majority of traffic, while the remaining 75k are 4 years worth of archives. Its insane that Wordpress has to go through all those 75k posts, to display data for these few 50.
#94
@
13 years ago
i know alot of people are following this. so i just want to share that my site is stable now on 3.2 with simple_modified.10964.diff and also this patch -> https://core.trac.wordpress.org/ticket/16706
#95
@
13 years ago
I'm on Wordpress 3.2.1 with over 14k posts and having major problem as described above that the mysqld server goes up to 100% CPU and sometimes crashes. There was no problem on 2.9.2 which I upgraded from a few weeks back.
- Could someone point me to a correct query.php without me having to understand how to make diff etc to the original file?
- What is the status of this ticket, will it be fixed for the next version?
Cheers,
Jens
#96
@
13 years ago
The status of this ticket is that there's no clear data on which approach is best, only anecdotal evidence.
I tend to like the approach in simple_groupby_modified.10964.diff, but it could be cleaned up a little.
#97
@
13 years ago
@scribu: Thanks for the reply, I applied the patch to my query.php and it seems to work kind of. The site feels more snappier but I can see that my mysqld process is very high (100%). If I run mytop I see that there is a few "Query SELECT wp_posts.* FROM wp_posts JOIN wp_postmeta ON (wp_posts.ID = wp_postmeta....." and the same number of "Sleep" with the same time. The good thing (just tried a few minutes is that the server don't hang and I get an 502.
Update: well, I now get 504 Gateway Time-out instead. So it did not work that well, hmm
Update 2: Here is one of the query that seem to be the problem, I think the problem is related to the
AND meta_value IN ('Distillery','Bottling')
SELECT wp_posts.* FROM wp_posts JOIN wp_postmeta ON (wp_posts.ID = wp_postmeta.post_id) WHERE 1=1 AND wp_posts.post_type = 'post' AND (wp_posts.post_status = 'publish') AND wp_posts.ID IN (
SELECT post_id
FROM wp_postmeta
WHERE meta_key = 'bottler'
AND meta_value IN ('Distillery','Bottling')
GROUP BY post_id
HAVING COUNT(*) >= 2
) GROUP BY wp_posts.ID HAVING COUNT(*) = 1 ORDER BY wp_posts.post_date DESC LIMIT 40, 10
#98
@
13 years ago
Had a go at simplifying simple_groupby_modified.10964.diff by reducing duplicated code and similar variable names.
#99
@
13 years ago
- Cc milez added
- Version changed from 2.8.4 to 3.1
Thanks kawauso I have applied simple_groupby_modified_simpler.10964.diff but seems to have no effect on my large database. Mysqld still hogging CPU on hits to index.php. Can you confirm your patch will also work on WP v3.1 ? Or perhaps someone knows what version patch will work for v3.1?
#101
@
13 years ago
- Cc graffics added
Okay I have been studying this thread for a couple days now and having trouble finding out witch patch goes with which version of Wordpress. Can anyone clearly state which patch above works the best with what version?
#104
@
13 years ago
I also got some performance issues on a site.
I wanted to share some tests i have made.
The query is :
SELECT SQL_CALC_FOUND_ROWS posts.* FROM posts INNER JOIN term_relationships ON (posts.ID = term_relationships.object_id) WHERE 1=1 AND ( term_relationships.term_taxonomy_id IN (9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55) ) AND posts.post_type = 'post' AND (posts.post_status = 'publish') GROUP BY posts.ID ORDER BY posts.post_date DESC LIMIT 0, 10;
SQL_CALC_FOUND_ROWS is not really the problem, the problem is that mysql is creating a temporary table with 200k rows on disk.
When you do the same request with
SELECT SQL_CALC_FOUND_ROWS posts.ID
instead of
SELECT SQL_CALC_FOUND_ROWS posts.*
You can see that the temporary table is created in Ram instead of disk.
Because posts table contains columns with text fields, mysql always create temporary tables with posts.* on disk.
- You can verify this, by doing before and after your query :
mysql> show global status like 'Created_tmp_disk_tables';
For this test to be effective :
- add SQL_NO_CACHE to your query to ensure you're not using query_cache.
- ensure tmp_table_size and max_heap_table_size are sized to permit temporary table to fit in ram.
- Of course don't do that on loaded mysqld server, you should be the only one using the server for this test to be valid
You'll see that Created_tmp_disk_tables value will not increase using posts.ID instead of posts.*
#105
@
13 years ago
- Cc asannad added
Attached( query.php_2012-01-05.patch ) is the patch against trunk that I have been using in our live site. I have used the idea(and older patches) of selecting the ids first and then getting all fields for those ids. The major problem is the write IO because of the temporary table thats getting created, attached is the graph( Write-IO-Graph.gif ) showing write IO with patch and without patch.
wp-includes/query.php