WordPress.org

Make WordPress Core

Opened 5 years ago

Closed 2 years ago

#10964 closed enhancement (duplicate)

Improving query_posts performance

Reported by: buch0090 Owned by:
Milestone: Priority: normal
Severity: normal Version: 2.8.4
Component: Performance Keywords: has-patch needs-testing
Focuses: Cc:

Description

We have a blog with over 80k posts and 750k records in post meta table. Noticed several inefficient queries that slowed down the home page of a theme with several widgets using the query_posts function.

Two examples:
SELECT SQL_CALC_FOUND_ROWS wp_1_posts.* FROM wp_1_posts WHERE 1=1 AND wp_1_posts.post_type = 'post' AND (wp_1_posts.post_status = 'publish') ORDER BY wp_1_posts.post_date DESC LIMIT 0, 10;

and

SELECT SQL_CALC_FOUND_ROWS wp_1_posts.* FROM wp_1_posts INNER JOIN wp_1_term_relationships ON (wp_1_posts.ID = wp_1_term_relationships.object_id) INNER JOIN wp_1_term_taxonomy ON (wp_1_term_relationships.term_taxonomy_id = wp_1_term_taxonomy.term_taxonomy_id) WHERE 1=1 AND wp_1_term_taxonomy.taxonomy = 'category' AND wp_1_term_taxonomy.term_id IN ('21', '49', '53', '488', '91', '107', '111', '136', '202', '209', '490') AND wp_1_posts.post_type = 'post' AND (wp_1_posts.post_status = 'publish') GROUP BY wp_1_posts.ID ORDER BY wp_1_posts.post_date DESC LIMIT 0, 4

Changed wp-includes>query.php get_posts function.

Starting at line 2251...

BELOW CODE MODIFIED BY DAVE BUCHANAN, SPLIT INTO TWO QUERIES /

if ( !empty($limits) )

$found_rows = 'SQL_CALC_FOUND_ROWS';

/ FIRST SELECT JUST IDs
$tmp = " SELECT $distinct $wpdb->posts.id FROM $wpdb->posts $join WHERE 1=1 $where $groupby $orderby $limits";
$tmp_q = $wpdb->get_col($tmp);
$tmp_ids = implode($tmp_q,',');
/ NOW NORMAL SELECT WHERE ID IS IN FIRST QUERY LIST
$this->request = " SELECT $found_rows $distinct $fields FROM $wpdb->posts $join WHERE $wpdb->posts.id IN ($tmp_ids) $groupby $orderby ";
if ( !$qsuppress_filters? )

$this->request = apply_filters('posts_request', $this->request);


$this->posts = $wpdb->get_results($this->request);

Let me know if you require further information or anything.

Thanks,
Dave Buchanan
Dolan Media Company

Attachments (18)

query.php (51.2 KB) - added by buch0090 5 years ago.
wp-includes/query.php
10964.diff (6.9 KB) - added by Denis-de-Bernardy 5 years ago.
query.2.php (70.0 KB) - added by buch0090 4 years ago.
10964.2.diff (1.8 KB) - added by ryan 4 years ago.
query.2.php diffed against the 2.8 branch
10964.4.diff (6.9 KB) - added by ryan 4 years ago.
10964.diff tweaked to fix queries without limits
query.5.diff (7.3 KB) - added by willmot 4 years ago.
Fixes orderby issues
query.php.2.9.2.patch (7.4 KB) - added by rowanbeentje 4 years ago.
Updated patch against 2.9.2
query.php.trunk.patch (8.8 KB) - added by rowanbeentje 4 years ago.
Updated patch against trunk (r15490)
query.php.r15490.patch (9.0 KB) - added by jeffstieler 4 years ago.
I gave query.php.trunk.patch a test drive and noticed it broke some of my filters. Here is a modified patch based on the same revision that passes the WP_Query object reference to the applied filters. The patch from rowanbeentje removed the &$this from both the posts_request and found_posts filters.
query.php.r17389.diff (9.8 KB) - added by kawauso 3 years ago.
Patch against r17389, haven't implemented nacin's suggestion, fixed !$post_ids check
query.php.r17389.nacin.diff (6.5 KB) - added by kawauso 3 years ago.
Implement clause filter against r17389
query.php.r17522.diff (7.2 KB) - added by kawauso 3 years ago.
Patch against r17522, removes code block that broke paging
simple.10964.diff (1.3 KB) - added by scribu 3 years ago.
simple_modified.10964.diff (1.4 KB) - added by mwidmann 3 years ago.
simple_groupby_modified.10964.diff (1.7 KB) - added by prettyboymp 3 years ago.
simple_groupby_modified_simpler.10964.diff (1.6 KB) - added by kawauso 3 years ago.
Simplified version of simple_groupby_modified
query.php_2012-01-05.patch (11.6 KB) - added by asannad 2 years ago.
Write-IO-Graph.gif (26.5 KB) - added by asannad 2 years ago.

Download all attachments as: .zip

Change History (125)

buch00905 years ago

wp-includes/query.php

comment:1 buch00905 years ago

First post was a little jumbled, here are code changes..

In wp-includes/query.php, starting at line 2251...

BELOW CODE MODIFIED BY DAVE BUCHANAN, SPLIT INTO TWO QUERIES

if ( !empty($limits) )

$found_rows = 'SQL_CALC_FOUND_ROWS';

/ FIRST SELECT JUST IDs
$tmp = " SELECT $distinct $wpdb->posts.id FROM $wpdb->posts $join WHERE 1=1 $where $groupby $orderby $limits";
$tmp_q = $wpdb->get_col($tmp);
$tmp_ids = implode($tmp_q,',');

/ NOW NORMAL SELECT WHERE ID IS IN FIRST QUERY LIST
$this->request = " SELECT $found_rows $distinct $fields FROM $wpdb->posts $join WHERE $wpdb->posts.id IN ($tmp_ids) $groupby $orderby ";

if ( !$qsuppress_filters? )
$this->request = apply_filters('posts_request', $this->request);

$this->posts = $wpdb->get_results($this->request);

comment:2 ryan5 years ago

Won't that break paging since SQL_CALC_FOUND_ROWS is done on a query limited by IN rather than LIMIT?

comment:3 scribu5 years ago

Might I ask why those two separate queries would be faster than a single query?

comment:4 buch00905 years ago

Ryan,
My bad, it does break the part for displaying the page links...IE. 1 2 3 ...100 101. I will look into this, perhaps it was premature for me to post this, was just excited to contribute, thanks.

Scribu,
The main reason is eliminating the giant table scans where it returns all fields in posts table for all rows.

SELECT SQL_CALC_FOUND_ROWS wp_1_posts.* FROM wp_1_posts WHERE 1=1 AND wp_1_posts.post_type = 'post' AND (wp_1_posts.post_status = 'publish') ORDER BY wp_1_posts.post_date DESC LIMIT 0, 10;

Becomes...

SELECT SQL_CALC_FOUND_ROWS wp_1_posts.ID FROM wp_1_posts WHERE 1=1 AND wp_1_posts.post_type = 'post' AND (wp_1_posts.post_status = 'publish') ORDER BY wp_1_posts.post_date DESC LIMIT 0, 10;

then..

SELECT SQL_CALC_FOUND_ROWS wp_1_posts.* FROM wp_1_posts WHERE wp_1_posts.id IN ( [LIST OF IDS FROM PREV QUERY]) ORDER BY wp_1_posts.post_date DESC LIMIT 0, 10;

comment:5 follow-up: scribu5 years ago

So, you're first selecting the IDs and then getting all fields for those ids, instead of returning all the fields only for the desired posts directly.

That doesn't seem a good optimisation since the SELECT clause is evaluated last, after the rows have been filtered. Or does this have something to do with indexes?

Do you have any performance tests? If it's indeed faster, it should be even faster if you make it into a subquery (WP 2.9 requires MySQL 4.1 => subqueries allowed).

comment:6 in reply to: ↑ 5 ; follow-up: buch00905 years ago

Replying to scribu:

So, you're first selecting the IDs and then getting all fields for those ids, instead of returning all the fields only for the desired posts directly.

That doesn't seem a good optimisation since the SELECT clause is evaluated last, after the rows have been filtered. Or does this have something to do with indexes?

Do you have any performance tests? If it's indeed faster, it should be even faster if you make it into a subquery (WP 2.9 requires MySQL 4.1 => subqueries allowed).

Load time of uncached home page that utilizes about 15 calls to query_posts function went down considerably. From about a minute to under 10 seconds.

Not sure I'm following you concerning subquery, you would still do a select of all fields/all rows THEN do a subquery? This fix only has the post ID when selecting all rows...then a second query only selects from the IDs of the first query.

comment:7 willmot5 years ago

  • Cc willmot added

comment:8 joehoyle5 years ago

  • Cc joehoyle added

comment:9 Denis-de-Bernardy5 years ago

there's another potential issue, I think... if a plugin adds a having clause on posts_request that uses calculated fields rather than raw field names, then this optimization may very well end up breaking the query.

comment:10 in reply to: ↑ 6 scribu5 years ago

Replying to buch0090:

Not sure I'm following you concerning subquery, you would still do a select of all fields/all rows THEN do a subquery? This fix only has the post ID when selecting all rows...then a second query only selects from the IDs of the first query.

I meant put the first query as a subquery of the second. That way, you don't have so much back-and-forth between the DB and PHP.

Denis-de-Bernardy5 years ago

comment:11 Denis-de-Bernardy5 years ago

@Buch0090: Please try the attached patch. It:

  1. Implements your suggestion while ensuresing that pagination keeps working; and
  2. Maintains backward compatibility for plugins that use query-related hooks (example)

@Ryan: I was initially leaning towards introducing new hooks, but backward compatibility considerations led me to rethink the whole idea and pass an extra argument instead: I'm added true for the quicker query that actually fetches rows, and false for the one that only fetches IDs. That way, plugins can potentially fetch if the second argument that gets passed is not empty and bail if they want.

The order_by clause was left unchanged: if a plugin hooks into that, it might be that it's adding table joins or calculated fields with a having clause, and the order clause would end up the same in both queries. as for the limit clause, it's not needed for the quick query, so no point in filtering that one twice.

comment:12 Denis-de-Bernardy5 years ago

my above patch needs more work. it caused #10969 for some reason.

comment:13 follow-up: buch00904 years ago

Ok I've modified my original version so paging works...(see attached) please let me know if anybody runs into other issues. my site isn't using custom fields for get posts. and definitely isn't using the below issue Denis pointed out..

...

there's another potential issue, I think... if a plugin adds a having clause on
posts_request that uses calculated fields rather than raw field names, then this
optimization may very well end up breaking the query.

buch00904 years ago

comment:14 ryan4 years ago

  • Milestone changed from 2.9 to 3.0

comment:15 mattrude4 years ago

  • Cc m@… added

comment:16 in reply to: ↑ 13 ; follow-up: hakre4 years ago

  • Keywords needs-patch reporter-feedback added; query_posts performance removed

Replying to buch0090:

Ok I've modified my original version so paging works...(see attached)

We need this in form of a real patch, otherwise unable to review. Please provide a patch.

comment:17 in reply to: ↑ 16 scribu4 years ago

Replying to hakre:

Replying to buch0090:

Ok I've modified my original version so paging works...(see attached)

We need this in form of a real patch, otherwise unable to review. Please provide a patch.

To that end, read this.

comment:18 ryan4 years ago

See also #10576

ryan4 years ago

query.2.php diffed against the 2.8 branch

ryan4 years ago

10964.diff tweaked to fix queries without limits

comment:19 buch00904 years ago

Hey all,
My bad for losing track of this change. This is my first post to trac. Thanks Ryan for doing the patch. Ryan your changes work perfect, much improved from my sloppiness.

Anything else that I can help with here?

Thanks.

comment:20 scribu4 years ago

  • Keywords has-patch added; needs-patch reporter-feedback removed

comment:21 scribu4 years ago

You can do some testing on ryan's patch.

comment:22 buch00904 years ago

I have this on our staging environment and looks good. here's an example of 3 queries generated by a custom widget to display recent posts...I'll check in next week, plan on having this on production by Monday, using WPMU 2.8.5.

Time: 0.018220901489258
Query: SELECT SQL_CALC_FOUND_ROWS wp_1_posts.ID FROM wp_1_posts WHERE 1=1 AND wp_1_posts.ID IN (136362,136280,136334,136392,136292,136355,136282) AND wp_1_posts.post_type = 'post' AND (wp_1_posts.post_status = 'publish' OR wp_1_posts.post_status = 'private') ORDER BY wp_1_posts.post_date DESC LIMIT 0, 10
Call from: require, require_once, include, dynamic_sidebar, call_user_func_array, WP_Widget->display_callback, dmc_top_stories_simple_widget->widget, query_posts, WP_Query->query, WP_Query->get_posts

# Time: 0.00015687942504883
Query: SELECT FOUND_ROWS()
Call from: require, require_once, include, dynamic_sidebar, call_user_func_array, WP_Widget->display_callback, dmc_top_stories_simple_widget->widget, query_posts, WP_Query->query, WP_Query->get_posts

# Time: 0.00025701522827148
Query: SELECT wp_1_posts.* FROM wp_1_posts WHERE 1 = 1 AND wp_1_posts.ID IN ( 136392,136355,136334,136282,136280 ) ORDER BY wp_1_posts.post_date DESC
Call from: require, require_once, include, dynamic_sidebar, call_user_func_array, WP_Widget->display_callback, dmc_top_stories_simple_widget->widget, query_posts, WP_Query->query, WP_Query->get_posts

comment:23 buch00904 years ago

FYI, this has worked fine on production. No long queries (similar to before) or anything after 1 week.

comment:24 scribu4 years ago

  • Keywords tested added

comment:25 willmot4 years ago

Current patch has issues with taxonomy/term queries.

a query like

WP_Query( 'taxonomy=person&term=dave' );

Causes an SQL error [Unknown column 'p2.post_status' in 'where clause']

This is caused because on line 1940

$post_status_join = true;

Then lines 2119 - 2123

if ( $post_status_join ) {
	$join .= " LEFT JOIN $wpdb->posts AS p2 ON ($wpdb->posts.post_parent = p2.ID) ";
	foreach ( $statuswheres as $index => $statuswhere )
		$statuswheres[$index] = "($statuswhere OR ($wpdb->posts.post_status = 'inherit' AND " . str_replace($wpdb->posts, 'p2', $statuswhere) . "))";
}

The $where is passed on to the quick request but the join is not and so we end up trying to query a column from a table we haven't joined to.

I haven't patched as I am not totally sure why the taxonomy code block sets $post_status_join = true;, my solution would be simply to remove line 1940

comment:26 willmot4 years ago

Also breaks querying by post_meta

WP_Query( 'meta_key=score&orderby=meta_value' );

Causes [Unknown column 'wp_postmeta.meta_value' in 'order clause']

Again because the quick_request is being passed $order but not $join

comment:27 scribu4 years ago

  • Keywords needs-patch added; has-patch tested removed

comment:28 nacin4 years ago

  • Milestone changed from 3.0 to Future Release

willmot4 years ago

Fixes orderby issues

comment:29 willmot4 years ago

  • Keywords has-patch added; needs-patch removed
  • Milestone changed from Future Release to 3.1

Attached patch fixes the $orderby issues by using the order of the ID's from the first query as the orderby in the second quick_request.

This fixes previously mentioned issues with orderby meta_value and by extension ordering by a column from a table that was in the original join. I can't reproduce the other issue I was having taxonomy queries so perhaps that was caused by something else.

Anyhow like @buch0090 we are running this on production on digitaltrends.com where it has significantly reduced the number of slow queries we get from uncached pages.

Is there anyway this can be pulled back into the 3.0 release as it did have a patch before the deadline, its just that it needed tweaking?

comment:30 nacin4 years ago

  • Keywords early added

Is there anyway this can be pulled back into the 3.0 release as it did have a patch before the deadline, its just that it needed tweaking?

I'd lean no -- changing functionality of query_posts() this late in the dev cycle is asking for trouble. Marking for early 3.1 consideration for now.

comment:31 willmot4 years ago

Understandable.

Worth mentioning that this has been tested in a production environment on a large WordPress site (millions of page views a month) and we have so far seen nothing but performance improvements.

Also if it wasn't clear before patch query.5.diff was updated to work against current trunk.

Would love to see this in 3 (pretty please).

comment:32 follow-up: buch00904 years ago

Hey, just wanted to add that we have had this on one of our WPMU production sites for about 11 weeks now. No problems reported thus far. And we've seen tremendous reductions for long running queries and temp table sizes. The site has over 80k records in posts table and 750k in post meta.

Without this fix the site would probably tilt under high loads.

But I totally understand that this is late in the game and you are pretty much under code freeze. Just wanted to update, good to see it has worked for you as well willmot.

comment:33 follow-up: scribu4 years ago

  • Keywords tested added

comment:34 in reply to: ↑ 33 popokolok4 years ago

  • Cc popokolok added
  • Version changed from 2.8.4 to 2.9.2

Hello, first of all I would like to say that this is a supreme work you guys did here, I was looking to fix this for quite a while now... However, I got a bit confused really and got lost in these patches and diff's up there - could anyone please post here or refer me to the most up-to-date complete and tested query.php file - Buch0090 and Willmot you both said you tested this and it works, could you please refer me to the file you use? Also, I use WP 2.9.2 - will this work then?

Thanks in advance!

comment:35 michaelh4 years ago

  • Version changed from 2.9.2 to 2.8.4

No need to change the version.

comment:36 Rand HOPPE4 years ago

Yes, how do I patch the 2.9.2 query.php with this fix? Sorry for any cluelessness.... Appreciate any pointers.

comment:37 in reply to: ↑ 32 lumilux4 years ago

Could someone please post a complete, patched query.php? I'm running 2.9.2 and would greatly appreciate a fix for this; my install goes down at least once a day because of slow queries from query_post.

Also, shouldn't this be upgraded to "major"? I imagine a lot of WordPress installs out there are suffering from this same problem.

comment:38 shockdiode4 years ago

Any word on if there's a version of the above patch for 3.0? A quick look at the diff above and query.php in 3.0 looks like that diff won't apply directly. Just thought I'd ask before attempting to work those changes into 3.0 in my setup.

Thanks

comment:39 rowanbeentje4 years ago

  • Cc rowan@… added

Hi guys,

I've been looking at this patch after we reviewed our slow query logs and found by far the biggest culprit to be these queries. The principle seems sound - manual testing of the split to two queries shows a marked performance improvement.

We're still on 2.9.2 (lots of custom plugins, and just waiting on further 3.0.x checks in-house) so I looked at both 10964.4.diff and query.5.diff and created a diff against 2.9.2 so we could test it.

A couple of comments on those previous patches:

10964.4.diff:

  • The third argument to apply_filters when applying post-paging filters to the various query placeholders appears to be to differentiate between the standard and quick versions. On line 2254 after applying the patch, I think the "true" should be "false".

query.5.diff:

  • In the same block of post-paging filters (!), on line 2379, the filter arguments are malformed and appear to be missing a leading "array( ".
  • Line 2391 is using another malformed filter argument setup - see "$array( quick_distinct, &$this )".

When testing, we found a problem which is a little more serious, and should probably block the release of this for the time being. Basically, the $distinct, $fields, $join, $where and $groupby query arguments are split into two in this patch - one for the quick request, one for the standard request. However, they're all initialised blank at the top of the function, presumably so that they're just available for filtering - but certain query setups can modify some of these placeholders, and not the quick version. This can result in the two queries diverging; this also causes E_USER_NOTICES (from broken queries) if other variables which are NOT split by query ($orderby or $limits) are modified at the same time, especially when they apply to a new $join.

I propose a quick fix for this would be to move the instantiation of $quick_join, $quick_where, and $quick_groupby to just before the "Apply post-paging filters on where and join" block, and set them to copy the values of $join, $where and $groupby respectively. This maintains separate post-paging filters and caching-plugin filters, but ensures that the rest of the function logic is applied to both new queries.

I'm happy to submit patches for 2.9.2 (just for people patching old installations) and trunk using this new approach if people think this is the right route to take.

comment:40 shockdiode4 years ago

  • Cc shockdiode added

comment:41 shockdiode4 years ago

I'd be more than happy to test a patch to trunk.

comment:42 rowanbeentje4 years ago

(query.5.diff has another serious problem - it leaves the standard postcount block beneath the modified code, so I think post counts are all off in it.)

I've gone through the various patches here, and made some more fixes; we've published this to a couple of our live servers, and it's made a huge difference to slow queries being logged (and reduced page generation time a bit too).

I'll attach two patches - one against 2.9.2 (which is what we're on, and what I'm testing properly), and one against trunk (which uses the same approaches, and which I've confirmed at least runs!).

What's changed in these patches from the ones above?

  • The problems I've mentioned above regarding typos, syntax errors, and duplicated code have been fixed.
  • I like the idea of the extra argument being available for the filter hooks, so that's now available in the patches against trunk.
  • Problems with non-LIMITed queries have been resolved
  • Problems with certain query setups - meta setups, post statuses, author names, certain category queries - are resolved. This includes the problem that willmot describes above on 25th Feb, but couldn't subsequently reproduce.
  • Problems with any queries involving GROUP BY have been fixed.

As I've mentioned, applying the 2.9.2 patch is looking really good for us. Some SQL_CALC_FOUND_ROWS still seem to be appearing - for example a LIMIT 0,1 which uses a filesort, I suspect unnecessarily - but this does improve the situation dramatically.

(For those above wondering how to apply the patch against 2.9.2 - copy the patch into your wordpress install folder, then on the command line type "patch -p0 < query.php.2.9.2.patch".)

I still have a few doubts about the setup of these patches, particularly the split between quick_placeholder and placeholder variables. Denis, in terms of compatibility, what's the best way to structure these? In my current version, the four quick_ equivalent variables are initially copied from their equivalent variables, before being passed into the filter hooks. Is this still backwards compatible?

It also results in the "fast" query looking like "SELECT * FROM … $quick_where AND $wpdb->posts.ID IN (…)". The query optimiser should mean this isn't significantly slower than just a posts.ID IN query, but the $where clause will by default be copied into $quick_where...

comment:43 shockdiode4 years ago

Thanks,

I'm running the trunk patch now and so far so good. I'm not seeing any odd behavior at any rate. I'm still seeing SQL_CALC_FOUND_ROWS in queries but they would appear to be reduced and performance seems to be up a bit.

Thanks for the patch and I'll provide suggestions should I have any once I've had time to dig a little deeper.

comment:44 rowanbeentje4 years ago

I happened to be writing a plugin which had to hook into the posts_fields filter today, and so spotted that all recent versions of this patch use an incorrect hardcoded fieldlist ("*") for the final request, instead of the normal filterable values. I'll upload updated versions of my patches above which address this.

rowanbeentje4 years ago

Updated patch against 2.9.2

rowanbeentje4 years ago

Updated patch against trunk (r15490)

comment:45 mikeschinkel4 years ago

  • Cc mikeschinkel@… added

comment:46 Klark04 years ago

  • Cc Klark0 added

Experiencing this problem on a site with 50k posts. How to apply this patch against 3.0.1? or not possible yet?

comment:47 rowanbeentje4 years ago

@Klark0 - it looks like 3.01 is possible to patch with the "query.php.trunk.patch" patch. Assuming you're on a *nix server, copy it into the root of your WordPress installation, and then run:

patch -p0 < query.php.trunk.patch

(For safety's sake, you may wish to back up wp-includes/query.php first, so that you can swap the file back if any problems occur).

For the record, we've been using the 2.9.2 patch without a problem for the last few weeks - it's made a big difference.

comment:48 Klark04 years ago

Yup, I have it running on my production site (running 3.0.1) for a day and so far it has improved performance greatly. Server load averaged at 1.2 all day, usually it's like 5.

There's still some slow queries though as someone mentioned before, but not nearly as many.
examples:

SELECT SQL_CALC_FOUND_ROWS  wp_posts.ID FROM wp_posts  INNER JOIN wp_term_relationships ON (wp_posts.ID = wp_term_relationships.object_id) INNER JOIN wp_term_taxonomy ON (wp_term_relationships.term_taxonomy_id = wp_term_taxonomy.term_taxonomy_id)  WHERE 1=1  AND wp_term_taxonomy.taxonomy = 'category'  AND wp_term_taxonomy.term_id IN ('77', '1', '58', '63', '64')  AND wp_posts.post_type = 'post' AND (wp_posts.post_status = 'publish') GROUP BY wp_posts.ID ORDER BY wp_posts.post_date DESC LIMIT 0, 1;


SELECT SQL_CALC_FOUND_ROWS  wp_posts.ID FROM wp_posts  WHERE 1=1  AND wp_posts.post_type = 'post' AND (wp_posts.post_status = 'publish')  ORDER BY wp_posts.post_date DESC LIMIT 0, 15;


jeffstieler4 years ago

I gave query.php.trunk.patch a test drive and noticed it broke some of my filters. Here is a modified patch based on the same revision that passes the WP_Query object reference to the applied filters. The patch from rowanbeentje removed the &$this from both the posts_request and found_posts filters.

comment:49 prettyboymp4 years ago

  • Cc mpretty@… added

comment:50 seanosh4 years ago

  • Cc seanosh added

comment:51 rowanbeentje4 years ago

Good catch, Jeff - thanks for that :)

comment:52 phpquery4 years ago

  • Keywords has-patch tested early removed
  • Version changed from 2.8.4 to 3.0.1

great work, I was kicked out of different web hosting just because of the "CPU throttling" and this issue.

For applying a fix to my 3.0.1 I should only download the query.php from the main
http://core.trac.wordpress.org/browser/trunk/wp-includes/query.php
or should patch manually using query.php.r15490.patch ?
Also how can I patch using this file?

Sorry for the laziness :/

comment:53 willmot4 years ago

  • Keywords has-patch tested early added
  • Version changed from 3.0.1 to 2.8.4

Adding back keywords.

No need to change version.

comment:54 follow-up: nacin3 years ago

  • Keywords needs-refresh added; has-patch tested early removed
  • Milestone changed from Awaiting Triage to Future Release

I don't see why we're setting $quick_blah to $blah, then passing both through the same filters... Why aren't we passing through the filters *once*, then duplicating the variables?

comment:55 in reply to: ↑ 54 rowanbeentje3 years ago

Replying to nacin:

I don't see why we're setting $quick_blah to $blah, then passing both through the same filters... Why aren't we passing through the filters *once*, then duplicating the variables?

I believe it's so filters can apply different behaviour to quick and full - note the extra argument being passed to the filters.

I have no use for this myself, but I can envisage some potential uses for filters who want to only apply themselves to the full version to avoid the speed hit on counts where results wouldn't be used anyway.

comment:56 nacin3 years ago

Okay -- I didn't see the extra argument.

Patch will need refresh against trunk. You will note that trunk has a singular filter for all clauses. I would rather see one of those for the quick_ clauses, versus the same filter applied over and over.

comment:57 follow-up: shockdiode3 years ago

Hello,

I was just curious if there was a patch against 3.0.4 at the moment?

Thanks

comment:58 in reply to: ↑ 57 kawauso3 years ago

Replying to shockdiode:
The last patch was against [15490], which is close to 3.0.1 by the looks of it, so you can probably apply that patch to 3.0.4.

comment:59 bakoline3 years ago

  • Type changed from enhancement to feature request
  • Version changed from 2.8.4 to 3.0.4

Please someone, explain, how to apply this patch? how to install it?

comment:60 scribu3 years ago

  • Type changed from feature request to enhancement

Tutorials can be found on the first page of trac:

http://core.trac.wordpress.org/#HowtoSubmitPatches

kawauso3 years ago

Patch against r17389, haven't implemented nacin's suggestion, fixed !$post_ids check

kawauso3 years ago

Implement clause filter against r17389

comment:61 kawauso3 years ago

  • Keywords has-patch added; needs-refresh removed
  • Version changed from 3.0.4 to 2.8.4

comment:62 follow-up: mozzer693 years ago

Do we have to patch with the latest attachment only ?

comment:63 in reply to: ↑ 62 kawauso3 years ago

  • Keywords needs-testing added

Replying to mozzer69:

Do we have to patch with the latest attachment only ?

Either of the latest two patches should work, though I've not been able to test them thoroughly, they're based off the other patches.

comment:64 obragblog3 years ago

Wondering if this made it into 3.1?

comment:65 follow-up: scribu3 years ago

If it had made it, this ticket would have been closed as fixed. Notice instead that it's set for "Future Release".

comment:66 in reply to: ↑ 65 obragblog3 years ago

scribu, thanks for the reply. New to the Trac system, didn't know how soon things get updated.

comment:67 follow-up: bakoline3 years ago

Hello again,

Please I need clear comment...

I use wordpress from http://svn.automattic.com/wordpress/trunk/ with TortoiseSVN and I have few questions:

  1. As I understand, when I download and install updated version from /trunk/ it is wordpress new version (for example 3.1) but with a lot of patches and bug fixes and it is totally different from normal wordpress (for example 3.1) from here? (http://wordpress.org/download/)
  1. My site has 14000 + posts and 500-600 users online (in the evenings) and I had serious problems with site loading (which is described in the first post above). my question is, when I installed version from /trunk/ does It already have all this updates and patches, which are given here in this ticket, or should I install them manually? (for example, is patch "query.php.r15490.patch" already installed in http://svn.automattic.com/wordpress/trunk/ last version?

my site is now hosted on private hosting with 4 CPU-s and 8 GB of memory and sometimes it's loading 50-90 seconds...

Wish you best

comment:68 in reply to: ↑ 67 kawauso3 years ago

Replying to bakoline:

  1. 3.1 is taken from a specific revision of trunk, r17483, so it's not too different from trunk at the moment (the difference in revisions is inflated by changes being applied twice). The only patch to have been committed since looks like the one for #16622. See the Timeline to see what's been committed when.
  1. Trunk only has patches which have been committed by the core committers (see scribu's comment above), so you need to apply this patch manually for now. As mentioned above, I've not got an install with heavy load to test this with, so please mention any issues (my patches are based on the previous patches however, with a bug fix added).

comment:69 pkirk3 years ago

  • Cc pkirk added

comment:70 follow-up: kraciboy3 years ago

  • Version changed from 2.8.4 to 3.1

Hello ALL,

I'm having the same issue on my site, using Wordpress 3.1 with 23k of post. We're using the original query.php from the original 3.1 package

My question is, what patch should i use to fix this problem?, should i edit the file query.php of 3.1 or i should download the first query.php of this ticket and start modifications? is working ok on 3.1?

I edited my original query.php 3.1 with the latests patch query.php.r17389.diff and query.php.r17389.nacin.diff but we lost the pagination

If anyone is using a fixed query.php and is working in Wordpress 3.1, can share the file?

Greetings!

Last edited 3 years ago by kraciboy (previous) (diff)

comment:71 sivel3 years ago

  • Version changed from 3.1 to 2.8.4

Please do not change the version. The version is used to track when the bug was introduced/reported.

comment:72 kraciboy3 years ago

uops! sorry was a mistake =)

kawauso3 years ago

Patch against r17522, removes code block that broke paging

comment:73 in reply to: ↑ 70 kawauso3 years ago

Replying to kraciboy:

I had a look at the patch again and it turns out that I forgot to remove a block of code that the other patches had, so paging should be fixed in the latest one.

All patches are against SVN trunk (or should be), so it'll apply cleanly against the trunk revision I wrote it against (r17522) and may work against others. The original query.php in this ticket will be based on an old version of WordPress, so don't use that.

comment:74 scribu3 years ago

People, this ticket is a mess.

If you want to see it in core at some point, you need to answer these two questions:

1) How much of a performance gain does this really give us? Some clear benchmarks would really help.

2) How can we prove it doesn't break plugins that alter the WP_Query SQL?

comment:75 kawauso3 years ago

I haven't got the time or resources to answer either of scribu's questions, so I leave it to those who need this patch to resolve them.

comment:76 TonyTheGreat3 years ago

  • Cc TonyTheGreat added

1) Lots of real world posts lack of true benchmarking it seems. http://www.mysqlperformanceblog.com/2007/08/28/to-sql_calc_found_rows-or-not-to-sql_calc_found_rows/ is an old example of a case with some quick testing.

You take the example at the start if it's not sitting in memory it's going to be slow. This is a problem with the larger datasets. You use a count then a select you don't see the same sort of performance degradation with larger datasets due to the index lookup for the row count.

We do web hosting and obviously a lot of our customers run wordpress. 95% of all slow queries (10s+) are wordpress and it's use of SQL_CALC_FOUND_ROWS queries. All with datasets where post amounts are around 5000 or more.

These queries are very problematic for users still using a shared environment. It's unlikely their queries will always be in memory. So as a result a lot more of the page loads are going to see slow load times from these queries not being in memory.

For those not in shared environments they can still see issues if there is a lot of changes in the posts table causing it to be flushed from memory.

2) I really cannot comment on the problems it may cause. Right now though leaving this problem is making wordpress look bad. So needs to be addressed if it's going to require some plugins to make some changes so be it. Longer it sits the more plugins it can potentially affect when it is changed.

comment:77 scribu3 years ago

If I understand this patch correctly, what it does is:

  1. Fetch *all* the post ids and then
  2. Fetch the rest of the rows using those IDs.

This can cause segfaults if there are really many posts. If not all posts are fetched, then counting is off, so this approach doesn't seem too good.

If SQL_CALC_FOUND_ROWS is the problem, then why not do this:

  1. Do the query, without SQL_CALC_FOUND_ROWS
  2. Do the query, with COUNT(*) and no limits, to get the total post count.

This is how we handle term and comment count, and I've seen no complaints.

More importantly, this would be a lot easier on plugins.

Last edited 3 years ago by scribu (previous) (diff)

scribu3 years ago

comment:78 follow-up: scribu3 years ago

Please test simple.10964.diff. Posting back specific timings along with as many details as possible would be awesome.

comment:79 in reply to: ↑ 78 prettyboymp3 years ago

Replying to scribu:

Please test simple.10964.diff. Posting back specific timings along with as many details as possible would be awesome.

I don't have time to write a patch at the moment, but I would like to suggest checking if the found_posts_query was empty before running the get_var. I've been playing around with caching the found posts results and have been seeing major performance improvements.

comment:80 scribu3 years ago

Empty queries are automatically ignored by $wpdb. Notice that the main query isn't checked either.

Last edited 3 years ago by scribu (previous) (diff)

comment:81 buch00903 years ago

While I originated this ticket, I apologize that I'm not sure what is needed now in order to get this into core...And like others, I'm strapped for time but I can make time given a clear direction.

As far as performance gains, we have 25 WPMU 2.9.2 sites running all with query_posts patched. Without this query fix our sites would never stay up...Each 'blog' averages between 15k to 120k posts and 50k to 750k post meta records. What other benchmarks are you looking for? I'm unsure how far back our graphs go...

All of our sites have had this patch for over 6 to 12 months.

@scribu I'm definitely open to alternatives if it provides equal performance improvements.

comment:82 scribu3 years ago

@scribu I'm definitely open to alternatives if it provides equal performance improvements.

Well, then please test the latest patch (simple.10964.diff) and let us know how it compares to the previous patch and to an unpatched install. Sharing those graphs (anonymized preferably) would be nice.

comment:84 prettyboymp3 years ago

scribu's patch can be taken a step further and lazy load the found posts query result. For the most part, the found posts result is only needed for the main loop where paging is involved, yet, most developers don't know to add the 'no_found_rows' parameter to every single WP_Query request where they don't need paging. It would require creating a magic get() method for backwards compatibility, but it can throw a deprecated warning and switch to a real getter implementation.

comment:85 scribu3 years ago

That's a pretty neat idea and should be considered separately: #17195

comment:86 mwidmann3 years ago

  • Cc martin.widmann@… added

@scribu: I wanted to give some feedback on the patch you submitted. I tried it out as we are currently facing some very strange issues related to the SQL_CALC_FOUND_ROWS which caused high load on the servers.

The problem with the patch is that count(*) and group by when used together can create more than one rows of result, making the returned number of rows incorrect. In that case you'd have to count the rows returned.

Because there's no easy way for me to fix this in core I created a couple of filters to add to a mu-plugin:

add_filter( 'posts_clauses', 'dh_store_last_post_clauses', 10, 2 );
add_filter( 'query', 'dh_patch_10964', 1 );
function dh_store_last_post_clauses( $clauses, $wp_query ) {
	global $last_post_clauses;
	$last_post_clauses = $clauses;
	return $clauses;
}

function dh_patch_10964( $query ) {
	if ( strpos( $query, 'SQL_CALC_FOUND_ROWS' ) ) {
		add_filter( 'found_posts_query', 'dh_patch_10964_phase2', 10, 2 );
		$query = str_replace( 'SQL_CALC_FOUND_ROWS', '', $query );
	}
	return $query;
}

function dh_patch_10964_phase2( $query, $wp_query ) {
	global $wpdb, $last_post_clauses;
	remove_filter( 'found_posts_query', __FUNCTION__ ); 
	
	if ( is_array( $last_post_clauses ) ) {
		$where = '';
		$groupby = '';
		$orderby = '';
		$join = '';
		
		$pieces = array( 'where', 'groupby', 'join', 'orderby', 'distinct', 'fields', 'limits' );
		foreach ( $pieces as $piece )
			$$piece = isset( $last_post_clauses[ $piece ] ) ? $last_post_clauses[ $piece ] : '';
	
		if ( !empty($groupby) )
			$groupby = 'GROUP BY ' . $groupby;
		if ( !empty( $orderby ) )
			$orderby = 'ORDER BY ' . $orderby;
		$query = "SELECT count(ID) FROM $wpdb->posts $join WHERE 1=1 $where $groupby $orderby";

		if ( !empty($groupby) ) {
			$counts = $wpdb->get_results( $query, ARRAY_N );
			$rows = sizeof( $counts );
			$query = "SELECT $rows cnt";
		}
	}
	return $query;
}

It seems a little bit hack-ish to me though.

Explanation what it does:

  1. I store the last post_clauses to the $last_post_clauses global in order to have access to this data in the count statement
  2. I look at each query if it contains the SQL_CALC_FOUND_ROWS command and remove it from there after registering a filter for found_posts_query
  3. in the found_posts_query filter I check the clauses and create the count statement. If there's a groupby I execute the query there and return a helper sql statement that just returns the number of items instead of counting. This way the normal behavior is tricked and the stuff works.

comment:87 scribu3 years ago

Ah, I forgot about GROUP BY. Thanks for pointing that out.

So, does this hack yield better performance than using SQL_CALC_FOUND_ROWS after all?

Last edited 3 years ago by scribu (previous) (diff)

comment:88 mwidmann3 years ago

Acutally it doesn't help once a GROUP BY is involved. With no GROUP BY it's about 4-5 times less time consuming on the db, though. Therefore it still would be a good thing to do. You'd need to check for $groupby being st though.

As the speed increase for unfiltered queries (e.g. the loop on the frontpage or the edit posts page in the admin) is substantial, I'd consider applying a modified version nevertheless.

My mu-plugin now looks like this:

add_filter( 'posts_clauses', 'dh_store_last_post_clauses', 10, 2 );
add_filter( 'query', 'dh_patch_10964', 1 );
function dh_store_last_post_clauses( $clauses, $wp_query ) {
	global $last_post_clauses;
	$last_post_clauses = $clauses;
	return $clauses;
}

function dh_patch_10964( $query ) {
	global $last_post_clauses;
	
	if ( is_array( $last_post_clauses) && empty( $last_post_clauses['groupby'] ) ) {
		if ( strpos( $query, 'SQL_CALC_FOUND_ROWS' ) ) {
			add_filter( 'found_posts_query', 'dh_patch_10964_phase2', 10, 2 );
			$query = str_replace( 'SQL_CALC_FOUND_ROWS', '', $query );
		}
	}
	return $query;
}

function dh_patch_10964_phase2( $query, $wp_query ) {
	global $wpdb, $last_post_clauses;
	remove_filter( 'found_posts_query', __FUNCTION__ ); 
	
	if ( is_array( $last_post_clauses ) ) {
		$where = '';
		$groupby = '';
		$orderby = '';
		$join = '';
		
		$pieces = array( 'where', 'groupby', 'join', 'orderby', 'distinct', 'fields', 'limits' );
		
		if ( empty($groupby) ) {

			foreach ( $pieces as $piece )
				$$piece = isset( $last_post_clauses[ $piece ] ) ? $last_post_clauses[ $piece ] : '';

			if ( !empty( $orderby ) )
				$orderby = 'ORDER BY ' . $orderby;
			$query = "SELECT count(ID) FROM $wpdb->posts $join WHERE 1=1 $where $orderby";
		}
	}
	return $query;
}

I took the freedom to create a patch with the changes.

comment:89 scribu3 years ago

simple_modified.10964.diff looks like a winner.

Tempted to move to 3.2 milestone, but I think it needs more testing.

Also, as you demonstrated, it can be done without modifying core.

Last edited 3 years ago by scribu (previous) (diff)

comment:90 prettyboymp3 years ago

I didn't like the idea of queries with a group by taking a different path when handling the found rows than other queries.

With initial tests, however, I'm actually seeing worse results with the simple*.10964.diff versions than current core, though, the fact that they open up the ability to cache found rows queries across multiple pages and even lazy load them, I think it may work out better in the long run.

comment:91 d_kc3 years ago

The last version of this patch, I applied 9 months ago worked beautifully. My MYSQL loads went way down on a site with 75,000 posts, 500,000 page views per day.

Now I've had to upgrade to 3.2 and I completely forgot that I was running the patch. (I'm not a website admin by trade, and i am learning all this as I go)

I applied simple_modified.10964.diff and its not working as the old patch did. Loads are still up and it spikes when editing or publishing a new post and when the homepage is being generated for a logged in user.

Just wanted to give my experience with it and sorry i can't provide technical details you guys might need.

So I guess I know I have to look back at the old patch and manually apply it.

Thanks.

comment:92 prettyboymp3 years ago

d_kc, do you have the ability to get some performance data to share? I attempted to, but wasn't getting much difference in performance between core, query.php.r15490.patch, and the latest patch. I have a feeling that this depends heavily on server resources. On our old cluster we were getting huge performance gains using query.php.r15490.patch, but we don't see a difference on our new cluster.

Based on our previous results, I feel like the original query.php.r15490.patch query is going to have the best performance on a good percentage of systems, but can no longer produce the performance data to show it.

comment:93 d_kc3 years ago

i gauge performance on whether my server crashes or not. :D

With the old patch, speed was great. According to W3 total cache, initial page build time was 4 - 10 second max. Although as scribu said above, there were segment faults which I learned to live with for the last year or so just for the performance gains. Server loads were constant between 1.5 - 3

With the latest patches the segment faults seem to be fixed, but page build time goes as high 44 seconds and sometimes not all. Server load is contantly at 8-10.

Seriously considering splitting our site into two installs of wordpress. Only the last 50 posts get the majority of traffic, while the remaining 75k are 4 years worth of archives. Its insane that Wordpress has to go through all those 75k posts, to display data for these few 50.

comment:94 d_kc3 years ago

i know alot of people are following this. so i just want to share that my site is stable now on 3.2 with simple_modified.10964.diff and also this patch -> https://core.trac.wordpress.org/ticket/16706

comment:95 JediSthlm3 years ago

I'm on Wordpress 3.2.1 with over 14k posts and having major problem as described above that the mysqld server goes up to 100% CPU and sometimes crashes. There was no problem on 2.9.2 which I upgraded from a few weeks back.

  • Could someone point me to a correct query.php without me having to understand how to make diff etc to the original file?
  • What is the status of this ticket, will it be fixed for the next version?

Cheers,
Jens

comment:96 scribu3 years ago

The status of this ticket is that there's no clear data on which approach is best, only anecdotal evidence.

I tend to like the approach in simple_groupby_modified.10964.diff, but it could be cleaned up a little.

comment:97 JediSthlm3 years ago

@scribu: Thanks for the reply, I applied the patch to my query.php and it seems to work kind of. The site feels more snappier but I can see that my mysqld process is very high (100%). If I run mytop I see that there is a few "Query SELECT wp_posts.* FROM wp_posts JOIN wp_postmeta ON (wp_posts.ID = wp_postmeta....." and the same number of "Sleep" with the same time. The good thing (just tried a few minutes is that the server don't hang and I get an 502.

Update: well, I now get 504 Gateway Time-out instead. So it did not work that well, hmm

Update 2: Here is one of the query that seem to be the problem, I think the problem is related to the
AND meta_value IN ('Distillery','Bottling')

SELECT wp_posts.* FROM wp_posts JOIN wp_postmeta ON (wp_posts.ID = wp_postmeta.post_id) WHERE 1=1 AND wp_posts.post_type = 'post' AND (wp_posts.post_status = 'publish') AND wp_posts.ID IN (

SELECT post_id
FROM wp_postmeta
WHERE meta_key = 'bottler'
AND meta_value IN ('Distillery','Bottling')
GROUP BY post_id
HAVING COUNT(*) >= 2

) GROUP BY wp_posts.ID HAVING COUNT(*) = 1 ORDER BY wp_posts.post_date DESC LIMIT 40, 10

Last edited 3 years ago by JediSthlm (previous) (diff)

kawauso3 years ago

Simplified version of simple_groupby_modified

comment:98 kawauso3 years ago

Had a go at simplifying simple_groupby_modified.10964.diff by reducing duplicated code and similar variable names.

comment:99 milez3 years ago

  • Cc milez added
  • Version changed from 2.8.4 to 3.1

Thanks kawauso I have applied simple_groupby_modified_simpler.10964.diff but seems to have no effect on my large database. Mysqld still hogging CPU on hits to index.php. Can you confirm your patch will also work on WP v3.1 ? Or perhaps someone knows what version patch will work for v3.1?

Version 0, edited 3 years ago by milez (next)

comment:100 scribu3 years ago

  • Version changed from 3.1 to 2.8.4

Please leave the version property alone.

comment:101 graffics3 years ago

  • Cc graffics added

Okay I have been studying this thread for a couple days now and having trouble finding out witch patch goes with which version of Wordpress. Can anyone clearly state which patch above works the best with what version?

comment:102 MZAWeb2 years ago

  • Cc MZAWeb added

comment:103 sheatsb2 years ago

  • Cc sheatsb added

comment:104 bear_beavis2 years ago

I also got some performance issues on a site.
I wanted to share some tests i have made.

The query is :

SELECT SQL_CALC_FOUND_ROWS  posts.* FROM posts  INNER JOIN term_relationships ON (posts.ID = term_relationships.object_id) WHERE 1=1  AND ( term_relationships.term_taxonomy_id IN (9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55) ) AND posts.post_type = 'post' AND (posts.post_status = 'publish') GROUP BY posts.ID ORDER BY posts.post_date DESC LIMIT 0, 10;

SQL_CALC_FOUND_ROWS is not really the problem, the problem is that mysql is creating a temporary table with 200k rows on disk.

When you do the same request with

SELECT SQL_CALC_FOUND_ROWS  posts.ID

instead of

SELECT SQL_CALC_FOUND_ROWS  posts.*

You can see that the temporary table is created in Ram instead of disk.
Because posts table contains columns with text fields, mysql always create temporary tables with posts.* on disk.

  • You can verify this, by doing before and after your query :
    mysql> show global status like 'Created_tmp_disk_tables';
    

For this test to be effective :

  • add SQL_NO_CACHE to your query to ensure you're not using query_cache.
  • ensure tmp_table_size and max_heap_table_size are sized to permit temporary table to fit in ram.
  • Of course don't do that on loaded mysqld server, you should be the only one using the server for this test to be valid

You'll see that Created_tmp_disk_tables value will not increase using posts.ID instead of posts.*

Last edited 2 years ago by bear_beavis (previous) (diff)

asannad2 years ago

comment:105 asannad2 years ago

  • Cc asannad added

Attached( query.php_2012-01-05.patch ) is the patch against trunk that I have been using in our live site. I have used the idea(and older patches) of selecting the ids first and then getting all fields for those ids. The major problem is the write IO because of the temporary table thats getting created, attached is the graph( Write-IO-Graph.gif ) showing write IO with patch and without patch.

comment:106 sbressler2 years ago

  • Cc sbressler@… added

comment:107 ryan2 years ago

  • Milestone Future Release deleted
  • Resolution set to duplicate
  • Status changed from new to closed

Let's move the conversation to #18536.

Note: See TracTickets for help on using tickets.