Make WordPress Core

Opened 13 years ago

Closed 12 years ago

Last modified 12 years ago

#18536 closed task (blessed) (fixed)

Improve performance of WP_Query core

Reported by: cheald's profile cheald Owned by: ryan's profile ryan
Milestone: 3.4 Priority: normal
Severity: normal Version:
Component: Performance Keywords:
Focuses: Cc:

Description (last modified by nacin)

WP_Query, by default, selects wp_posts.* when building post queries. This can result in extremely large result sets when used with a large wp_posts table, resulting in MySQL being forced to use a temp table for sorting, which leads to all sorts of performance problems. SELECT * is generally considered to be a code smell as is, and is commonly known to cause problems at scale.

This patch breaks default WP_Query queries into two parts - one to select a list of IDs with the given selection criteria, and a second to actually select the posts once the IDs have been determined. This vastly reduces the amount of data that has to be processed by the database, resulting in fewer context switches and vastly improved performance.

I have implemented this in a production system with a wp_posts table with over 450k rows and reduced query times by two orders of magnitude. Database CPU burn is down by 90% and context switches have been drastically reduced.

This should be of some benefit even in small cases, since smaller datasets will result in faster sorts. The overhead of the second query is minimal. In large cases, the performance benefits range from "good" to "dramatic".

I have run the Wordpress PHPUnit test suite against the patch, and it did not break any tests. The suite, when I checked it out, had multiple failures present, but no new ones were introduced by this change.

Attachments (11)

query.patch (2.5 KB) - added by cheald 13 years ago.
Query performance patch
18536.diff (2.7 KB) - added by ryan 13 years ago.
18536.2.diff (2.2 KB) - added by scribu 13 years ago.
18536.3.diff (3.7 KB) - added by scribu 13 years ago.
18536.4.diff (4.1 KB) - added by scribu 13 years ago.
introduce set_found_posts()
18536.5.diff (5.4 KB) - added by prettyboymp 13 years ago.
introduce _prime_post_caches as separate function
18536.6.diff (5.9 KB) - added by scribu 13 years ago.
advanced-post-cache.php (7.4 KB) - added by ryan 13 years ago.
Advanced Post Cache plugin, for back compat testing
18536.ref_array.patch (1.2 KB) - added by scribu 13 years ago.
18536-ancestors.diff (1.0 KB) - added by ryan 13 years ago.
18536.post-ancestors.diff (585 bytes) - added by duck_ 12 years ago.

Download all attachments as: .zip

Change History (105)

@cheald
13 years ago

Query performance patch

#1 @nacin
13 years ago

Impressive. Related: #10964.

Unfortunately, not many of our unit tests cover WP_Query. After some bad experiences in version 3.1 when we added taxonomy queries, we're not likely to significantly alter WP_Query again (for both this ticket and #10964, among otherS) until we have them. Building some unit tests for WP_Query would be quite the undertaking, but it'd be one that would move along tickets like this.

That said, this does look relatively simple. The main concern would be seeing how plugins that hook deeply into WP_Query might be affected.

Another concern would be figuring out where old and new intersect with regards to performance. I imagine for simple queries or smaller datasets, the current version would be more performant.

What's one of your queries that isn't performing well? If you're doing an interesting join or sorting or querying by an unindexed field, you might be doing a tablesort that you should work to avoid. Can we see some EXPLAIN results?

#2 @cheald
13 years ago

The problem is indeed the tablesorts. This patch helps keep our queries out of tablesort territory by reducing the amount of data that the DB has to chew on. Here's a chunk from our slow query log:

# User@Host: prod[prod] @  [192.168.100.162]
# Thread_id: 213317196  Schema: prod  Last_errno: 0  Killed: 0
# Query_time: 0.542673  Lock_time: 0.000094  Rows_sent: 2  Rows_examined: 47995  Rows_affected: 0  Rows_read: 9108
# Bytes_sent: 10754  Tmp_tables: 1  Tmp_disk_tables: 1  Tmp_table_sizes: 50918492
# InnoDB_trx_id: 676FA764
# QC_Hit: No  Full_scan: No  Full_join: No  Tmp_table: Yes  Tmp_table_on_disk: Yes
# Filesort: Yes  Filesort_on_disk: No  Merge_passes: 0
#   InnoDB_IO_r_ops: 0  InnoDB_IO_r_bytes: 0  InnoDB_IO_r_wait: 0.000000
#   InnoDB_rec_lock_wait: 0.000000  InnoDB_queue_wait: 0.000000
#   InnoDB_pages_distinct: 10030
SET timestamp=1302277759;
SELECT SQL_CALC_FOUND_ROWS  wp_posts.* FROM wp_posts  INNER JOIN wp_term_relationships ON (wp_posts.ID = wp_term_relationships.object_id) INNER JOIN wp_term_taxonomy ON (wp_term_relationships.term_taxonomy_id = wp_term_taxonomy.term_taxonomy_id)  WHERE 1=1  AND wp_posts.ID NOT IN (332000,583467,583371,579313,583657,583185) AND wp_term_taxonomy.taxonomy = 'post_tag'  AND wp_term_taxonomy.term_id IN ('1862', '82', '19567', '553', '135', '43', '1174', '1100', '5819', '16796', '119', '74')  AND wp_posts.post_type = 'post' AND (wp_posts.post_status = 'publish') GROUP BY wp_posts.ID ORDER BY wp_posts.post_date DESC LIMIT 0, 2;

And here's the EXPLAIN:

+----+-------------+-----------------------+--------+--------------------------------------+------------------+---------+----------------------------------------+------+-----------------------------------------------------------+
| id | select_type | table                 | type   | possible_keys                        | key              | key_len | ref                                    | rows | Extra                                                     |
+----+-------------+-----------------------+--------+--------------------------------------+------------------+---------+----------------------------------------+------+-----------------------------------------------------------+
|  1 | SIMPLE      | wp_term_taxonomy      | range  | PRIMARY,term_id_taxonomy,taxonomy    | term_id_taxonomy | 106     | NULL                                   |   12 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | wp_term_relationships | ref    | PRIMARY,term_taxonomy_id             | term_taxonomy_id | 8       | prod.wp_term_taxonomy.term_taxonomy_id |   15 | Using index                                               |
|  1 | SIMPLE      | wp_posts              | eq_ref | PRIMARY,post_status,type_status_date | PRIMARY          | 8       | prod.wp_term_relationships.object_id   |    1 | Using where                                               |
+----+-------------+-----------------------+--------+--------------------------------------+------------------+---------+----------------------------------------+------+-----------------------------------------------------------+

Nothing too fancy going on there,
The really interesting thing there is the temp table size - 50,918,492 bytes - just to get 2 posts! "Tmp_table_on_disk" explains the performance pretty easily - making the context switch to write the temp table isn't trivial!

Here's another:

# User@Host: prod[prod] @  [192.168.100.162]
# Thread_id: 213303997  Schema: prod  Last_errno: 0  Killed: 0
# Query_time: 0.909574  Lock_time: 0.000117  Rows_sent: 2  Rows_examined: 33733  Rows_affected: 0  Rows_read: 6785
# Bytes_sent: 11761  Tmp_tables: 1  Tmp_disk_tables: 1  Tmp_table_sizes: 40452708
# InnoDB_trx_id: 676ED8D4
# QC_Hit: No  Full_scan: No  Full_join: No  Tmp_table: Yes  Tmp_table_on_disk: Yes
# Filesort: Yes  Filesort_on_disk: No  Merge_passes: 0
#   InnoDB_IO_r_ops: 0  InnoDB_IO_r_bytes: 0  InnoDB_IO_r_wait: 0.000000
#   InnoDB_rec_lock_wait: 0.000000  InnoDB_queue_wait: 0.000000
#   InnoDB_pages_distinct: 7863
SET timestamp=1302277444;
SELECT SQL_CALC_FOUND_ROWS  wp_posts.* FROM wp_posts  INNER JOIN wp_term_relationships ON (wp_posts.ID = wp_term_relationships.object_id) INNER JOIN wp_term_taxonomy ON (wp_term_relationships.term_taxonomy_id = wp_term_taxonomy.term_taxonomy_id)  WHERE 1=1  AND wp_posts.ID NOT IN (249519,583467,583371,579313,583657,583185) AND wp_term_taxonomy.taxonomy = 'post_tag'  AND wp_term_taxonomy.term_id IN ('20747', '2009', '82', '131', '553', '135', '1156', '732', '1100', '119')  AND wp_posts.post_type = 'post' AND (wp_posts.post_status = 'publish') GROUP BY wp_posts.ID ORDER BY wp_posts.post_date DESC LIMIT 0, 2;

Explained:

+----+-------------+-----------------------+--------+--------------------------------------+------------------+---------+----------------------------------------+------+-----------------------------------------------------------+
| id | select_type | table                 | type   | possible_keys                        | key              | key_len | ref                                    | rows | Extra                                                     |
+----+-------------+-----------------------+--------+--------------------------------------+------------------+---------+----------------------------------------+------+-----------------------------------------------------------+
|  1 | SIMPLE      | wp_term_taxonomy      | range  | PRIMARY,term_id_taxonomy,taxonomy    | term_id_taxonomy | 106     | NULL                                   |   10 | Using where; Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | wp_term_relationships | ref    | PRIMARY,term_taxonomy_id             | term_taxonomy_id | 8       | prod.wp_term_taxonomy.term_taxonomy_id |   15 | Using where; Using index                                  |
|  1 | SIMPLE      | wp_posts              | eq_ref | PRIMARY,post_status,type_status_date | PRIMARY          | 8       | prod.wp_term_relationships.object_id   |    1 | Using where                                               |
+----+-------------+-----------------------+--------+--------------------------------------+------------------+---------+----------------------------------------+------+-----------------------------------------------------------+

All the patch does is reduce the amount of data that MySQL has to manage to perform the sort - with small dataset sizes, this isn't likely to have much of an impact, but once the size of your potential resultset (with included post content) passes your query into filesort territory, the differences become *big*.

#3 @DrewAPicture
13 years ago

  • Cc xoodrew@… added

#4 @scribu
13 years ago

I think the major problem here isn't unit tests for WP_Query args, but plugin compatibility. There are a lot of plugins that use the SQL filters - posts_fields, posts_where etc.

Last edited 13 years ago by scribu (previous) (diff)

#5 @scribu
13 years ago

  • Keywords close added

Speaking of those filters, it's possible to use them to achieve the behavior you want without modifying Core. See #10964 for an example of this.

As such, suggest closing as duplicate.

#7 follow-up: @cheald
13 years ago

One possible solution might be as simple as a config switch; the administrator could be warned that the switch may break some plugins. It's certainly not a perfect solution, but for people with large installs, it might be a pretty significant quality-of-experience improvement.

My first draft of this fix implemented query rewrites as a plugin using the posts_request and posts_results filters, but it's a hack of a solution, and seems like the sort of thing that belongs in the core anyhow. https://github.com/cheald/wp_fasterquery/blob/master/filter.php is the plugin, but it's very clearly tailored specifically to our needs and install, and may not work well for others.

While it's possible to fix this with a plugin, it feels like the fact that it can happen at all is a somewhat self-limiting design issue that could use some more formal resolution. If the team disagrees, that's certainly fine, but I felt I'd be remiss if I didn't at least make an attempt! :)

#8 @aaroncampbell
13 years ago

  • Cc aaroncampbell added

#9 @anointed
13 years ago

  • Cc anointed added

@ryan
13 years ago

#10 @ryan
13 years ago

18536.diff modifies $fields after the posts_fields filter runs and avoids the second query if the first comes back empty.

#11 @ryan
13 years ago

Some caching plugins return an empty query string from the posts_request filter. With the current query code, this causes get_results() to return null and skip the query. The plugin then fills in the posts array by fetching the posts from cache and returning them through the posts_results filter. This fails with 18536.diff applied. The get_col() call in the expand_ids block receives an empty query string. get_col() uses last_result when an empty query is passed. This results in some totally whacky stuff ending up in the SELECT * query in the expand IDs block.

#12 @ryan
13 years ago

A quick fix would be to save the query string before calling the filter and use that with get_col() if the filtered query string is empty. But, who knows what other back compat problems await. I'd still like to give this a go, but I think we have hold off until 3.4 to give plugin authors time to test and accommodate.

#13 follow-up: @scribu
13 years ago

  • Keywords close removed

So this is basically what was suggested in #10964 i.e. split the query in two, right?

PS: apply_filters_ref_array() is not necessary there. Yay for PHP5.

#14 in reply to: ↑ 13 @ryan
13 years ago

Replying to scribu:

So this is basically what was suggested in #10964 i.e. split the query in two, right?

Yes. That ticket is all over the place though.

If there's a hook registered for posts_request, then don't do the expand_ids block. Create two new request filters for the two queries in the expand_ids block. That way plugins have a clear way forward. Sort of like Denis' suggestion in 10964, but I think new filters will handle back compat better than adding a flag to posts_request.

#15 @scribu
13 years ago

Another patch was submitted in #19608

The problem was again the temporary table size.

Last edited 13 years ago by scribu (previous) (diff)

#16 @scribu
13 years ago

  • Component changed from Query to Performance
  • Keywords early added

#17 @scribu
13 years ago

  • Milestone changed from Awaiting Review to 3.4

I think it's pretty obvious that we should give this a try for WP 3.4.

#18 @scribu
13 years ago

With regards to back-compat, I agree with ryan that it's better to not fire some of the old filters anymore and introduce new filters. That way, if a plugin wants to be compatible with WP < 3.4, it can just keep the old callbacks without any modification and add new code for WP 3.4+.

#19 @nacin
13 years ago

  • Description modified (diff)

Removing the screenshot (dead link). I do remember the graph looking mighty impressive, though.

#20 @ryan
13 years ago

Duped #10964 to this ticket to consolidate the conversation.

#21 @kawauso
13 years ago

  • Cc kawauso added

#22 follow-up: @prettyboymp
13 years ago

I was thinking about this ticket in combination with some others like #19726 and wondering if we should look into create a separate function that that would fill in the posts array from the passed in ID's that would first check cache for each post_id before trying to get it's data from the DB. This would allow the second query to be avoided in a lot of cases where an object cache was available. The order of the posts could easily be rebuilt from the order of the post_ids passed in.

#23 @prettyboymp
13 years ago

  • Cc mpretty@… added

#24 in reply to: ↑ 22 @nacin
13 years ago

Replying to prettyboymp:

I was thinking about this ticket in combination with some others like #19726 and wondering if we should look into create a separate function that that would fill in the posts array from the passed in ID's that would first check cache for each post_id before trying to get it's data from the DB. This would allow the second query to be avoided in a lot of cases where an object cache was available. The order of the posts could easily be rebuilt from the order of the post_ids passed in.

Essentially replicates some of the work of the advanced caching plugin. It's definitely a good idea.

#25 @azizur
13 years ago

  • Cc azizur added

#26 @mwidmann
13 years ago

  • Cc martin.widmann@… added

#27 @ocean90
13 years ago

  • Cc ocean90 added

#28 @scribu
13 years ago

  • Keywords needs-patch added; has-patch removed

Patch is stale.

@scribu
13 years ago

#29 @scribu
13 years ago

  • Keywords has-patch added; needs-patch removed

Refreshed: 18536.2.diff

#30 @scribu
13 years ago

I noticed that the query that fetches the rest of the rows contains the $orderby fragment again. This would obviously break when the orderby refers to any table other than wp_posts, for example when ordering by a meta key.

Going to use an approach similar to WP_User_Query: fetch only the non-cached posts and then populate the posts array according to the order of the ids found.

#31 @scribu
13 years ago

Oh, unlike WP_User_Query, WP_Query has a 'cache_results' flag, which might get in the way.

#32 @scribu
13 years ago

18536.3.diff does what I said in 30. Still don't know how to handle 'cache_results' => false.

#33 @scribu
13 years ago

I also noticed an inconsistency:

On the one hand, we have this in get_post():

if ($filter != 'raw')
	$_post = sanitize_post($_post, $filter);

On the other, we call sanitize_post( $post, 'raw' ) in WP_Query.

#34 @scribu
13 years ago

Nevermind, sanitize_post( $post, 'raw' ) is applied in get_post() as well, before the if, which just prevents a duplicate sanitization.

#35 @NA1
13 years ago

  • Cc NA1 added

#36 in reply to: ↑ 7 @darioj
13 years ago

Very good patch, it helped a lot reducing disk tmp tables, locking and key write requests.
It was applied in production but we had to add the if $ids empty check to the find_posts_from_ids() funtion:

function find_posts_from_ids($posts) {
  global $wpdb, $saved_posts_count, $last_order_clause;
  $ids = array();
  if(is_array($posts)) {
    foreach($posts as $post) { $ids[] = $post->ID; }
  }
  $saved_posts_count = $wpdb->get_var( 'SELECT FOUND_ROWS()' );
  if(is_array($ids) && $ids[0] != '') {
     return $wpdb->get_results("SELECT * FROM {$wpdb->posts} WHERE ID IN (" . implode(",", $ids) . ") $last_order_clause;");
  }
}

Cheers
Dario

#37 follow-up: @scribu
13 years ago

darioj: There is no find_posts_from_ids(). The patch you should be testing is 18536.4.diff.

Last edited 13 years ago by scribu (previous) (diff)

@scribu
13 years ago

#38 @scribu
13 years ago

18536.4.diff:

  • introduce set_found_posts() method to prevent code duplication
  • use array_map() instead of manual foreach
Last edited 13 years ago by scribu (previous) (diff)

@scribu
13 years ago

introduce set_found_posts()

#39 in reply to: ↑ 37 @darioj
13 years ago

Replying to scribu:
Scribu: sorry for the misunderstanding, I've implemented on 2.9.2 cheald's plugin, as soon as I can I'll give a try to your latest patch on 3.2.1 and report back.

#40 @scribu
13 years ago

  • Version 3.2.1 deleted

#41 @ryan
13 years ago

I think the use of _get_non_cached_ids() allows us to ignore cache_results=>false while honoring the primary intent.

#42 follow-ups: @ryan
13 years ago

As a possible performance improvement for sites without a persistent object cache, do the get_post() array_map() only if $non_cached_ids and $ids don't have the same count? If they're the same just use $fresh_posts.

Last edited 13 years ago by ryan (previous) (diff)

#43 @ryan
13 years ago

This looks like a good foundation. Time to fork the request filters and worry with plugin compat?

#44 @ryan
13 years ago

  • Type changed from enhancement to task (blessed)

#45 in reply to: ↑ 42 @ryan
13 years ago

Replying to ryan:

As a possible performance improvement for sites without a persistent object cache, do the get_post() array_map() only if $non_cached_ids and $ids don't have the same count? If they're the same just use $fresh_posts.

Although we'd still need something to restore order since orderby had to be dropped for the second query. So, probably not worth the worry. Regardless, we can hash out with further optimizations later.

#46 in reply to: ↑ 42 @scribu
13 years ago

Replying to ryan:

As a possible performance improvement for sites without a persistent object cache, do the get_post() array_map() only if $non_cached_ids and $ids don't have the same count? If they're the same just use $fresh_posts.

Even with a non-persistent object cache, get_post() shouldn't trigger any new queries, since update_post_cache() is called before it.

#47 @scribu
13 years ago

  • Keywords commit needs-testing added

So, I'm pretty happy with the current patch, as the first step.

It seems to work fine with one of my plugins that uses the posts_clauses filter pretty heavily.

#48 @MZAWeb
13 years ago

  • Cc MZAWeb added

#49 @darioj
13 years ago

I think you really should! Forgive me if this might look inappropriate, but the improvements I'm seeing are so drastic that I just can't resist.

The following table shows the difference of two xhprof runs, one with the original wp-3.2.1 code and the other with 18536.4.diff applied.

Run #wp-3.2.1.search.origRun #wp-3.2.1.search.18536.4DiffDiff%
Number of Function Calls218,806235,94817,1427.8%
Incl. Wall Time (microsec)2,461,044895,115-1,565,929-63.6%
Incl. CPU (microsecs)840,872886,86545,9935.5%
Incl. MemUse (bytes)29,871,44030,596,296724,8562.4%
Incl. PeakMemUse (bytes)30,065,56830,727,160661,5922.2%

This is really faster, my plugins all work :))

@prettyboymp
13 years ago

introduce _prime_post_caches as separate function

#50 @prettyboymp
13 years ago

So far it looks great. I modified the attachment:ticket:18536:18536.4.diff in attachment:ticket:18536:18536.5.diff to pull out the section that primed the post cache into it's own function since this can be used in other places in core.

I hope to test this on one of our larger sites later today.

#51 follow-up: @nacin
13 years ago

I don't think _prime_post_caches() has a use case outside of WP_Query. If you wish to prime post caches elsewhere, simply use WP_Query. (For example, we do this for nav menus.)

#52 in reply to: ↑ 51 @prettyboymp
13 years ago

Replying to nacin:

I don't think _prime_post_caches() has a use case outside of WP_Query. If you wish to prime post caches elsewhere, simply use WP_Query. (For example, we do this for nav menus.)

Yes, but doing it this way still causes the first query to get the posts.ID of the posts that were already in the posts_in clause.

Take for instance the update_post_thumbnail_cache() function. Since we already have the post thumbnail ids, there is no reason to run the ID query just to get the same IDs which will be fed into the primer. The same goes in the nav menus. I also use a similar function a lot in my own code any time I need to show a user sorted list of posts.

It would be hard to avoid this extra query to get the id's you already told it unless you specifically tell WP_Query that you won't be using the sorted result.

#53 @nacin
13 years ago

That brings up a good point. A query such as WP_Query( 'post__in' => array( 5, 15 ) ) will look rather silly: It will first grab all IDs when we already have them. Perhaps a further optimization would be that, given a simple $where and a big enough $limit, we can avoid making the query all together, and simply populate the post objects and order them in PHP. But I'm perhaps getting a bit ahead of things. Ryan might have a better idea on how we can abstract this out so existing prime attempts are not adversely affected, and there is a canonical way to prime a cache.

#54 follow-up: @market4android
13 years ago

  • Cc market4android added

I just tried to apply the 18536.5.diff to my live installation of wordpress 3.3.1 and it failed. Here's my shell's output:

(Note: I used 18536.5.diff but I removed the .5 after I uploaded the file to my server for mind's sake.)

$ patch < 18536.diff
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|Index: wp-includes/post.php
|===================================================================
|--- wp-includes/post.php       (revision 19788)
|+++ wp-includes/post.php       (working copy)
--------------------------
Patching file post.php using Plan A...
Hunk #1 succeeded at 5328 (offset 8 lines).
Hmm...  The next patch looks like a unified diff to me...
The text leading up to this was:
--------------------------
|Index: wp-includes/functions.php
|===================================================================
|--- wp-includes/functions.php  (revision 19788)
|+++ wp-includes/functions.php  (working copy)
--------------------------
Patching file functions.php using Plan A...
Hunk #1 succeeded at 3699 with fuzz 2.
Hmm...  The next patch looks like a unified diff to me...
The text leading up to this was:
--------------------------
|Index: wp-includes/query.php
|===================================================================
|--- wp-includes/query.php      (revision 19788)
|+++ wp-includes/query.php      (working copy)
--------------------------
Patching file query.php using Plan A...
Hunk #1 succeeded at 1952 (offset 1 line).
Hunk #2 succeeded at 2600 (offset 2 lines).
Hunk #3 failed at 2634.
Hunk #4 succeeded at 2668 (offset 1 line).
Hunk #5 succeeded at 2771 (offset 2 lines).
1 out of 5 hunks failed--saving rejects to query.php.rej
Hmm...  The next patch looks like a unified diff to me...
The text leading up to this was:
--------------------------
|Index: wp-includes/formatting.php
|===================================================================
|--- wp-includes/formatting.php (revision 19788)
|+++ wp-includes/formatting.php (working copy)
--------------------------
Patching file formatting.php using Plan A...
Hunk #1 succeeded at 3018 with fuzz 2 (offset -1 lines).
patch: **** misordered hunks! output would be garbled

When I tried to undo the patch that I applied, here's the output :

$ patch -R < 18536.diff
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|Index: wp-includes/post.php
|===================================================================
|--- wp-includes/post.php       (revision 19788)
|+++ wp-includes/post.php       (working copy)
--------------------------
Patching file post.php using Plan A...
Hunk #1 succeeded at 5328 (offset 8 lines).
Hmm...  The next patch looks like a unified diff to me...
The text leading up to this was:
--------------------------
|Index: wp-includes/functions.php
|===================================================================
|--- wp-includes/functions.php  (revision 19788)
|+++ wp-includes/functions.php  (working copy)
--------------------------
Patching file functions.php using Plan A...
Unreversed (or previously applied) patch detected!  Ignore -R? [y] n
Apply anyway? [n] y
Hunk #1 failed at 3699.
1 out of 1 hunks failed--saving rejects to functions.php.rej
Hmm...  The next patch looks like a unified diff to me...
The text leading up to this was:
--------------------------
|Index: wp-includes/query.php
|===================================================================
|--- wp-includes/query.php      (revision 19788)
|+++ wp-includes/query.php      (working copy)
--------------------------
Patching file query.php using Plan A...
Hunk #1 succeeded at 1952 (offset 1 line).
Hunk #2 succeeded at 2599 (offset 2 lines).
Hunk #3 failed at 2628.
Hunk #4 succeeded at 2630 (offset -15 lines).
Hunk #5 succeeded at 2756 (offset 2 lines).
1 out of 5 hunks failed--saving rejects to query.php.rej
Hmm...  The next patch looks like a unified diff to me...
The text leading up to this was:
--------------------------
|Index: wp-includes/formatting.php
|===================================================================
|--- wp-includes/formatting.php (revision 19788)
|+++ wp-includes/formatting.php (working copy)
--------------------------
Patching file formatting.php using Plan A...
Hunk #1 succeeded at 3018 with fuzz 2 (offset -1 lines).
patch: **** misordered hunks! output would be garbled

#55 in reply to: ↑ 54 @kawauso
13 years ago

Replying to market4android:

I just tried to apply the 18536.5.diff to my live installation of wordpress 3.3.1 and it failed.

Patches are done against trunk, which is sufficiently different from 3.3.1 now that you'd need to patch that manually.

#56 @tabbymarie
13 years ago

Just coming in from a fresh experience of upgrading a production site to the latest (3.3.1), there is also the use of SQL_CALC_FOUND_ROWS in the user.php that was introduced recently (I haven't checked the revision history of it just yet). This should utilize a different approach as well. Is that easy to add to this patch?

#57 @scribu
13 years ago

@tabbymarie: This ticket is focusing on WP_Query. Open a new ticket for WP_User_Query.

#58 follow-up: @jeffstieler
13 years ago

  • Cc jeff@… added

Testing 18536.5.diff, it breaks in the situation that a plugin has filtered 'posts_request' and changed the selected fields.

It would be difficult to check for fields after this filter.. should we expect a filter on 'posts_request' to know the pitfalls and modify the query accordingly so it can still be optimized?

It looks like filtering 'post_fields' or 'posts_fields_request' in addition to 'posts_request' could shortcut the $expand_ids functionality..

#59 in reply to: ↑ 58 @scribu
13 years ago

  • Keywords commit removed

Replying to jeffstieler:

It would be difficult to check for fields after this filter.. should we expect a filter on 'posts_request' to know the pitfalls and modify the query accordingly so it can still be optimized?

I think so. Using posts_request is inherently fragile, so the code in question should be very resilient.

On the other hand, maybe it's worth introducing an 'expand_ids' query var that forces $expand_ids = false.

#60 @ryan
13 years ago

I think avoiding expand_ids if a posts_request filter is present is safest for 3.4. There are a number of caching plugins that will break in bizarre fashions with expand_ids. Like we talked about in comment:14 and #10964.

#61 @scribu
13 years ago

18536.6.diff:

  • check if the query was altered in any way through the 'posts_request' filter before expanding ids
  • introduce 'posts_request_ids' filter
  • use _prime_post_caches() in update_post_thumbnail_cache().

I don't have an elegant solution for preventing the useless SELECT ID WHERE ID IN (...) query.

Last edited 13 years ago by scribu (previous) (diff)

@scribu
13 years ago

@ryan
13 years ago

Advanced Post Cache plugin, for back compat testing

#62 follow-up: @ryan
13 years ago

I don't think checking if posts_request changing the query is sufficient. See advanced-post-cache.php, which does not change the query, but uses it in ways that will break with the new query style.

#63 @aaroncampbell
13 years ago

I would say we could check has_filter( 'posts_request' );, but there are other plugins like the debug bar extender that use it too.

#64 @market4android
13 years ago

Do we know whether this will work with plugins like W3 Total Cache? Sorry, please forgive me and don't flame, I'm not huge on knowledge of edits like this and the WP core. It's why I'm here, to learn :)

#65 @scribu
13 years ago

Do we know whether this will work with plugins like W3 Total Cache?

The best way to find out is to try it.

#66 in reply to: ↑ 62 ; follow-up: @scribu
13 years ago

Replying to ryan:

See advanced-post-cache.php, which does not change the query, but uses it in ways that will break with the new query style.

But it does change it:

  if ( $uncached_post_ids )
    return "SELECT * FROM $wpdb->posts WHERE ID IN(" . join( ',', array_map( 'absint', $uncached_post_ids ) ) . ")";
  return '';

#67 in reply to: ↑ 66 ; follow-up: @ryan
13 years ago

Replying to scribu:

Replying to ryan:

See advanced-post-cache.php, which does not change the query, but uses it in ways that will break with the new query style.

But it does change it:

Okay. Maybe we run with that patch as is for now. We can bring in has_filter() if we find a plugin that requires it.

Looking at the other posts_* filters, I don't think we need to split any of those. Seems like they still apply just fine to the new queries.

#68 in reply to: ↑ 67 @scribu
13 years ago

Replying to ryan:

Looking at the other posts_* filters, I don't think we need to split any of those. Seems like they still apply just fine to the new queries.

Yep, everything besides 'posts_request' is unaffected.

#69 @ryan
13 years ago

In [19918]:

Split the main WP_Query posts query into two queries to avoid temp tables. Leverage cache to avoid second query in persistent cache environments. Props scribu, cheald, prettyboymp. see #18536

#70 @ryan
13 years ago

Warning: Parameter 2 to a_posts_request_cb() expected to be a reference, value given in /.../wp-includes/plugin.php on line 170

Seeing that due to the change from apply_filters_ref_array() to apply_filters().

#71 @scribu
13 years ago

Darn, guess we'll have to change it back then.

#72 @scribu
13 years ago

  • Keywords needs-patch added; has-patch needs-testing removed

#73 @scribu
13 years ago

  • Keywords has-patch added; needs-patch removed

#74 @ryan
13 years ago

In [20024]:

Return to apply_filters_ref_array() for back compat with callbacks that specify a ref for the args. Props scribu. see #18536

#75 @milez
13 years ago

  • Cc milez added

Really glad you guys are working on this and hope to have it down for 3.4. My site has been completely crippled by this slow query since we crossed about 100k posts.

Currently I run 3.1. My question is: if I want to test this right now, first I'll need to update my install to trunk version (3.4) then I install the patch: 18536.6.diff. Is that correct? Or are we better served to wait for 3.4 at this point?

#76 @dd32
13 years ago

first I'll need to update my install to trunk version (3.4) then I install the patch: 18536.6.diff. Is that correct?

Since [19918] (2 weeks ago) it's been in trunk, so installing the latest trunk SVN onto the server would allow you to test it. although it should work, just realise there's a lot of things still in development which aren't quite finished yet.

#77 @johnjamesjacoby
13 years ago

Something in [19918] broke post ancestors, specifically for bbPress 2.0 and 2.1.

Previously, forums, topics, and replies (all custom post types) would come with their ancestors pre-populated so they could be walked up through their forums to root. Now sometimes there are no ancestors, so things like breadcrumbs are showing inaccurate results.

I've only been able to track it down to being inconsistent ancestor results in the cache; I haven't come up with the best place to drop in a fix. Most likely a call to _get_post_ancestors( $_post ); is needed somewhere.

#78 @scribu
13 years ago

  • Keywords needs-patch added; early has-patch removed

#79 @scribu
13 years ago

Related: #16574

#80 @scribu
13 years ago

16574.4.diff should fix the problem found in bbPress.

#81 @ryan
13 years ago

In 3.3 WP_Query::get_posts(), get_post_status() called get_post() called _get_post_ancestors() before the caches were primed. This meant a post object with ancestors made it into the cache. In 3.4, the caches are primed without ancestors before get_post_status() -> get_post() -> _get_post_ancestors() is called. Since get_post() does cache adds, not sets, the first add without the ancestors wins. We could fix this by doing a cache set in get_post() if the ancestors property is not set in the cached version. I'd rather not mess with that right now, however. A simpler alternative that should suffice for the time being is to call _get_post_ancestors() from get_post_ancestors() if the ancestors property is not set. This could result in the query being run on every _get_post_ancestors() call in the situation where the cached object does not have ancestors, but I can live with that. In 3.5 we can contemplate a proper WP_Post object with lazy ancestors fetching and improved caching.

#82 @nacin
13 years ago

If we pass $this->posts[0]->ID instead of the $this->posts[0] object to get_post_status(), the extra query will be avoided. (The _get_post_ancestors() call is caused by an object with empty(filter) getting into get_post(). The lack of a query is caused by an ID getting into get_post() for a post object that is already cached.)

This avoids the unnecessary ancestors query, at least until get_post_ancestors() gets called.

However, subsequent calls to get_post_ancestors() will result in subsequent queries for the same data, as get_post_status() is no longer putting ancestors into the cache. Perhaps we need to call _get_post_ancestors() for is_singular(), as that's what was happening already with 3.3, and that's where ancestors are most likely to be used.

#83 @scribu
13 years ago

Wouldn't plugins using $post->ancestors directly still be broken, even with 18536-ancestors.diff?

http://core.trac.wordpress.org/ticket/16574#comment:15

#84 @nacin
13 years ago

Nope. $post->ancestors is only sometimes set, and (under 3.3) it is never set for a post object that comes from WP_Query. It *is*, however, set for the post object in cache (but not $wp_query->posts[0]) when is_singular, due to get_post_status() caching the object (as described above).

#85 @ryan
13 years ago

In [20171]:

Call _get_post_ancestors() from get_post_ancestors() if the ancestors property is not set in the post object. Works around situations where ancestors is not set in the cached version of the post object. see #18536

#86 @johnjamesjacoby
13 years ago

Confirming fixed bbPress in r20171. Cheers, friends.

#87 @ryan
13 years ago

  • Resolution set to fixed
  • Status changed from new to closed

#88 @duck_
12 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

In 3.3 the $posts array in WP_Query can contain a post object with the ancestors property set. This is because get_post_status() is working on an object (reference) and that object gets the ancestors property set.

I think that for 3.4 we should remove "->ID", see http://wordpress.org/support/topic/nest-page-ancestors-in-34-r3. There are still fewer queries run on a child page in 3.4 than 3.3 even with it removed. Then we can re-add it for 3.5 with lazy loading of ancestors.

#89 @ryan
12 years ago

  • Owner set to ryan
  • Resolution set to fixed
  • Status changed from reopened to closed

In [21073]:

Pass a post object instead of ID to help preserve ancestors. Props duck_. fixes #18536

#90 @kevinB
12 years ago

  • Cc kevin@… added

Can [21073] please be included in the 3.4 release? Passing the post ID breaks plugin "teaser" functionality, which uses the 'posts_results' filter to spoof post_status to 'publish' (while replacing post_content with a placeholder).

#91 @scribu
12 years ago

It should be included in 3.4-RC3 (it was commited to trunk).

#92 @kevinB
12 years ago

The 3.4-RC3.zip I downloaded this morning still has post ID being passed for the status check:

// Check post status to determine if post should be displayed.
if ( !empty($this->posts) && ($this->is_single || $this->is_page) ) {
    $status = get_post_status($this->posts[0]->ID);

#93 @nacin
12 years ago

It was included in 3.4-RC4 and will be included in 3.4 final.

#94 @mikeschinkel
12 years ago

  • Cc mikeschinkel@… added
  • Keywords needs-patch removed
Note: See TracTickets for help on using tickets.