Make WordPress Core

Opened 13 years ago

Closed 10 years ago

Last modified 6 years ago

#19866 closed enhancement (wontfix)

Allow specifications of any wp_posts field(s) in WP_Query

Reported by: bigdawggi's profile bigdawggi Owned by:
Milestone: Priority: normal
Severity: normal Version:
Component: Query Keywords: has-patch close
Focuses: Cc:

Description

Hi guys, I'm considering writing a patch to be able to pull specific fields from a WP_Query but wanted to see if there was already something in the works on another ticket. I searched trac and hackers list (via its Google Group) and surprisingly didn't see anything.

Here's the current implementation which limits people to only querying for certain fields: wp-includes/query.php:2033

		switch ( $q['fields'] ) {
			case 'ids':
				$fields = "$wpdb->posts.ID";
				break;
			case 'id=>parent':
				$fields = "$wpdb->posts.ID, $wpdb->posts.post_parent";
				break;
			default:
				$fields = "$wpdb->posts.*";
		}

Stems from a situation where we needed all posts of a certain type -- potentially thousands -- but really only required post_title and ID fields, so didn't want all that memory overhead.

Attachments (2)

19866.diff (1.2 KB) - added by wonderboymusic 11 years ago.
19866.2.diff (2.8 KB) - added by wonderboymusic 11 years ago.

Download all attachments as: .zip

Change History (30)

#1 in reply to: ↑ description @DrewAPicture
13 years ago

  • Cc xoodrew@… added

Replying to bigdawggi:

Stems from a situation where we needed all posts of a certain type -- potentially thousands -- but really only required post_title and ID fields, so didn't want all that memory overhead.

If there's a performance improvement to be had by being able to specialize queries, this would come in really handy. +1

#3 @MikeSchinkel
11 years ago

  • Cc mike@… added

#4 @wonderboymusic
11 years ago

  • Keywords has-patch added
  • Milestone changed from Awaiting Review to 3.7

Yes, yes. With my patch, if you pass an array to fields, it will grab only those columns

$query = new WP_Query( array( 'fields' => array( 'post_parent', 'guid' ) ) );
print_r( $query->posts );
exit();

#5 follow-up: @MikeSchinkel
11 years ago

A huge +1 on this. A question though; how will this affect the post cache?

#6 in reply to: ↑ 5 ; follow-up: @nacin
11 years ago

Replying to MikeSchinkel:

A huge +1 on this. A question though; how will this affect the post cache?

Pretty terribly / it'll completely ignore it. Which is why I am not a huge fan of this — it is ripe for abuse and misunderstanding. This would not come in "handy" as much as cause people to do really stupid things and force new queries whenever WordPress goes "yo, I need a proper post object". Maybe direct queries is better advice for stuff like this?

#7 in reply to: ↑ 6 ; follow-up: @MikeSchinkel
11 years ago

Replying to nacin:

Pretty terribly / it'll completely ignore it. Which is why I am not a huge fan of this — it is ripe for abuse and misunderstanding. ... Maybe direct queries is better advice for stuff like this?

How do you feel about maybe modularizing WPQuery->get_posts() such that we would be able to get parts of a query more easily; specifically the WHERE and JOIN clauses without having to duplicate so much SQL logic?

I know it would be a huge undertaking, but every time I have to debug into that function (which is often) I think of how much easier it would be to understand the behavior in specific areas if I didn't have to trace through 2000+ lines of code.

On a related note it should would be nice if the components of the SQL queries were modeled as objects instead of as string fragments so that modifying them during hooks wouldn't be so fragile. I spent an hour and wrote a proof of concept to illustrate what I mean: https://gist.github.com/mikeschinkel/6577595.

I'd be happy to create new tickets for either of these if there's any chance of consideration.

#8 @helen
11 years ago

  • Milestone changed from 3.7 to Awaiting Review

Doesn't sound like there's a real consensus for direction, if any, just yet.

#9 @wonderboymusic
11 years ago

  • Milestone Awaiting Review deleted
  • Resolution set to maybelater
  • Status changed from new to closed

If someone wants queries that are deliberately not cached, they can accomplish this through filters or their own SQL query. I don't think we should encourage this through WP_Query directly.

#10 in reply to: ↑ 7 ; follow-up: @helen
10 years ago

Replying to MikeSchinkel:

How do you feel about maybe modularizing WPQuery->get_posts() such that we would be able to get parts of a query more easily; specifically the WHERE and JOIN clauses without having to duplicate so much SQL logic?

Filtering fully formed HTML and SQL statements/fragments is one of my least favorite things to do. It sounds nice - #24684 is very much related to that.

#11 in reply to: ↑ 10 @MikeSchinkel
10 years ago

Replying to helen:

Filtering fully formed HTML and SQL statements/fragments is one of my least favorite things to do. It sounds nice - #24684 is very much related to that.

Thoughts on how to move forward?

#12 @bigdawggi
10 years ago

Regarding caching of results, is there a difference in specifying the fields as a query argument or using the "posts_fields" or "posts_fields_request" filters that are already available? I'm thinking this is another way to accomplish the same thing in a more-direct manner than using the filter system -- and if they're both equally good/bad for caching, then why not allow it?

#13 @sc0ttkclark
10 years ago

  • Resolution maybelater deleted
  • Status changed from closed to reopened

Would be great if we made this possible. Losing out on everything WP_Query can provide for query building (meta_query, tax_query, search, etc), simply because it doesn't prime caches for the posts returned (exactly like any non-default 'fields' currently does in core).

I think we should encourage this through WP_Query, let's not make people write more custom SQL that is anything but backwards compatible, pretty please?

Last edited 10 years ago by sc0ttkclark (previous) (diff)

#14 @boonebgorges
10 years ago

A slightly better approach here is to do something like the following (assuming split_the_query): Do the SELECT ID query as per usual. Then, once we have an array of matched post IDs, fill the posts objects only with the fields requested in the 'fields' array. And for each of those, we would first check the cache, to avoid doing an additional query if possible. So, fetching ID+post_title would result in two queries: SELECT ID WHERE... and SELECT post_title WHERE IN ($ids).

This is slightly better because it doesn't ignore the cache when populating the post objects. But it's still not great because (a) it doesn't *update* the cache, so subsequent versions of the same query will still require hitting the DB (though perhaps we could avoid this by always querying for the full object if a persistent cache is found?), and (b) as nacin notes, trying to use $query->posts objects alongside any other WP APIs (like template loops) is going to result in database hits to populate the full post object.

#15 follow-up: @sc0ttkclark
10 years ago

Fetching 'fields' => 'ids' or 'id>parent' seems to have the same issues you're speaking of. Would it be better to de-couple the tweaks you're speaking of from this ticket, and allow this ticket to move forward with the current caveats that we all accept with 'ids' and 'id>parent'?

#16 @SergeyBiryukov
10 years ago

  • Milestone set to Awaiting Review

#17 in reply to: ↑ 15 @boonebgorges
10 years ago

  • Keywords close added

Replying to sc0ttkclark:

Fetching 'fields' => 'ids' or 'id>parent' seems to have the same issues you're speaking of. Would it be better to de-couple the tweaks you're speaking of from this ticket, and allow this ticket to move forward with the current caveats that we all accept with 'ids' and 'id>parent'?

'fields => ids' is not really the same. The ID lookup in WP_Query is never cached. If 'split_the_query', a SELECT ID query is run, and post data is filled in from the cache; otherwise SELECT * is run, and the post cache is ignored. It's a special case. 'fields = id=>parent' does run up against the problems I've laid out. But the fact that we're doing something less than optimal in one case doesn't mean we should start doing it in all cases.

I'm not sure that it's desirable to "move forward with the current caveats". If the proposed implementation is similar to 'id=>parent' - we modify the SELECT clause and then bail early - then our API will be implicitly suggesting that you save some overhead by fetching only the fields you need, which will not actually be true, since it'll totally miss the cache.

There will still be cases (though I'd guess they'll be relatively rare) where it's measurably better to fetch fewer post fields. It seems to me that the best advice in these cases is to do a WP_Query with 'field=ids', and then pass the found IDs, in PHP, to a direct SQL query that gets exactly what you want. That way, you have access to all the tools of WP_Query *and* you avoid the SELECT * overhead. This will not perform as well as SELECT field_1, field_2... at very large scales (like, say, during migration scripts), but in those cases you probably ought to be writing your own SQL anyway.

This ticket was mentioned in Slack in #core by sc0ttkclark. View the logs.


10 years ago

#19 @sc0ttkclark
10 years ago

Had a chat in Slack with Boone about this a bit more. The consensus seems to be that the use-case for 'fields' would be too small to offset the cache-bypassing effects and bad developer practices it would open the flood gates to, for use-cases which really shouldn't be using this new enhancement.

I am undecided yet on how to move this forward, aside from someone stepping in to provide benchmarks with and without persistent cache, on the performance of the different 'fields' options, versus using the current work arounds.

#20 @boonebgorges
10 years ago

  • Milestone Awaiting Review deleted
  • Resolution set to wontfix
  • Status changed from reopened to closed

Here's my takeaway from the above: The cases in which SELECTing only a subset of the fields is substantially more performant are rare. In those cases, devs can still use WP_Query to get the matched post IDs, and then pass them to custom SQL that SELECTs only the desired fields. As far as our API is concerned, we should be encouraging best practices, which will almost always mean 'fields=all' or 'fields=ids'.

Based on this reading, I'm reclosing this ticket. If someone can generate benchmarks that show that this analysis way off the mark, please feel free to reopen with details.

#21 @MikeSchinkel
10 years ago

I know this was just closed, but it might be worth considering one more option before putting this to bed completely.

I would say the biggest issue is loading longtext fields, followed by loading text fields, when we only need a few of the smaller fields. Maybe there could be options for "fields"=>"no-longtext" and "fields"=>"no-text", the latter of which would omit both text and longtext fields.

Any chance for this to be considered?

#22 @sc0ttkclark
10 years ago

The only thing holding any progress on this back is lack of performance tests. Being able to show how it performs versus ids specifically would be great help IMO.

#23 follow-up: @boonebgorges
10 years ago

MikeSchinkel - What sc0ttkclark said. More specifically, I'd only be convinced by benchmarks that demonstrate a real-life situation where excluding longtext and text fields significantly decreases query overhead, in a way that couldn't easily be avoided by using fields=ids.

#24 in reply to: ↑ 23 @MikeSchinkel
10 years ago

Replying to boonebgorges:

MikeSchinkel - What sc0ttkclark said. More specifically, I'd only be convinced by benchmarks that demonstrate a real-life situation where excluding longtext and text fields significantly decreases query overhead, in a way that couldn't easily be avoided by using fields=ids.

Understood. I unfortunately don't have the bandwidth at the moment to generate said benchmarks.

But point of note, I myself have not been concerned about performance per se but instead memory usage. For example, let's assume a query loads 1000 WP Post records with the average size of a post being 2941 characters (as from my blog) that is almost 3Mb of memory to load said data, which seems like quit a lot for one query on one page load.

Yes, we can code it so that we use two queries (I already do that) or maybe 1000 posts are rarely needed (although I have had code reviews where they required 'posts_per_page' to be set to 999, just in case) but the point being that using two queries or that much memory seems to be overkills when all you want are just ID, post_title, post_name and maybe post_parent which I would guest is by for the most common need, at least in my use cases.

Just sayin...

#25 @boonebgorges
10 years ago

I understand the memory issue and I'm sympathetic. However, cases where you *legitimately* need 1000 posts are exceedingly rare, and we should not optimize for cases where people are _doing_it_wrong(). In real life cases, where you might pull 10 or 20 or even 100 posts, the memory overhead is close to negligible.

More importantly, any immediate performance gain that you get from cherry-picking post fields is immediately offset when you introduce a persistent cache, because, for reasons discussed above, these kinds of queries cannot interact properly with the cache (at least, not without a rewrite of the way that WP_Query primes and accesses post caches).

#26 @boonebgorges
9 years ago

#33519 was marked as a duplicate.

#27 @ocean90
6 years ago

#33519 was marked as a duplicate.

#28 @ocean90
6 years ago

#43636 was marked as a duplicate.

Note: See TracTickets for help on using tickets.