WordPress.org

Make WordPress Core

Opened 4 years ago

Closed 3 years ago

Last modified 3 years ago

#36687 closed feature request (fixed)

Feature to override WP_Query to provide results from other source

Reported by: jpdavoutian Owned by: jpdavoutian
Milestone: 4.6 Priority: normal
Severity: normal Version:
Component: Query Keywords: has-patch
Focuses: Cc:
PR Number:

Description

Currently the only way to get results is to query the database.
We are currently working on a solution to utilize a solr server to retrieve post ids but there is no way to use WP_Query and return the results without actually hitting the database which defeats the purpose of having a solr server.

The only solution right now is to implement all functionality we need by rewriting most of the widgets, plugins that query wordpress, to query solr in a totally custom way. This means that we cannot have a plugin that works universally for all to benefit from it without reinventing the wheel every time.

It would be great to have an option in wp-includes/query.php to say that we have the results on hand, and don't go to the database.
This is something that could be implemented early in wp-includes/query.php with an action and if $q contains lets say a key 'exit', then assume that we have posts and other needed variables filled in.

I could provide a code sample if you see any interest in a feature like that.

Regards

Jean-Paul

Attachments (4)

query.php.patch (4.6 KB) - added by jpdavoutian 4 years ago.
36687.diff (5.8 KB) - added by boonebgorges 4 years ago.
36687.2.diff (2.7 KB) - added by dd32 3 years ago.
Example of the issue and a (extremely lazy) unit test to demonstrate it
36687.3.diff (2.6 KB) - added by spacedmonkey 3 years ago.

Download all attachments as: .zip

Change History (35)

#1 @boonebgorges
4 years ago

  • Keywords reporter-feedback added

Hi @jpdavoutian - Thanks for the ticket, and welcome to WordPress Trac!

It would be helpful to hear more about how you think this feature would be implemented, and how you'd use it. There are a number of plugins doing something similar to what you've suggested, and they're "tricking" WP_Query into fetching content from elsewhere using the posts_request and the_posts filters. See eg https://github.com/10up/ElasticPress/blob/develop/classes/class-ep-wp-query-integration.php#L55 Can you say more about how your approach would differ, and what advantages it would offer beyond these existing techniques?

#2 @jpdavoutian
4 years ago

  • Keywords reporter-feedback removed

Thanks for the quick reply.

Yes, I have seen the plugins you suggest.
There are a couple of issues with them.

First of all they deal only with search results and not the wordpress queries in general.

Secondly, you can see that the posts_request filter actually is returning an sql statement that is dummy.
https://github.com/10up/ElasticPress/blob/develop/classes/class-ep-wp-query-integration.php#L336
Although this approach is clever, it actually hits the database for nothing.

But the most serious issue in my opinion is that it does not allow to intercept queries when we request only post ids. You simply cannot do it.
Many plugins make use of such queries and there is no way these queries to be served from solr or any other similar service.

By having one more filter in wp-includes/query.php and a few more checks in code, we could actually generalize the use of search services without affecting or rewriting other plugins.
I have placed an action in wp-includes/query.php, line 3548
$this->is_external_query = apply_filters_ref_array( 'posts_query_external', array( $this->request, &$this ) );
and later on I just check the $this->is_external_query to if I need to count posts or other stuff.

I'm already working on such a case (woocommerce layered navigation) and have great success till now and the performance improvement is huge.
There are of course some minor issues I haven't solved yet and maybe cases I can't even think of, since my experience in wordpress is limited (that's why I haven't uploaded any code yet).

#3 @boonebgorges
4 years ago

  • Keywords needs-patch added
  • Milestone changed from Awaiting Review to Future Release

Thanks for the thoughtful response, @jpdavoutian. I have similar thoughts about the limitations of the approaches used by plugins like the one I linked to.

A filter like the one you've suggested seems appropriate to me, though you're right that there will probably be some auxiliary issues caused by skipping the MySQL query. When it's in a workable state, would you mind sharing your modifications so that we can have a closer look at the approach?

#4 @tlovett1
4 years ago

Regarding only getting post ids, there is a ticket for that limitation as it affects ElasticPress: https://core.trac.wordpress.org/ticket/35476

I'd be a huge proponent of a filter for overriding the posts query entirely for use with external services.

Last edited 4 years ago by tlovett1 (previous) (diff)

#5 @jpdavoutian
4 years ago

@tlovett1 The solution you mention is very nice and allows you to bypass the database query when using the fields option but does nothing for the rest.
I already have a solution at hand and have tested it and it seems to be working fine.
Soon I'll upload a patch for review.

#6 @jpdavoutian
4 years ago

  • Keywords needs-patch removed

I have just added a patch.
In order to make this work I also have another custom plugin that makes use of these changes.
The plugin is based on the excellent plugin from Pantheon (https://github.com/pantheon-systems/solr-power).
If you need it I can send it, but I think there is point to create a public repository for it since no one will be able to use it unless they have the specific patch.

This is my first attempt to change wordpress core files, so please be gentle if there is something I've done wrong :)

#7 @DrewAPicture
4 years ago

  • Keywords has-patch added

@boonebgorges
4 years ago

#8 @boonebgorges
4 years ago

  • Keywords 2nd-opinion added

Thanks for the patch, @jpdavoutian ! It looks like a step in the right direction.

To my mind, a query parameter like 'external' doesn't seem like the best way to do this. First of all, 'external' is not really descriptive of what's happening here: it's really about disabling WP_Query's database query. In your case, this is useful because you're getting your posts from an *external* source, but that seems to me to be a step removed. In any case, regardless of the name, a boolean flag like 'external' feels weird because presumably you will want to provide the posts themselves in the $this->posts array. Which means that you'd have to hook to pre_get_posts or something like that. So my feeling is that this feature should be implemented as a filter, since you'll have to use a filter callback no matter what.

See 36687.diff for my iteration on query.php.patch. I've changed the name to posts_pre_query. Filter it and return anything other than null to bypass the database queries. I've left most of the logic in the fields=ids etc blocks, with the expectation that if you filtering posts_pre_query, you'll check fields first and be sure to return the correct kind of data.

In order to make the improvement fully useful, I think it's necessary to have a more direct way to override WP's set_found_posts() logic - presumably, this info is going to come from the same place as your posts array. I've implemented this by adding a bail condition to set_found_posts(). We could consider something more sophisticated than this, but it does the trick.

Here's a sample implementation:

function wp36687_do_external_posts_query( $posts, WP_Query $query ) {
    $response = wp_remote_get( 'https://my-remote-data-store/foo/bar' );
    
    if ( 200 === wp_remote_response_code( $response ) ) {
        $response_body = wp_remote_retrieve_body( $response );
        
        // Assuming the API response is a JSON object containing `post_ids` and an int `found_posts`
        $response_body = json_decode( $response_body );

        $posts = $response_body->post_ids;
        $query->found_posts = $response_body->found_posts;
    }

    return $posts;
}
add_filter( 'posts_pre_query', 'wp36687_do_external_posts_query', 10, 2 );

Questions:

  • Does this seem like a sufficiently general technique for using external data in WP_Query? Are there pieces that need to be flexible that are not, using this technique? Use cases that aren't covered?
  • Is posts_pre_query a reasonable filter name?

#9 @tlovett1
4 years ago

Patch looks good Boone. The only thing that would make this more useful is not requiring WP_Query to return WP_Posts. Right now all objects returned from posts_pre_query are wrapped in WP_Post.

#10 @jpdavoutian
4 years ago

@boonebgorges I have to confess that the variable names I used are not very descriptive and sorry for not having any comments in the changes. I forgot completely about them.

As for the query parameter. It is not about disabling WP_Query's database query but to avoid making the external call. For example when indexing posts for the solr server we need to explicitly use the database. Otherwise we cannot read the posts. That's why I think that we should have the option there. The default should be that we want make the external call since the cases that we require explicit database use are less. Maybe a better name and comments in code will help.

As for the set_found_posts(). You are right that the number of posts comes with the results, even with paging. So there is no reason to make an extra call if we have this number on hand. Besides from settings the $query->found_posts I cannot figure out any other way.

#11 @boonebgorges
4 years ago

  • Milestone changed from Future Release to 4.6

The only thing that would make this more useful is not requiring WP_Query to return WP_Posts. Right now all objects returned from posts_pre_query are wrapped in WP_Post.

This isn't quite right. The way 36687.diff is written, the fields parameter is still respected, and your posts_pre_query callback should be cognizant of fields. In other words, if fields=ids, your posts_pre_query callback can return an array of IDs, and it'll work as you'd expect; if you want it to return full post objects, you can either build pseudo-post objects in your callback, or return IDs and let WP fetch the full posts from the database (array_map( 'get_post', $this->posts )).

As for the query parameter. It is not about disabling WP_Query's database query but to avoid making the external call. For example when indexing posts for the solr server we need to explicitly use the database. Otherwise we cannot read the posts. That's why I think that we should have the option there. The default should be that we want make the external call since the cases that we require explicit database use are less.

When I said that it's about "disabling the database query", I don't mean disabling the database altogether. Obviously, your external appliance has to get the data from somewhere. I just mean that what the flag really does is prevents this specific instance of WP_Query from connecting to the database to get post IDs. I'm still not convinced that an option will add anything of value - even if there is something like an external or dont_query_database flag, you'll still have to provide the posts data somehow; and since that "somehow" will likely be with a filter callback, it makes sense to use *only* a filter callback instead of having a two-part solution.

Besides from settings the $query->found_posts I cannot figure out any other way.

It'd be possible to add either a query_var that lets you set an explicit found_posts value,
or a specific filter that bypasses the found_posts query (much like posts_pre_query - posts_pre_set_found_posts or something like that). But they would fundamentally do the same thing as 36687.diff, so all things being equal, I'd go with the more minimal solution.

This ticket was mentioned in Slack in #core by chriscct7. View the logs.


3 years ago

#13 @chriscct7
3 years ago

  • Owner set to jpdavoutian
  • Status changed from new to assigned

#14 @boonebgorges
3 years ago

Would like feedback from @jpdavoutian to make sure I'm not crazy before moving forward :) (see comment 11)

#15 follow-up: @jpdavoutian
3 years ago

@boonebgorges I have tested the solution and it works really great.

My only concern is about bypassing pre_posts_query when reindexing content. I have worked out a solution for my case using the filter and a class variable. Maybe not the ideal solution but it was the only one that I could think of.
I'm not sure how other plugins could make any use of this in case they need the original query.

Last edited 3 years ago by jpdavoutian (previous) (diff)

#16 in reply to: ↑ 15 ; follow-up: @boonebgorges
3 years ago

Replying to jpdavoutian:

@boonebgorges I have tested the solution and it works really great.

My only concern is about bypassing pre_posts_query when reindexing content. I have worked out a solution for my case using the filter and a class variable. Maybe not the ideal solution but it was the only one that I could think of.
I'm not sure how other plugins could make any use of this in case they need the original query.

Thanks for the feedback. Can you say more about this concern? I'm not sure what you mean by "bypassing pre_posts_query when reindexing content".

#17 in reply to: ↑ 16 @jpdavoutian
3 years ago

Replying to boonebgorges:

Replying to jpdavoutian:

@boonebgorges I have tested the solution and it works really great.

My only concern is about bypassing pre_posts_query when reindexing content. I have worked out a solution for my case using the filter and a class variable. Maybe not the ideal solution but it was the only one that I could think of.
I'm not sure how other plugins could make any use of this in case they need the original query.

Thanks for the feedback. Can you say more about this concern? I'm not sure what you mean by "bypassing pre_posts_query when reindexing content".

Well, there is this code from the solr-power plugin I mentioned earlier that does the indexing of posts:

        $args  = array(
          'post_type'    => apply_filters( 'solr_post_types', get_post_types( array( 'exclude_from_search' => false ) ) ),
          'post_status'  => 'publish',
          'fields'     => 'ids',
          'posts_per_page' => absint( $limit ),
          'offset'     => absint( $prev )
        );
        $query   = new WP_Query( $args );
        $postids = $query->posts;

It gets all post ids of specific types in order to send these posts to solr server.
If we have the posts_pre_query in place then we will get an empty result since nothing is initially indexed.

My first attempt was to have an option in $args to force a database query instead of querying the solr server:

        $args  = array(
          'external'     => FALSE,

Since there is no such option now, I raise a flag before calling the query and check for this flag in my posts_pre_query filter.
An other solution could be to add the filter only after we have completed indexing.

So, my concern is not how I can handle such a case within my plugin were I have set the filter and I can easily bypass it, but how a third party plugin could tell the system not to go through the posts_pre_query without being aware of my plugin.

#18 @boonebgorges
3 years ago

So, my concern is not how I can handle such a case within my plugin were I have set the filter and I can easily bypass it, but how a third party plugin could tell the system not to go through the posts_pre_query without being aware of my plugin.

Thanks for the clarification. I think I understand the concern. This hook is essentially a forced override: it allows plugin A to prevent WP from firing its normal query. You are suggesting that perhaps there should be an override for this override, in the form of a WP_Query param or something like that, so that plugin B can tell plugin A not to bypass the query.

I'm not sure such an override is wise, or helpful. What happens when plugin A wants to tell plugin B that, no, *really*, I want to override WP_Query? Should there be an override of the override of the override?

The use cases for pre_posts_query are varied, and the interaction between third-party plugins using it will almost always have to be very specific to the two plugins involved. To take your example, say you're using something like what I suggested above to feed external posts into the query results. And let's say that you have a plugin that occasionally needs "fresh" results from the WP database, for the purpose of refreshing the search index. In that case, I would whitelist those specific index queries:

function wp36687_do_external_posts_query( $posts, WP_Query $query ) {
    if ( isset( $_GET['get_fresh_posts'] ) ) {
        return $posts;
    }
    
    ...

You probably want something more sophisticated than this, but you get the idea: Like anywhere else in WordPress, when you use an action or filter hook, you have to be sure that your callback only fires in the proper context. What that context is will differ from case to case, such that I don't think WP can make the decision ahead of time.

My inclination is to go with 36687.diff for 4.6. If plugin authors find that they're having a hard time overriding the override, we can revisit in a future release. Does that seem reasonable to you @jpdavoutian ?

#19 @jpdavoutian
3 years ago

The way you put it @boonebgorges, I couldn't agree more. Lets go with what you propose.

#20 @boonebgorges
3 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 37692:

Query: Allow plugins to supply post results instead of having WP_Query fetch them from the database.

Returning a non-null value from the new posts_pre_query filter will cause
WP_Query to skip its database query, so that posts data can be provided from
elsewhere. This is useful in cases where post data may be mirrored in a
separate location, such as an external search application.

Developers should note that the WP_Query properties generally used to
calculate pagination - specifically, found_posts and max_num_pages, which
are determined by default in set_found_posts() - must be provided explicitly
when using the posts_pre_query filter; since WP_Query will not be
contacting the database, it will have no access to SELECT FOUND_ROWS().
The WP_Query instance is passed to posts_pre_query by reference, so that
these properties can be set manually if needed.

Props jpdavoutian, tlovett1.
Fixes #36687.

#21 @dd32
3 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

@boonebgorges This change has broken at least one implementation of a plugin inserting it's own posts data.

In this case, it's a Jetpack Search implementation - https://meta.trac.wordpress.org/browser/sites/trunk/wordpress.org/public_html/wp-content/plugins/plugin-directory/libs/site-search/jetpack-search.php

It's a bit convoluted to follow, but the basic principle is that it hooks into the_posts to add the posts and found_posts to add the found posts counts. However, the issue is that previously $this->set_found_posts( $q, $limits ); was run regardless of if $this->posts actually contained any results, but it now is only run on line 3656 if data was retrieved from the database.

It seems like set_found_posts has some strange logic in it ( is_array( $this->posts ) && ! $this->posts ) which works around this case, so I think the safest bet might be to always call the method regardless of if posts were found.

@dd32
3 years ago

Example of the issue and a (extremely lazy) unit test to demonstrate it

#22 @jadpm
3 years ago

The last change has broken simple queries counters. On a vanilla new install with the SVN trunk version of WordPress, no plugins and default theme, running a simple query sets max_num_pages to 1 even when it shoudl be another value.

Consider a clean site with just 16 posts and the following query added to the footer of the theme:

<?php
$ek_query_args = array(
        'post_type'             => 'post',
        'posts_per_page'        => 5
);
$ek_query = new WP_Query( $ek_query_args );

In that case, $ek_query->max_num_pages should be 4. With the latest changes on query.php it defaults to 1. Reverting to the very previous version of that file states the right 4 number of pages. Note that there is no (at least external) filter on posts_pre_query whatsoever.

Note that this also applies to native builtin queries. The Homepage set as the blog page, with a setting of 2 posts per page and 16 posts on the site is displaying obviously no pagination controls on TwentySixteen, since, well, the page counter has become 1.

Please revert and test properly before commiting again.

This ticket was mentioned in Slack in #core by jadpm. View the logs.


3 years ago

#24 @boonebgorges
3 years ago

  • Keywords 2nd-opinion removed

@dd32 Thanks for the patch and the detailed explanation. I should've known better than to try to be parsimonious.

Please revert and test properly before commiting again.

@jadpm I am unable to reproduce the issue as you've described it. If you're still experiencing the problem after the upcoming commit to address the issue as described by @dd32, please post here with more details, including plugins (I know you said "vanilla", but just to be sure), and info about your WP environment - particularly, whether you have any object caching enabled.

#25 @boonebgorges
3 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 37712:

Query: After [37692], don't skip set_found_posts() when no posts are found.

The 'found_posts' filter must continue to run for plugins manipulating post
results via filter.

Props dd32.
Fixes #36687.

#26 @jadpm
3 years ago

I could not replicate the very same issue on a new and clean installation. However, I am getting erratic outcomes on a fairly complex plugin that does custom queries which used to work properly before the changeset 37692 and it is still not working with the latests SVN trunk version. Note that it does work properly on 4.5.2 and on SVN before that changeset.

I did manage to replicate a small erratic behavior that might suggest that there are indeed side effects on those changes.

Queries executed on AJAX calls are getting wrong found_posts and max_num_pages properties. On a local site I have, with just three posts, I am getting a count of 4 found posts, which is obviously wrong. The queries on my plugin, on the other hand, are getting 0 post founds.

To replicate this small test case:

  • Install a SVN trunk copy of WordPress
  • Create 3 posts. I could reproduce the erratic behavior with also 5, 6 and so on posts (the magic number seems to be 4...)

Now you need an AJAX calback to generate a query:

<?php
function ek_query_callback() {
        $ek_args = array(
                'post_type'             => 'post',
                'posts_per_page'        => 2
        );
        $ek_query = new WP_Query( $ek_args );
        return '<pre>' . print_r($ek_query, true) . '</pre>';
}

add_action( 'wp_ajax_ek_get_posts', 'ek_get_posts' );

function ek_get_posts() {
        $data = ek_query_callback();
        echo $data;
        die();
}

And a place to fire the AJAX. Note that ajaxurl is not available by default on the frontend, so I hijacked the localization for a frontend script on the active theme to add it:

jQuery( document ).on( 'click', '.js-ek-get-posts', function( e ) {
	e.preventDefault();
	var data = {
		action: 'ek_get_posts'
	};
	jQuery.ajax({
		type:		"GET",
		url:		ajaxurl,
		data:		data
	}).done( function( response ) {
		alert(response);
	});
});

Create a page and add a fake link:

<a href="#" class="js-ek-get-posts">Get posts</a>

When you click the link, an alert should display the query data. Mind the found_posts and max_number_pages values. I am getting 4 and 2 for 3 posts.

Some odd combinations:

  • posts_per_page = 1 and 7 posts produces 2 found posts and 2 max pages.
  • posts_per_page = 2 and 7 posts produces 4 found posts and 2 max pages.
  • posts_per_page = 3 and 7 posts produces 6 found posts and 2 max pages.
  • posts_per_page = 4 and 7 posts produces 8 found posts and 2 max pages.

As you can see, the number of found posts is random and sometimes higher than the actual number of existing posts, and the number of max pages seems to get stuck.

So we do have a problem here.

If the test case is too complex I can craft a Duplicator package with minimum changes in the TwentyFifteen theme to pack the AJAX calback, the JS for the frontend and the posts and pages used to test this. I can also pack this all as a standalone plugin if needed.

Note that this is working properly, all of it, on 4.5.2 and on SVN before the first commit on this ticket.

#27 @boonebgorges
3 years ago

  • Resolution fixed deleted
  • Status changed from closed to reopened

@jadpm Thanks very much for the details.

There is indeed a bug: moving the call to set_found_posts() meant that, in some cases, another query was being run between the main $this->request and SELECT FOUND_ROWS(): the query was triggered by _prime_post_caches(). As such, SELECT FOUND_ROWS() was matching the found rows from the cache-priming query rather than the main query. These are the "random" results you're seeing.

set_found_posts() has to go back immediately after the main queries have taken place. I'll enforce this with a test or two.

#28 @boonebgorges
3 years ago

  • Resolution set to fixed
  • Status changed from reopened to closed

In 37721:

Query: set_found_posts() must run immediately after main query.

If not run immediately after, the SELECT FOUND_ROWS() query might refer to
a different query, such as the one used to populate the post cache for a split
query.

Introduced in [37692].

Fixes #36687.

#29 @jadpm
3 years ago

Works fine now, thanks for finding the root cause. I spent some time trying to but did not manage to see the reason.

@spacedmonkey
3 years ago

#30 @spacedmonkey
3 years ago

In 36687.3.diff I refactored how the fields=>ids call is passed. It also now does a split query. This removes the early return and pass the query through the posts_request_ids and posts_results filters.

#31 @romulodl
3 years ago

About the solution, should not be better to create a filter called use_wp_query just after pre_get_posts function and add an early return in case of being false?

The proposed solution is ok, but IMO it would be very confusing for a dev navigate on which filter needs to be hooked if you want to return from another source database or if you're caching the results (which was my pain creating a post cache plugin for memcache.)

Note: See TracTickets for help on using tickets.