Opened 3 years ago
Closed 2 months ago
#56992 closed defect (bug) (fixed)
The Loop displays incorrect data for queries started with `fields => 'id=>parent'`.
Reported by: |
|
Owned by: |
|
---|---|---|---|
Milestone: | 6.8 | Priority: | normal |
Severity: | normal | Version: | 3.1 |
Component: | Query | Keywords: | has-patch has-unit-tests |
Focuses: | performance | Cc: |
Description
Using the Loop on custom WP_Query
objects started with fields => 'id=>parent'
will set the global post object incorrectly. Instead of containing all data, the global post object will only contain the post's ID and parent fields. All other fields are the default value.
The following test demonstrates the issue:
<?php public function test_the_loop_when_querying_post_parents_only() { $parent = self::factory()->post->create( array( 'post_type' => 'page' ) ); $child = self::factory()->post->create( array( 'post_type' => 'page', 'post_parent' => $parent ) ); $query = new WP_Query( array( 'fields' => 'id=>parent', 'post_type' => 'page', 'page_id' => $child, ) ); $query->the_post(); $global_post = get_post( null, ARRAY_A ); $specific_post = get_post( $child, ARRAY_A ); $this->assertEqualSetsWithIndex( $global_post, $specific_post ); }
The cause of the error is within WP_Query::next_post()
which assumes the WP_Query::$posts
property contains full post objects which isn't nessesarily the cause if a sub-set of fields is requested. See the source code.
Due to #56948 the test above will currently throw an error on trunk but it will demonstrate the problem if you check out the 6.0 branch.
I suspect this has been an issue since the fields parameter was introduced in #14777 for version 3.1 but I am unable to test that far back due to PHP versions I have available on my local config.
Change History (53)
This ticket was mentioned in PR #7985 on WordPress/wordpress-develop by @juzar.
5 months ago
#1
- Keywords has-patch has-unit-tests added; needs-patch needs-unit-tests removed
#2
@
5 months ago
@peterwilsoncc, Yes the global post object set is wrong if used ids=>parent
for fields, it creates an empty WP_Post object and assigns that to the global post.
I have fixed the issue in this PR: https://github.com/WordPress/wordpress-develop/pull/7985
with unit test added as above.
This ticket was mentioned in Slack in #core by audrasjb. View the logs.
4 months ago
4 months ago
#5
@joemcgill, I agree with you, that would be more consistent. I have updated the logic to return WP_Post
object.
This ticket was mentioned in Slack in #core-test by oglekler. View the logs.
4 months ago
#7
@
4 months ago
- Keywords dev-feedback 2nd-opinion added
@juzar I've been reviewing the code and testing your approach
But @oglekler raised an interesting concern I'm looking into right now: The test seems to be failing because it is expected to work with the minimal data structure, but your implementation seems to be trying to convert it to full post objects, creating unnecessary overhead and potentially changing the expected data structure.
As you said, in the original test, you can see on one side, you populated the child post with full data, on the other side, just the bare minimal data was retrieved, and the rest was fed with default data as you pointed out in your first post in the test. This is why the test is failing, but it should fail that specific test as expected, as next_post()
should not retrieve full data by default, as it may generate a performance issue in the future.
I need a 2nd opinion on this, maybe @joemcgill?
#9
@
3 months ago
@SirLouen Are you able to clarify the concerns were discussing with @oglekler? I think you're worried about the change is return type for WP_Query::next_post()
but I might be missing something. Is that correct?
Re the performance concern: When WP_Query::the_post()
is called, the cache is primed for all posts returned by the query. The same isn't true for WP_Query::next_post()
although it is intended that when traversing the loop the former is called. If it proves to be a performance issue, it might be possible to prime the cache it the next_post
method too.
@SirLouen commented on PR #7985:
3 months ago
#10
@joemcgill just to make sure, have you read the trac info, before approving these changes?
#11
follow-up:
↓ 12
@
3 months ago
@peterwilsoncc maybe priming the cache in the next_post is worth, but maybe with a use case you can prove why the test test_the_loop_when_querying_post_parents_only
. Have you found an scenario where you need more info required by next_post
?
Once you can get id=>parent
for the part ID, can you just query the ID to get the information?
Can you suggest an scenario where actually retrieving the whole info from next post could be useful (instead of simply being able to query the next post). I'm assuming that the least information picked, the better (specially because the next post is queried for each post in the loop so as we were commenting this could be a performance issue).
Also, I don't really understand what you mean with priming the cache and not sure if still, fetching the whole info from next post, and caching, would be a great idea either.
Without an example of why this could be useful, I'm clueless of why this change is worthy, and despite the patch solves the issue, I'm not sure, if, in first place, there is a issue.
#12
in reply to:
↑ 11
;
follow-up:
↓ 13
@
3 months ago
Replying to SirLouen:
@peterwilsoncc maybe priming the cache in the next_post is worth, but maybe with a use case you can prove why the test
test_the_loop_when_querying_post_parents_only
. Have you found an scenario where you need more info required bynext_post
?
Once you can get
id=>parent
for the part ID, can you just query the ID to get the information?
I had the expectation and the actual reversed in the test. It should read $this->assertEqualSetsWithIndex( $specific_post, $global_post );
Running the tests with the expected value first shows the issue correctly:
Failed asserting that two arrays are equal. --- Expected +++ Actual @@ @@ 'ID' => 5 'ancestors' => Array (...) 'comment_count' => '0' - 'comment_status' => 'closed' + 'comment_status' => 'open' 'filter' => 'raw' - 'guid' => 'http://example.org/?page_id=5' + 'guid' => '' 'menu_order' => 0 'page_template' => '' - 'ping_status' => 'closed' + 'ping_status' => 'open' 'pinged' => '' 'post_author' => '0' 'post_category' => Array () - 'post_content' => 'Post content 0000023' + 'post_content' => '' 'post_content_filtered' => '' - 'post_date' => '2025-02-26 22:47:51' - 'post_date_gmt' => '2025-02-26 22:47:51' - 'post_excerpt' => 'Post excerpt 0000023' + 'post_date' => '0000-00-00 00:00:00' + 'post_date_gmt' => '0000-00-00 00:00:00' + 'post_excerpt' => '' 'post_mime_type' => '' - 'post_modified' => '2025-02-26 22:47:51' - 'post_modified_gmt' => '2025-02-26 22:47:51' - 'post_name' => 'post-title-0000023' + 'post_modified' => '0000-00-00 00:00:00' + 'post_modified_gmt' => '0000-00-00 00:00:00' + 'post_name' => '' 'post_parent' => 4 'post_password' => '' 'post_status' => 'publish' - 'post_title' => 'Post title 0000023' - 'post_type' => 'page' + 'post_title' => '' + 'post_type' => 'post' 'tags_input' => Array () 'to_ping' => '' )
Can you suggest an scenario where actually retrieving the whole info from next post could be useful (instead of simply being able to query the next post). I'm assuming that the least information picked, the better (specially because the next post is queried for each post in the loop so as we were commenting this could be a performance issue).
Also, I don't really understand what you mean with priming the cache and not sure if still, fetching the whole info from next post, and caching, would be a great idea either.
Without an example of why this could be useful, I'm clueless of why this change is worthy, and despite the patch solves the issue, I'm not sure, if, in first place, there is a issue.
The issue is that WP_Query::the_post()
is used to populate the global post object while running the standard loop on a secondary query.
In the current form, the code populates the global post incompletely so running code to get post data while in the loop will return empty values for get_the_title()
, get_the_content()
and get_the_excerpt()
.
For most other function calls to get the global post objects data, the value will be incorrect. comments_open()
returns false
instead of true
, etc.
#13
in reply to:
↑ 12
@
3 months ago
Replying to peterwilsoncc:
I had the expectation and the actual reversed in the test. It should read
$this->assertEqualSetsWithIndex( $specific_post, $global_post );
Running the tests with the expected value first shows the issue correctly:
The issue is that
WP_Query::the_post()
is used to populate the global post object while running the standard loop on a secondary query.
In the current form, the code populates the global post incompletely so running code to get post data while in the loop will return empty values for
get_the_title()
,get_the_content()
andget_the_excerpt()
.
For most other function calls to get the global post objects data, the value will be incorrect.
comments_open()
returnsfalse
instead oftrue
, etc.
I'm not sure if I'm not understanding you, or I'm not explaining well.
Your test is made ad-hoc to proof some behavior that I'm not 100% sure if it's the expected behavior, maybe it is, but im not sure.
More specifically here: https://github.com/WordPress/wordpress-develop/pull/7985/files#diff-1d050668621cbc3028e027d15e68438f4d879dd6e80ba98f9c9f69b9fb5469d0R3728
You have to issue an extra query with get_post to fetch the full of it.
But theoretically, with 'id=>parent' you are just saying that it returns an associative array of post IDs and their parent IDs not the whole data. In $global_post we should only find that, right?
When you only need the ID, retrieving the full post object is unnecessary and consumes more resources and if you are working with large datasets, it also consumes more memory.
Again, perhaps I'm wrong, but theoretically this is the expected behavior of fields
from my understanding.
This ticket was mentioned in PR #8418 on WordPress/wordpress-develop by @peterwilsoncc.
3 months ago
#14
Trac ticket: Core-56992
#15
@
3 months ago
@SirLouen It's always possible I am not explaining well :)
But theoretically, with 'id=>parent' you are just saying that it returns an associative array of post IDs and their parent IDs not the whole data. In $global_post we should only find that, right?
That limited set of data is accessible via $query->posts
without the need to start the loop. My reading of how WP_Query
is used is that it's expected that data is populated with the sub-set when the fields
parameter is provided.
When in the loop, it's expected that the functions get_the_title()
, get_the_content()
and a large number of others return the respective data for the the global post object (ie, calling the functions without specifying a post ID).
My expectation once I start the loop by calling ::the_post()
is that the global post will always contain the complete set of post data and that calling get_post()
and get_post( $current_global_post_id )
return the same thing.
I'm working on a PR and pushing some tests to hopefully demonstrate what I see as the problem.
@joemcgill commented on PR #7985:
3 months ago
#16
@Juzar10 thanks for the ping. I approved the PR before the ongoing conversation the Trac ticket and am happy to wait for you and @peterwilsoncc to refine the approach.
#17
@
3 months ago
I've created an alternative pull request to address the concerns @SirLouen raises.
- the return value of
WP_Query::next_post()
is unchanged - Incomplete data in the loop is addressed within
WP_Query::the_post()
- Includes the follow up to [54771] that was assigned to be worked on for this ticket
What changes:
- Secondary loops ensure the global post is fully populated, allowing
get_the_title()
,get_the_content(),
get_post()`, etc, etc to work as documented when called without a parameter. - The post and author cache is correctly primed when starting the loop for
parent=>id
queries
I've written up unit tests for the changes and pushed them prior to the source changes to demonstrate the issue I was seeing when in the loop.
This ticket was mentioned in Slack in #core by peterwilsoncc. View the logs.
3 months ago
#19
@
3 months ago
- Owner set to peterwilsoncc
- Resolution set to fixed
- Status changed from new to closed
In 59919:
@peterwilsoncc commented on PR #8418:
3 months ago
#20
@peterwilsoncc commented on PR #7985:
3 months ago
#21
This was the basis for the follow up PR https://github.com/WordPress/wordpress-develop/pull/8418 which ended up being merged, thank you!
#22
@
3 months ago
Changeset [59919] may have changed the behavior of the post preview, resulting in e2e test failures in the Gutenberg repository.
Could you take a look at what's reported here?
#23
@
3 months ago
- Keywords needs-patch added; has-patch has-unit-tests dev-feedback removed
- Resolution fixed deleted
- Status changed from closed to reopened
Reopening to address the regression reported on Gutenberg side: https://github.com/WordPress/gutenberg/issues/69403
#24
@
3 months ago
I anticipate that we might need to add an is_preview
check here and grab the unsaved post object:
This ticket was mentioned in PR #8460 on WordPress/wordpress-develop by @peterwilsoncc.
3 months ago
#25
- Keywords has-patch has-unit-tests added; needs-patch removed
Follow up to earlier commit after it broke preview of autosaves.
Trac ticket: https://core.trac.wordpress.org/ticket/56992
#26
@
3 months ago
I've created a follow up pull request to fix the issue with previews:
- adds tests for previewing both draft and published posts with autosaves
- modifies
WP_Query::the_post()
to useWP_Query::$posts
in their original, filtered, filtered form if the full post objects were queried - continues to populate the post object in full for partial field queries, parent/child relationships of IDs.
The failing tests can be seen in the action runs against the first commit.
@Mamaduka commented on PR #8460:
3 months ago
#27
I can confirm that failing Gutenberg e2e tests (test/e2e/specs/editor/various/preview.spec.js) are passing when this patch is applied.
@peterwilsoncc commented on PR #8460:
3 months ago
#28
@Mamaduka Thanks George, I'll commit on that basis.
#30
@
2 months ago
- Focuses performance added
- Keywords dev-feedback added
- Resolution fixed deleted
- Status changed from closed to reopened
I left some days ago because, as I said, I was observing several performance concerns I want to address here, and now I see you have progressed and merged into trunk.
I would like to reopen this ticket because I've observed some issues
Running last two patches: https://github.com/WordPress/wordpress-develop/pull/8460 and https://github.com/WordPress/wordpress-develop/pull/8418
After running some performance tests, they have shown these results:
Dataset: 1000 posts ------------------- Iterations: 10 Time: Modified is slower by 16.97% Queries: Same number of queries Iterations: 100 Time: Modified is slower by 40.9% Queries: Same number of queries Iterations: 1000 Time: Modified is slower by 40.82% Queries: Same number of queries
- Modified is the new version
As I asked some days ago, in my first post, I would like to see a use-case where this reported change makes sense (not only test-cases but a real use-case).
Here is the performance testing plugin in case anyone wants to check the testing code
https://github.com/SirLouen/the-post-performance-test
Note that the performance worsens the large the dataset becomes
I've figured out a use-case where I can showcase using WP_Query with fields => 'id=>parent'
without needing the full data, in a hierarchical page navigation menu. I'm going to design a little plugin to showcase this and tomorrow I will upload it here.
#31
follow-up:
↓ 32
@
2 months ago
@SirLouen
The issue is that the global post is only partially populated when the fields are set to id=>parent
. If you call WP_Query()
with the fields parameter set to id
or all
, the global post is populated correctly.
If it just affected global $post
, I might be fine with that but it affects all the functions that are documented to the effect that "within the loop, this function will return the value of the [thing] of the global post" This includes:
get_the_content()
get_the_excerpt()
get_the_title()
get_the_time()
- most other functions used in the loop and prefixed
get_
orthe_
I understand your concern but I can't be convinced that these functions should return incorrect data within the loop for one of the three fields
parameters but not the others. The return value for each was empty.
This is a real world case as these assertions were failing prior to this change for the id=>parent
fields parameter but neither of the others.
I'm leaving this ticket open to review the performance issues you raise.
For id=>parent
some performance impact is expected when starting the loop, as the post object are now queried to ensure the functions above return the correct data.
If accessing the parent child relationships via WP_Query::$posts
no impact is expected as the full post objects will not be queried.
#32
in reply to:
↑ 31
@
2 months ago
Replying to peterwilsoncc:
If it just affected
global $post
, I might be fine with that but it affects all the functions that are documented to the effect that "within the loop, this function will return the value of the [thing] of the global post" This includes:
get_the_content()
get_the_excerpt()
get_the_title()
get_the_time()
- most other functions used in the loop and prefixed
get_
orthe_
I'm doing myself a lot of questions with this topic. And one of the biggest questions that comes to my mind is: Isn't a workaround, without touching the core code, to achieve the same results?
Because the baseline is performant by default, but your "upgrade" hits significantly the performance. If someone is currently using the performant version (which is the mostly likely since the new version did not exist for ages), he will have his site affected by this patch.
But if there is a workaround for your use-case that doesn't require the core "upgrade" then your site will be impacted, but the rest of the sites will remain untouched.
When I'm asking for a use-case, I'm asking for a plugin, either an existing or a new one, that clearly requires this new upgrade and cannot achieve the result because of the lacking functionality.
Here is my plugin, that is using the current version before this patch. It doesn't require the "upgrade" and it works as expected https://github.com/SirLouen/hierarchical-page-navigator
Just remember that now it's being impacted by a 40% extra overhead with the current patch with no gains in functionality.
A workaround that comes to my mind, to populate template functions with the original version could be done using setup_postdata
<?php $query = new WP_Query([ 'post_type' => 'page', 'fields' => 'id=>parent', // Whatever else you want add to the query ]); while ($query->have_posts()) { $query->the_post(); global $post; // And here you get full post object only when really needed if (is_numeric($post) || $post instanceof stdClass) { $post = get_post($post instanceof stdClass ? $post->ID : $post); setup_postdata($post); } // Now the template functions should work correctly the_title(); }
#33
follow-up:
↓ 34
@
2 months ago
- Keywords has-patch has-unit-tests removed
I'm doing myself a lot of questions with this topic. And one of the biggest questions that comes to my mind is: Isn't a workaround, without touching the core code, to achieve the same results?
You're right, there is a work-around without touching WP Core but that's not really satisfactory. Requiring developers to work around a bug is not a good approach, as many developers will not know to work around it.
Because the baseline is performant by default, but your "upgrade" hits significantly the performance. If someone is currently using the performant version (which is the mostly likely since the new version did not exist for ages), he will have his site affected by this patch.
It looks like there can be some performance improvements in how the code determines whether posts need to have their caches primed. I'll work up a patch.
In terms of testing, I think extending WP_Query
with the old method for testing will provide data that's closer to the real world use (see below), so I will test with that and using Yoast's content faker.
<?php class WP_Query_Old extends WP_Query { /** * Sets up the current post. * * Retrieves the next post, sets up the post, sets the 'in the loop' * property to true. * * @since 1.5.0 * * @global WP_Post $post Global post object. */ public function the_post() { global $post; if ( ! $this->in_the_loop ) { // Only prime the post cache for queries limited to the ID field. $post_ids = array_filter( $this->posts, 'is_numeric' ); // Exclude any falsey values, such as 0. $post_ids = array_filter( $post_ids ); if ( $post_ids ) { _prime_post_caches( $post_ids, $this->query_vars['update_post_term_cache'], $this->query_vars['update_post_meta_cache'] ); } $post_objects = array_map( 'get_post', $this->posts ); update_post_author_caches( $post_objects ); } $this->in_the_loop = true; $this->before_loop = false; if ( -1 == $this->current_post ) { // Loop has just started. /** * Fires once the loop is started. * * @since 2.0.0 * * @param WP_Query $query The WP_Query instance (passed by reference). */ do_action_ref_array( 'loop_start', array( &$this ) ); } $post = $this->next_post(); $this->setup_postdata( $post ); } }
#34
in reply to:
↑ 33
@
2 months ago
Replying to peterwilsoncc:
You're right, there is a work-around without touching WP Core but that's not really satisfactory. Requiring developers to work around a bug is not a good approach, as many developers will not know to work around it.
I was here using the Yoast's content faker as you recommended, and I noticed something:
After issuing the WP_Query with id=>parent
I could perfectly retrieve title, URL, content and time with get_the_title
, get_permalink
, get_post
and get_the_time
respectively
Environment
- WordPress: 6.7.2
- PHP: 8.2.27
- Server: Apache/2.4.62 (Debian)
- Database: mysqli (Server: 11.4.5-MariaDB-ubu2404 / Client: mysqlnd 8.2.27)
- Browser: Chrome 134.0.0.0
- OS: Windows 10/11
- Theme: Twenty Twenty-Five 1.0
- MU Plugins: None activated
- Plugins:
- Hierarchical Page Navigator 1.0.1
- Test Reports 1.2.0
This is the modified function in my plugin (link to code in my previous answer), that includes all this:
<?php private function display_page_tree($parent_id, $page_children, $current_depth, $max_depth) { if ($max_depth > 0 && $current_depth > $max_depth) { return; } if (isset($page_children[$parent_id])) { echo '<ul class="page-level-' . ($current_depth - 1) . '">'; foreach ($page_children[$parent_id] as $page_id) { $page_title = get_the_title($page_id); $page_url = get_permalink($page_id); $page_the_content = get_post($page_id)->post_content; $page_time = get_the_time('Y-m-d H:i:s', $page_id); echo '<li class="page-item page-item-' . $page_id . '">'; echo '<a href="' . esc_url($page_url) . '">' . esc_html($page_title) . '</a>'; echo '<p>' . esc_html($page_the_content) . '</p>'; echo '<p>' . esc_html($page_time) . '</p>'; // Recursively display children $this->display_page_tree($page_id, $page_children, $current_depth + 1, $max_depth); echo '</li>'; } echo '</ul>'; } }
So, adding on top of all the comments in my previous messages, now I can't see the problem you are stating in this report. With the ID you have access to all the resources for each post. Forcefully bringing all the info doesn't bring any additional value except for an unnecessary performance impact as I reported in my tests.
I think someone else should come, read this conversation, test and add a 2nd-opinion.
This ticket was mentioned in Slack in #core-test by sirlouen. View the logs.
2 months ago
#36
follow-up:
↓ 37
@
2 months ago
I'll ping @joemcgill (one of the WordPress 6.8 tech leads) for a second opinion but I am very confident that he to will want the loop to work as documented and without requiring developers work around the bug.
#37
in reply to:
↑ 36
@
2 months ago
Replying to peterwilsoncc:
I'll ping @joemcgill (one of the WordPress 6.8 tech leads) for a second opinion but I am very confident that he to will want the loop to work as documented and without requiring developers work around the bug.
Have you checked my last reply(ies)?
Give me feedback, I've been doing a ton of testable code to validate my vision on why this patch should not progress
Currently, I don't even feel that there is a need of a workaround. I believe that the tests you submitted are incomplete. Tomorrow I'm going to focus on them to have a final verdict.
And I would like to understand why you think this patch is so important. Are you building some feature in your company and finding some blocker? How did this come to your mind one day out of the blue?
I would also ping @oglekler that was who brought me all these doubts about this patch.
This ticket was mentioned in PR #8495 on WordPress/wordpress-develop by @peterwilsoncc.
2 months ago
#38
- Keywords has-patch has-unit-tests added
Trac ticket: https://core.trac.wordpress.org/ticket/56992
#39
@
2 months ago
I've got some time before going to bed to review the main test (The test I believe is more conflictive).
This is my version, I have changed:
- The 3
get_the_*
with the post ID (self::$page_child_ids[0]
), as a parameter, to get the expected result. If we have this parameter available why dont we use it? Its not a workaround, its available there. - I have removed the
assertSameSetsWithIndex
because obviously, this test is only meant to validate your hyphotesis that WP_Query withid=>parent
is not fully popualted by default, but its really not testing anything.
<?php /** * Ensure that a secondary loop populates the global post completely regardless of the fields parameter. * * @ticket 56992 * * @dataProvider data_the_loop_fields * * @param string $fields Fields parameter for use in the query. */ public function test_the_loop_populates_the_global_post_completely( $fields ) { $query = new WP_Query( array( 'fields' => $fields, 'post_type' => 'page', 'page_id' => self::$page_child_ids[0], ) ); $this->assertNotEmpty( $query->posts, 'The query is expected to return results' ); // Start the loop. $query->the_post(); // Get the global post and specific post. $global_post = get_post(); $specific_post = get_post( self::$page_child_ids[0], ARRAY_A ); // Use the post ID parameter for these functions $this->assertNotEmpty( get_the_title( self::$page_child_ids[0] ), 'The title is expected to be populated.' ); $this->assertNotEmpty( get_the_content( null, false, self::$page_child_ids[0] ), 'The content is expected to be populated.' ); $this->assertNotEmpty( get_the_excerpt( self::$page_child_ids[0] ), 'The excerpt is expected to be populated.' ); // $this->assertSameSetsWithIndex( $specific_post, $global_post->to_array(), 'The global post is expected to be fully populated.' ); }
I've tested this with the 6.7.1-alpha-59396-src
and got: OK (4 tests, 16 assertions)
as expected
The rest of the tests I should take a better look, but at this point, I believe that this test alone is the main pillar of this whole report.
I also believe that priming the cache is something that anyone could be doing in their app if they find it relevant for performance purposes, and is what the other test feel to be about (checking if the number of queries match the priming fact). The last tests seem to be related with the side effects of this patch, inflicted in the editor so I think I don't need to go into that part.
This ticket was mentioned in Slack in #core-test by sirlouen. View the logs.
2 months ago
#41
@
2 months ago
- Keywords 2nd-opinion removed
@peterwilsoncc thanks for the ping. I've re-read conversation here and tested the reported bug prior to any of the changes that have been committed here, just to make sure I understand the issues trying to be addressed. Here's my current understanding of the situation:
The WP_Query
methods that are used in "the loop", specifically ::the_post()
and ::next_post()
and have long assumed that the WP_Query::post
property (when set) and the global $post variable will be hydrated with a WP_Post
object so that template tag functions like get_the_title()
, get_the_excerpt()
, get_the_content()
, etc. will work as expected.
When a custom WP_Query()
is created with fields=>'all'
or fields=>'ids'
is passed to the loop, then things work as expected, but if fields=>'id=>parent'
is used, then the correct WP_Post
data does not get properly hydrated. I agree that this seems like unexpected behavior based on the way these methods are documented. It's possible that someone is relying on this undocumented behavior, but my opinion is that it is more predictable, and less buggy to ensure that these methods work as documented.
The following unit test can be used, and manipulated to observe this issue:
<?php /** * @group test */ public function test_the_loop_when_querying_post_parents_only() { global $post; $parent = self::factory()->post->create( array( 'post_type' => 'page' ) ); $child = self::factory()->post->create( array( 'post_type' => 'page', 'post_parent' => $parent, ) ); $query = new WP_Query( array( // Manipulate this arg to see how the template tags are affected. 'fields' => 'id=>parent', 'post_type' => 'page', 'page_id' => $child, ) ); $query->the_post(); var_dump( array( 'title' => get_the_title(), 'excerpt' => get_the_excerpt(), 'content' => get_the_content(), 'date' => get_the_date(), ) ); $this->assertInstanceOf( 'WP_Post', $post ); }
As for the performance concerns that @SirLouen has raised, I agree that the normal intent of generating a WP_Query
using fields='id=>parent'
is designed to be a more performant way to query hierarchical relationships with posts in order to avoid needing to traverse the entire posts table. Even so, if that query is then passed into a loop, users should be able to expect that the post data is hydrated with the full post data. There are already many places where we have already put in place cache priming to ensure that the full data for posts and parent posts are loaded efficiently in a single DB query (example 1, example 2). If the changes made to resolve the bug can be made more efficient by priming caches in a different way, then I think that's worth handling in a follow-up ticket if needed.
As a side note, @SirLouen I appreciate the energy that you've put into reviewing and testing this functionality at a very deep level. However, some of your comments in this conversation have come across as accusatory and unproductive. @peterwilsoncc is a longstanding contributor to this project who is well respected for having good judgement and a deep knowledge of some of the application's most complex code. Asking for feedback in support of your own understanding is very welcome, but please refrain from assuming there is some ulterior motive.
#42
@
2 months ago
Replying to @joemcgill
As a side note, @SirLouen I appreciate the energy that you've put into reviewing and testing this functionality at a very deep level. However, some of your comments in this conversation have come across as accusatory and unproductive. @peterwilsoncc is a longstanding contributor to this project who is well respected for having good judgement and a deep knowledge of some of the application's most complex code. Asking for feedback in support of your own understanding is very welcome, but please refrain from assuming there is some ulterior motive.
This is a problem I've been raising lately for the test team improvement. In fact, I see it the other way around, not a bad thing to having an ulterior motive (i.e., use-case), and promoting it. Not showcasing a real-life (or somewhat real-life) use-case is what sometimes makes things difficult to understand/view, debug or test and think around the box for the Testing team and keeps judgment exclusive for (citing your words) members "with a deep knowledge of some of the application's most complex code".
What I'm a little dubious about how healthy could be for a big foss project like this, and I'm not gonna lie here, is the "you cook and you eat your code" philosophy. I was not planning to bring this here to a report, but you mentioned it. What I mean with this, is that patches are peer-reviewed between committers, without another governing body (testing team), interceding in the process. But if we want to talk more deeply about this, I think we should raise it slack maybe in either #core-test or #core channels, please ping me there and I will be happy to discuss further.
But let's go back to the issue here:
I was reading the official docs and according to them these are the expected behaviors:
https://developer.wordpress.org/reference/classes/wp_query/
When using the fields
parameter, we are going to expect a WP_Query that returns only stdClass
objects with the ID of post parent:
Return Fields Parameter Set return values. fields (string) – Which fields to return. There are three options: 'all' – Return all fields (default). 'ids' – Return an array of post IDs. 'id=>parent' – Return an array of stdClass objects with ID and post_parent properties. Passing anything else will return all fields (default) – an array of post objects.
Here you're forcing a full hydration as if all
property was set in all cases, which will make underperformant all the use-cases where id=>parent
were taking advantage (as I've been proposing in my examples).
It's possible that someone is relying on this undocumented behavior, but my opinion is that it is more predictable, and less buggy to ensure that these methods work as documented.
Now I'm asking, where is this documented? Because I've been browsing the docs for a while and I can't find traces of this.
UPDATE
I've been running several more tests with another custom plugin I've built and now I think I see where the issue is
According to the docs it says about: WP_Query::the_post()
Retrieves the next post, sets up the post, sets the 'in the loop' property to true.
The key part is picking up the "sets up the post". It all started in the WP_Query::next_post()
method, with the iterate current post index
and this has been tricking me for the whole time as a massive performance issue. I noticed that many sites were using the previous implementation, "benefiting" of this "unexpected behaviour" with the fact that since full posts were not retrieved, even after the initialization of the loop, you could just retrieve single elements like get_the_title
and get_the_permalink
for example for listings within the loop, with some little gains in performance as I showed in my tests
So we can consider that the expected behavior is that global post should be completely hydrated, despite that the WP_Query only received the stdClass with the ID for performance purposes. A very tricky scenario. I had to build all the code by myself to get this picture and compare the two scenarios. A little frustrating and particularly hard to test (without blindly accepting the proposed unit-tests, obviously).
This ticket was mentioned in Slack in #core-test by sirlouen. View the logs.
2 months ago
#44
follow-up:
↓ 45
@
2 months ago
I've put together a PR to change how it's determined whether the method needs to walk the array of posts for cache warming of posts and author caches.
Over 1000 iterations of a large loop using various fields settings, the performance regression for "all" is recovered.
Below are the average values for all versions of PHP used in the test suite recorded using MariaDB 11.4. The number of queries is for each iteration. Data for each version of PHP can be found in this gist.
Fields | 6.7 Time | 6.7 Queries | Trunk Time | Trunk Queries | Patch time | Patch Queries | Trunk - 6.7 | Patch - 6.7 | Patch - trunk |
---|---|---|---|---|---|---|---|---|---|
all | 4.364121467 | 4 | 5.077857733 | 4 | 4.390091121 | 4 | 0.7137362659 | 0.02596965432 | -0.6877666116 |
ids | 5.359244257 | 5 | 5.227709651 | 5 | 5.20280087 | 5 | -0.1315346062 | -0.1564433873 | -0.02490878105 |
id, parent | 1.43522051 | 1 | 5.324584454 | 5 | 5.331013381 | 5 | 3.889363945 | 3.895792872 | 0.006428927183 |
The performance hit for id, parent
is only present if a developer starts the loop. If they don't start the loop then it's performance shouldn't change.
The source code changes are in a PR. The tesst suite changes won't go in as I've hacked them to dump data, only run tests I needed and introduced a very flakey test to get the timings.
The code in comment 32 is a good example of the improvements this change makes, on a test install running the 6.7 code with twenty twenty one it makes 19 extra database queries than the version of the code in trunk.
#45
in reply to:
↑ 44
@
2 months ago
- Keywords dev-feedback removed
Replying to peterwilsoncc:
I've put together a PR to change how it's determined whether the method needs to walk the array of posts for cache warming of posts and author caches.
Have you found that
// Ensure a full post object is available if ( $post instanceof stdClass ) { // stdClass indicates that a partial post object was queried $post = get_post( $post->ID ); } elseif ( is_numeric( $post ) ) { // Numeric indicates that only post IDs were queried $post = get_post( $post ); }
Is somehow different from the other version published in 8418 patch?
if ( ! $post instanceof WP_Post ) { if ( 'ids' === $this->query_vars['fields'] ) { // Post IDs queried. $post = get_post( $post ); } elseif ( isset( $post->ID ) ) { // Partial object (stdClass) queried. $post = get_post( $post->ID ); } }
From what I found over my previous tests, there were only 3 options:
- WP_Post on
all
- A numeric ID, for
ids
- The stdClass for
id=>parent
.
Anyway, I think that the same in a way or another way, the improvement here is just to check for WP_Post to avoid priming cache in just that scenario.
Again, using the same implementation as the patch just before this one, the results are barely identical and excluding the array_reduce
from the ids
doesn't seem to make a difference with such test battery.
Fields | 6.7 Time | 6.7 Queries | 8495 Patch Time | 8495 Patch Queries | This Comment Patch time | This Comment Patch Queries |
---|---|---|---|---|---|---|
all | 3.2470240592956543 | 4 | 3.2879478931427 | 4 | 3.2534329891204834 | 4 |
ids | 3.8917858600616455 | 5 | 3.8529629707336426 | 5 | 3.8428380489349365 | 5 |
id, parent | 1.0311799049377441 | 1 | 3.907115936279297 | 5 | 3.9453020095825195 | 5 |
This is the function with the WP_Post
addition
<?php public function the_post() { global $post; if ( ! $this->in_the_loop ) { if ( $this->posts && $this->posts[0] instanceof WP_Post ) { $post_objects = $this->posts; } else { // Get post IDs to prime incomplete post objects $post_ids = array_reduce( $this->posts, function ( $carry, $post ) { if ( is_numeric( $post ) && $post > 0 ) { // Query for post ID $carry[] = $post; } if ( is_object( $post ) && isset( $post->ID ) ) { // Query for object, either partial WP_Post or stdClass $carry[] = $post->ID; } return $carry; }, array() ); if ( $post_ids ) { _prime_post_caches( $post_ids, $this->query_vars['update_post_term_cache'], $this->query_vars['update_post_meta_cache'] ); } $post_objects = array_map( 'get_post', $post_ids ); } update_post_author_caches( $post_objects ); } $this->in_the_loop = true; $this->before_loop = false; if ( -1 === $this->current_post ) { // Loop has just started /** * Fires once the loop is started. * * @since 2.0.0 * * @param WP_Query $query The WP_Query instance (passed by reference). */ do_action_ref_array( 'loop_start', array( &$this ) ); } $post = $this->next_post(); // Ensure a full post object is available if ( $post instanceof stdClass ) { // stdClass indicates that a partial post object was queried $post = get_post( $post->ID ); } elseif ( is_numeric( $post ) ) { // Numeric indicates that only post IDs were queried $post = get_post( $post ); } // Set up the global post object for the loop $this->setup_postdata( $post ); }
Both versions pass the query unit-tests anyway, so given that both the original version with the conditional for WP_Post
and the last in 8495 seem to be identical in performance, functionality and correctness, just choose whatever you prefer.
#46
follow-up:
↓ 47
@
2 months ago
Honestly, the second part was playing around with micro-optimisations to see if comparing a string would be faster than type checking.
The main focus has been avoiding looping through the posts in the first section when it's known their complete objects or IDs. I'll get PR#8495 back in a normal (ie, running a full, unhacked test suite).
#47
in reply to:
↑ 46
@
2 months ago
Replying to peterwilsoncc:
Honestly, the second part was playing around with micro-optimisations to see if comparing a string would be faster than type checking.
Ok, now I see at this point that you were trying to micro-optimize the all
field in exchange for the other variants.
Yesterday I was thinking: If this has been like this for a good time, as I said, performance impact could occur on those who were not needed to fully hydrate the global post as I showed in my example.
As @joemcgill pointed out, I completely overlooked the fact that "by default" the global post is always expected to be hydrated.
But here is the catch: what if we introduced a new parameter in the_post
, that permits overriding this whole behavior on demand? This could return the performance impact for those who wanted to keep it as-is and keep it compliant with my examples.
#48
follow-up:
↓ 49
@
2 months ago
Yesterday I was thinking: If this has been like this for a good time, as I said, performance impact could occur on those who were not needed to fully hydrate the global post as I showed in my example.
With the changes in the WIP pull request, the performance impact is only affecting starting the loop for queries that are initiated with a partial data set, ie id, parent
. This is expected as the extra database queries are what fixes the bug.
I don't think there is any benefit of adding a new parameter to bypass the bug fix as the partial data remains available via the posts property. The change in this PR makes the loop consistent with how it now works with id
and all
.
#49
in reply to:
↑ 48
@
2 months ago
Replying to peterwilsoncc:
I don't think there is any benefit of adding a new parameter to bypass the bug fix as the partial data remains available via the posts property. The change in this PR makes the loop consistent with how it now works with
id
andall
.
If you check your comment 44 or my comment 45, in all the performance results, all
is outperforming both ids
and id=>parent
and I can't be indifferent to this.
On the paper, everything looks right. But in my mind, both ids
and id=>parent
should have been the outperforming options (because of their optimal nature) but it's not the case, doing one obvious extra query each one.
For example, back in the day: https://core.trac.wordpress.org/ticket/56948#comment:6
People still think that ids
is the performing option
And also its interesting to see that you play around the idea of id=>parent
back in the day https://core.trac.wordpress.org/ticket/56948#comment:12
But it did not progress (By reading it, I'm wondering why)
So my question is: Why not simply, and always, querying the entire posts if you we are ultimately planning to call the_post
?
Maybe a _doing_it_wrong
could have been interesting in this case?
#50
follow-up:
↓ 51
@
2 months ago
@SirLouen You've asked for a second opinion, received a second opinion and continue to prosecute this. I understand your argument but both Joe and I are in agreement that if a user starts the loop, the loop should work as intended.
#56948 made that happen for IDs, and this issue was logged to make that happen for id, parent
.
Calling WP_Query
with a request for partial fields WILL result in a performance issue if you subsequently start the loop. That's the fix. No one is recommending that developers set up the loop without all
fields, but we have decided that we need to accommodate the situation.
What I am happy to continue working on and discussing is this: the original commit impacted fields => all
when starting the loop and this ought to be avoided. There is a work in progress PR for this fix, and once I'm happy it's working as intended I'll mark it ready for review and get in touch with some other committers to review it.
#51
in reply to:
↑ 50
@
2 months ago
Replying to peterwilsoncc:
@SirLouen You've asked for a second opinion, received a second opinion and continue to prosecute this. I understand your argument but both Joe and I are in agreement that if a user starts the loop, the loop should work as intended.
What I'm saying is the other way around: Explicitly encouraging developers on using all
because of the clear performance impact (or discouraging on not using all
however you prefer)
#52
@
2 months ago
@SirLouen passing something other than "all" to fields
is used for different purposes than to set up a loop. However, there may be cases where someone has performed a query in that way and then subsequently passed off the query to a loop. I'm not in support of throwing a _doing_it_wrong
warning in this case, as the software cannot know the intent the developer had in doing things this way and throwing a warning would be presumptive, at best.
The next_post function assumes a WP_Post object, which isn't always the case. When a standard object with ID and parent is provided, setup_postdata creates an empty WP_Post object with just the ID.
This PR resolves the issue by passing only the post ID from next_post when the object is not a WP_Post.
Trac ticket: https://core.trac.wordpress.org/ticket/56992