Make WordPress Core

Opened 17 months ago

Closed 14 months ago

Last modified 13 months ago

#58599 closed defect (bug) (fixed)

WP_Query->get_posts: cache updated also when query is set to be not cacheable

Reported by: saulirajala's profile saulirajala Owned by: spacedmonkey's profile spacedmonkey
Milestone: 6.4 Priority: normal
Severity: normal Version: 6.2
Component: Query Keywords: has-patch has-unit-tests needs-dev-note
Focuses: performance Cc:

Description

Seems that there is a bug in if-statement in class-wp-query.php get_posts() function: https://github.com/WordPress/wordpress-develop/blob/trunk/src/wp-includes/class-wp-query.php#L3520-L3522

Can't see any reason why this if-statement doesn't have also && $id_query_is_cacheable-check like earlier if ( $q['cache_results'] ) -checks have in the same function.

This can cause weird problems especially, when you have activated some persistent object cache solution in your site and you have modified the $fields variable for example via posts_fields_request -hook.. I have tested only with wp-redis, but would imagine this is an issue also with other persistent object cache solutions and maybe even without one.

### How to reproduce the issue?

  1. Fresh WP 6.2.2 and twentytwentythree theme installed
  2. Install and configure wp-redis (1.4.2)
  3. Add following php-snippet (for example as mu-plugin)
<?php
$args = [
        'post_type'        => 'page',
        'post_status'      => 'publish',
        'posts_per_page'   => 1,
        'suppress_filters' => false,
];

add_filter( 'posts_fields_request', 'foo_allowed_fields', 20, 2 );
$pages = get_posts( $args );
remove_filter( 'posts_fields_request', 'foo_allowed_fields', 20 );

function foo_allowed_fields( $fields, $wp_query ) {
        global $wpdb;


        $allowed_fields = [
                'ID',
                'post_title',
                'post_name',
                'post_parent',
                'post_type',
                'guid',
                'menu_order',
        ];

        $new_fields = '';
        foreach ( $allowed_fields as $i => $field ) {
                $new_fields .= "{$wpdb->prefix}posts.{$field}";

                if ( $i + 1 < count( $allowed_fields ) ) {
                        $new_fields .= ', ';
                }
        }

        return $new_fields;
}

  1. Run wp cache flush
  2. Load the webpage couple of times. Some weird things happens. For example I see Comments form in frontpage even though I have deactivated comments feature in the site. In our other site permalink structure brake causing multiple 404 responses.

Attachments (1)

58599.diff (541 bytes) - added by saulirajala 17 months ago.

Download all attachments as: .zip

Change History (27)

This ticket was mentioned in PR #4675 on WordPress/wordpress-develop by @saulirajala.


17 months ago
#1

  • Keywords has-patch added

@saulirajala
17 months ago

#2 @spacedmonkey
17 months ago

Reviewing this ticket and the patch. It feels like a breaking change. I am not sure this ticket could ever get committed. I could be talked out of this.

#3 @saulirajala
16 months ago

Hmm, what would this break? What is the use case where we would want the query to be uncacheable in all the other if-statements in the get_posts()-function, but in the last one we want it to be cacheable? Isn't the code comment in rows 3138-3149 "Random queries are expected to have unpredictable results and cannot be cached…If $fields has been modified by the posts_fields, posts_fields_request, post_clauses or posts_clauses_request filters, then caching is disabled to prevent caching collisions." true also in the last if-statement? If query is set to be uncacheable, then it should be treated as one consistently throughout the function. And thus update_post_cache() shouldn't be executed, when $id_query_is_cacheable is false.

I understand that the situation where I am, can be a rare edge case (using persistent object cache and modifying the query with posts_fields_request-hook). But I would still argue that this patch is a fix to a bug and not a breaking change. To my knowledge the breaking change was introduced in r55035. $this->posts has wrong field values for posts when we have modified the query via posts_fields_request -hook. For example, if the foo_allowed_fields -hook function (mentioned above) is enabled, $this->posts would have posts that has default values of `WP_Post` object for fields like post_date, ping_status, comment_status, post_modified and post_password. This can cause a serious problems on the site using persistent object cache since the wrong values are saved persistently for each post. Now that I think about this, this can even lead to a somekind of security issue, since for post that are password protect can now have the value in the persistent object cache, where post_password is empty string and thus the page will not be password protected.

#4 follow-up: @peterwilsoncc
16 months ago

  • Version changed from 6.2.2 to 6.2

I think this is a bug introduced in [55035] / git 5a497225b2869 for #57373.

Prior to the change, the IDs would be used to prime the cache and an additional query made to the database. After the change the returned post data was used to prime the cache.

To ensure the cache contains all the expected data when the fields are modified, I think the code would need to be something like this:

<?php
if ( $q['cache_results'] ) {
        if ( $id_query_is_cacheable ) {
                update_post_caches( $this->posts, $post_type, $q['update_post_term_cache'], $q['update_post_meta_cache'] );
        } else {
                $post_ids = wp_list_pluck( $this->posts, 'ID' );
                _prime_post_caches( $post_ids, $q['update_post_term_cache'], $q['update_post_meta_cache'] );
        }
}

#5 in reply to: ↑ 4 ; follow-up: @saulirajala
16 months ago

Replying to peterwilsoncc:

To ensure the cache contains all the expected data when the fields are modified, I think the code would need to be something like this:

<?php
if ( $q['cache_results'] ) {
        if ( $id_query_is_cacheable ) {
                update_post_caches( $this->posts, $post_type, $q['update_post_term_cache'], $q['update_post_meta_cache'] );
        } else {
                $post_ids = wp_list_pluck( $this->posts, 'ID' );
                _prime_post_caches( $post_ids, $q['update_post_term_cache'], $q['update_post_meta_cache'] );
        }
}

Hmm, you are probably right. That change would at least preserve the old behaviour (before r33035) for non-cacheable queries.

I still find it weird, that the query we explicitly set as non-cacheable, is cached after all. It seems to be an contradictory behaviour potentially leading to more nasty cache issues.

But what do you say @spacedmonkey: Should I update the patch according to the suggestion by @peterwilsoncc?

#6 in reply to: ↑ 5 @peterwilsoncc
16 months ago

Replying to saulirajala:

I still find it weird, that the query we explicitly set as non-cacheable, is cached after all. It seems to be an contradictory behaviour potentially leading to more nasty cache issues.

I agree it's a little weird but there's some history to it. cache_results was only ever meant to refer to the main query, the one that determines which post IDs to get for the loop, so it would be better named cache_main_query_results. As it had been around for a while (but not doing anything) we were kind of stuck with it. ("we" being those of us that worked on #22176.)

Individual post objects had been cached regardless of the setting for a very long time.

#8 @mukesh27
16 months ago

  • Milestone changed from Awaiting Review to Future Release

This ticket was discussed in the recent Performance bug scrub.

moving to Future Release for consideration and waiting further feedback from @peterwilsoncc and @spacedmonkey.

Additional props to @joemcgill

#9 follow-up: @spacedmonkey
16 months ago

So looking into this issue, I have some thoughts.

This issue seems unrelated to the changes to the WP_Query that have happened in the last couple of releases. The problem I see is, if you filter the fields that are requested in the database call, these fields are then passed to the get_post and then cached in object caching. This means if you do not request the post content for example, the post object is created and saved into object cache without the post content. This is a problem, as the sites with and without preseiant object cache would have incorrect value saved in the object cache.

Looking at the code, I think a change like this might fix the detailed issue, to ensure that these values are not stored in cache. Also want to ensure that meta and term are still primed and no new database queries are added.

   if ( $this->posts ) {
        $this->post_count = count( $this->posts );

        /** @var WP_Post[] */
        $this->posts = array_map( 'get_post', $this->posts );

        if ( $q['cache_results'] && $id_query_is_cacheable ) {
                update_post_cache( $this->posts );
        }

        $post_ids = wp_list_pluck( $this->posts, 'ID' );

        if ( $q['update_post_meta_cache'] ) {
                update_postmeta_cache( $post_ids );
        }

        if ( $q['update_post_term_cache'] ) {
                $post_types = array_map( 'get_post_type', $this->posts );
                $post_types = array_unique( $post_types );
                update_object_term_cache( $post_ids, $post_types );
        }

This code snippet also fixes a long standing issue, IMO where passing cache_results is passed and term and meta caches were not primed.

What do you think of this code snippet @peterwilsoncc @saulirajala ?

#10 in reply to: ↑ 9 @saulirajala
15 months ago

  • Version changed from 6.2 to 6.2.2

Replying to spacedmonkey:

What do you think of this code snippet @peterwilsoncc @saulirajala ?

To my mind this seems to be a good solution. It fixes the original issue I was having with persisted object cache.

#11 @spacedmonkey
15 months ago

  • Milestone changed from Future Release to 6.4
  • Owner set to spacedmonkey
  • Status changed from new to assigned
  • Version changed from 6.2.2 to 6.2

This ticket was mentioned in PR #5134 on WordPress/wordpress-develop by @spacedmonkey.


14 months ago
#12

  • Keywords has-unit-tests added

#13 @spacedmonkey
14 months ago

PR is ready for review @mukesh27 @peterwilsoncc .

This ticket was mentioned in Slack in #core-performance by spacedmonkey. View the logs.


14 months ago

@spacedmonkey commented on PR #5134:


14 months ago
#15

@felixarntz I loved your feedback. I have actioned all of it and improved comment and tests. Ready for review. again.

This ticket was mentioned in Slack in #core-performance by spacedmonkey. View the logs.


14 months ago

#17 @spacedmonkey
14 months ago

Feedback as been actioned, awaiting feedback from @peterwilsoncc .

This ticket was mentioned in Slack in #core-performance by spacedmonkey. View the logs.


14 months ago

@spacedmonkey commented on PR #5134:


14 months ago
#20

After some great feedback from @felixarntz and @peterwilsoncc. I agree this PR has problems. I think a new approach is needed. I am going to close this PR for now, in favour of https://github.com/WordPress/wordpress-develop/pull/5251

#22 @spacedmonkey
14 months ago

  • Resolution set to fixed
  • Status changed from assigned to closed

In 56656:

Query: Improved handling of filtered queries in WP_Query.

The WP_Query class enables developers to customize queries using filters like posts_fields_request, posts_request, and the_posts, which can modify both the queried fields and retrieved post objects. In some cases with these filters, incomplete or invalid post objects lacking essential data may arise. To address this, if any of these filters are active during a query, the get_posts method now avoids caching post objects with the usual update_post_caches function call, opting for a call to _prime_post_caches instead. This may occasionally trigger new database queries to prime the post data cache. While this enhancement may result in rare additional database queries, it ensures that invalid post objects aren't cached, prioritizing data consistency and integrity in filtered query scenarios.

Props saulirajala, spacedmonkey, flixos90, mukesh27, peterwilsoncc.
Fixes #58599.

#26 @spacedmonkey
13 months ago

  • Keywords needs-dev-note added
Note: See TracTickets for help on using tickets.