WordPress.org

Make WordPress Core

Opened 3 years ago

Last modified 2 years ago

#35075 assigned defect (bug)

Comment cache ignores custom query vars

Reported by: jason_the_adams Owned by: boonebgorges
Milestone: Future Release Priority: normal
Severity: normal Version:
Component: Comments Keywords: has-patch 4.6-early
Focuses: performance Cc:

Description

It's currently very possible to add custom query vars to the get_comments function, for themes and plugins to extend. A problem occurs, however, if the function is being used and a custom query var is being used, but no standard ones are changing. This is because it caches the first query, and every subsequent query is then assumed to be the same since custom query vars are ignored.

I first thought about just using the entire query_vars array for caching, but decided there's probably a reason this wasn't done. So instead I decided it would make the most sense to make a filter for the cache keys so a plugin/theme can add its own keys that should be considered for caching.

Attachments (8)

35075.patch (800 bytes) - added by jason_the_adams 3 years ago.
35075.2.patch (1.1 KB) - added by jason_the_adams 3 years ago.
Includes inline docs for filter
35075.test.patch (1.1 KB) - added by jason_the_adams 2 years ago.
35075.3.patch (673 bytes) - added by jason_the_adams 2 years ago.
35075.4.patch (1.9 KB) - added by jason_the_adams 2 years ago.
35075.test.2.patch (3.1 KB) - added by jason_the_adams 2 years ago.
35075.diff (4.5 KB) - added by boonebgorges 2 years ago.
35075.2.diff (3.7 KB) - added by boonebgorges 2 years ago.

Download all attachments as: .zip

Change History (36)

This ticket was mentioned in Slack in #core by jasontheadams. View the logs.


3 years ago

#2 @SergeyBiryukov
3 years ago

  • Keywords needs-docs added
  • Milestone changed from Awaiting Review to 4.5

New filters need to be documented as per the documentation standards.

@jason_the_adams
3 years ago

Includes inline docs for filter

#3 @jason_the_adams
3 years ago

@SergeyBiryukov Thanks! I assume you meant the code docs. Does that suffice?

#4 follow-up: @boonebgorges
3 years ago

  • Keywords needs-docs removed

A filter for this purpose doesn't seem like the most straightforward way to address the issue. It seems to me that we ought to be using all query vars in all cases - better for the cache to be too fine-grained than too coarse-grained. That said, the current technique for constructing a cache key is used in a number of places through WP (get_terms() for example). Before we make any changes, it'd be nice for someone to find the changeset(s) where the technique was introduced, to get a better understanding of the potential ramifications of using all query vars to build the key.

#5 in reply to: ↑ 4 @SergeyBiryukov
3 years ago

Replying to boonebgorges:

it'd be nice for someone to find the changeset(s) where the technique was introduced, to get a better understanding of the potential ramifications of using all query vars to build the key.

Introduced in [7738] for taxonomies, [9511] for comments, [10070] for pages (comment was added in [23300]).

#6 @jason_the_adams
3 years ago

@SergeyBiryukov Bah! I just found it! I'm still putting the changeset in here to prove I did it too: [9511] :)

#7 @jason_the_adams
3 years ago

So since there was no reason given for limiting the cache hash to only the defaults, what about just using the entire query_var array?

$key = md5( serialize( $this->query_vars ) );

#8 follow-up: @boonebgorges
3 years ago

Given that the only identifiable annotation on any of these changesets is [7738], I'm going to say the magic word - @mdawaffe - to see if he could hold court with Mnemosyne and tell us whether the change was in the name of "efficiency", or whether it fixed an actual bug.

#9 in reply to: ↑ 8 @mdawaffe
2 years ago

Replying to boonebgorges:

Given that the only identifiable annotation on any of these changesets is [7738], I'm going to say the magic word - @mdawaffe - to see if he could hold court with Mnemosyne and tell us whether the change was in the name of "efficiency", or whether it fixed an actual bug.

Cache keys should not depend on arbitrary user entered data (only on the user entered data we allow). I'm pretty sure that, at the time, ->query_vars could get polluted by arbitrary $_GET data, which decreased cache hits and duplicated information stored in the cache.

If it's no longer the case that ->query_vars can be polluted like that (via $_GET or something else), it seems fine to remove that whitelisting. A brief look at core suggests that core does not allow arbitrary keys in anywhere. Obviously, plugins and themes may be doing strange things.

#10 @jason_the_adams
2 years ago

Seems like any protection of the ->query_vars should happen higher up the chain, not at the point of caching. I'd like to believe the ->query_vars aren't being arbitrarily injected with the $_GET superglobal. I dug around and I'm not seeing, at least for comments, anywhere that this would be happening. The query string itself is never being passed to get_comments.

Should I throw together a patch that removes the restrictions and caches the entire array for comments, taxonomies, and pages?

#11 follow-up: @boonebgorges
2 years ago

  • Keywords needs-patch added

Thanks, @mdawaffe.

Should I throw together a patch that removes the restrictions and caches the entire array for comments, taxonomies, and pages?

OK. It would also be nice to have unit tests demonstrating that the cache is sensitive to custom query vars.

#12 in reply to: ↑ 11 ; follow-up: @jason_the_adams
2 years ago

Replying to boonebgorges:

OK. It would also be nice to have unit tests demonstrating that the cache is sensitive to custom query vars.

I went to write the test and, interestingly enough, I found an "inverse" test in tests/comment/query.php: test_comment_cache_key_should_ignore_custom_params which relates to #22400. I see that @wonderboymusic added the wp_array_slice_assoc in [28460].

This tests asserts that only one query is performed when a custom query_var is introduced, when I would assert that there should be two. What's the protocol in this case? Should I modify/remove the existing test?

Last edited 2 years ago by jason_the_adams (previous) (diff)

#13 in reply to: ↑ 12 @boonebgorges
2 years ago

Replying to jason_the_adams:

Replying to boonebgorges:

OK. It would also be nice to have unit tests demonstrating that the cache is sensitive to custom query vars.

I went to write the test and, interestingly enough, I found an "inverse" test in tests/comment/query.php: test_comment_cache_key_should_ignore_custom_params which relates to #22400. I see that @wonderboymusic added the wp_array_slice_assoc in [28460].

This tests asserts that only one query is performed when a custom query_var is introduced, when I would assert that there should be two. What's the protocol in this case? Should I modify/remove the existing test?

This is why we have regression tests - to know when we are changing behavior :)

In this case, we are changing the behavior on purpose. So we should change the test too.

[28460] was related to @wonderboymusic's quest to eradicate the use of extract(). We can fix this in a way that won't reintroduce extract().

#14 @jason_the_adams
2 years ago

@boonebgorges Added a test patch that modifies the test to reflect what we're doing. It passes if you apply the 35075.3 patch (which only changes comments, so far). Thoughts?

#15 @boonebgorges
2 years ago

Thanks, @jason_the_adams - The patch looks good. For the test:

  • Don't remove the existing @ticket annotation. Just add another one for this ticket.
  • Let's change the method name to something that makes more sense. Like: test_comment_cache_key_should_not_ignore_custom_query_vars().
  • Can you please fix the indentation? :)
  • Don't bother testing $start_num + 2. This makes the test fragile. Just assertNotEquals().

With these changes, please go ahead and work up patches for the other places where we do this.

#16 @jason_the_adams
2 years ago

Thanks for the direction, @boonebgorges

Here's what we've got:

  • Since we're using assertNotEquals I'm doing two assertions. Otherwise the issue wouldn't reveal itself as the first query would already have changed the number of queries
  • Added the change and tests for the other two places
  • Indentation is now lovelified :)
Last edited 2 years ago by jason_the_adams (previous) (diff)

@boonebgorges
2 years ago

#17 @boonebgorges
2 years ago

  • Keywords has-patch added; needs-patch removed

Thanks, @jason_the_adams! 35075.diff combines the tests and code changes, and makes a few minor modifications.

I feel like this change is correct - it's better for the cache to be too fine-grained than too coarse-grained; in each of these cases, we have filters that allow the queries to be modified based on custom parameters, but the current caching schema makes them practically impossible to use reliably. But I also have a nagging feeling that I'm missing something. Could I get another opinion on it, maybe from @dd32 or @nacin ?

#18 @nacin
2 years ago

So, I'm almost positive that the reason for this code is that often these arguments are passed through from higher level functions.

See [7738] for example. This is get_terms(). Now look at WP_Terms_List_Table. The $args that go into get_terms() are actually multi-purpose. The context can be generically explained in the following example: $args is passed down the stack on purpose, so some template function accepts $args which ends up being used directly in a lower-level function to determine what to fetch, and then other $args (ignored by the lower-level function) actually define templating.

In practice, this is not a terribly smart pattern, as it allows a future API addition in the lower-level function to accidentally be stomped by a well-intentioned higher-level function that already had that argument name.

I'm struggling to find uses of this pattern, but I know it exists in many places. I'll also claim (probably incorrectly) that I've never implemented this pattern anywhere. I'll also claim (probably incorrectly) that there are some good use cases for this, such as when we *want* to automatically pick up API changes over time, since the higher-level function is little more than a wrapper.

It's not "dependency injection", it's "argument injection"...

I am *sure* there is a practical reason why each of these calls were done this way. It just might take a lot of spelunking to figure out why, to ascertain what would break, and to determine how to avoid breaking it.

#19 @nacin
2 years ago

ack 'get_posts\(\s*\$' will likely match some possible situations.

#20 follow-up: @jason_the_adams
2 years ago

@nacin To be perfectly honest, I'm having a hard time figuring out what you mean in relation to the scope of this ticket. It sounds like you're talking about the $args as a whole, within the context of the function, while this ticket relates only to the $args as relate to the caching of the query. They're not affected for the entirety of the function, but only to be serialized and hashed.

Perhaps some light on how I even stumbled on this will help:

I'm involved in a project/plugin called Piklist, which attempts to extend WordPress for enterprise-level development. In the current beta, support for object (e.g. comments, users, and posts) relationships was added. I had finished adding in support for a relate_query (modeled after meta_query) query_var for these objects, to be able to find, for example, posts that belong to another post and user (again, just an example).

I was writing the unit tests for this, and found that my 2nd get_comments weren't working. After some digging, I realized it was because it was caching the result of the first query and giving it back for the second. Reason being: the relate_query query_var wasn't recognized by the caching method. And it would not be at all unrealistic for someone to do the following:

<?php
$p1 = get_comments(array(
  'relate_query' => array(
    array(
      'relate' => 'belongs',
      'scope' => 'user',
      'id' => 3
    )
  )
));

$p2= get_comments(array(
  'relate_query' => array(
    array(
      'relate' => 'belongs',
      'scope' => 'user',
      'id' => 4
    )
  )
));

This isn't a problem with get_posts and get_users, as they cache differently. It's only the comments, terms, and pages that have this issue, which also presents inconsistency.

Hope this helps in better presenting the issue. :)

#21 in reply to: ↑ 20 @nacin
2 years ago

Replying to jason_the_adams:

@nacin To be perfectly honest, I'm having a hard time figuring out what you mean in relation to the scope of this ticket. It sounds like you're talking about the $args as a whole, within the context of the function, while this ticket relates only to the $args as relate to the caching of the query. They're not affected for the entirety of the function, but only to be serialized and hashed.

Sorry, I never bothered to close the loop there: My point is I suspect that this patch as written will suddenly cause a *lot* of queries to be cache misses.

#22 @nacin
2 years ago

Also, any object relationships should probably be leveraged using Posts 2 Posts, as that's likely what would be pulled into core if we did so (and are more likely than not going to do at some point).

#23 @jason_the_adams
2 years ago

@nacin I definitely grant you that, which is why I originally proposed this as filter to open the cache keys up to additional query_vars. In any case, it's a balance of caching enough, but not caching inappropriately. If custom query_vars are going to be supported, they can't be ignored in caching.

Also, any object relationships should probably be leveraged using Posts 2 Posts, as that's likely what would be pulled into core if we did so (and are more likely than not going to do at some point).

That's a whole other topic, which I'll refrain from going down too much, so suffice to say the Piklist team goes to great lengths to not break anything in core, while also not necessarily being limited by core. It largely attempts to draw out features in core that are largely unnoticed and extend it's capabilities. P2P is a bit limiting for our intentions, but if something like P2P was added to core, we'd make adjustments to play nicely with it as well. :)

@boonebgorges
2 years ago

#24 @boonebgorges
2 years ago

@nacin Thanks for chiming in. Your comment articulates my vague discomfort.

The underlying problem here is that we're deciding which params should be used when building the cache key by looking at the $defaults array, but custom params are not included in that array. So another strategy is to let plugin developers "register" their custom params, by passing the defaults through a filter. See 35075.2.diff.

I admit that this approach is not super straightforward from a developer's point of view. That's why 35075.2.diff adds a detailed description of the issue in the docblock of each affected function.

As long as we continue to allow arbitrary args to be passed to the function, and allow them to affect the query via various filters, I think this is about as close as we can get to building an "API" for "registering" "official" parameters. Thoughts?

#25 @jorbin
2 years ago

  • Owner set to boonebgorges
  • Status changed from new to assigned

I think this might be something that would make more sense to happen early in a release, but going to let @boonebgorges make the call if we should still try to get something in to 4.5.

#26 @boonebgorges
2 years ago

  • Keywords 4.6-early added
  • Milestone changed from 4.5 to Future Release

#27 @chriscct7
2 years ago

  • Version trunk deleted

This ticket was mentioned in Slack in #core by jasontheadams. View the logs.


2 years ago

Note: See TracTickets for help on using tickets.