Make WordPress Core

Opened 2 years ago

Last modified 11 months ago

#55463 assigned enhancement

Add a limit to _prime_*_cache functions

Reported by: spacedmonkey's profile spacedmonkey Owned by: spacedmonkey's profile spacedmonkey
Milestone: Future Release Priority: normal
Severity: normal Version:
Component: Cache API Keywords: good-first-bug has-patch 2nd-opinion
Focuses: performance Cc:

Description

The _prime_*_cache functions, like _prime_post_cache do a simple query to prime caches. This does a IN request to get all the objects. As we know the number of objects we wish returned, as have an array, we can add a simple LIMIT to the query to improve the performance of the query.

Example:

$fresh_posts = $wpdb->get_results( sprintf( "SELECT $wpdb->posts.* FROM $wpdb->posts WHERE ID IN (%s) LIMIT %d", implode( ',', $non_cached_ids ), count( $non_cached_ids ) ) );

Attachments (6)

Screenshot 2022-03-25 at 12.19.15.png (199.3 KB) - added by spacedmonkey 2 years ago.
Before change
Screenshot 2022-03-25 at 12.20.14.png (221.5 KB) - added by spacedmonkey 2 years ago.
After change
55463.diff (4.3 KB) - added by elifvish 2 years ago.
Screenshot 2022-04-12 at 11.52.23.png (17.9 KB) - added by spacedmonkey 2 years ago.
After patch
Screenshot 2022-04-12 at 11.53.17.png (16.6 KB) - added by spacedmonkey 2 years ago.
Before patch
optimizer_trace.png (62.6 KB) - added by soulseekah 11 months ago.

Download all attachments as: .zip

Change History (15)

@elifvish
2 years ago

#1 @elifvish
2 years ago

  • Keywords has-patch added; needs-patch removed

#2 @spacedmonkey
2 years ago

  • Keywords commit added
  • Milestone changed from Awaiting Review to 6.0
  • Owner set to spacedmonkey
  • Status changed from new to assigned

Patch looks good.

#3 @peterwilsoncc
2 years ago

Are you able to run a test looping this 1000 times and making sure the query is indeed quicker?

I can see the speed difference in your local testing but that could just be random variations. It seems odd to me that the query wouldn't be optimised by the server given the ID is a primary unique key.

Probably best to check with 10 IDs as that's typical of a front end request.

#4 @spacedmonkey
2 years ago

@peterwilsoncc Good call out.

I wrote a little test script. The site, has 250 pages. Using a get_posts to get 20 pages. See attached screenshots.

The reasons are pretty clear cut.

The reason this is faster is because by adding a limit, it means when the query runs and the criteria of the query and limit is met, the query can finish. So if you are looking for 20 ids and limit by 20 when reach 20 rows, the query finishes. Without a limit, the database keeps scanning the rows.

#5 @peterwilsoncc
2 years ago

I'm not seeing any benefit here. (Sorry, I couldn't quite read the images you shared.)

Running 10,000 queries of the new query vs the old query isn't showing any difference to the second or third decimal place.

Patched Unpatched
2.602267742157 2.5758287906647
2.6104681491852 2.6174831390381
3.0410692691803 3.05641913414
2.5572915077209 2.5297293663025
2.6810057163239 2.6585900783539
2.4571781158447 2.4542315006256

This is just using the changed $wpdb->get_results() call with the relative queries.

The test code I was using is available as a gist, ignore the reference to caching -- obviously this isn't an issue with direct database queries -- it was in there for testing an earlier ticket.

The database has 1100 test posts in it, I was getting the indexes 35-44 for some, 1035-1044 for other test runs.

#6 follow-up: @spacedmonkey
2 years ago

In my tests of 1000 requests, the difference was.

Before - 0.379
After - 0.366

I am also seeing improves on single requests.

Before - 0.0005
After - 0.0004

I saw similar results in multiple test runs. The improve is not as much as I hoped but it something.

Limits normally do improve the performance a query, as they allow queries to stop early. But looking at this thread, it seems that the improve only comes when query is against a non indexed field.

TLDR, This patch is an improvement, but it not much as hoped.

#7 in reply to: ↑ 6 @peterwilsoncc
2 years ago

  • Keywords commit removed

Replying to spacedmonkey:

I saw similar results in multiple test runs. The improve is not as much as I hoped but it something.

Limits normally do improve the performance a query, as they allow queries to stop early. But looking at this thread, it seems that the improve only comes when query is against a non indexed field.

I suspect this is why my tests were sometime showing the patched version as faster, sometimes the unpatched version.

I'm inclined to leave it out until you and I are both sure there is a gain. Mainly for the sake of code clarity.

#8 @spacedmonkey
2 years ago

  • Milestone changed from 6.0 to Future Release

Moving to future release unlimit we can decide if this should go ahead.

#9 @soulseekah
11 months ago

  • Keywords 2nd-opinion added

I was unable to reproduce any difference between having a limit and not having a limit whatsoever.

The query planner (EXPLAIN) and the query trace (information_schema.OPTIMIZER_TRACE) are absolutely the same, apart from the rows_estimation step in the join_optimization, where the range_analysis table_scan estimated cost is actually higher (even though the range scan is not performed) for a query with a LIMIT clause.

See screenshot above.

Any benchmarks done using mictrotime inside PHP itself will vary wildly depending on kernel scheduling (other tasks running), if you run the two versions over 6-12 hours on an idle machine (preferably bare-metal hardware, as VPS's tend to be noisy as well) the results should converge.

Note: See TracTickets for help on using tickets.