Opened 2 years ago
Last modified 4 weeks ago
#55463 assigned enhancement
Add a limit to _prime_*_cache functions
Reported by: | spacedmonkey | Owned by: | spacedmonkey |
---|---|---|---|
Milestone: | Future Release | Priority: | normal |
Severity: | normal | Version: | |
Component: | Cache API | Keywords: | good-first-bug has-patch 2nd-opinion |
Focuses: | performance | Cc: |
Description
The _prime_*_cache functions, like _prime_post_cache
do a simple query to prime caches. This does a IN request to get all the objects. As we know the number of objects we wish returned, as have an array, we can add a simple LIMIT to the query to improve the performance of the query.
Example:
$fresh_posts = $wpdb->get_results( sprintf( "SELECT $wpdb->posts.* FROM $wpdb->posts WHERE ID IN (%s) LIMIT %d", implode( ',', $non_cached_ids ), count( $non_cached_ids ) ) );
Attachments (6)
Change History (16)
#2
@
2 years ago
- Keywords commit added
- Milestone changed from Awaiting Review to 6.0
- Owner set to spacedmonkey
- Status changed from new to assigned
Patch looks good.
#3
@
2 years ago
Are you able to run a test looping this 1000 times and making sure the query is indeed quicker?
I can see the speed difference in your local testing but that could just be random variations. It seems odd to me that the query wouldn't be optimised by the server given the ID is a primary unique key.
Probably best to check with 10 IDs as that's typical of a front end request.
#4
@
2 years ago
@peterwilsoncc Good call out.
I wrote a little test script. The site, has 250 pages. Using a get_posts to get 20 pages. See attached screenshots.
The reasons are pretty clear cut.
The reason this is faster is because by adding a limit, it means when the query runs and the criteria of the query and limit is met, the query can finish. So if you are looking for 20 ids and limit by 20 when reach 20 rows, the query finishes. Without a limit, the database keeps scanning the rows.
#5
@
2 years ago
I'm not seeing any benefit here. (Sorry, I couldn't quite read the images you shared.)
Running 10,000 queries of the new query vs the old query isn't showing any difference to the second or third decimal place.
Patched | Unpatched |
---|---|
2.602267742157 | 2.5758287906647 |
2.6104681491852 | 2.6174831390381 |
3.0410692691803 | 3.05641913414 |
2.5572915077209 | 2.5297293663025 |
2.6810057163239 | 2.6585900783539 |
2.4571781158447 | 2.4542315006256 |
This is just using the changed $wpdb->get_results()
call with the relative queries.
The test code I was using is available as a gist, ignore the reference to caching -- obviously this isn't an issue with direct database queries -- it was in there for testing an earlier ticket.
The database has 1100 test posts in it, I was getting the indexes 35-44 for some, 1035-1044 for other test runs.
#6
follow-up:
↓ 7
@
2 years ago
In my tests of 1000 requests, the difference was.
Before - 0.379
After - 0.366
I am also seeing improves on single requests.
Before - 0.0005
After - 0.0004
I saw similar results in multiple test runs. The improve is not as much as I hoped but it something.
Limits normally do improve the performance a query, as they allow queries to stop early. But looking at this thread, it seems that the improve only comes when query is against a non indexed field.
TLDR, This patch is an improvement, but it not much as hoped.
#7
in reply to:
↑ 6
@
2 years ago
- Keywords commit removed
Replying to spacedmonkey:
I saw similar results in multiple test runs. The improve is not as much as I hoped but it something.
Limits normally do improve the performance a query, as they allow queries to stop early. But looking at this thread, it seems that the improve only comes when query is against a non indexed field.
I suspect this is why my tests were sometime showing the patched version as faster, sometimes the unpatched version.
I'm inclined to leave it out until you and I are both sure there is a gain. Mainly for the sake of code clarity.
#8
@
2 years ago
- Milestone changed from 6.0 to Future Release
Moving to future release unlimit we can decide if this should go ahead.
#9
@
14 months ago
- Keywords 2nd-opinion added
I was unable to reproduce any difference between having a limit and not having a limit whatsoever.
The query planner (EXPLAIN) and the query trace (information_schema.OPTIMIZER_TRACE) are absolutely the same, apart from the rows_estimation
step in the join_optimization
, where the range_analysis
table_scan
estimated cost is actually higher (even though the range scan is not performed) for a query with a LIMIT clause.
See screenshot above.
Any benchmarks done using mictrotime inside PHP itself will vary wildly depending on kernel scheduling (other tasks running), if you run the two versions over 6-12 hours on an idle machine (preferably bare-metal hardware, as VPS's tend to be noisy as well) the results should converge.
Before change