#19879 closed task (blessed) (fixed)
Better caching for get_dirsize
Reported by: |
|
Owned by: |
|
---|---|---|---|
Milestone: | 5.6 | Priority: | normal |
Severity: | normal | Version: | 3.3.1 |
Component: | Filesystem API | Keywords: | has-patch has-unit-tests commit has-dev-note |
Focuses: | multisite, performance | Cc: |
Description
In a multisite install, when trying to determine whether a site has exceeded its storage quota, WordPress will scan through a blog's upload directory and sum up the file sizes, by running filesize
against each one. With a large number of files, this can significantly slow down the upload process or certain portions of the Dashboard.
get_dirsize
has transient caching in place but this is a single cache entry for all folders. It might be better if WordPress has a separate cache entry for each folder and was invalidated based on context so that get_dirsize does not need to be run constantly on older, unchanged directories as frequently.
Attachments (9)
Change History (63)
#4
in reply to:
↑ description
@
13 years ago
- Cc kpayne@… added
- Type changed from defect (bug) to enhancement
My first thought would be to change recurse_dirsize()
like so:
} elseif (is_dir($path)) { $handlesize = recurse_dirsize( $path ); if ($handlesize > 0) $size += $handlesize; }
to:
} elseif (is_dir($path)) { $handlesize = get_dirsize( $path ); if ($handlesize > 0) $size += $handlesize; }
This change would check the cache as it's doing the recursive iteration. This is very optimistic, though. Cache misses would result in a database SELECT and UPDATE/INSERT. For most sites, this change would probably slow things down. Further, this could result in a lot of extra rows in the db that wouldn't serve a useful purpose.
#5
@
13 years ago
See http://mu.trac.wordpress.org/ticket/1175 for the origins of the get_dirsize() caching.
#8
follow-ups:
↓ 9
↓ 10
@
12 years ago
- Keywords 2nd-opinion close added
I think (???) this is a non issue now.
In [21387], get_space_used()
was introduced to allow for an alternate method of checking used space to work via filter. I believe this takes care of the original issue.
When that was introduced, it started passing the directory to check via wp_upload_dir()
rather than BLOGUPLOADDIR
, which theoretically should mean nice things for checking directory sizes in smaller pieces.
Most likely worksforme, but further input is welcome.
#9
in reply to:
↑ 8
@
12 years ago
Replying to jeremyfelt:
When that was introduced, it started passing the directory to check via
wp_upload_dir()
rather thanBLOGUPLOADDIR
Related: [21823] (first I tried to find that in the original changeset, [21387]).
#10
in reply to:
↑ 8
@
12 years ago
Replying to jeremyfelt:
In [21387],
get_space_used()
was introduced to allow for an alternate method of checking used space to work via filter. I believe this takes care of the original issue.
While it allows someone to do their own caching, this is still a bit lame.
One thing we could do would be to remove the timeout. The dirsize_cache transient is invalidated in wp_handle_upload() (though not wp_handle_sideload()) and wp_delete_attachment(). If we're confident in this, we should be able to use the transient as long as it exists.
It doesn't help with repeated invalidations followed by repeated recursive directory searches, but it'll at least lessen the number of times the directory is recursed for no reason at all.
When that was introduced, it started passing the directory to check via
wp_upload_dir()
rather thanBLOGUPLOADDIR
, which theoretically should mean nice things for checking directory sizes in smaller pieces.
BLOGUPLOADDIR
contained a path to a site's individual /files/
folder, rather than all sites. So no change here.
#12
@
11 years ago
- Keywords close removed
- Milestone changed from Awaiting Review to Future Release
We should see what we can do with this in 3.9.
This ticket was mentioned in Slack in #feature-respimg by jeremyfelt. View the logs.
9 years ago
#17
@
9 years ago
Supplied patch removes transient timeout as suggested by @nacin. However, instead of purging the entire cache the dirsize_cache
transient now caches each directory like so:
<?php Array ( [wp-content/uploads/2013/09] => 2082994 [wp-content/uploads/2013] => 2082994 [wp-content/uploads/2014/01] => 2082994 [wp-content/uploads/2014] => 2082994 [wp-content/uploads/2015/04] => 486873 [wp-content/uploads/2015/10] => 2919469 [wp-content/uploads/2015/11] => 34562447 [wp-content/uploads/2015] => 37968789 [wp-content/uploads] => 42142973 )
When wp_handle_upload()
or wp_delete_attachment()
are called only the directories that are effected are purged. For example if we upload wp-content/uploads/2015/11/test.png
the _transient_dirsize_cache
becomes:
<?php Array ( [wp-content/uploads/2013/09] => 2082994 [wp-content/uploads/2013] => 2082994 [wp-content/uploads/2014/01] => 2082994 [wp-content/uploads/2014] => 2082994 [wp-content/uploads/2015/04] => 486873 [wp-content/uploads/2015/10] => 2919469 )
This ensures that only the changed directories are checked when get_dirsize
is called. Still needs unit tests, but wanted to get feedback before doing any more work on this patch.
This ticket was mentioned in Slack in #core-multisite by jeremyfelt. View the logs.
8 years ago
#20
@
4 years ago
Just wanted to push the importance of this feature.
We already replaced the default WordPress PHP calculation with OS native features ("du" on Linux) which reduced the amount of time massively last year. Sadly now we moved our uploads to NFS (aka "a slow filesystem") and the upload folder grew even more over time. So we are back to quota calculation times of >30s.
As some sites see regular uploads which resets the transient frequently throughout the day, the impact of this is massive to the WP-Admin backend, as well as to REST API calls to the /media/ endpoint.
Calculating this on a per month base and only on a "as needed" base is a perfect way to address this issue permanently! Then the calculation times should go down instantly independet of filesystem and way to measure the space used (OS tools or default PHP).
So this is a definite +1 for approaching this soon.
Anything we can do to support?
This ticket was mentioned in Slack in #core-multisite by janthiel. View the logs.
4 years ago
#22
@
4 years ago
Just FYI on how much this has an impact on the file upload:
Currently on a huge site on NFS with 25G of files the Quota processing would take >30 seconds(!) to calculate the whole site. This time is required for each upload to finish besides the actual upload processing.
Calculating only the one required folder would reduce this to <1s.
This time also adds on top to wp-admin dashboard calls if there is no cache present.
Here are some basic stats using Unix file system tools. The actual PHP quota calculation will be even worse.
-bash-4.2$ cd wp-content/uploads/sites/25/
-bash-4.2$ time du -sh ./
25G ./
real 0m31.757s
user 0m0.522s
sys 0m7.391s
-bash-4.2$ cd wp-content/uploads/sites/25/2020/09/
-bash-4.2$ time du -sh ./
7,9G ./
real 0m0.898s
user 0m0.068s
sys 0m0.250s
This ticket was mentioned in Slack in #core by thelmachido. View the logs.
4 years ago
#24
@
4 years ago
- Milestone set to 5.6
Milestoning for visibility, seems reasonable at first glance but not at all my area of expertise.
@
4 years ago
Added new filter to allow for custom dir size calculation without losing the new caching
#25
@
4 years ago
I just added another patch file. It includes one additional filter which will alow one to change the calculation of the dirsize without losing the new caching benefits.
Currently one can only use the pre_get_space_used
filter which then required to either duplicate large parts of the actual get_dirsize
and recurse_dirsize
functions or lose the benefits of the new caching.
The new filter allows for custom code to run per directory and the result will fill the dirsize_cache. Yet the previous existing filter pre_get_space_used
can still be used to implement a complete own space calculation which might or might not rely on the dirsize_cache. So there will be complete backward compatibility for those who implemented their own logic.
Particular use cases for the new filter are:
- Calling CDN APIs to get the actual usage of the requested directory
- Implement custom logic benefiting from deep integration into native operating system tools for much faster calculations (like
du
on unix)
@
4 years ago
New filter comment wrongly stated to expect the dir size in megabytes and not in bytes
@
4 years ago
"Final" patch proposition; Renamed cache invalidation method and added cache invalidation to wp_upload_bits which did not invalidate the cache before acording to the Unit Tests
@
4 years ago
Updated function comment from copied test case to what the function really does. Missed that in the first place.
This ticket was mentioned in Slack in #core-multisite by hellofromtonya. View the logs.
4 years ago
#29
@
4 years ago
This code should be backward compatible without breaking anything.
Still there are some plugins which might be affected by this. Because up until now the only way to implement any way of custom space calculation was using the pre_get_space_used
filter.
This forced one to re-implement the dirsize caching oneself which will not be compatible with the new code. If there is a custom implementation around the pre_get_space_used
filter working with the dirsize_cache
transient, this patch won't break anything but the new optimized caching would not kick in. So the worst case is that there is no performance gain. This should be well documented and communicated so that plugin authors can adapt.
If a plugin does anything around get_dirsize
, recurse_dirsize
the pre_get_space_used
filter or the dirsize_cache
transient, it might be affected and requires changes to fully leverage this optimizations. Those plugin authors should consider moving to the new calculate_current_dirsize
filter instead.
I reached out to some of the Media Offloading Plugin vendors and asked for some feedback based on their codebase.
#30
@
4 years ago
@janthiel, thanks for getting in touch.
WP Offload Media does use the pre_get_space_used
filter. It queries its own data to get size information for files that have been offloaded and removed from the server, and then adds that to the total retrieved from a call to get_dirsize
for the uploads basedir.
From the sounds of things that would not be broken by this proposed change?
Would it however cause performance issues?
#31
@
4 years ago
@ianmjones Thanks for getting back to this so quickly :-)
From what you describe, this change will only affect you in all the positive ways it should. The call to get_dirsize
will not change in any way. But the caching under the hood of it will be much more scalable for growing upload folders. So as long as you don't handle the dirsize_cache
by yourself, everything will continue without any change required on your end.
Performancewise: On "new" or small sites you will most probably not notice anything at all. But for larger ones or those running on NFS you should notice the increased performance as well.
This ticket was mentioned in Slack in #core by hellofromtonya. View the logs.
4 years ago
This ticket was mentioned in PR #608 on WordPress/wordpress-develop by helen.
4 years ago
#33
This mirrors these two patches over for Actions testing purposes:
- https://core.trac.wordpress.org/attachment/ticket/19879/19879_Better-dirsize-caching-and-invalidation.patch
- https://core.trac.wordpress.org/attachment/ticket/19879/19879_Unit_Tests_comment-fix.patch
Trac ticket: https://core.trac.wordpress.org/ticket/19879
#34
@
4 years ago
I mirrored the two latest patches onto GitHub to allow the tests to run: https://github.com/WordPress/wordpress-develop/pull/608
You'll find some PHPCS errors in there :) the tests seem to pass though. I'd like to do a little more asking around about people who use these filters but otherwise I think we can attempt this for the 5.6 beta.
This ticket was mentioned in Slack in #core by hellofromtonya. View the logs.
4 years ago
#36
@
4 years ago
- Keywords needs-testing added
Notes from today's scrub:
Per @helen:
I did not get any feedback from anybody that they’re actually interacting with any hooks in the area, so it’s probably a matter of committing and seeing if anything breaks at this point…
Per @SergeyBiryukov:
I'm not sure
normalize_dirsize_cache_path()
is needed for just two instances, I would look into usingwp_normalize_path()
and getting rid ofDIRECTORY_SEPARATOR
, as we don't use that anywhere else for similar actions. I can take a closer look and test on Windows.
#37
@
4 years ago
- Owner set to helen
- Resolution set to fixed
- Status changed from new to closed
In 49212:
#39
@
4 years ago
- Resolution fixed deleted
- Status changed from closed to reopened
- Type changed from enhancement to task (blessed)
Reopening for some documentation fixes and testing.
- I believe this should be renamed to
clean_dirsize_cache()
, for consistency with other functions of a similar purpose:clean_bookmark_cache()
clean_category_cache()
clean_comment_cache()
clean_site_details_cache()
clean_network_cache()
clean_blog_cache()
clean_post_cache()
clean_attachment_cache()
clean_object_term_cache()
clean_term_cache()
clean_taxonomy_cache()
clean_user_cache()
- As noted above, I'm not sure
normalize_dirsize_cache_path()
is needed for just two instances, seems a bit too specific for just two lines of code. tests/multisite/dirsizeCache.php
file name should match the name of the function being tested.
This ticket was mentioned in PR #686 on WordPress/wordpress-develop by helen.
4 years ago
#40
Updates per https://core.trac.wordpress.org/ticket/19879#comment:39
Trac ticket: https://core.trac.wordpress.org/ticket/19879
#41
@
4 years ago
@SergeyBiryukov I agree with all of the above, there are other instances of us doing the same PHP that normalize_dirsize_cache_path()
is doing anyway so it doesn't make sense to name it for some usage but not all. Opened https://github.com/WordPress/wordpress-develop/pull/686 for review on those changes and to make sure tests pass.
This ticket was mentioned in Slack in #core by daisyo. View the logs.
4 years ago
#49
@
4 years ago
It looks like the difference in transient structure can sometimes cause fatals on PHP 8. #51913 has details. Please add thoughts if you have any.
We worked around this bug by making sure that upload_space_check_disabled was set; this resulted in an immediate speed-up.