WordPress.org

Make WordPress Core

Opened 5 years ago

Last modified 10 months ago

#19879 new enhancement

Better caching for get_dirsize

Reported by: batmoo Owned by:
Milestone: Future Release Priority: normal
Severity: normal Version: 3.3.1
Component: Filesystem API Keywords: has-patch dev-feedback needs-unit-tests
Focuses: multisite, performance Cc:

Description

In a multisite install, when trying to determine whether a site has exceeded its storage quota, WordPress will scan through a blog's upload directory and sum up the file sizes, by running filesize against each one. With a large number of files, this can significantly slow down the upload process or certain portions of the Dashboard.

get_dirsize has transient caching in place but this is a single cache entry for all folders. It might be better if WordPress has a separate cache entry for each folder and was invalidated based on context so that get_dirsize does not need to be run constantly on older, unchanged directories as frequently.

Attachments (1)

19879.diff (5.0 KB) - added by A5hleyRich 10 months ago.

Download all attachments as: .zip

Change History (19)

#1 @DJPaul
5 years ago

  • Cc djpaul@… added

#2 @danielbachhuber
5 years ago

  • Cc wordpress@… added

#3 @DJPaul
5 years ago

We worked around this bug by making sure that upload_space_check_disabled was set; this resulted in an immediate speed-up.

#4 in reply to: ↑ description @kurtpayne
5 years ago

  • Cc kpayne@… added
  • Type changed from defect (bug) to enhancement

My first thought would be to change recurse_dirsize() like so:

} elseif (is_dir($path)) {
	$handlesize = recurse_dirsize( $path );
	if ($handlesize > 0)
		$size += $handlesize;
}

to:

} elseif (is_dir($path)) {
	$handlesize = get_dirsize( $path );
	if ($handlesize > 0)
		$size += $handlesize;
}

This change would check the cache as it's doing the recursive iteration. This is very optimistic, though. Cache misses would result in a database SELECT and UPDATE/INSERT. For most sites, this change would probably slow things down. Further, this could result in a lot of extra rows in the db that wouldn't serve a useful purpose.

#5 @jamescollins
5 years ago

See http://mu.trac.wordpress.org/ticket/1175 for the origins of the get_dirsize() caching.

#6 @nacin
5 years ago

In [19747]:

Remove copy-pasted comment from upload_is_user_over_quota()'s phpdoc. It came from recurse_dirsize(). see #19879.

#8 follow-ups: @jeremyfelt
3 years ago

  • Keywords 2nd-opinion close added

I think (???) this is a non issue now.

In [21387], get_space_used() was introduced to allow for an alternate method of checking used space to work via filter. I believe this takes care of the original issue.

When that was introduced, it started passing the directory to check via wp_upload_dir() rather than BLOGUPLOADDIR, which theoretically should mean nice things for checking directory sizes in smaller pieces.

Most likely worksforme, but further input is welcome.

#9 in reply to: ↑ 8 @SergeyBiryukov
3 years ago

Replying to jeremyfelt:

When that was introduced, it started passing the directory to check via wp_upload_dir() rather than BLOGUPLOADDIR

Related: [21823] (first I tried to find that in the original changeset, [21387]).

#10 in reply to: ↑ 8 @nacin
3 years ago

Replying to jeremyfelt:

In [21387], get_space_used() was introduced to allow for an alternate method of checking used space to work via filter. I believe this takes care of the original issue.

While it allows someone to do their own caching, this is still a bit lame.

One thing we could do would be to remove the timeout. The dirsize_cache transient is invalidated in wp_handle_upload() (though not wp_handle_sideload()) and wp_delete_attachment(). If we're confident in this, we should be able to use the transient as long as it exists.

It doesn't help with repeated invalidations followed by repeated recursive directory searches, but it'll at least lessen the number of times the directory is recursed for no reason at all.

When that was introduced, it started passing the directory to check via wp_upload_dir() rather than BLOGUPLOADDIR, which theoretically should mean nice things for checking directory sizes in smaller pieces.

BLOGUPLOADDIR contained a path to a site's individual /files/ folder, rather than all sites. So no change here.

#11 @wpdavis
3 years ago

Related: #26135

#12 @jeremyfelt
3 years ago

  • Keywords close removed
  • Milestone changed from Awaiting Review to Future Release

We should see what we can do with this in 3.9.

#13 @jeremyfelt
3 years ago

  • Component changed from Multisite to Filesystem
  • Focuses multisite added

#14 @jeremyfelt
3 years ago

  • Focuses performance added

#15 @chriscct7
12 months ago

  • Keywords needs-patch added; 2nd-opinion removed

This ticket was mentioned in Slack in #feature-respimg by jeremyfelt. View the logs.


12 months ago

@A5hleyRich
10 months ago

#17 @A5hleyRich
10 months ago

Supplied patch removes transient timeout as suggested by @nacin. However, instead of purging the entire cache the dirsize_cache transient now caches each directory like so:

<?php
Array
(
    [wp-content/uploads/2013/09] => 2082994
    [wp-content/uploads/2013] => 2082994
    [wp-content/uploads/2014/01] => 2082994
    [wp-content/uploads/2014] => 2082994
    [wp-content/uploads/2015/04] => 486873
    [wp-content/uploads/2015/10] => 2919469
    [wp-content/uploads/2015/11] => 34562447
    [wp-content/uploads/2015] => 37968789
    [wp-content/uploads] => 42142973
)

When wp_handle_upload() or wp_delete_attachment() are called only the directories that are effected are purged. For example if we upload wp-content/uploads/2015/11/test.png the _transient_dirsize_cache becomes:

<?php
Array
(
    [wp-content/uploads/2013/09] => 2082994
    [wp-content/uploads/2013] => 2082994
    [wp-content/uploads/2014/01] => 2082994
    [wp-content/uploads/2014] => 2082994
    [wp-content/uploads/2015/04] => 486873
    [wp-content/uploads/2015/10] => 2919469
)

This ensures that only the changed directories are checked when get_dirsize is called. Still needs unit tests, but wanted to get feedback before doing any more work on this patch.

#18 @A5hleyRich
10 months ago

  • Keywords has-patch dev-feedback needs-unit-tests added; needs-patch removed
Note: See TracTickets for help on using tickets.