WordPress.org

Make WordPress Core

Opened 3 years ago

Closed 2 years ago

Last modified 2 years ago

#34359 closed defect (bug) (fixed)

Cache output of `wp_upload_dir()` to improve performance

Reported by: mikeschroder Owned by: azaozz
Milestone: 4.5 Priority: normal
Severity: normal Version:
Component: Media Keywords: needs-testing has-patch early
Focuses: multisite, performance Cc:

Description

While testing the Responsive Images support for core, we found that in some cases wp_upload_dir(), through its get_options() calls, made up approximately 1/3 of the runtime, sometimes up more than 300ms on pages that had 50+ images.

@joemcgill notes that we can reduce this almost entirely by caching the output -- at the very least per page-view, and optimally, in the object cache.

Initially chatted with @jeremyfelt about this in RespImg, and agrees with the need and seems to think the approach is sane.

Attachments (6)

34359.patch (9.3 KB) - added by azaozz 3 years ago.
34359.2.patch (1.0 KB) - added by kovshenin 2 years ago.
34359.3.patch (10.5 KB) - added by azaozz 2 years ago.
34359.4.patch (10.7 KB) - added by azaozz 2 years ago.
34359.5.patch (10.8 KB) - added by azaozz 2 years ago.
34359.6.patch (1.0 KB) - added by azaozz 2 years ago.

Download all attachments as: .zip

Change History (71)

#1 @mikeschroder
3 years ago

  • Owner set to DH-Shredder
  • Status changed from new to assigned

#3 @boonebgorges
3 years ago

+1. I've noticed the overhead in the past, too.

We should be careful to cache before filters. Lots of plugins (like BuddyPress) filter wp_upload_dir() conditionally - based on current context, current user, etc - and we need to make sure it's still possible to filter in this way.

#4 @johnbillion
3 years ago

With persistent caching we'll need to be careful not to pollute the upload dir URL where a site is using different domain names and/or different schemes for the home URL, site URL, and admin area. Tests should reflect this.

This ticket was mentioned in Slack in #core by joemcgill. View the logs.


3 years ago

#6 follow-up: @azaozz
3 years ago

Looks like wp_upload_dir() is intended to be called (mostly) in wp-admin, when saving or deleting an uploaded file, etc. It would be nice to cache all of it but that may prove difficult or bring edge cases.

As far as I see responsive images only need $uploads_dir['baseurl']. Maybe only this can be cached for now, so there is only one call to wp_upload_dir() per front-end page load.

This ticket was mentioned in Slack in #feature-respimg by joemcgill. View the logs.


3 years ago

#8 in reply to: ↑ 6 @mikeschroder
3 years ago

Replying to azaozz:

As far as I see responsive images only need $uploads_dir['baseurl']. Maybe only this can be cached for now, so there is only one call to wp_upload_dir() per front-end page load.

If we can avoid the call completely, that's likely fine for responsive images. If it's going to be cached for responsive images, though, I'd definitely prefer to see it in wp_upload_dir(), so it improves performance beyond just responsive image support.

This ticket was mentioned in Slack in #feature-respimg by joemcgill. View the logs.


3 years ago

@azaozz
3 years ago

#10 @azaozz
3 years ago

  • Keywords has-patch needs-testing needs-unit-tests added; needs-patch removed

In 34359.patch:

  • Cache the $uploads array before filtering.
  • Test different upload paths (and create upload sub-directories) only once after filtering.
  • If an error occurs when creating new upload sub-directory, cache the error separately and add it to the returned data.

#11 follow-up: @azaozz
3 years ago

Caching the output from wp_upload_dir() has to be tied to the optional $time param. When it is used, the subdir key in the return array can vary. In 34359.patch the cache key also contains the year/month portion from $time.

The cache is in the default bucket which is changed when switch_to_blog() runs.

We want to cache the $uploads array (that contains all the data) before the 'upload_dir filter. However there is a test (wp_mkdir_p( $filtered_path )) if the uploads sub-directories exist after that filter. We don't want to run this test every time as it is slow. To do that a second cache key is used to hold the paths that have been tested and the eventual error messages if creating the upload sub-directories has failed.

There are several options used in calculating the upload path and URL. Currently my only concern is that in theory caching can fail if a plugin intercepts one of these options and returns different values depending on some external factor, instead of using the upload_dir filter.

More testing especially on older multisite installs and with plugins like BuddyPress will be very appreciated.

This ticket was mentioned in Slack in #core-multisite by azaozz. View the logs.


3 years ago

This ticket was mentioned in Slack in #buddypress by azaozz. View the logs.


3 years ago

#14 in reply to: ↑ 11 @imath
3 years ago

Replying to azaozz:

More testing especially on older multisite installs and with plugins like BuddyPress will be very appreciated.

Hi @azaozz

Just tested your patch with BuddyPress trunk :

  • regular WordPress
  • WordPress Multisite (having BuddyPress activated on the network or not)
  • I've also run the BuddyPress phpunit testsuite regular/multisite.

All tests are ok and everything seems to work fine for BuddyPress uploads (avatars & cover images).

This ticket was mentioned in Slack in #core by azaozz. View the logs.


3 years ago

#16 @joemcgill
3 years ago

If/when this is committed, we should also remove _wp_upload_dir_baseurl() from wp_calculate_image_srcset() (See #34430).

#17 @mikeschroder
3 years ago

@azaozz Thanks for writing this up!

I think that this late in the release, it's unwise to push a change like this to persistent cache.

I'd still support a change that only caches it per page/WordPress-load, in a static variable, or otherwise, but think we should be careful when it comes to persistent cache.

One things that stands out at first is that, unless I'm missing it, it doesn't seem to cache per blog ID.

This ticket was mentioned in Slack in #core by wonderboymusic. View the logs.


3 years ago

#19 @mikeschroder
3 years ago

  • Keywords needs-patch added; has-patch removed
  • Milestone changed from 4.4 to Future Release

As I didn't have time to write up/test an alternate approach and it's RC1 time, thinking the best thing to do here is to punt to the next release, and commit early next cycle with a complete solution, like the one proposed by @azaozz.

I really like this idea and ticket, and still want to see it happen.

#20 @azaozz
3 years ago

  • Keywords needs-patch removed

...it doesn't seem to cache per blog ID.

It adds the cache to the default "bucket" which is keyed by blog ID. The same happens in the persistent cache as far as I know.

This function is actually (a bit) safer than _wp_upload_dir_baseurl() as the filter is run every time on the pre-cached data. I know it sounds weird to store some image sub-sizes in one location and other in another location, but there might be plugins doing that and depending on the upload_dir filter to select the proper path/URL.

Thinking it is better to commit this patch and remove _wp_upload_dir_baseurl() completely. Can add a simple way to disable the caching for eventual edge cases on multisite installs.

Last edited 3 years ago by azaozz (previous) (diff)

#21 @azaozz
3 years ago

  • Keywords has-patch added

@kovshenin
2 years ago

#22 @kovshenin
2 years ago

A slightly different approach in 34359.2.patch, non-persistent caching, much like _wp_upload_dir_baseurl().

@azaozz
2 years ago

#23 @azaozz
2 years ago

  • Keywords early added; needs-unit-tests removed
  • Milestone changed from Future Release to 4.5

In 34359.3.patch:

  • Non-persistent cache like in 34359.2.patch.
  • Cache the data before the upload_dir filter and run the filter every time on the cached data.
  • Use separate static var for caching the tested paths. This means we will usually run wp_mkdir_p() only once per load.
  • Introduce wp_get_upload_dir() for "light weight" use, when a file is not being uploaded.
  • Add cache refresh option to wp_upload_dir() and use it in the tests.
  • Fix the test a bit: consolidate multiple calls to gmstrftime().
Last edited 2 years ago by azaozz (previous) (diff)

@azaozz
2 years ago

#24 follow-up: @azaozz
2 years ago

34359.4.patch is identical to 34359.3.patch except the tested paths are stored in persistent cache. We are testing absolute paths, no point to repeat the same test on every page load, the result is not going to change :)

#25 in reply to: ↑ 24 ; follow-up: @mikeschroder
2 years ago

Replying to azaozz:

34359.4.patch is identical to 34359.3.patch except the tested paths are stored in persistent cache. We are testing absolute paths, no point to repeat the same test on every page load, the result is not going to change :)

I could see this happening if WordPress is scaled horizontally, and initial saves happen on whatever local web machine, or localized network storage might be used for that particular part of the cluster, but still using a shared pool of object cache. In practice, I don't know how common this is, but it's certainly a configuration possibility.

#26 in reply to: ↑ 25 @azaozz
2 years ago

Replying to mikeschroder:

Was thinking more about avoiding the wp_mkdir_p() call on (very) busy sites with persistent cache. That seems to be the slowest part of wp_upload_dir() and is needed only once a month when uploading a file.

We have to keep calling it so we are 100% backwards compatible. If a theme uses wp_upload_dir() on the front end on a busy site, or if there is a gallery in the post, it is called many thousands of times every day for no good reason :)

Last edited 2 years ago by azaozz (previous) (diff)

This ticket was mentioned in Slack in #core by azaozz. View the logs.


2 years ago

This ticket was mentioned in Slack in #core-images by joemcgill. View the logs.


2 years ago

This ticket was mentioned in Slack in #core by joemcgill. View the logs.


2 years ago

#30 @markoheijnen
2 years ago

I'm still mixed about using a (persistent) cache solution but I guess there is no better way.

#31 @azaozz
2 years ago

The persistent cache is only used to store absolute paths that have been tested (or created) with wp_mkdir_p(). On multisite these are still stored in each site's bucket.

Keep in mind that in this context wp_mkdir_p() needs to run only once per month to create the next /year/month sub-dirs. The rest of the time (thousands or millions of runs) it is useless :)

#32 @markoheijnen
2 years ago

I get that. It's a shame we can't move wp_mkdir_p() away. I guess we can't get away with it to only call it in the admin?

#33 @azaozz
2 years ago

Front-end uploads? Not common but there are few plugins that use them.

We can try using is_user_logged_in() and change the default $create_dir = false. That will speed it up a little, not sure how significant.

#34 follow-up: @azaozz
2 years ago

Thinking more about this: doing $create_dir = is_user_logged_in() is a nice optimization especially for sites without persistent cache. It's worth adding it.

Can't think of any cases where a not-logged-in user will need to check/create the uploads sub-dirs.

@azaozz
2 years ago

#35 @azaozz
2 years ago

34359.5.patch is the same as 34359.4.patch except it uses is_user_logged_in() to determine whether to run wp_mkdir_p() by default.

#36 @markoheijnen
2 years ago

is_user_logged_in() is a nice one. We could change the default value of $create_dir to null. (currently PHPDocs also needs a change). Also if I'm correct, refresh_cache is pure for unit testing right?

#37 in reply to: ↑ 34 ; follow-ups: @mikeschroder
2 years ago

Replying to azaozz:

Thinking more about this: doing $create_dir = is_user_logged_in() is a nice optimization especially for sites without persistent cache. It's worth adding it.

Can't think of any cases where a not-logged-in user will need to check/create the uploads sub-dirs.

So, I think this is a good idea. One thought, though: Would contact/support forms be affected by that, for things like attaching documents? (also perhaps anonymous image/video uploads, although probably less frequently)

This ticket was mentioned in Slack in #core-images by joemcgill. View the logs.


2 years ago

#39 in reply to: ↑ 37 @markoheijnen
2 years ago

Replying to mikeschroder:

Replying to azaozz:

Thinking more about this: doing $create_dir = is_user_logged_in() is a nice optimization especially for sites without persistent cache. It's worth adding it.

Can't think of any cases where a not-logged-in user will need to check/create the uploads sub-dirs.

So, I think this is a good idea. One thought, though: Would contact/support forms be affected by that, for things like attaching documents? (also perhaps anonymous image/video uploads, although probably less frequently)

Plugins I have seen only using the base and upload it into their own directory.

#40 in reply to: ↑ 37 @azaozz
2 years ago

Replying to mikeschroder:

Would contact/support forms be affected by that, for things like attaching documents? (also perhaps anonymous image/video uploads, although probably less frequently)

Right, it is safer to always check/create the uploads dir, even when the user is not logged in.

In this case 34359.4.patch is ready to go. After that we can change all calls to wp_upload_dir() that are not in uploading context to use wp_get_upload_dir().

#41 @jeremyfelt
2 years ago

I've had 34359.4.patch in production (multisite, > 1000 sites) for a couple days now without any complaints or noticeable issues.

#42 follow-up: @mikeschroder
2 years ago

Ran tests on a box with Memcached with 34359.4.patch; didn't notice issues.

However, have this failing unit test:

Tests_Post_Attachments::test_wp_get_attachment_url_should_force_https_when_administering_over_https_and_siteurl_is_https
Failed asserting that two strings are identical.
--- Expected
+++ Actual
@@ @@
-https://example.org/wp-content/uploads/2016/02/test-image-1.jpg
+http://example.org/wp-content/uploads/2016/02/test-image-1.jpg

/vagrant/www/wordpress-develop/tests/phpunit/tests/post/attachments.php:467

#43 in reply to: ↑ 42 ; follow-ups: @joemcgill
2 years ago

Replying to mikeschroder:

However, have this failing unit test:

I believe this may a flaw in our unit tests for #15928 and #32112. Essentially, we're changing the scheme of the 'site_url' during the course of the tests in order to check the URL created by wp_get_attachment_image() in different contexts. However, caching the uploads directory is effectively negating those changes. We should probably rewrite those tests so we filter the output of wp_uploads() instead of changing the site_url() option during the same set of operations.

However, I wonder if anyone is filtering the site URL upstream from wp_uploads() based on some context (e.g. is_admin(), etc.), because that will no longer take affect if we're caching the output of the uploads directory. This may be another case for not using a persistent cache.

#44 in reply to: ↑ 43 @azaozz
2 years ago

Replying to joemcgill:

I wonder if anyone is filtering the site URL upstream from wp_uploads() based on some context (e.g. is_admin(), etc.)

It's not "impossible", but it is the wrong way to do it. That would mean either using the pre_option_* filter or updating the option (writing to the DB) every time. The "right way" is to use the upload_dir filter.

This may be another case for not using a persistent cache.

Right, this part is not in persistent cache.

This ticket was mentioned in Slack in #core by joemcgill. View the logs.


2 years ago

#46 in reply to: ↑ 43 @boonebgorges
2 years ago

Replying to joemcgill:

Replying to mikeschroder:

However, have this failing unit test:

I believe this may a flaw in our unit tests for #15928 and #32112. Essentially, we're changing the scheme of the 'site_url' during the course of the tests in order to check the URL created by wp_get_attachment_image() in different contexts. However, caching the uploads directory is effectively negating those changes. We should probably rewrite those tests so we filter the output of wp_uploads() instead of changing the site_url() option during the same set of operations.

Or we manually bust the cache in those tests.

#47 @joemcgill
2 years ago

I take it all back—sorta.

Running the failing test in isolation was also causing the above test to fail, so I spent some more time debugging and the issue is that the WP_UnitTestCase::setUp() method calls wp_upload_dir() in WP_UnitTestCase::scan_user_uploads(), which is what warms the static cache inside wp_upload_dir(). All of this happens before we have a chance to modify the scheme of the 'siteurl' option for the purposes of our tests.

So, in short, the WP_UnitTestCase::setUp() method is polluting the static cache inside wp_uploads_dir(). We may need a way to make sure a static cache value is _not_ set in certain instances, rather than only being able to force a refresh on a subsequent run.

As an aside, we should probably refresh the whole 15928 group of tests so they're not each uploading the same data.

This ticket was mentioned in Slack in #core-images by joemcgill. View the logs.


2 years ago

#49 @mikeschroder
2 years ago

  • Owner changed from mikeschroder to azaozz

#50 @mikeschroder
2 years ago

Chatted with @azaozz on this -- Fine with a commit with unit tests passing.

This ticket was mentioned in Slack in #core by joemcgill. View the logs.


2 years ago

#52 @azaozz
2 years ago

In 36565:

Improve the performance of wp_upload_dir():

  • Cache the output in non-persistent cache.
  • Cache the result from wp_mkdir_p() in persistent cache (when present).
  • Introduce wp_get_upload_dir() for use when not uploading files. It is equivalent to wp_upload_dir() but does not check for the existence or create the upload directory.
  • Change tests to use the non-cached _wp_upload_dir(). They change options on the fly (should never be used in production) to simulate different environments.
  • Introduce _upload_dir_no_subdir() and _upload_dir_https() to facilitate testing. These use the proper upload_dir filter to simulate different environments.

Props kovshenin, azaozz.
See #34359.

#53 @azaozz
2 years ago

Leaving open for now in case we want to change some function names or add more tests.

#54 @azaozz
2 years ago

In 36569:

Replace wp_upload_dir() with the new wp_get_upload_dir() in all cases where a file is not being uploaded. Deprecate _wp_upload_dir_baseurl(), and replace it with wp_get_upload_dir().

See #34359.

@azaozz
2 years ago

#55 @azaozz
2 years ago

Found another edge case: if wp-content is temporarily unavailable (perhaps on very slow NFS share, etc.) we cache the wp_mkdir_p() error in persistent cache. Then we return that error until the cache expires.

These errors are very rare. Thinking it is better if we don't cache wp_mkdir_p()/file system errors so if there is some sort of temporary outage we don't block uploads needlessly.

In 34359.6.patch: change the persistent caching of tested paths to only cache them when no FS errors.

#56 @azaozz
2 years ago

In 36628:

In wp_upload_dir() do not cache error from wp_mkdir_p() when a directory cannot be created. Keep trying to create the dirs. This happens mostly in file upload context.

See #34359.

#57 follow-up: @mikeschroder
2 years ago

Is there anything else left here?

#58 in reply to: ↑ 57 @azaozz
2 years ago

  • Resolution set to fixed
  • Status changed from assigned to closed

Replying to mikeschroder:

Seems to be working properly. Feel free to reopen if any bugs or regressions.

#59 @DrewAPicture
2 years ago

In 36925:

Docs: Improve DocBlock syntax for wp_get_upload_dir(), introduced in [36565].

See #34359. See #35986.

#60 follow-up: @barry
2 years ago

  • Resolution fixed deleted
  • Severity changed from normal to major
  • Status changed from closed to reopened

This broke things on WordPress.com and elsewhere. I don't think it's safe to cache the results of wp_mkdir_p() in a persistent cache, but caching it non-persistently seems fine. I think we need to go back to the approach in https://core.trac.wordpress.org/attachment/ticket/34359/34359.3.patch, storing the results in a non-persistent cache.

#61 @swissspidy
2 years ago

@barry Could you create a patch with the suggested rollback for trunk? Best in a new ticket that we can move to the 4.5.1 milestone for a better overview.

#62 in reply to: ↑ 60 @ocean90
2 years ago

  • Resolution set to fixed
  • Severity changed from major to normal
  • Status changed from reopened to closed

Replying to barry:

This broke things on WordPress.com and elsewhere. I don't think it's safe to cache the results of wp_mkdir_p() in a persistent cache, but caching it non-persistently seems fine. I think we need to go back to the approach in https://core.trac.wordpress.org/attachment/ticket/34359/34359.3.patch, storing the results in a non-persistent cache.

Moved into #36621.

#63 @ocean90
2 years ago

In 37285:

Media: Don't cache the results of wp_mkdir_p() in a persistent cache.

To improve the performance of wp_upload_dir() the result of wp_mkdir_p() was stored in a persistent cache, introduced in [36565]. But this becomes an issue when WordPress is scaled horizontally. You may end up caching a value for a server where the directory doesn't exist which will prevent further uploads on other servers because of the persistent cache.
The fix is to use a non-persistent cache.

Props azaozz, ocean90.
See #34359.
Fixes #36621.

#64 @ocean90
2 years ago

In 37286:

Media: Don't cache the results of wp_mkdir_p() in a persistent cache.

To improve the performance of wp_upload_dir() the result of wp_mkdir_p() was stored in a persistent cache, introduced in [36565]. But this becomes an issue when WordPress is scaled horizontally. You may end up caching a value for a server where the directory doesn't exist which will prevent further uploads on other servers because of the persistent cache.
The fix is to use a non-persistent cache.

Merge of [37285] to the 4.5 branch.

Props azaozz, ocean90.
See #34359.
Fixes #36621.

This ticket was mentioned in Slack in #core-multisite by jeremyfelt. View the logs.


2 years ago

Note: See TracTickets for help on using tickets.