Make WordPress Core

Opened 20 months ago

Last modified 5 months ago

#54221 new defect (bug)

_transient_dirsize_cache is set to autoload=yes and kills db performance if it grows (20MB in our case)

Reported by: archon810's profile archon810 Owned by:
Milestone: Future Release Priority: normal
Severity: major Version: 2.8
Component: Database Keywords: needs-patch
Focuses: administration, performance Cc:

Description (last modified by SergeyBiryukov)

Hi there, folks. After months of perf issues on a fairly large site (10s of millions of monthly PVs), I finally dug in properly and discovered an absolutely massive _transient_dirsize_cache option which was set to autoload=yes... currently sitting at 207221 entries worth 20MB of data. That's 20MB every

SELECT option_name, option_value FROM wp_options WHERE autoload = 'yes'

loads each time. The query takes 1s+ on 32GB RAM 8-CPU db servers.

From the looks of it, it was added in 5.6.0 https://core.trac.wordpress.org/ticket/19879#comment:52.

For now, I set the transient to autoload=no, which considering its potential size, it should be set by default to no.

If this functionality is only used in multisite and only in some specific quota codepaths, it should definitely be set to autoload=no.

I'm also wondering if shortcircuitting it with

add_filter( "transient_dirsize_cache", function() {return; } );

and deleting the transient to avoid such a huge 20MB db entry would have any side effects. Or if it'd be a better idea to disable the cache altogether?

See https://wordpress.slack.com/archives/C02RQBWTW/p1633393189369400.

Attachments (1)

54221.diff (1.7 KB) - added by wpgurudev 5 months ago.

Download all attachments as: .zip

Change History (33)

#1 @SergeyBiryukov
20 months ago

  • Description modified (diff)

#2 @iandunn
20 months ago

  • Focuses multisite performance added
  • Version changed from 5.8.1 to 2.8

Thanks for reporting this!

I'm assuming the following about your site:

  • it's not a Multisite instance
  • it doesn't use object caching
  • the site with the 20mb dirsize_cache has 207k folders in its upload directory
  • sites with fewer upload subfolders have smaller dirsize_cache values, which are proportional to the # of folders

Is all of that correct?

At first glance, it does look like dirsize_cache can grow indefinitely based on the # of folders. The transient isn't set with an expiration, which seems intentional/correct, given the purpose. It also seems like it shouldn't be autoloaded, since it's only used on the Dashboard, file upload/deletion, and Site Health.

It looks like set_transient() sets $autoload to yes for this situation, and it doesn't allow the caller to override that. This doesn't seem directly related to r49744; it looks like it goes all the way back to when the concept of expiration was added to transients in r10603.

I don't see a ticket for that commit, though, or any documentation in the code. It's not obvious to me why autoload would be tied to expiration; it seems like it should be tied whether the average performance is better when it's loaded w/ all the options, or only when needed. Maybe expiration was thought to be a good proxy for that?

dirsize_cache was added for Multisite instances when Multisite was merged to Core in r12603. It was repurposed for Site Health (both single- and Multisite instances) in r45104.

#3 @archon810
20 months ago

Hi Ian,

Correct, we're not using multisite, don't have object caching turned on via a plugin of some sort (though considering something like W3 Total Cache), and have probably over 2x that amount of folders by my calculation in the uploads dir.

It's now been a day since I've checked on the transient and set it to autload=no, and its size is still the same 207221 elements. I sort of expected it to move by now, given that there are hundreds such directories created daily, so I'm not sure what's up with that, or if it'll flip back to autoload=yes next time it decides to update. When does that happen, by the way? How frequently or in what scenarios?

Based on your other observations, what is the plan to fix this permanently? Should we add autoload=no support to set_transient?

#4 @archon810
20 months ago

I also looked at why our mysqldump backups broke a couple of days ago:

Error 2020: Got packet bigger than 'max_allowed_packet' when dumping table `wp_options` at row: 459

Now I understand why... the option is probably so big that it actually breaks mysqldump or mysql.

This got me thinking, and not that we're in danger here yet, but if this field reaches 4GB, it'll actually hit the longtext max. WP will probably explode long before then though.

#5 @iandunn
20 months ago

if it'll flip back to autoload=yes next time it decides to update. When does that happen, by the way? How frequently or in what scenarios?

I think it will if the transient is deleted and recreated, but it's possible that'll never happen in practice.

Since it doesn't have an expiration, it shouldn't get cleared when upgrading WP. Since there's no object cache, it won't get cleared when rebooting the server. Since it's not Multisite, clean_dirsize_cache() won't be called.

To be safe, you could do something like this until a fix is available:

  1. Add a pre_set_transient_dirsize_cache callback that sets the value to an empty array
  2. Delete the transient

That would probably result Site Health, the Dashboard, and file upload pages showing wrong data, but IIRC some of that is only for Multisite.


what is the plan to fix this permanently? Should we add autoload=no support to set_transient?

I'm guessing this only affects sites w/ a very large number of subfolders, so it may be a bit of an edge case, and therefore a lower priority. I'm curious to hear what others think, though.

I think there are two problems:

  1. set_transient() makes sketchy assumptions about whether or not a transient should be autoloaded. Adding a new $autoload param to the function could allow us to specify one in cases like this. I'm guessing there are other transients that are only used on a few pages despite being autoloaded. More research into the original intent, benchmarks to verify assumptions, etc might be a good next step.
  2. dirsize_cache can become unwieldy on sites with tons of folders. Given its limited usage, one option would be to just disable it on those sites (ala wp_is_large_network). I'm assuming that the admins on those types of sites will usually capable of doing something custom if they actually need the functionality. Alternatively, the value could be sharded across multiple options, moved onto disk, etc. There could be better ideas too.

have probably over 2x that amount of folders ... its size is still the same 207221 elements [even though] there are hundreds such directories created daily
the option is probably so big that it actually breaks mysqldump

Huh, maybe those are related? Maybe there's an error when trying to add more entries to the value too? I'd assume MySQL could handle that just fine, but maybe there's a config option that limits it?

#6 follow-up: @iandunn
20 months ago

Another idea that might be a better solution for the dirsize_cache expansion: instead of storing the raw size of every folder, we could just store the total size (for each site in Multisite). AFAIK that's the only thing we need, and storing the individual sizes is for performance.

Instead of calculating the size synchronously (but with the data partially cached), we could do a full calculation asynchronously, and cache the result. That way the user always sees cached data, which is faster and avoids storing megabytes of data in an option.

It would get stale when files are uploaded/removed, but we could hook into those processes and manually add/subtract the file size from the cached total size. That should keep the size accurate enough until the cron job runs again.

That would need some more research to make sure it'd meet all the current requirements that dirsize_cache serves, including #19879 and #46645.

#7 in reply to: ↑ 6 @SergeyBiryukov
20 months ago

Replying to iandunn:

Instead of calculating the size synchronously (but with the data partially cached), we could do a full calculation asynchronously, and cache the result. That way the user always sees cached data, which is faster and avoids storing megabytes of data in an option.

That sounds like a good approach.

This ticket was mentioned in Slack in #core-test by boniu91. View the logs.


20 months ago

#9 @archon810
19 months ago

Hi, any updates on the proposed solution? Thanks.

#10 @vsamoletov
19 months ago

Hi all!

Just faced the same issue!

Mine is smaller, though:

mysql> SELECT option_name, length(option_value) AS option_value_length FROM wp_options WHERE autoload='yes' ORDER BY option_value_length DESC LIMIT 200;

+---------------------------------------------------------------+---------------------+
| option_name                                                   | option_value_length |
+---------------------------------------------------------------+---------------------+
| _transient_dirsize_cache                                      |              515393 |

However, this is the biggest wp_option!

Last edited 19 months ago by vsamoletov (previous) (diff)

#11 @seanconklin
18 months ago

I have a client site with _transient_dirsize_cache size 1,222,157 -- by far the largest row in the table.

#12 @SergeyBiryukov
18 months ago

  • Keywords needs-patch added
  • Milestone changed from Awaiting Review to Future Release

#13 @jk_gabba
16 months ago

  • Focuses administration added; multisite removed

I'm having a similar issue, got a huge uploads directory, so the _transient_dirsize_cache stands at 1379775 which is obviously slowing down the DB a load, is it safe to clear the record, or change the autoload, or should we wait for the patch?

#14 @archon810
16 months ago

So far changing it to autoload=no has worked, and the option hasn't reverted to yes.

Ours is now up to 1,019,740 entries, 5-fold from when I originally reported the issue. It's a total waste of resources.

There should be a way to turn this off and avoid the strain on resources entirely, especially since our uploads dir sits on block storage with gluster on top, which is much slower than local disk access.

Last edited 16 months ago by archon810 (previous) (diff)

#15 @seedsca
16 months ago

Interesting read, I have a client with:
_transient_dirsize_cache 1344451

However only 551 directories in uploads:
[~/public_html/wp-content/uploads]# ls -lR | grep "^d" | wc -l
551

Strangely, I see all directories inside public_html listed. And when I pipe to wc this number is higher yet not the 1344451...
wp option get _transient_dirsize_cache | wc -l
11367

Will set to autoload=no since it's not a multisite...

Last edited 16 months ago by seedsca (previous) (diff)

#16 @rachelposey
16 months ago

I also have the same issue on a brand new site so there can not be that much on it!

It is the biggest option! (1161500)

Following this thread for support.

#17 @rachelposey
16 months ago

I also have the same issue on a brand new site so there can not be that much on it!

It is the biggest option! (1161500)

Following this thread for support.

#18 @archon810
11 months ago

Any updates? Oddly, I'm still seeing the same 1,019,740 entries in the _transient_dirsize_cache option as 5 months ago. It stopped growing, it seems, but still occupies a massive amount of space.

If I empty it, will it not grow anymore? Did any recent versions of WP change some logic, or is it not growing for some other reason?

#19 follow-up: @seedsca
11 months ago

@iandunn should the version be 2.8? Seems like the previous 5.8.2 or newer is affected... At least that is what I was running when I noticed the problem.

#20 @seanconklin
11 months ago

I deleted this wp_option row weeks ago and it hasn't returned with the latest version of WordPress. In scanning the wp-includes folder I see dirsize being called for Multisite as well as WP REST API upload functions. It does not appear to be triggering from the Media Library nor the Site Health tool.

Here's a clue I ran into with the client who had this issue. I could tell they used to have a Subdirectory installation (where WordPress is served from a folder within the site root), because there was an entire copy of WordPress plugins and all in a sub-folder that was no longer bring used. So the dirsize cache grew large based on all of those folders being left behind.

Object caching is enabled and working fine now. WP Engine recommends an 800K limit on autoload size and this site used to be several times that! They cap the buffer size at 1MB. No wonder object caching was turned off until we fixed this. Working great now.

#21 in reply to: ↑ 19 @iandunn
11 months ago

should the version be 2.8?

I think it's supposed to represent the version where the bug was introduced.

This doesn't seem directly related to r49744; it looks like it goes all the way back to when the concept of expiration was added to transients in r10603. ... dirsize_cache was added for Multisite instances when Multisite was merged to Core in r12603.

(comment:2)

#22 @emilymoran84
11 months ago

Hello, I hope it's okay for me to "chime in". I have also found this in my database.

I am not running multisite and do not have Object Cache happening.

Mine is at a bit over 0.8MB, which is still a lot, pushing my overall wp_options "autoload" well over the recommended limit.

From my table:

_transient_dirsize_cache	823714

I'm afraid to delete it because I'm unsure what will happen, but I'm really needing to reduce my autoload and this is by far the largest entry.

Is is safe to delete? I read through this thread but am still unclear what to do. Thank you for your time.

Last edited 11 months ago by emilymoran84 (previous) (diff)

#23 @pixluser
11 months ago

Same bug for me too… +1MB…

Some news about that?

#24 @emilymoran84
11 months ago

Thought I'd update. I deleted this transient on 2022-07-01. It was at the size of 823714 bytes

Today (2022-07-13) I noticed it was back, though I'm unsure when it came back. It was 588465 bytes this time.

I've deleted it again, but would love for this to stop being created. Again, I am not running multisite and do not have Object Cache happening.

Thank you!

#25 @iandunn
8 months ago

Related: #55969 would add an $autoload param to set_transient().

#26 @separatereality
8 months ago

Same problem here. Is there a plan to fix it?
There simply should not be an entry that grows potentially indefinitely.

#27 @flixos90
8 months ago

@iandunn @SergeyBiryukov Acknowledging that the size of this transient is a problem, my take here would be that the even larger issue is that it is being autoloaded, thus affecting every basically every request.

How about we break the work into two parts?

  1. Change the autoload behavior so that this value is not autoloaded. That could happen as part of this ticket, since that is also the primary point of the ticket based on the title.
  2. Open a separate ticket to further explore how the calculation can be optimized.

For 1.: How about we change set_transient() to determine the autoload value based on a certain threshold of the value? I'm not exactly sure which value that should be, but I think it's fair to say that once a value goes beyond a certain threshold in size, it should never be autoloaded as it would notably slow down that entire query.

Related: #55969 would add an $autoload param to set_transient().

I would advise against this for now. While today this could be useful, options and transients misusing (or probably, for the most part, developers not thinking about) autoloading and what that means is a large problem in general. We are currently exploring an overhaul of that entire approach via https://github.com/WordPress/performance/issues/526, and part of that would potentially be to deprecate that parameter. Probably the biggest mistake with it was that it's optional - resulting in many people to ignore it and maybe not even knowing it exists. Anyway, long story short, I'm not convinced adding that parameter to set_transient() is the right answer.

#28 @annemaciver
7 months ago

Question:

(NOT using Multisite; YES using object cache)

Is there a downside to disabling the _transient_dirsize_cache autoload?

I'm getting a caution in Site Health that I have 1293 autoloaded options (size: 1 MB) in the options table, which could slow my site. The largest (though under 800KB) is _transient_dirsize_cache. I do have Redis/ object cache.

What is the potential consequence of disabling that autoload? Any reason not to?

Thanks!

#29 @giuse
7 months ago

Hello, I wonder why we need to store all the sizes of all directories. Why don't we store just the total size? It would be a number, not a so huge array.
I don't see anywhere in the core the need to know all the sizes, but only the global size.
When you upload or delete a file you can update that single number, and that's it.
Maybe you can calculate again the total size every time you upload or delete a file, but I would never save all the details in a transient. For what reason?
Probably, I missing something. Where do you need so much detailed information?

As I see in the core files the function get_dirsize() is used only wp-includes/ms-functions.php.

Here are the lines of code where it is used:

<?php
$upload_dir = wp_upload_dir();
$space_used = get_dirsize( $upload_dir['basedir'] ) / MB_IN_BYTES;

Then the function get_dirsizes() calls the function recurse_dirsize() which is defined in wp-includes/functions.php. Inside this last function, we have get_transient( 'dirsize_cache' ).
In no other place in the core, do I see anything that needs to get the transient 'dirsize_cache'.

So we have this huge option only to easily get the size of the upload directory? One for one single value?

Are we boating the database and slowing down every page because of information that nobody needs? And this option is even autoloaded?

And this is still not fixed?

If I'm not missing anything I would only store the total size of the upload directory, and nothing else.

Or if someone knows that in some situations we need all the details, please let us know.

For me it's not a matter of autoloading or not autoloading, it's a matter of whether we need or we don't need all the details. And if we need them, they must not be autoloaded, no doubt about that. This is a serious issue for performance and should be very urgent.

#30 @stinkyweezle
6 months ago

Still an issue on 6.1.1 Just found this field in one of our sites has grown to nearly 1MB.

#31 @wpgurudev
5 months ago

I think we have two options here to tackle the first issue according to @flixos90 's comment:27

  1. Change the transient _transient_dirsize_cache autoload option to no in WordPress update.
  1. Add a temporray fix in WP using query filter so that whenever wp_load_alloptions function's query run to load options with autoload set to yes, we can exclude certain options from loading.

@wpgurudev
5 months ago

#32 @wpgurudev
5 months ago

I think the second approach would be useful because of following reasons.

  1. There can be some other options like _transient_dirsize_cache which can have performance impacts.
  1. There need to be a general provision to filter out such options from alloptions query which can hamper the performance.
  1. Changing the transient autoload value to no can have some unknown side-effects.

Adding the patch here which adds a function wp_filter_autoloaded_options and a filter exlude_autoloaded_options to filter out the undesirable options from alloptions query.

Note: See TracTickets for help on using tickets.