WordPress.org

Make WordPress Core

Opened 15 months ago

Last modified 9 months ago

#23256 new defect (bug)

Mitigate Plugins SVN Exponential Storage Growth

Reported by: bpetty Owned by:
Milestone: WordPress.org Priority: normal
Severity: major Version:
Component: WordPress.org site Keywords:
Focuses: Cc:

Description

We all know Subversion is terribly inefficient with storage compared to most modern version control systems, however I've noticed a pretty serious problem that goes way beyond this in the plugins SVN repo. Please forgive me if some details are slightly off as obviously anything running WP.org infrastructure is mostly a black box that no-one outside of Automattic or Audrey has any insight into (and I'm lucky to have found this problem in the first place).

If you take a look at the attached graph, you can see that the plugins repository is growing exponentially by disk usage regardless of what rate commits are coming in (but that has always been growing too, making this worse). I am assuming the repository is using FSFS, though this is actually still a problem if it were using BDB. All SVN repositories suffer from this weakness if used the same way the plugins SVN repo is being used.

The problem is that every SVN commit stores off node IDs of every sibling node of every parent node of all nodes that have changed in that revision. This means that a single commit to /myplugin/trunk/readme.txt contains references to all files and directories (and their related revision) in the /myplugin/trunk directory, the references to the branches, tags, and trunk nodes in /myplugin, and finally references to every directory in the root node (/) which means every single plugin in the repository.

Since the root node is related to any changed node in every single commit, and the list of plugins is constantly growing, this means that even though the repository is somewhere around 450GB right now, the actual data in the repo, including the full history, is only about 30GB. You can confirm with a simple dump of the repository. The other 420GB or so is entirely wasted space by SVN overhead.

If nothing is done in the next two years, the SVN repository is expected to double in size to about 900GB, and it’s performance will quickly degrade as the server takes longer to read revisions and the filesystem cache can no longer be used (which I suspect is already the case now). Another four years, and we could be looking at a 2TB Subversion repository with every single commit being required to write about 8MB to disk even if it's a one line change.

I know that any solution to this is going to take years to fully implement mostly because I believe this is going to require plugin SVN URLs to change during a migration at some point most likely. However, at the least, we should be heading up this problem by getting new plugin submissions started in their own repository rather than creating new directories for them in the current plugins SVN repo. This would at least stop the exponential growth of the plugins repository, extending it's lifespan significantly.

Attachments (4)

plugins-svn-repo-growth.png (21.2 KB) - added by bpetty 15 months ago.
svn-1.6-pack-sizes.png (22.2 KB) - added by bpetty 10 months ago.
svn-1.8-pack-sizes.png (22.6 KB) - added by bpetty 10 months ago.
total-repo-size-by-date.png (21.5 KB) - added by bpetty 10 months ago.

Download all attachments as: .zip

Change History (13)

comment:1 SergeyBiryukov15 months ago

  • Milestone changed from Awaiting Review to WordPress.org

comment:2 johnbillion15 months ago

  • Cc johnbillion added

comment:3 DrewAPicture15 months ago

  • Cc DrewAPicture added

comment:4 kovshenin15 months ago

  • Cc kovshenin added

comment:5 nacin15 months ago

I've confirmed that the systems team has been cognizant of this for some time. (Years, even.) The short-term goal is to keep throwing hardware at the problem and to see what can be done to squeeze out performance.

The longer term goal (maybe 2013, maybe longer) is to start splitting up this behemoth repository in some way. (Git has been mentioned, but is by no means a magic bullet here. The upgrade and mirroring process would be a lot of work and fairly painful.) There are no concerete plans right now, and few have the bandwidth at the moment (babies, health, weddings, 3.6, other projects, and such), but it is something that we can/should at least draw up plans for later this year.

comment:6 bpetty15 months ago

For what it's worth, I should have automatically updated git-svn mirrors up for any plugins in the repo for anyone that wants one in the next month or two (it's not quite as difficult as you may be thinking), but you're right, that's not really the solution.

As mentioned before though, I think the first action item for this is to at least build out the changes needed to push new plugins into their own SVN repos. It's a very clear step, and one that would still make a big difference if done as early as possible. I'm surprised this hasn't been in progress already if this has been known about for years. Regardless of what happens with the plugin repo in the next 5 years, it's obvious that it needs to continue supporting Subversion (even if git support is added in at some point), and that this is how it should have been implemented originally.

If everyone else doesn't have the time to do this, I'm available to work on this myself if necessary.

comment:7 bpetty10 months ago

@nacin and I chat a bit about this today.

Notable things to bring back:

  • The new directory deltas feature in 1.8 appears to resolve this, and will likely be the solution here.
  • When the plugins SVN repo is upgraded to use the directory deltas, it's important to remember to inspect the SVNInMemoryCacheSize setting as it will be used more heavily.

bpetty10 months ago

bpetty10 months ago

comment:8 bpetty10 months ago

I've confirmed that Subversion 1.8 with directory deltas enabled resolves this issue.

Total size of the plugins SVN repo in 1.8:

... drumroll ...

16GB

Not bad for a 769GB repository in 1.6 or 1.7, and surprisingly good compared to the 67GB dump.

Curious how this affects the pack sizes? Read on...

Here's pack sizes with 1.6:

http://core.trac.wordpress.org/raw-attachment/ticket/23256/svn-1.6-pack-sizes.png

Here's pack sizes with 1.8 and directory deltas enabled:

http://core.trac.wordpress.org/raw-attachment/ticket/23256/svn-1.8-pack-sizes.png

And finally, here's how the repository growth looks by date:

http://core.trac.wordpress.org/raw-attachment/ticket/23256/total-repo-size-by-date.png

comment:9 lkraav9 months ago

  • Cc leho@… added
Note: See TracTickets for help on using tickets.