Make WordPress Core

Opened 9 years ago

Closed 6 years ago

Last modified 6 years ago

#23256 closed defect (bug) (maybelater)

Mitigate Plugins SVN Exponential Storage Growth

Reported by: bpetty Owned by:
Milestone: Priority: normal
Severity: major Version:
Component: Site Keywords:
Focuses: Cc:


We all know Subversion is terribly inefficient with storage compared to most modern version control systems, however I've noticed a pretty serious problem that goes way beyond this in the plugins SVN repo. Please forgive me if some details are slightly off as obviously anything running infrastructure is mostly a black box that no-one outside of Automattic or Audrey has any insight into (and I'm lucky to have found this problem in the first place).

If you take a look at the attached graph, you can see that the plugins repository is growing exponentially by disk usage regardless of what rate commits are coming in (but that has always been growing too, making this worse). I am assuming the repository is using FSFS, though this is actually still a problem if it were using BDB. All SVN repositories suffer from this weakness if used the same way the plugins SVN repo is being used.

The problem is that every SVN commit stores off node IDs of every sibling node of every parent node of all nodes that have changed in that revision. This means that a single commit to /myplugin/trunk/readme.txt contains references to all files and directories (and their related revision) in the /myplugin/trunk directory, the references to the branches, tags, and trunk nodes in /myplugin, and finally references to every directory in the root node (/) which means every single plugin in the repository.

Since the root node is related to any changed node in every single commit, and the list of plugins is constantly growing, this means that even though the repository is somewhere around 450GB right now, the actual data in the repo, including the full history, is only about 30GB. You can confirm with a simple dump of the repository. The other 420GB or so is entirely wasted space by SVN overhead.

If nothing is done in the next two years, the SVN repository is expected to double in size to about 900GB, and it’s performance will quickly degrade as the server takes longer to read revisions and the filesystem cache can no longer be used (which I suspect is already the case now). Another four years, and we could be looking at a 2TB Subversion repository with every single commit being required to write about 8MB to disk even if it's a one line change.

I know that any solution to this is going to take years to fully implement mostly because I believe this is going to require plugin SVN URLs to change during a migration at some point most likely. However, at the least, we should be heading up this problem by getting new plugin submissions started in their own repository rather than creating new directories for them in the current plugins SVN repo. This would at least stop the exponential growth of the plugins repository, extending it's lifespan significantly.

Attachments (4)

plugins-svn-repo-growth.png (21.2 KB) - added by bpetty 9 years ago.
svn-1.6-pack-sizes.png (22.2 KB) - added by bpetty 9 years ago.
svn-1.8-pack-sizes.png (22.6 KB) - added by bpetty 9 years ago.
total-repo-size-by-date.png (21.5 KB) - added by bpetty 9 years ago.

Download all attachments as: .zip

Change History (18)

#1 @SergeyBiryukov
9 years ago

  • Milestone changed from Awaiting Review to

#2 @johnbillion
9 years ago

  • Cc johnbillion added

#3 @DrewAPicture
9 years ago

  • Cc DrewAPicture added

#4 @kovshenin
9 years ago

  • Cc kovshenin added

#5 @nacin
9 years ago

I've confirmed that the systems team has been cognizant of this for some time. (Years, even.) The short-term goal is to keep throwing hardware at the problem and to see what can be done to squeeze out performance.

The longer term goal (maybe 2013, maybe longer) is to start splitting up this behemoth repository in some way. (Git has been mentioned, but is by no means a magic bullet here. The upgrade and mirroring process would be a lot of work and fairly painful.) There are no concerete plans right now, and few have the bandwidth at the moment (babies, health, weddings, 3.6, other projects, and such), but it is something that we can/should at least draw up plans for later this year.

#6 @bpetty
9 years ago

For what it's worth, I should have automatically updated git-svn mirrors up for any plugins in the repo for anyone that wants one in the next month or two (it's not quite as difficult as you may be thinking), but you're right, that's not really the solution.

As mentioned before though, I think the first action item for this is to at least build out the changes needed to push new plugins into their own SVN repos. It's a very clear step, and one that would still make a big difference if done as early as possible. I'm surprised this hasn't been in progress already if this has been known about for years. Regardless of what happens with the plugin repo in the next 5 years, it's obvious that it needs to continue supporting Subversion (even if git support is added in at some point), and that this is how it should have been implemented originally.

If everyone else doesn't have the time to do this, I'm available to work on this myself if necessary.

#7 @bpetty
9 years ago

@nacin and I chat a bit about this today.

Notable things to bring back:

  • The new directory deltas feature in 1.8 appears to resolve this, and will likely be the solution here.
  • When the plugins SVN repo is upgraded to use the directory deltas, it's important to remember to inspect the SVNInMemoryCacheSize setting as it will be used more heavily.

#8 @bpetty
9 years ago

I've confirmed that Subversion 1.8 with directory deltas enabled resolves this issue.

Total size of the plugins SVN repo in 1.8:

... drumroll ...


Not bad for a 769GB repository in 1.6 or 1.7, and surprisingly good compared to the 67GB dump.

Curious how this affects the pack sizes? Read on...

Here's pack sizes with 1.6:

Here's pack sizes with 1.8 and directory deltas enabled:

And finally, here's how the repository growth looks by date:

#9 @lkraav
9 years ago

  • Cc leho@… added

#10 @cfoellmann
7 years ago

  • Keywords close added

#11 @wonderboymusic
6 years ago

  • Milestone deleted
  • Resolution set to maybelater
  • Status changed from new to closed

No activity here.

#12 follow-up: @DrewAPicture
6 years ago

  • Keywords close removed

@samuelsidler: Do you know if there is a corresponding meta ticket for this? Seems like it should be handled there anyway.

#13 in reply to: ↑ 12 @samuelsidler
6 years ago

Replying to DrewAPicture:

@samuelsidler: Do you know if there is a corresponding meta ticket for this? Seems like it should be handled there anyway.

There is currently no meta ticket on file for this.

#14 @dd32
6 years ago

FWIW our systems team will update it when the time comes, with or without a ticket. Resource usage isn't a significant concern at present.

Note: See TracTickets for help on using tickets.