Mitigate Plugins SVN Exponential Storage Growth
|Reported by:||bpetty||Owned by:|
We all know Subversion is terribly inefficient with storage compared to most modern version control systems, however I've noticed a pretty serious problem that goes way beyond this in the plugins SVN repo. Please forgive me if some details are slightly off as obviously anything running WP.org infrastructure is mostly a black box that no-one outside of Automattic or Audrey has any insight into (and I'm lucky to have found this problem in the first place).
If you take a look at the attached graph, you can see that the plugins repository is growing exponentially by disk usage regardless of what rate commits are coming in (but that has always been growing too, making this worse). I am assuming the repository is using FSFS, though this is actually still a problem if it were using BDB. All SVN repositories suffer from this weakness if used the same way the plugins SVN repo is being used.
The problem is that every SVN commit stores off node IDs of every sibling node of every parent node of all nodes that have changed in that revision. This means that a single commit to /myplugin/trunk/readme.txt contains references to all files and directories (and their related revision) in the /myplugin/trunk directory, the references to the branches, tags, and trunk nodes in /myplugin, and finally references to every directory in the root node (/) which means every single plugin in the repository.
Since the root node is related to any changed node in every single commit, and the list of plugins is constantly growing, this means that even though the repository is somewhere around 450GB right now, the actual data in the repo, including the full history, is only about 30GB. You can confirm with a simple dump of the repository. The other 420GB or so is entirely wasted space by SVN overhead.
If nothing is done in the next two years, the SVN repository is expected to double in size to about 900GB, and it’s performance will quickly degrade as the server takes longer to read revisions and the filesystem cache can no longer be used (which I suspect is already the case now). Another four years, and we could be looking at a 2TB Subversion repository with every single commit being required to write about 8MB to disk even if it's a one line change.
I know that any solution to this is going to take years to fully implement mostly because I believe this is going to require plugin SVN URLs to change during a migration at some point most likely. However, at the least, we should be heading up this problem by getting new plugin submissions started in their own repository rather than creating new directories for them in the current plugins SVN repo. This would at least stop the exponential growth of the plugins repository, extending it's lifespan significantly.