When writing a large code base for a project, you need a way to manage the changes that occur to your soure files. Tools for doing this are called ‘source revision control systems’, and common examples are git, svn, bzr, cvs, and many more. Any revision control project worth its weight will facilitate managing what is written, allow for many users to collaboratively upload code, keep incremental versions of the code base to look back upon, and do all of these tasks in a reasonable amount of computational time. However, there is a fundamental dichotomy in source control systems, namely those that are “distributed” versus those that are “centralized”.
Centralized systems have a master code base, kept on a server. The user can download the source code, make changes, and when the changes are ready to be published, they ‘commit’ the changes, which means they are uploaded back to the central server, and recorded. All developers work from one, master, centralized repository, and anytime you want to ‘save’ what you’ve done by committing it, the master code base changes on the main server. Common centralized systems are svn and cvs.
Distributed systems still have a working code base on a server, and the user downloads the source code, just like in centralized systems. However, on distributed systems, instead of downloading the master code base to your server, the system makes a distinct copy of the code base and puts it on your system. This allows you to save your work to your local computer, you don’t have to save it to a central server for everyone to see. When you want to publish your work, you can ‘push’ your changes back to the development server. The most popular distributed system is git, and there are some up-and-coming ones out there like bazaar.
The differences in these two are subtle, and even negligible if the project has one or even two developers. When you start getting many developers, though, the differences between the two become profound. As you might have guessed from the title, I like the ‘distributed’ source code systems much better. Why?
- I like to “save” my work as often as possible – Maybe it goes back to my middle school days of whacking Ctrl-S at the end of every sentence when using MS Word. Furthermore, as all good developers should be, I’m self conscious about what others think of the code I write. At any rate, anytime I do anything, no matter how small, I like to save my work. With a centralized system, I don’t commit as often because I dont want the other developers to see tiny little commits like “added a one line comment”, or things like that. Furthermore, when building an algorithm, a lot of little parts have to come together (i.e. writing helper functions and things like that). I like to save every time I get ‘a little part’ working, and oftentimes, I don’t have enough “little parts” working just yet to have the full algorithm, and I’m not ready to send my “little parts” out in the world by themself. With distributed systems, I can save as often as I like, and only publish the important things.
- The crazy ideas – Sometimes, they happen. The idea pops into your head, “Oh, lets tear down this central part of the program to rebuild it better!” Personally, I love getting these ideas, its where the most valuable work gets done. This process always takes a while, and many changes will be made before the teardown/rebuild has a chance at working. With centralized control system, you cannot save any intermediate progress between the crazy idea, and the crazy idea working. If you commit a bunch of half formed revisions to the central server, other developers will get angry, because you just broke code they needed for their work. However, with a distributed system, your changes can be committed just to your local tree. The other developers can continue their work, and you can save as much as you want. When your big, better algorithm is complete (and working 😀 ) you can just push the major milestone changes out, and no one’s work gets interrupted, so no one gets angry.
- Branching. Both centralized and distributed have methods of ‘branching’ the code. This is a method of splitting off from the main code base to implement something experimental, usually. I’m not gonna lie, the differences here aren’t earth shatteringly different. Anecdotally, its always made more sense to me to be able to branch a code base that exists on your own machine than one that is shared with everyone. Also, anecdotally, branching code from centralized systems has always given me a headache, but branching code from distributed systems has always been quick and easy.
- Helping out the little guys When one is just starting out doing serious development work for OSS projects, you don’t start off with the ability to commit anything. You have to prove your mettle before the developers will give you the ability to go about making changes to the project. With a centralized server, this leaves you, as a little guy in the project, out in the cold. You can’t commit anything back without asking an official dev to do so on your behalf, so the centralized revision control system doesn’t help you out very much. With a distributed one though, since you have your own copy of the master branch, you can download the source, branch it, change it, and commit to it locally. Its actually a very useful tool, even without the ability to push your changes back to the master branch. This makes developers that are starting out with a project be able to use source control usefully until they gain the ability to commit or push back to the master base.
So thats why I gravitate strongly towards git or bzr over tools like svn. I am by no means a git or svn guru, and I’m sure a master of these tools could nitpick my article like there’s no tomorrow. I’m just a coder who taught himself these tools and I’m recounting the personal experiences that led me to like distributed systems much better than centralized.
These days, the two biggest projects I hack on are VLC and Compiz Fusion. Compiz, as far as I know, has always used git, and VLC recently migrated from svn to git. For little personal projects, I use bzr (mainly because launchpad.net will host my code for free then 🙂 ). I’m real glad that these projects use distributed source code revision tools, and hope more projects I occasionally hack on (***coughffmpegcough***) will consider switching. Whatever method you use to keep track of your work though, happy hacking! 😀