Subversion via Git

Subversion is a lot better than no version control system at all, but for those accustomed to distributed version control systems like Git or Mercurial, it can be pretty painful to lose features like cheap and flexible branching, intelligent merging and rebasing, and the snappy operations of Git when the need arises to work on code in a Subversion repository, perhaps for a legacy project or for an organisation with established repositories.

Fortunately, there’s an excellent compromise available in using the git-svn wrapper, which allows you to treat a Git repository as a working copy of a Subversion repository. This works transparently, meaning that others using the regular svn client on your team won’t have any difficulty working with your commits or branches, but your private Git workflow can meanwhile be as simple or complex as you like.

Installing

If you’re on Debian or Ubuntu, you can install the git-svn wrapper with:

# apt-get install git-svn

If you’re installing Git from source, it’s included in the default installation.

Cloning (Checkout)

To check out a copy of the trunk of the repository, you can clone it directly:

$ git svn clone svn://server.network/project/trunk project

This will provide you with a working copy of the repository’s trunk in the form of a Git repository, complete with a git log history imported from the Subversion commits. If you want to include tracking for the repository’s branches and tags as well, you can specify the paths to them on the command line:

$ git svn clone svn://server.network/project project \
    --trunk trunk --branches branches --tags tags

Typing git branch and git tag in the resulting repository will then show these as available branches and tags, as if they’d been created in Git. If the Subversion repository has a standard layout of folders, with branches, tags, and trunk, you can pass the --stdlayout or -s as a shortcut for the above:

$ git svn clone svn://server.network/project project --stdlayout

Fetching (Updating)

To pull the most recent changes from the Subversion repository into your Git “working copy”, use:

$ git svn fetch

If you have local changes in your repository that have not yet been committed, you may be prompted to temporarily cache them while you run the fetch operation. Git’s stash function works well for this:

$ git stash
$ git svn fetch
$ git stash apply

Committing

You can commit to your Git repository as normal with git commit. When you’re ready to send these commits to the Subversion repository, you can do so with:

$ git svn dcommit

This will also note information about the Subversion commit in the output of git log.

Branching

If you want to create a new branch in Git that tracks a similarly created branch in the Subversion repository, you can do this:

$ git svn branch experimental

It’s useful to note that if you don’t have any need to add the branch you’re creating to the Subversion repository, you can just use the usual git branch to keep a branch restricted to your Git repository. You should only need the git svn branch facility if others might need to use that branch before you merge it.

Merging

While it may be possible to conduct merges within the Subversion repository from the git-svn client, I think this particular task is probably best done using the usual svn merge tool. If you’re up to reading how this works in man git-svn, you should go for it, but I don’t think that getting a handle on the complexity of the rules for how these merges are run is really worth the effort in most cases, and given how brittle Subversion can be it’s likely not worth the risk of breaking things.

Properties

Aside from the above recommendation about using the native svn merge to conduct merges of Subversion branches, another limitation of the git-svn client is it ignores Subversion properties like the very useful svn:ignore, and furthermore doesn’t provide any way to set them. As a result, after the clone Git won’t know which file patterns a traditional Subversion working copy would be set up to ignore. You can emulate this by writing the properties file to a .gitignore or the .git/info/exclude files:

$ git svn show-ignore >> .gitignore
$ git svn show-ignore >> .git/info/exclude

Empty directories

Finally, Subversion and Git differ in how they treat empty directories. As a result, you may create an empty directory in a Git repository with intent to commit it into the Subversion repository, and find yourself unable to do so. The workaround here is to place some sort of file into the directory and commit that; a README file explaining the directory’s purpose seems to be a sensible choice.

These limitations, among others, show that the mapping from Subversion’s functionality to Git’s isn’t perfect, but for those who find working with Subversion a bit painful or imprecise, the functionality available in git-svn can certainly help, until such time as you’re able to convince your repository’s host to migrate everything to a more modern and capable revision control system.

Managing dot files with Git

Managing configuration files in your home directory on a POSIX system can be a pain when you often work on more than one machine, or when you accidentally remove or delete some useful option or file. It turns out that it’s beneficial to manage your configuration files via a version control system, which will allow you both to track the changes you make, and also to easily implement them on other machines. In this case, I’m going to show you how to do it with Git, but in principle there’s no reason most of this couldn’t work with Subversion or Mercurial.

Choosing which files to version

A good way to start is to take a look at the dot files and dot directories you have storing your carefully crafted configurations, and figure out for which of them it would be most important to track changes and to be able to rapidly deploy on remote systems. I use the following criteria:

  • Compatibility: Is the configuration likely to work on all or most of the systems on which you’re going to use it? If you’re going to check out your cutting edge .vimrc file on a remote Debian Sarge machine that hasn’t been updated since 2006, you might find that a lot of it doesn’t work properly. In some cases, you can add conditionals to configuration files so that they only load the option if it’s actually available. Similarly, you might not want to copy your .bashrc to all of your machines if you use a wide variety of them.
  • Transferability: Are you going to want exactly the same behaviour this file configures on all of your remote systems? If your .gitconfig file includes a personal handle or outside e-mail address, it might not be appropriate for you to clone that onto your work servers, since it’ll end up in commits you do from work.
  • Mutability: Are you going to be the only agent that updates this configuration, or will programs change it as well, for example to store cached file references? This can make updating a pain.
  • Privacy: If you’re going to put the file on GitHub or any other public repository service, does it contain private information? You probably shouldn’t put anything with API keys, SSH keys, or database credentials out in the ether.

With these criteria applied, it turns out there are configurations for three programs that I really want to be able to maintain easily across servers: my Vim configuration, my Git configuration, and my GNU Screen configuration.

Creating the repository

To start, we’ll create a directory called .dotfiles to hold all our configuration, and initialise it as an empty Git repository.

$ mkdir .dotfiles
$ cd .dotfiles
$ git init

Then we’ll copy in the configuration files we want to track, and drop symbolic links to them from where they used to be, so that the applications concerned read them correctly.

$ cd
$ mv .vim .dotfiles/vim
$ mv .vimrc .dotfiles/vimrc
$ mv .screenrc .dotfiles/screenrc
$ mv .gitconfig .dotfiles/gitconfig
$ ln -s .dotfiles/vim .vim
$ ln -s .dotfiles/vimrc .vimrc
$ ln -s .dotfiles/screenrc .screenrc
$ ln -s .dotfiles/gitconfig .gitconfig

Next, we drop into the .dotfiles directory, add everything to the staging area, and commit it:

$ cd .dotfiles
$ git add *
$ git commit -m "First commit of dotfiles."

And that’s it, we’ve now got all four of those files tracked in our local Git repository.

Using a remote repository

With that done, if you want to take the next step of having a central location where you can always get your configuration from any machine with an internet connection, you can set up a repository for your dot files on GitHub, with a free account. The instructions for doing this on GitHub itself are great, so just follow them for your existing repository. On my machine, the results look like this:

$ git remote add origin git@github.com:tejr/dotfiles.git
$ git push -u origin master

Note that I’m pushing using a public key setup, which you can arrange in the SSH Public Keys section of your GitHub account settings.

With this done, if you update your configuration at any time, first add and commit the changes to your local repository, and then all you need to do to update the GitHub version as well is:

$ git push

Cloning onto another machine

Having done this, when you’re working with a new machine onto which you’d like to clone your configuration, you clone the repository from GitHub, and delete any existing versions of those files in your home directory to replace them with symbolic links into your repository, like so:

$ git clone git@github.com:tejr/dotfiles.git .dotfiles
$ rm -r .vim .vimrc .screenrc .gitconfig
$ ln -s .dotfiles/vim .vim
$ ln -s .dotfiles/vimrc .vimrc
$ ln -s .dotfiles/screenrc .screenrc
$ ln -s .dotfiles/gitconfig .gitconfig

Finally, if you come back to use this machine later after you’ve tweaked these configuration files a bit and pushed them to GitHub, you can update them by just running a pull:

$ git pull

Making things easier

This ends up taking a lot of annoyances out of my day, as I know on any machine on which I frequently work, all I need to do is drop to my .dotfiles directory and run a git pull to get the most recent version of my configurations. This ends up being a lot better than manually running scp or rsync calls to keep things up to date.

Committing part of a file

One of the advantages that Git has over Subversion and CVS is the use of its index as a staging area, which turns out to be a much more flexible model than Subversion. One of the things that always annoyed me about Subversion was that there seemed to be no elegant way to only commit only some of your changes to a particular tracked file. Subversion deals only in files in the working copy, and if you want to commit changes to a file, you have to commit all the changes in that file, even if they’re not related.

Where Subversion falls short

As an example, suppose you’re making changes to a working copy of a Subversion repository called myproject, and you’ve made a few changes to the main file, myproject.php; on one line, you’ve fixed a bug caused by getting the parameters for htmlentities() in the wrong order. On another, near the head of the file, you’ve changed a php.ini setting to allow the script to run for a long time. Here’s what the output of svn status and svn diff might look like in this case:

$ svn status
M myproject.php

$ svn diff
Index: myproject.php
================================================================
--- myproject.php (revision 2)
+++ myproject.php (working copy)
@@ -1,5 +1,7 @@
 <?php
+ini_set("max_execution_time", 300);
+
 /**
  * Open main class.
  */
 @@ -120,7 +122,7 @@
 public function dumpvalue($value)
 {
-    print htmlentities($value, "UTF-8", ENT_COMPAT);
+    print htmlentities($value, ENT_COMPAT, "UTF-8");
 }

Under Subversion, unless you move files around, you can’t commit only one of these changes; you need to commit both. This isn’t really the end of the world, since you could include a commit message describing both things you changed:

$ svn commit -m "Allowed longer runtime, fixed parameter order bug"
Transmitting file data .
Committed revision 3.

But if you’re finicky like me, and you’d prefer to think of commits as grouping semantically related changes as much as possible, it would be much better to be able to commit these two changes separately, and this is where Git’s use of an index shines.

Git’s method

Let’s work with the same project again, but this time as a Git repository. We’ll make the same changes again, and view the output of git status and git diff:

$ git status
# On branch master
# Changes not staged for commit:
#
# modified: myproject.php
#
no changes added to commit

$ git diff
diff --git a/myproject.php b/myproject.php
index 7c20f21..c149190 100644
--- a/myproject.php
+++ b/myproject.php
@@ -1,5 +1,7 @@
 <?php
+ini_set("max_execution_time", 300);
+
 /**
 * Open main class.
 */
 @@ -120,7 +122,7 @@ class MyProject
 public function dumpvalue($value)
 {
-    print htmlentities($value, "UTF-8", ENT_COMPAT);
+    print htmlentities($value, ENT_COMPAT, "UTF-8");
 }

So far, so good. Now when we run git add myproject.php to stage the changes in the index ready for commit, by default it does the same thing Subversion does, putting all of the changes in that file into the staging area. That’s probably fine in most cases, but today we want to commit one change, and then the other. The most basic way to do this is using Git’s --patch option.

The --patch option can be added to git add, and to some other Git commands concerned with manipulating the index as well, to explicitly prompt you about staging or not staging different sections of the file, that it terms hunks. In our case, the process of including only the first change would look something like this:

$ git add --patch myproject.php
diff --git a/myproject.php b/myproject.php
index 7c20f21..c149190 100644
--- a/myproject.php
+++ b/myproject.php
@@ -1,5 +1,7 @@
 <?php
+ini_set("max_execution_time", 300);
+
 /**
 * Open main class.
 */
 Stage this hunk [y,n,q,a,d,/,j,J,g,e,?]? y
 @@ -120,7 +122,7 @@ class MyProject
 public function dumpvalue($value)
 {
-    print htmlentities($value, "UTF-8", ENT_COMPAT);
+    print htmlentities($value, ENT_COMPAT, "UTF-8");
 }
Stage this hunk [y,n,q,a,d,/,K,g,e,?]? n

This done, if you compare the output of git diff --staged and git diff, you’ll notice that there are changes staged ready for commit in the file, and also changes that are not staged that we can commit separately later:

$ git diff --staged
diff --git a/myproject.php b/myproject.php
index 7c20f21..4bb2362 100644
--- a/myproject.php
+++ b/myproject.php
@@ -1,5 +1,7 @@
 <?php
+ini_set("max_execution_time", 300);
+
 /**
 * Open main class.
 */

$ git diff
diff --git a/myproject.php b/myproject.php
index 4bb2362..c149190 100644
--- a/myproject.php
+++ b/myproject.php
@@ -122,7 +122,7 @@ class MyProject
 public function dumpvalue($value)
 {
-    print htmlentities($value, "UTF-8", ENT_COMPAT);
+    print htmlentities($value, ENT_COMPAT, "UTF-8");
 }

So your staging area is all ready with just that one change in it, and all you need to do is type git commit with an appropriate message:

$ git commit -m "Allowed longer runtime"
[master 19d9068] Allowed longer runtime
1 files changed, 2 insertions(+), 0 deletions(-)

And the other change you made is still there, waiting to be staged and committed whenever you see fit:

$ git diff
diff --git a/myproject.php b/myproject.php
index 4bb2362..c149190 100644
--- a/myproject.php
+++ b/myproject.php
@@ -122,7 +122,7 @@ class MyProject
 public function dumpvalue($value)
 {
-    print htmlentities($value, "UTF-8", ENT_COMPAT);
+    print htmlentities($value, ENT_COMPAT, "UTF-8");
 }

Other methods

Because Git’s index can be manipulated with its lower-level tools very easily, you can treat the differences between your changes and the index like any other diff task. This means more advanced tools like Fugitive for Vim can be even better for seeing changesets in individual files as you stage them for commit. Check out Drew Neil’s Vimcast series on Fugitive if you’re interested in doing this; it’s quite an in-depth series of videos, but very much worth watching if you’re a Vim user who wants to understand and use Git to its fullest, and you really value precision and clarity in your commits.