Subversion via Git

Subversion is a lot better than no version control system at all, but for those accustomed to distributed version control systems like Git or Mercurial, it can be pretty painful to lose features like cheap and flexible branching, intelligent merging and rebasing, and the snappy operations of Git when the need arises to work on code in a Subversion repository, perhaps for a legacy project or for an organisation with established repositories.

Fortunately, there’s an excellent compromise available in using the git-svn wrapper, which allows you to treat a Git repository as a working copy of a Subversion repository. This works transparently, meaning that others using the regular svn client on your team won’t have any difficulty working with your commits or branches, but your private Git workflow can meanwhile be as simple or complex as you like.

Installing

If you’re on Debian or Ubuntu, you can install the git-svn wrapper with:

# apt-get install git-svn

If you’re installing Git from source, it’s included in the default installation.

Cloning (Checkout)

To check out a copy of the trunk of the repository, you can clone it directly:

$ git svn clone svn://server.network/project/trunk project

This will provide you with a working copy of the repository’s trunk in the form of a Git repository, complete with a git log history imported from the Subversion commits. If you want to include tracking for the repository’s branches and tags as well, you can specify the paths to them on the command line:

$ git svn clone svn://server.network/project project \
    --trunk trunk --branches branches --tags tags

Typing git branch and git tag in the resulting repository will then show these as available branches and tags, as if they’d been created in Git. If the Subversion repository has a standard layout of folders, with branches, tags, and trunk, you can pass the --stdlayout or -s as a shortcut for the above:

$ git svn clone svn://server.network/project project --stdlayout

Fetching (Updating)

To pull the most recent changes from the Subversion repository into your Git “working copy”, use:

$ git svn fetch

If you have local changes in your repository that have not yet been committed, you may be prompted to temporarily cache them while you run the fetch operation. Git’s stash function works well for this:

$ git stash
$ git svn fetch
$ git stash apply

Committing

You can commit to your Git repository as normal with git commit. When you’re ready to send these commits to the Subversion repository, you can do so with:

$ git svn dcommit

This will also note information about the Subversion commit in the output of git log.

Branching

If you want to create a new branch in Git that tracks a similarly created branch in the Subversion repository, you can do this:

$ git svn branch experimental

It’s useful to note that if you don’t have any need to add the branch you’re creating to the Subversion repository, you can just use the usual git branch to keep a branch restricted to your Git repository. You should only need the git svn branch facility if others might need to use that branch before you merge it.

Merging

While it may be possible to conduct merges within the Subversion repository from the git-svn client, I think this particular task is probably best done using the usual svn merge tool. If you’re up to reading how this works in man git-svn, you should go for it, but I don’t think that getting a handle on the complexity of the rules for how these merges are run is really worth the effort in most cases, and given how brittle Subversion can be it’s likely not worth the risk of breaking things.

Properties

Aside from the above recommendation about using the native svn merge to conduct merges of Subversion branches, another limitation of the git-svn client is it ignores Subversion properties like the very useful svn:ignore, and furthermore doesn’t provide any way to set them. As a result, after the clone Git won’t know which file patterns a traditional Subversion working copy would be set up to ignore. You can emulate this by writing the properties file to a .gitignore or the .git/info/exclude files:

$ git svn show-ignore >> .gitignore
$ git svn show-ignore >> .git/info/exclude

Empty directories

Finally, Subversion and Git differ in how they treat empty directories. As a result, you may create an empty directory in a Git repository with intent to commit it into the Subversion repository, and find yourself unable to do so. The workaround here is to place some sort of file into the directory and commit that; a README file explaining the directory’s purpose seems to be a sensible choice.

These limitations, among others, show that the mapping from Subversion’s functionality to Git’s isn’t perfect, but for those who find working with Subversion a bit painful or imprecise, the functionality available in git-svn can certainly help, until such time as you’re able to convince your repository’s host to migrate everything to a more modern and capable revision control system.

Committing part of a file

One of the advantages that Git has over Subversion and CVS is the use of its index as a staging area, which turns out to be a much more flexible model than Subversion. One of the things that always annoyed me about Subversion was that there seemed to be no elegant way to only commit only some of your changes to a particular tracked file. Subversion deals only in files in the working copy, and if you want to commit changes to a file, you have to commit all the changes in that file, even if they’re not related.

Where Subversion falls short

As an example, suppose you’re making changes to a working copy of a Subversion repository called myproject, and you’ve made a few changes to the main file, myproject.php; on one line, you’ve fixed a bug caused by getting the parameters for htmlentities() in the wrong order. On another, near the head of the file, you’ve changed a php.ini setting to allow the script to run for a long time. Here’s what the output of svn status and svn diff might look like in this case:

$ svn status
M myproject.php

$ svn diff
Index: myproject.php
================================================================
--- myproject.php (revision 2)
+++ myproject.php (working copy)
@@ -1,5 +1,7 @@
 <?php
+ini_set("max_execution_time", 300);
+
 /**
  * Open main class.
  */
 @@ -120,7 +122,7 @@
 public function dumpvalue($value)
 {
-    print htmlentities($value, "UTF-8", ENT_COMPAT);
+    print htmlentities($value, ENT_COMPAT, "UTF-8");
 }

Under Subversion, unless you move files around, you can’t commit only one of these changes; you need to commit both. This isn’t really the end of the world, since you could include a commit message describing both things you changed:

$ svn commit -m "Allowed longer runtime, fixed parameter order bug"
Transmitting file data .
Committed revision 3.

But if you’re finicky like me, and you’d prefer to think of commits as grouping semantically related changes as much as possible, it would be much better to be able to commit these two changes separately, and this is where Git’s use of an index shines.

Git’s method

Let’s work with the same project again, but this time as a Git repository. We’ll make the same changes again, and view the output of git status and git diff:

$ git status
# On branch master
# Changes not staged for commit:
#
# modified: myproject.php
#
no changes added to commit

$ git diff
diff --git a/myproject.php b/myproject.php
index 7c20f21..c149190 100644
--- a/myproject.php
+++ b/myproject.php
@@ -1,5 +1,7 @@
 <?php
+ini_set("max_execution_time", 300);
+
 /**
 * Open main class.
 */
 @@ -120,7 +122,7 @@ class MyProject
 public function dumpvalue($value)
 {
-    print htmlentities($value, "UTF-8", ENT_COMPAT);
+    print htmlentities($value, ENT_COMPAT, "UTF-8");
 }

So far, so good. Now when we run git add myproject.php to stage the changes in the index ready for commit, by default it does the same thing Subversion does, putting all of the changes in that file into the staging area. That’s probably fine in most cases, but today we want to commit one change, and then the other. The most basic way to do this is using Git’s --patch option.

The --patch option can be added to git add, and to some other Git commands concerned with manipulating the index as well, to explicitly prompt you about staging or not staging different sections of the file, that it terms hunks. In our case, the process of including only the first change would look something like this:

$ git add --patch myproject.php
diff --git a/myproject.php b/myproject.php
index 7c20f21..c149190 100644
--- a/myproject.php
+++ b/myproject.php
@@ -1,5 +1,7 @@
 <?php
+ini_set("max_execution_time", 300);
+
 /**
 * Open main class.
 */
 Stage this hunk [y,n,q,a,d,/,j,J,g,e,?]? y
 @@ -120,7 +122,7 @@ class MyProject
 public function dumpvalue($value)
 {
-    print htmlentities($value, "UTF-8", ENT_COMPAT);
+    print htmlentities($value, ENT_COMPAT, "UTF-8");
 }
Stage this hunk [y,n,q,a,d,/,K,g,e,?]? n

This done, if you compare the output of git diff --staged and git diff, you’ll notice that there are changes staged ready for commit in the file, and also changes that are not staged that we can commit separately later:

$ git diff --staged
diff --git a/myproject.php b/myproject.php
index 7c20f21..4bb2362 100644
--- a/myproject.php
+++ b/myproject.php
@@ -1,5 +1,7 @@
 <?php
+ini_set("max_execution_time", 300);
+
 /**
 * Open main class.
 */

$ git diff
diff --git a/myproject.php b/myproject.php
index 4bb2362..c149190 100644
--- a/myproject.php
+++ b/myproject.php
@@ -122,7 +122,7 @@ class MyProject
 public function dumpvalue($value)
 {
-    print htmlentities($value, "UTF-8", ENT_COMPAT);
+    print htmlentities($value, ENT_COMPAT, "UTF-8");
 }

So your staging area is all ready with just that one change in it, and all you need to do is type git commit with an appropriate message:

$ git commit -m "Allowed longer runtime"
[master 19d9068] Allowed longer runtime
1 files changed, 2 insertions(+), 0 deletions(-)

And the other change you made is still there, waiting to be staged and committed whenever you see fit:

$ git diff
diff --git a/myproject.php b/myproject.php
index 4bb2362..c149190 100644
--- a/myproject.php
+++ b/myproject.php
@@ -122,7 +122,7 @@ class MyProject
 public function dumpvalue($value)
 {
-    print htmlentities($value, "UTF-8", ENT_COMPAT);
+    print htmlentities($value, ENT_COMPAT, "UTF-8");
 }

Other methods

Because Git’s index can be manipulated with its lower-level tools very easily, you can treat the differences between your changes and the index like any other diff task. This means more advanced tools like Fugitive for Vim can be even better for seeing changesets in individual files as you stage them for commit. Check out Drew Neil’s Vimcast series on Fugitive if you’re interested in doing this; it’s quite an in-depth series of videos, but very much worth watching if you’re a Vim user who wants to understand and use Git to its fullest, and you really value precision and clarity in your commits.