Unix as IDE: Revisions

Version control is now seen as an indispensable part of professional software development, and GUI IDEs like Eclipse and Visual Studio have embraced it and included support for industry standard version control systems in their products. Modern version control systems trace their lineage back to Unix concepts from programs such as diff and patch however, and there are plenty of people who will insist that the best way to use a version control system is still at a shell prompt.

In this last article in the Unix as an IDE series, I’ll follow the evolution of common open-source version control systems from the basic concepts of diff and patch, among the very first version control tools.

diff, patch, and RCS

A central concept for version control systems has been that of the unified diff, a file expressing in human and computer readable terms a set of changes made to a file or files. The diff command was first released by Douglas McIlroy in 1974 for the 5th Edition of Unix, so it’s one of the oldest commands still in regular use on modern systems.

A unified diff, the most common and interoperable format, can be generated by comparing two versions of a file with the following syntax:

$ diff -u example.{1,2}.c
--- example.c.1    2012-02-15 20:15:37.000000000 +1300
+++ example.c.2    2012-02-15 20:15:57.000000000 +1300
@@ -1,8 +1,9 @@
 #include <stdio.h>
+#include <stdlib.h> 

 int main (int argc, char* argv[]) { printf("Hello, world!\n");
-    return 0;
+    return EXIT_SUCCESS; }

In this example, the second file has a header file added, and the call to return changed to use the standard EXIT_SUCCESS rather than a literal 0 as the return value for main(). Note that the output for diff also includes metadata such as the filename that was changed and the last modification time of each of the files.

A primitive form of version control for larger code bases was thus for developers to trade diff output, called patches in this context, so that they could be applied to one another’s code bases with the patch tool. We could save the output from diff above as a patch like so:

$ diff -u example.{1,2}.c > example.patch

We could then send this patch to a developer who still had the old version of the file, and they could automatically apply it with:

$ patch example.1.c < example.patch

A patch can include diff output from more than one file, including within subdirectories, so this provides a very workable way to apply changes to a source tree.

The operations involved in using diff output to track changes were sufficiently regular that for keeping in-place history of a file, the Source Code Control System and the Revision Control System that has pretty much replaced it were developed. RCS enabled “locking” files so that they could not be edited by anyone else while “checked out” of the system, paving the way for other concepts in more developed version control systems.

RCS retains the advantage of being very simple to use. To place an existing file under version control, one need only type ci <filename> and provide an appropriate description for the file:

$ ci example.c
example.c,v  <--  example.c
enter description, terminated with single '.' or end of file:
NOTE: This is NOT the log message!
>> example file
>> .
initial revision: 1.1
done

This creates a file in the same directory, example.c,v, that will track the changes. To make changes to the file, you check it out, make the changes, then check it back in:

$ co -l example.c
example.c,v  -->  example.c
revision 1.1 (locked)
done
$ vim example.c
$ ci -u example.c
example.c,v  <--  example.c
new revision: 1.2; previous revision: 1.1
enter log message, terminated with single '.' or end of file:
>> added a line
>> .
done

You can then view the history of a project with rlog:

$ rlog example.c

RCS file: example.c,v
Working file: example.c
head: 1.2
branch:
locks: strict
access list:
symbolic names:
keyword substitution: kv
total revisions: 2; selected revisions: 2
description:
example file
----------------------------
revision 1.2
date: 2012/02/15 07:39:16;  author: tom;  state: Exp;  lines: +1 -0
added a line
----------------------------
revision 1.1
date: 2012/02/15 07:36:23;  author: tom;  state: Exp;
Initial revision
=============================================================================

And get a patch in unified diff format between two revisions with rcsdiff -u:

$ rcsdiff -u -r1.1 -r1.2 ./example.c 
===================================================================
RCS file: ./example.c,v
retrieving revision 1.1
retrieving revision 1.2
diff -u -r1.1 -r1.2
--- ./example.c 2012/02/15 07:36:23 1.1
+++ ./example.c 2012/02/15 07:39:16 1.2
@@ -4,6 +4,7 @@
 int main (int argc, char* argv[])
 {
     printf("Hello, world!\n");
+    printf("Extra line!\n");
     return EXIT_SUCCESS;
 }

It would be misleading to imply that simple patches were now in disuse as a method of version control; they are still very commonly used in the forms above, and also figure prominently in both centralised and decentralised version control systems.

CVS and Subversion

To handle the problem of resolving changes made to a code base by multiple developers, centralized version systems were developed, with the Concurrent Versions System (CVS) developed first and the slightly more advanced Subversion later on. The central feature of these systems are using a central server that contains the repository, from which authoritative versions of the codebase at any particular time or revision can be retrieved. These are termed working copies of the code.

For these systems, the basic unit of the systems remained the changeset, and the most common way to represent these to the user was in the archetypal diff format used in earlier systems. Both systems work by keeping records of these changesets, rather than the actual files themselves from state to state.

Other concepts introduced by this generation of systems were of branching projects so that separate instances of the same project could be worked on concurrently, and then merged into the mainline, or trunk with appropriate testing and review. Similarly, the concept of tagging was introduced to flag certain revisions as representing the state of a codebase at the time of a release of the software. The concept of the merge was also introduced; reconciling conflicting changes made to a file manually.

Git and Mercurial

The next generation of version control systems are distributed or decentralized systems, in which working copies of the code themselves contain a complete history of the project, and are hence not reliant on a central server to contribute to the project. In the open source, Unix-friendly environment, the standout systems are Git and Mercurial, with their client programs git and hg.

For both of these systems, the concept of communicating changesets is done with the operations push, pull and merge; changes from one repository are accepted by another. This decentralized system allows for a very complex but tightly controlled ecosystem of development; Git was originally developed by Linus Torvalds to provide an open-source DVCS capable of managing development for the Linux kernel.

Both Git and Mercurial differ from CVS and Subversion in that the basic unit for their operations is not changesets, but complete files (blobs) saved using compression. This makes finding the log history of a single file or the differences between two revisions of a file slightly more expensive, but the output of git log --patch still retains the familiar unified diff output for each revision, some forty years after diff was first being used:

commit c1e5559ddb09f8d02b989596b0f4100ad1aab422
Author: Tom Ryder <tom@sanctum.geek.nz>
Date:   Thu Feb 2 01:14:21 2012

Changed my mind about this one.

diff --git a/vim/vimrc b/vim/vimrc index cfbe8e0..65a3143 100644
--- a/vim/vimrc
+++ b/vim/vimrc
@@ -47,10 +47,6 @@
 set shiftwidth=4
 set softtabstop=4
 set tabstop=4

-" Heresy
-inoremap <C-a> <Home>
-inoremap <C-e> <End>
-
 " History
 set history=1000

The two systems have considerable overlap in functionality and even in command set, and the question of which to use provokes considerable debate. The best introductions I’ve seen to each are Pro Git by Scott Chacon, and Hg Init by Joel Spolsky.

Conclusion

This is the last post in the Unix as IDE series; I’ve tried to offer a rapid survey of the basic tools available just within a shell on Linux for all of the basic functionality afforded by professional IDEs. At points I’ve had to be not quite as thorough as I’d like in explaining certain features, but to those unfamiliar to development on Linux machines this will all have hopefully given some idea of how comprehensive a development environment the humble shell can be, and all with free, highly mature, and standard software tools.

This entry is part 7 of 7 in the series Unix as IDE.

Unix as IDE: Editing

The text editor is the core tool for any programmer, which is why choice of editor evokes such tongue-in-cheek zealotry in debate among programmers. Unix is the operating system most strongly linked with two enduring favourites, Emacs and Vi, and their modern versions in GNU Emacs and Vim, two editors with very different editing philosophies but comparable power.

Being a Vim heretic myself, here I’ll discuss the indispensable features of Vim for programming, and in particular the use of Linux shell tools called from within Vim to complement the editor’s built-in functionality. Some of the principles discussed here will be applicable to those using Emacs as well, but probably not for underpowered editors like Nano.

This will be a very general survey, as Vim’s toolset for programmers is enormous, and it’ll still end up being quite long. I’ll focus on the essentials and the things I feel are most helpful, and try to provide links to articles with a more comprehensive treatment of the topic. Don’t forget that Vim’s :help has surprised many people new to the editor with its high quality and usefulness.

Filetype detection

Vim has built-in settings to adjust its behaviour, in particular its syntax highlighting, based on the filetype being loaded, which it happily detects and generally does a good job at doing so. In particular, this allows you to set an indenting style conformant with the way a particular language is usually written. This should be one of the first things in your .vimrc file.

if has("autocmd")
  filetype on
  filetype indent on
  filetype plugin on
endif

Syntax highlighting

Even if you’re just working with a 16-color terminal, just include the following in your .vimrc if you’re not already:

syntax on

The colorschemes with a default 16-color terminal are not pretty largely by necessity, but they do the job, and for most languages syntax definition files are available that work very well. There’s a tremendous array of colorschemes available, and it’s not hard to tweak them to suit or even to write your own. Using a 256-color terminal or gVim will give you more options. Good syntax highlighting files will show you definite syntax errors with a glaring red background.

Line numbering

To turn line numbers on if you use them a lot in your traditional IDE:

set number

You might like to try this as well, if you have at least Vim 7.3 and are keen to try numbering lines relative to the current line rather than absolutely:

set relativenumber

Tags files

Vim works very well with the output from the ctags utility. This allows you to search quickly for all uses of a particular identifier throughout the project, or to navigate straight to the declaration of a variable from one of its uses, regardless of whether it’s in the same file. For large C projects in multiple files this can save huge amounts of otherwise wasted time, and is probably Vim’s best answer to similar features in mainstream IDEs.

You can run :!ctags -R on the root directory of projects in many popular languages to generate a tags file filled with definitions and locations for identifiers throughout your project. Once a tags file for your project is available, you can search for uses of an appropriate tag throughout the project like so:

:tag someClass

The commands :tn and :tp will allow you to iterate through successive uses of the tag elsewhere in the project. The built-in tags functionality for this already covers most of the bases you’ll probably need, but for features such as a tag list window, you could try installing the very popular Taglist plugin. Tim Pope’s Unimpaired plugin also contains a couple of useful relevant mappings.

Calling external programs

There are two major methods of calling external programs during a Vim session:

  • :!<command> — Useful for issuing commands from within a Vim context particularly in cases where you intend to record output in a buffer.
  • :shell — Drop to a shell as a subprocess of Vim. Good for interactive commands.

A third, which I won’t discuss in depth here, is using plugins such as Conque to emulate a shell within a Vim buffer. Having tried this myself and found it nearly unusable, I’ve concluded it’s simply bad design. From :help design-not:

Vim is not a shell or an Operating System. You will not be able to run a shell inside Vim or use it to control a debugger. This should work the other way around: Use Vim as a component from a shell or in an IDE.

Lint programs and syntax checkers

Checking syntax or compiling with an external program call (e.g. perl -c, gcc) is one of the calls that’s good to make from within the editor using :! commands. If you were editing a Perl file, you could run this like so:

:!perl -c %

/home/tom/project/test.pl syntax OK

Press Enter or type command to continue

The % symbol is shorthand for the file loaded in the current buffer. Running this prints the output of the command, if any, below the command line. If you wanted to call this check often, you could perhaps map it as a command, or even a key combination in your .vimrc file. In this case, we define a command :PerlLint which can be called from normal mode with \l:

command PerlLint !perl -c %
nnoremap <leader>l :PerlLint<CR>

For a lot of languages there’s an even better way to do this, though, which allows us to capitalise on Vim’s built-in quickfix window. We can do this by setting an appropriate makeprg for the filetype, in this case including a module that provides us with output that Vim can use for its quicklist, and a definition for its two formats:

:set makeprg=perl\ -c\ -MVi::QuickFix\ %
:set errorformat+=%m\ at\ %f\ line\ %l\.
:set errorformat+=%m\ at\ %f\ line\ %l

You may need to install this module first via CPAN, or the Debian package libvi-quickfix-perl. This done, you can type :make after saving the file to check its syntax, and if errors are found, you can open the quicklist window with :copen to inspect the errors, and :cn and :cp to jump to them within the buffer.

Vim quickfix working on a Perl file

Vim quickfix working on a Perl file

This also works for output from gcc, and pretty much any other compiler syntax checker that you might want to use that includes filenames, line numbers, and error strings in its error output. It’s even possible to do this with web-focused languages like PHP, and for tools like JSLint for JavaScript. There’s also an excellent plugin named Syntastic that does something similar.

Reading output from other commands

You can use :r! to call commands and paste their output directly into the buffer with which you’re working. For example, to pull a quick directory listing for the current folder into the buffer, you could type:

:r!ls

This doesn’t just work for commands, of course; you can simply read in other files this way with just :r, like public keys or your own custom boilerplate:

:r ~/.ssh/id_rsa.pub
:r ~/dev/perl/boilerplate/copyright.pl

Filtering output through other commands

You can extend this to actually filter text in the buffer through external commands, perhaps selected by a range or visual mode, and replace it with the command’s output. While Vim’s visual block mode is great for working with columnar data, it’s very often helpful to bust out tools like column, cut, sort, or awk.

For example, you could sort the entire file in reverse by the second column by typing:

:%!sort -k2 -r

You could print only the third column of some selected text where the line matches the pattern /vim/ with:

:'<,'>!awk '/vim/ {print $3}'

You could arrange keywords from lines 1 to 10 in nicely formatted columns like:

:1,10!column -t

Really any kind of text filter or command can be manipulated like this in Vim, a simple interoperability feature that expands what the editor can do by an order of magnitude. It effectively makes the Vim buffer into a text stream, which is a language that all of these classic tools speak.

Built-in alternatives

It’s worth noting that for really common operations like sorting and searching, Vim has built-in methods in :sort and :grep, which can be helpful if you’re stuck using Vim on Windows, but don’t have nearly the adaptability of shell calls.

Diffing

Vim has a diffing mode, vimdiff, which allows you to not only view the differences between different versions of a file, but also to resolve conflicts via a three-way merge and to replace differences to and fro with commands like :diffput and :diffget for ranges of text. You can call vimdiff from the command line directly with at least two files to compare like so:

$ vimdiff file-v1.c file-v2.c
Vim diffing a .vimrc file

Vim diffing a .vimrc file

Version control

You can call version control methods directly from within Vim, which is probably all you need most of the time. It’s useful to remember here that % is always a shortcut for the buffer’s current file:

:!svn status
:!svn add %
:!git commit -a

Recently a clear winner for Git functionality with Vim has come up with Tim Pope’s Fugitive, which I highly recommend to anyone doing Git development with Vim. There’ll be a more comprehensive treatment of version control’s basis and history in Unix in Part 7 of this series.

The difference

Part of the reason Vim is thought of as a toy or relic by a lot of programmers used to GUI-based IDEs is its being seen as just a tool for editing files on servers, rather than a very capable editing component for the shell in its own right. Its own built-in features being so composable with external tools on Unix-friendly systems makes it into a text editing powerhouse that sometimes surprises even experienced users.

This entry is part 3 of 7 in the series Unix as IDE.

Committing part of a file

One of the advantages that Git has over Subversion and CVS is the use of its index as a staging area, which turns out to be a much more flexible model than Subversion. One of the things that always annoyed me about Subversion was that there seemed to be no elegant way to only commit only some of your changes to a particular tracked file. Subversion deals only in files in the working copy, and if you want to commit changes to a file, you have to commit all the changes in that file, even if they’re not related.

Where Subversion falls short

As an example, suppose you’re making changes to a working copy of a Subversion repository called myproject, and you’ve made a few changes to the main file, myproject.php; on one line, you’ve fixed a bug caused by getting the parameters for htmlentities() in the wrong order. On another, near the head of the file, you’ve changed a php.ini setting to allow the script to run for a long time. Here’s what the output of svn status and svn diff might look like in this case:

$ svn status
M myproject.php

$ svn diff
Index: myproject.php
================================================================
--- myproject.php (revision 2)
+++ myproject.php (working copy)
@@ -1,5 +1,7 @@
 <?php
+ini_set("max_execution_time", 300);
+
 /**
  * Open main class.
  */
 @@ -120,7 +122,7 @@
 public function dumpvalue($value)
 {
-    print htmlentities($value, "UTF-8", ENT_COMPAT);
+    print htmlentities($value, ENT_COMPAT, "UTF-8");
 }

Under Subversion, unless you move files around, you can’t commit only one of these changes; you need to commit both. This isn’t really the end of the world, since you could include a commit message describing both things you changed:

$ svn commit -m "Allowed longer runtime, fixed parameter order bug"
Transmitting file data .
Committed revision 3.

But if you’re finicky like me, and you’d prefer to think of commits as grouping semantically related changes as much as possible, it would be much better to be able to commit these two changes separately, and this is where Git’s use of an index shines.

Git’s method

Let’s work with the same project again, but this time as a Git repository. We’ll make the same changes again, and view the output of git status and git diff:

$ git status
# On branch master
# Changes not staged for commit:
#
# modified: myproject.php
#
no changes added to commit

$ git diff
diff --git a/myproject.php b/myproject.php
index 7c20f21..c149190 100644
--- a/myproject.php
+++ b/myproject.php
@@ -1,5 +1,7 @@
 <?php
+ini_set("max_execution_time", 300);
+
 /**
 * Open main class.
 */
 @@ -120,7 +122,7 @@ class MyProject
 public function dumpvalue($value)
 {
-    print htmlentities($value, "UTF-8", ENT_COMPAT);
+    print htmlentities($value, ENT_COMPAT, "UTF-8");
 }

So far, so good. Now when we run git add myproject.php to stage the changes in the index ready for commit, by default it does the same thing Subversion does, putting all of the changes in that file into the staging area. That’s probably fine in most cases, but today we want to commit one change, and then the other. The most basic way to do this is using Git’s --patch option.

The --patch option can be added to git add, and to some other Git commands concerned with manipulating the index as well, to explicitly prompt you about staging or not staging different sections of the file, that it terms hunks. In our case, the process of including only the first change would look something like this:

$ git add --patch myproject.php
diff --git a/myproject.php b/myproject.php
index 7c20f21..c149190 100644
--- a/myproject.php
+++ b/myproject.php
@@ -1,5 +1,7 @@
 <?php
+ini_set("max_execution_time", 300);
+
 /**
 * Open main class.
 */
 Stage this hunk [y,n,q,a,d,/,j,J,g,e,?]? y
 @@ -120,7 +122,7 @@ class MyProject
 public function dumpvalue($value)
 {
-    print htmlentities($value, "UTF-8", ENT_COMPAT);
+    print htmlentities($value, ENT_COMPAT, "UTF-8");
 }
Stage this hunk [y,n,q,a,d,/,K,g,e,?]? n

This done, if you compare the output of git diff --staged and git diff, you’ll notice that there are changes staged ready for commit in the file, and also changes that are not staged that we can commit separately later:

$ git diff --staged
diff --git a/myproject.php b/myproject.php
index 7c20f21..4bb2362 100644
--- a/myproject.php
+++ b/myproject.php
@@ -1,5 +1,7 @@
 <?php
+ini_set("max_execution_time", 300);
+
 /**
 * Open main class.
 */

$ git diff
diff --git a/myproject.php b/myproject.php
index 4bb2362..c149190 100644
--- a/myproject.php
+++ b/myproject.php
@@ -122,7 +122,7 @@ class MyProject
 public function dumpvalue($value)
 {
-    print htmlentities($value, "UTF-8", ENT_COMPAT);
+    print htmlentities($value, ENT_COMPAT, "UTF-8");
 }

So your staging area is all ready with just that one change in it, and all you need to do is type git commit with an appropriate message:

$ git commit -m "Allowed longer runtime"
[master 19d9068] Allowed longer runtime
1 files changed, 2 insertions(+), 0 deletions(-)

And the other change you made is still there, waiting to be staged and committed whenever you see fit:

$ git diff
diff --git a/myproject.php b/myproject.php
index 4bb2362..c149190 100644
--- a/myproject.php
+++ b/myproject.php
@@ -122,7 +122,7 @@ class MyProject
 public function dumpvalue($value)
 {
-    print htmlentities($value, "UTF-8", ENT_COMPAT);
+    print htmlentities($value, ENT_COMPAT, "UTF-8");
 }

Other methods

Because Git’s index can be manipulated with its lower-level tools very easily, you can treat the differences between your changes and the index like any other diff task. This means more advanced tools like Fugitive for Vim can be even better for seeing changesets in individual files as you stage them for commit. Check out Drew Neil’s Vimcast series on Fugitive if you’re interested in doing this; it’s quite an in-depth series of videos, but very much worth watching if you’re a Vim user who wants to understand and use Git to its fullest, and you really value precision and clarity in your commits.