GNU/Linux Crypto: Backups

This entry is part 8 of 10 in the series GNU/Linux Crypto.

While having local backups for quick restores is important, such as on a USB disk or spare hard drive, it’s equally important to have a backup offsite from which you can restore your important documents if, for example, your office was burgled or burned down, losing both your workstation and backup media.

The easiest way to do this for most people is with a storage provider, offering convenient access to bulk storage of suitable size maintained on another company’s systems for a relatively modest price or even for free, such as the Ubuntu One service, or Microsoft’s offering, Skydrive. The best storage providers will also encrypt the data on their own servers, whether or not they have access.

Trusting a company with all your data and the encryption thereof is risky, particularly given recent revelations of corporate collusion with the NSA, and privacy-conscious users should prefer the security of encrypting the backups before they go up onto the provider’s servers. The provider may implement closed and/or symmetric encryption mechanisms of their own, which may or may not be trustworthy. For very strong personal encryption, as established, we can use our GnuPG setup to encrypt files before we put them up there:

$ tar -cf docsbackup-"$(date +%Y-%m-%d)".tar $HOME/Documents
$ gpg --encrypt docsbackup-2013-07-27.tar
$ scp docsbackup-2013-07-27.tar.gpg user@backup.example.com:docsbackup

The problem with encrypting whole files before we put them up for storage is that for even modestly sized data, performing entire backups and uploading all of the files together every time can cost a lot of bandwidth. Similarly, we’d like to be able to restore our personal files as they were on a specific date, in case of bad backups or accidental deletion, but without storing every file on every backup day, which may end up requiring far too much space.

Incremental backups

Normally, the solution is to use an incremental backup system, meaning after first uploading your files in their entirety to the backup system, successive backups upload only the changes, storing them in a retrievable and space-efficient format. Systems like Dirvish, a free Perl frontend to rsync(1), allow this.

Unfortunately, Dirvish doesn’t encrypt the files or changesets it stores. What’s needed is an incremental backup solution that efficiently calculates and stores changes in files on a remote server, and also encrypts them. Duplicity, a Python tool built around librsync, excels at this, and can use our GnuPG asymmetric key setup for the file encryption. It’s available in Debian-derived systems in the duplicity package. Note that, as before, a GnuPG key setup with an agent is required for this to work.

Usage

We can get an idea of how duplicity(1) works by asking it to start a backup vault on our local machine. It uses much the same source destination argument as tools like rsync or scp:

$ cd
$ duplicity --encrypt-key tom@sanctum.geek.nz Documents file://docsbackup

It’s important to specify --encrypt-key, because otherwise duplicity(1) will use symmetric encryption with a passphrase rather than a public key, which is considerably less secure. Specify the email address corresponding to the public keypair you would like to use for the encryption.

This performs a full encrypted backup of the directory, returning the following output:

Local and Remote metadata are synchronized, no sync needed.
Last full backup date: none
No signatures found, switching to full backup.
--------------[ Backup Statistics ]--------------
StartTime 1374903081.74 (Sat Jul 27 17:31:21 2013)
EndTime 1374903081.75 (Sat Jul 27 17:31:21 2013)
ElapsedTime 0.01 (0.01 seconds)
SourceFiles 4
SourceFileSize 142251 (139 KB)
NewFiles 4
NewFileSize 142251 (139 KB)
DeletedFiles 0
ChangedFiles 0
ChangedFileSize 0 (0 bytes)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 4
RawDeltaSize 138155 (135 KB)
TotalDestinationSizeChange 138461 (135 KB)
Errors 0
-------------------------------------------------

You’ll note you were not prompted for your passphrase to do this. Remember, encrypting files with your public key does not require a passphrase; the whole idea is that anyone can encrypt using your key without needing your permission.

Checking the created directory docsbackup, we find three new files within it, all three of them encrypted:

$ ls -1 docsbackup
duplicity-full.20130727T053121Z.manifest.gpg
duplicity-full.20130727T053121Z.vol1.difftar.gpg
duplicity-full-signatures.20130727T053121Z.sigtar.gpg

The vol1.difftar.gpg file contains the actual data stored; the other two files contain metadata about the backup’s contents, for use to calculate differences the next time the backup runs.

If we make a small change to a file in the directory being backed up, and run the same command again, we note that the backup has been performed incrementally, and only the changes (the new file) have been saved:

$ duplicity --encrypt-key tom@sanctum.geek.nz Documents file://docsbackup
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Sat Jul 27 17:34:33 2013
--------------[ Backup Statistics ]--------------
StartTime 1374903396.52 (Sat Jul 27 17:36:36 2013)
EndTime 1374903396.52 (Sat Jul 27 17:36:36 2013)
ElapsedTime 0.01 (0.01 seconds)
SourceFiles 5
SourceFileSize 142255 (139 KB)
NewFiles 2
NewFileSize 4100 (4.00 KB)
DeletedFiles 0
ChangedFiles 0
ChangedFileSize 0 (0 bytes)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 2
RawDeltaSize 4 (4 bytes)
TotalDestinationSizeChange 753 (753 bytes)
Errors 0
-------------------------------------------------

We also find three new files in the docsbackup directory containing the new data:

$ ls -1 docsbackup
duplicity-full.20130727T053433Z.manifest.gpg
duplicity-full.20130727T053433Z.vol1.difftar.gpg
duplicity-full-signatures.20130727T053433Z.sigtar.gpg
duplicity-inc.20130727T053433Z.to.20130727T053636Z.manifest.gpg
duplicity-inc.20130727T053433Z.to.20130727T053636Z.vol1.difftar.gpg
duplicity-new-signatures.20130727T053433Z.to.20130727T053636Z.sigtar.gpg

Note that the new files have the prefix duplicity-inc- or duplicity-new-, denoting them as incremental backups and not full ones.

Note that in order to keep track of what files have already been backed up, duplicity(1) stores metadata in ~/.cache/duplicity, as well as storing them along with the backup. This allows us to let our backup processes run unattended, rather than having to put in our passphrase to read the metadata on the remote server before performing an incremental backup. Of course, if we lose our cached files, that’s OK; we can read the ones out of the backup vault by supplying our passphrase on request for decryption.

Remote backups

If you have SSH or even just SCP/SFTP access to your storage provider’s servers, not much has to change to make duplicity(1) store the files up there instead:

$ duplicity --encrypt-key tom@sanctum.geek.nz Documents sftp://user@backup.example.com:docsbackup

Your backups will then be sent over an SSH link to the directory docsbackup on the system backup.example.com, with username user. In this way, not only is all the data protected in transmission, it’s stored encrypted on the remote server; it never sees your plaintext data. All anyone with access to your backups can see is their approximate size, the dates they were made, and (if you publish your public key) the user ID on the GnuPG key used to encrypt them.

If you’re using the ssh-agent(1) program to store your decrypted private keys, you won’t even have to enter a passphrase for that.

The duplicity(1) frontend supports other methods of uploading to different servers, too, including the boto backend for S3 Amazon Web Services, the gdocs backend for Google Docs, and httplib2 or oauthlib for Ubuntu One.

If you like, you can also sign your backups to make sure they haven’t been tampered with at the time of restoration, by changing --encrypt-key to --encrypt-sign-key. Note that this will require your passphrase.

Restoring

Restoring from a duplicity(1) backup volume is much the same, but with the arguments reversed:

$ duplicity sftp://user@backup.example.com:docsbackup docsrestore
Synchronizing remote metadata to local cache...
GnuPG passphrase:
Copying duplicity-full-signatures.20130727T053433Z.sigtar.gpg to local cache.
Copying duplicity-full.20130727T053433Z.manifest.gpg to local cache.
Copying duplicity-inc.20130727T053433Z.to.20130727T053636Z.manifest.gpg to local cache.
Copying duplicity-new-signatures.20130727T053433Z.to.20130727T053636Z.sigtar.gpg to local cache.
Last full backup date: Sat Jul 27 17:34:33 2013

Note that this time you are asked for your passphrase. This is because restoring the backup requires decrypting the data and possibly the signatures in the backup vault. After doing this, the complete set of documents from the time of your most recent incremental backup will be available in docsrestore.

Using this incremental system also allows you to restore your data in the state in the last backup before a given time. For example, to retrieve my ~/Documents directory as it was three days ago, I might run this:

$ duplicity --time 3D \
    sftp://user@backup.example.com:docsbackup \
    docsrestore

You can extend this to only restore particular files for large vaults, if you only need a particular file from the vault:

$ duplicity --time 3D \
    --file-to-restore private/eff.txt \
    sftp://user@backup.example.com:docsbackup \
    docsrestore

Automating

You should run your first full backup interactively to make sure it’s doing exactly what you need, but once you’re confident that everything is working correctly, you can set up a simple Bash script to run incremental backups for you. Here’s an example script, saved in $HOME/.local/bin/backup-remote:

#!/usr/bin/env bash

# Run keychain to recognise any agents holding decrypted keys we might need
# (optional, depending on your SSH key setup)
eval "$(keychain --eval --quiet)"

# Specify directory to back up, GnuPG key ID, and remote username and
# hostname
keyid=tom@sanctum.geek.nz
local=/home/tom/Documents
remote=sftp://user@backup.example.com/docbackups

# Run backup with duplicity
/usr/bin/duplicity --encrypt-key "$keyid" -- "$local" "$remote"

The line with keychain is optional, but will be necessary if you’re using an SSH key with a passphrase on it; you’ll also need to have authenticated with ssh-agent at least once. See the earlier article on SSH/GPG agents for details on this setup.

Don’t forget to make the script executable:

$ chmod +x ~/.local/bin/backup-remote

You can then have cron(8) call this for you every week, running it as your user, by editing your user crontab(5) file:

$ crontab -e

The following line would run this script every morning, beginning at 6.00am:

0 6 * * *   ~/.local/bin/backup-remote

Tips

A few general best practices apply to this, consistent with the Tao of Backup:

  • Check that your backups completed; either have the output of the cron script mailed to you, or log it to a file that you check at least occasionally to make sure your backups are working. I highly recommend using an email message, and including error output:

    MAILTO=you@example.com
    0 6 * * *   ~/.local/bin/backup-remote 2>&1
    
  • Run backups to your local servers too; this might prevent your backup provider from reading your files, but it won’t save them from being accidentally deleted.

  • Don’t forget to occasionally test-restore your backups to make sure they’re working correctly. It’s also wise to use duplicity verify on them occasionally, particularly if you don’t back up every day:

    $ duplicity verify sftp://user@remote.example.com/docbackups Documents
    Local and Remote metadata are synchronized, no sync needed.
    Last full backup date: Sat Jul 27 17:34:33 2013
    GnuPG passphrase:
    Verify complete: 2195 files compared, 0 differences found.
    
  • This incremental system means that you’ll likely only have to make full backups once, so you should back up too much data rather than too little; if you can spare the bandwidth and have the space, backing up your entire computer isn’t really that extreme.

  • Try not to depend too much on your remote backups; see them as a last resort, and work securely and with local backups as much as you can. Certainly, never rely on backups as a version control system; use Git for that.

Bash history expansion

Setting the Bash option histexpand allows some convenient typing shortcuts using Bash history expansion. The option can be set with either of these:

$ set -H
$ set -o histexpand

It’s likely that this option is already set for all interactive shells, as it’s on by default. The manual, man bash, describes these features as follows:

-H  Enable ! style history substitution. This option is on
    by default when the shell is interactive.

You may have come across this before, perhaps to your annoyance, in the following error message that comes up whenever ! is used in a double-quoted string, or without being escaped with a backslash:

$ echo "Hi, this is Tom!"
bash: !": event not found

If you don’t want the feature and thereby make ! into a normal character, it can be disabled with either of these:

$ set +H
$ set +o histexpand

History expansion is actually a very old feature of shells, having been available in csh before Bash usage became common.

This article is a good followup to Better Bash history, which among other things explains how to include dates and times in history output, as these examples do.

Basic history expansion

Perhaps the best known and most useful of these expansions is using !! to refer to the previous command. This allows repeating commands quickly, perhaps to monitor the progress of a long process, such as disk space being freed while deleting a large file:

$ rm big_file &
[1] 23608
$ du -sh .
3.9G    .
$ !!
du -sh .
3.3G    .

It can also be useful to specify the full filesystem path to programs that aren’t in your $PATH:

$ hdparm
-bash: hdparm: command not found
$ /sbin/!!
/sbin/hdparm

In each case, note that the command itself is printed as expanded, and then run to print the output on the following line.

History by absolute index

However, !! is actually a specific example of a more general form of history expansion. For example, you can supply the history item number of a specific command to repeat it, after looking it up with history:

$ history | grep expand
 3951  2012-08-16 15:58:53  set -o histexpand
$ !3951
set -o histexpand

You needn’t enter the !3951 on a line by itself; it can be included as any part of the command, for example to add a prefix like sudo:

$ sudo !3850

If you include the escape string \! as part of your Bash prompt, you can include the current command number in the prompt before the command, making repeating commands by index a lot easier as long as they’re still visible on the screen.

History by relative index

It’s also possible to refer to commands relative to the current command. To subtitute the second-to-last command, we can type !-2. For example, to check whether truncating a file with sed worked correctly:

$ wc -l bigfile.txt
267 bigfile.txt
$ printf '%s\n' '11,$d' w | ed -s bigfile.txt
$ !-2
wc -l bigfile.txt
10 bigfile.txt

This works further back into history, with !-3, !-4, and so on.

Expanding for historical arguments

In each of the above cases, we’re substituting for the whole command line. There are also ways to get specific tokens, or words, from the command if we want that. To get the first argument of a particular command in the history, use the !^ token:

$ touch a.txt b.txt c.txt
$ ls !^
ls a.txt
a.txt

To get the last argument, add !$:

$ touch a.txt b.txt c.txt
$ ls !$
ls c.txt
c.txt

To get all arguments (but not the command itself), use !*:

$ touch a.txt b.txt c.txt
$ ls !*
ls a.txt b.txt c.txt
a.txt  b.txt  c.txt

This last one is particularly handy when performing several operations on a group of files; we could run du and wc over them to get their size and character count, and then perhaps decide to delete them based on the output:

$ du a.txt b.txt c.txt
4164    a.txt
5184    b.txt
8356    c.txt
$ wc !*
wc a.txt b.txt c.txt
16689    94038  4250112 a.txt
20749   117100  5294592 b.txt
33190   188557  8539136 c.txt
70628   399695 18083840 total
$ rm !*
rm a.txt b.txt c.txt

These work not just for the preceding command in history, but also absolute and relative command numbers:

$ history 3
 3989  2012-08-16 16:30:59  wc -l b.txt
 3990  2012-08-16 16:31:05  du -sh c.txt
 3991  2012-08-16 16:31:12  history 3
$ echo !3989^
echo -l
-l
$ echo !3990$
echo c.txt
c.txt
$ echo !-1*
echo c.txt
c.txt

More generally, you can use the syntax !n:w to refer to any specific argument in a history item by number. In this case, the first word, usually a command or builtin, is word 0:

$ history | grep bash
 4073  2012-08-16 20:24:53  man bash
$ !4073:0
man
What manual page do you want?
$ !4073:1
bash

You can even select ranges of words by separating their indices with a hyphen:

$ history | grep apt-get
 3663  2012-08-15 17:01:30  sudo apt-get install gnome
$ !3663:0-1 purge !3663:3
sudo apt-get purge gnome

You can include ^ and $ as start and endpoints for these ranges, too. 3* is a shorthand for 3-$, meaning “all arguments from the third to the last.”

Expanding history by string

You can also refer to a previous command in the history that starts with a specific string with the syntax !string:

$ !echo
echo c.txt
c.txt
$ !history
history 3
 4011  2012-08-16 16:38:28  rm a.txt b.txt c.txt
 4012  2012-08-16 16:42:48  echo c.txt
 4013  2012-08-16 16:42:51  history 3

If you want to match any part of the command line, not just the start, you can use !?string?:

$ !?bash?
man bash

Be careful when using these, if you use them at all. By default it will run the most recent command matching the string immediately, with no prompting, so it might be a problem if it doesn’t match the command you expect.

Checking history expansions before running

If you’re paranoid about this, Bash allows you to audit the command as expanded before you enter it, with the histverify option:

$ shopt -s histverify
$ !rm
$ rm a.txt b.txt c.txt

This option works for any history expansion, and may be a good choice for more cautious administrators. It’s a good thing to add to one’s .bashrc if so.

If you don’t need this set all the time, but you do have reservations at some point about running a history command, you can arrange to print the command without running it by adding a :p suffix:

$ !rm:p
rm important-file

In this instance, the command was expanded, but thankfully not actually run.

Substituting strings in history expansions

To get really in-depth, you can also perform substitutions on arbitrary commands from the history with !!:gs/pattern/replacement/. This is getting pretty baroque even for Bash, but it’s possible you may find it useful at some point:

$ !!:gs/txt/mp3/
rm a.mp3 b.mp3 c.mp3

If you only want to replace the first occurrence, you can omit the g:

$ !!:s/txt/mp3/
rm a.mp3 b.txt c.txt

Stripping leading directories or trailing files

If you want to chop a filename off a long argument to work with the directory, you can do this by adding an :h suffix, kind of like a dirname call in Perl:

$ du -sh /home/tom/work/doc.txt
$ cd !$:h
cd /home/tom/work

To do the opposite, like a basename call in Perl, use :t:

$ ls /home/tom/work/doc.txt
$ document=!$:t
document=doc.txt

Stripping extensions or base names

A bit more esoteric, but still possibly useful; to strip a file’s extension, use :r:

$ vi /home/tom/work/doc.txt
$ stripext=!$:r
stripext=/home/tom/work/doc

To do the opposite, to get only the extension, use :e:

$ vi /home/tom/work/doc.txt
$ extonly=!$:e
extonly=.txt

Quoting history

If you’re performing substitution not to execute a command or fragment but to use it as a string, it’s likely you’ll want to quote it. For example, if you’ve just found through experiment and trial and error an ideal ffmpeg command line to accomplish some task, you might want to save it for later use by writing it to a script:

$ ffmpeg -f alsa -ac 2 -i hw:0,0 -f x11grab -r 30 -s 1600x900 \
> -i :0.0+1600,0 -acodec pcm_s16le -vcodec libx264 -preset ultrafast \
> -crf 0 -threads 0 "$(date +%Y%m%d%H%M%S)".mkv 

To make sure all the escaping is done correctly, you can write the command into the file with the :q modifier:

$ echo '#!/usr/bin/env bash' >ffmpeg.sh
$ echo !ffmpeg:q >>ffmpeg.sh

In this case, this will prevent Bash from executing the command expansion "$(date ... )", instead writing it literally to the file as desired. If you build a lot of complex commands interactively that you later write to scripts once completed, this feature is really helpful and saves a lot of cutting and pasting.

Thanks to commenter Mihai Maruseac for pointing out a bug in the examples.