Back That Thing Up

Introduction

Three Sundays ago, my primary Mac OS X hard drive failed. Those of you who follow me on Twitter got somewhat of a play-by-play as I discovered the depth of my drive failure I got home to the Spinning Pinwheel of Death (SPOD), and discovered quickly that my computer would not wake from screensaver or boot. However, I didn’t panic. Why? Because I have what I believe to be a relatively robust backup system for home use.

I can’t stress enough how important regular backups are. Data loss is one of my personal nightmares (well, that, and Lego or Andrle loss), since most of my life (professional and personal) is on the computer. Among other things, I’d lose every picture I’ve ever taken since freshman year of college, every homework assignment I’ve written on the computer since late 6th grade (when we got our first Mac), not to mention substantial configuration work and those precious saved games.

I sit atop what I call the Backup Tripod: regular clones to an external disk stored off-site, hourly incremental backups to a local disk or local network storage, and as-needed on-save synchronization to cloud storage. I’m sure there are many other articles out there that recommend a particular strategy, but this is my solution for Macs. I even convinced my parental units to use a similar setup. I’ll go into detail on what solutions I use and why (as well as recovery strategy) for each below the cut.

I can’t emphasize enough how important data backup is for the typical modern power user.

Offsite Clones

Why do an offsite clone? First, this is primarily to solve the “your apartment burned down” problem. You want to be able to recover as much of your data as possible, all at once, but you might be okay in that situation with a little bit of recent data loss. The latter can be mitigated of course by using the other two legs of the tripod in combination

I’ve been using the excellent Carbon Copy Cloner for several years to make cloned backups of my primary hard drive. It’s a Mac-specific solution that has seen regular improvements over that time, including a migration to launchd as part of updating for Mac OS 10.5 that enables some features I use, such as “initiate backup on mount”, which triggers my standard clone job as soon as my external drive is plugged in. Cloning the actual disk is of course only half of the solution: you need to be responsible for storing your backup drive in a safe location, and bringing it home for regular incremental updates, to minimize the annoyance during a recovery.

I store my home hard drive in a locked drawer in my office at work; my parents keep an old iPod with important files in their safe deposit box. I should also admit at this point that, for this particular failure, my clone backup was almost 4 months old, therefore making recovery a fair bit more annoying than it needed to be had I been keeping my clone more up-to-date.

Incremental Local Backups

Why do incremental backups? This is primarily to smooth out the frequency of your backups: now instead of a backup only as recent as the last time you remembered to bring your clone drive home, you can have one as recent as a few minutes ago. The other major advantage is that, when configured properly, they are completely automated. For the most part, you shouldn’t even care that they’re set up; they’ll just run silently in the background, and you’ll only notice their existence when you need them to perform a recovery. Note that in this context, by “local” I mean “in the same building”, not necessarily on the same machine, since some network solutions are viable here.

Being a Mac junkie, I use Apple’s Time Machine. Given my experience with this recent near-loss of data, I can safely say that for me, this is now Apple’s most important product ever (no matter how magical and revolutionary the iPad may be, it hasn’t saved my bacon… yet). Since my primary computer is a Mac Pro desktop, I have Time Machine configured to back up to a second internal drive. My parental units, since they both have laptops, use a Time Capsule as their backup system (and primary wireless router).

I can’t from personal experience recommend a particular incremental backup solution for Windows or Linux, but TSOR suggests TimeVault for Ubuntu. At work they use a network version of IronMountain Connected Backup, which is at least the fourth Windows backup solution IT has tried since I’ve worked there. So far it seems to have staying power, and I assume it’s working, but a running backup causes a massive performance hit on my Windows XP development machine, making it essentially unusable due to I/O when the backup is running. (I should add that in my experience, Time Machine does not cause a performance hit, even when playing a processor-intensive 3D game like World of Warcraft. I believe this has to do with Apple’s implementation using FSEvents, although Siracusa’s review of Leopard explains the underpinnings of Time Machine better than I.)

One of the major caveats of an incremental backup, as I learned the hard way during this particular recovery, is that they are bad at dealing with single large files that change often. In other words, databases of various sorts, as well as writeable disk images. Examples include the iTunes Music Library file, and the Mac OS 9 disk image used by the SheepShaver PowerPC emulator. There are two problems that these files can cause: the first is that they may cause your backup to fail due to file locking (especially if the program using the particular file is currently running); the second is that, since most incremental backup systems operate at filesystem granularity, they have to back up the entire file even if only a small portion of its data has changed. The latter leads to your incremental backup disk getting filled with many version changes

This is where careful management of your incremental backup’s ignore list comes in: I had both of these files ignored in Time Machine, so I had to restore them from my clone drive. More on that in the Recovery section below.

Cloud Storage

Why use cloud storage? Or, in this case, a better question might be “Why not use a cloud backup service?”.

I’ll answer the second first: I don’t trust them. Maybe this is some bizarre geek control freak paranoia, but I just don’t believe that their service can actually deliver what they promise while simultaneously being sufficiently secure. There are at least three parts to this: the first is that I don’t believe that the service will necessarily be available when I need it (whether during backup or recovery), the second is that I don’t believe they’ll actually delete my data completely when I ask, and the third is that I don’t believe they can completely protect my data from an attack on their systems. To some extent, these factors are all true of existing cloud services I rely on (such as GMail), but that data is a lot less critical or private than the full contents of my hard drive (and, in the case of e-mail, I keep a full local cache). I haven’t used any of the major cloud backup services out there for full backups, so I don’t want to bash any of them by name, but I would argue that they all suffer from these potential problems.

That brings us to my (admittedly limited) use of the Dropbox service. I only use the free 2 GB level of service, and I use it primarily for temporary file transfer to my iPhone or iPad. However, for a handful of important files that I’m actively working on, I use it as a sort of temporary remote version control. That is, as I reach a point in a particular document that I want to make sure I don’t lose, I’ll save the file locally and then copy it to my Dropbox, generally not overwriting the previous copy. That means if I need to “restore”, I just retrieve the appropriate copy from my Dropbox. When I’m done, I delete that particular working folder.

Recovery

Since I refer to this setup as the Backup Tripod, then there is the platform being supported by the three legs described above: quick and relatively painless recovery of your lost data. How do you use these three levels of backup to restore your data after a disk failure? To a large extent this depends on the type of failure you experienced, and how broad it was.

If you’ve accidentally deleted a single file, or somehow produced a corrupted version, then pulling a previous copy from either your cloud storage or your hourly local backup is probably a sufficient level of recovery, particularly if the first iteration of the file was created since your last offsite backup. (Depending on what you’re working on, there may also be a place where a distributed version control system like git can play a role.) Deciding how many past versions to keep over the course of a work session is up to your discretion (and the importance of the file). Of course, if the “single file” is one of the files described above as being difficult to backup incrementally, you will probably have to recover the version from your clone drive. Again, regular backups here are critical, in order to make recovery possible.

If you’ve lost the entire drive, you have two options, assuming you were using Time Machine as your incremental backup solution: you can restore from the last successful hourly backup, or you can restore from your clone (obviously both require purchasing a new drive and replacing the old one). Most likely you need to do a combination of the two: restore from the incremental backup first, to get the most recent version of various files, and then manually restore any files you keep on the ignore list such as virtual machine images or databases, plus any “high priority” files you have stored “in the cloud”.

For my particular recovery, in the case of the aforementioned VM image, this was fine, since I hadn’t used it since my last complete clone; in the case of my music library, I had to reimport a number of apps from the backed up iTunes data folders and eliminate duplicates and stale versions. This was mostly a task of synchronizing the state of the iTunes Library with the fully backed up on-disk state. Annoying, but avoidable/minimized by keeping more recent clones than I had. This was also a case where I didn’t need the cloud storage leg at all, since the failure occurred while I wasn’t home, so I wasn’t actively working on an important file that I wouldn’t have wanted to lose if I had a local disaster.

Incidentally, my hard drives would probably be the first thing out of my apartment in the event of a fire. <.<

Conclusion

Those are the three legs to the Backup Tripod, blending a balance of manual vs. automatic, offsite vs. local, and complete vs. incremental. I claim you need all of these features to insure a complete, up-to-date backup under a variety of failure conditions.

If you follow the general gist of this guide, and keep good backup habits, there’s a reasonable chance that a complete hard drive failure will be nothing more than a minor inconvenience. You’ll have to order a new drive and wait for it to arrive (or pick one up at your favorite tech shop), and then most of the process is waiting several hours while your computer does the restore, during which time you can get outside or something. Maybe a little bit of manual file restoration. All in all, annoying, but not a massive tragedy that would be the loss of all your home data (and a hell of a lot cheaper than hiring a data recovery expert).

Seriously. Back. Up. DO EET NAOW!!!

(And while you’re at it, make sure the IT guys at your work are on top of this.)

Comments

One response to “Back That Thing Up”

  1. Leo Avatar
    Leo

    Very cool suggestions. As a Linux egghead, I’ve been actively using a remote server powered by rdiff-backup to keep my extensive music, photography and .doc collection alive and well. I’m now looking at also doing further off-site backup (thinking of rsync.net which supports rdiff-backup) as uploading 50 GB over a DSL connection to another is taking me better part of the week ;-)

    Will keep you posted on what I find.

Nurd Up!