Server Backups to S3

Backing Up Your Server, Locally and Remotely

After my recent hard drive failure, improving my backup strategy was high on my list. My server has two drives, so what I wanted was a local backup on the second drive for quick recovery in case the primary drive failed, and a remote backup somewhere in case the whole server tanked. It had to be cost effective, and automated.

First I had to pick a remote backup option. I looked at backup options from my hosting provider, and getting a second server, but they were both too expensive. I have about 200 GB to back up, so things add up pretty fast. Finally I settled on Amazon’s S3 service. It’s affordable, it’s outside of my data center, fast (I get about 10 Mbit/sec to and from S3) and should be pretty reliable.

Once I’d decided on using S3, I started looking at software to handle the backup. Initially I tried using backup-manager, however I ran into several issues with it. First, the S3 upload can’t handle a file larger than 2 GB, so you can’t use the tar archiver. You have to use the dar archiver. Which doesn’t handle incrementals. It also seemed a bit flakey overall. The S3 uploads would fail without any useful error messages, the dar archiving burned a ton of CPU, etc… So I gave up on it.

I ended up rolling my own. I have a bash script that backs everything up to the secondary hard drive using rsync, and then backs that up to S3 using the s3cmd utility. You can’t backup to S3 directly if the directory you’re backing up contains files that will disappear during the S3 sync. For instance, if you’re backing up your user’s home directory, the s3cmd sync will first catalog all the files it needs to backup, and then will sync them up to Amazon. If some files don’t exist anymore (such as e-mail messages that have been deleted) the s3cmd sync fails with an error. Same goes for log files, spool files, blog and gallery cache files, etc… So even if you don’t have a secondary drive, you need to rsync your files over to a backup area, before you sync them up to S3.

I also wanted a bit of a safety net, in case I accidentally delete a file. So I have the rsync only sync over deletes every other Sunday, and on alternate Sundays I have the S3 sync propagate deletes up to S3. That means that any file I delete exists somewhere for 1-2 weeks.

Click below to see how the backups work and get the bash script I’m using.