Backing Up Your Server, Locally and Remotely

After my recent hard drive failure, improving my backup strategy was high on my list. My server has two drives, so what I wanted was a local backup on the second drive for quick recovery in case the primary drive failed, and a remote backup somewhere in case the whole server tanked. It had to be cost effective, and automated.

First I had to pick a remote backup option. I looked at backup options from my hosting provider, and getting a second server, but they were both too expensive. I have about 200 GB to back up, so things add up pretty fast. Finally I settled on Amazon’s S3 service. It’s affordable, it’s outside of my data center, fast (I get about 10 Mbit/sec to and from S3) and should be pretty reliable.

Once I’d decided on using S3, I started looking at software to handle the backup. Initially I tried using backup-manager, however I ran into several issues with it. First, the S3 upload can’t handle a file larger than 2 GB, so you can’t use the tar archiver. You have to use the dar archiver. Which doesn’t handle incrementals. It also seemed a bit flakey overall. The S3 uploads would fail without any useful error messages, the dar archiving burned a ton of CPU, etc… So I gave up on it.

I ended up rolling my own. I have a bash script that backs everything up to the secondary hard drive using rsync, and then backs that up to S3 using the s3cmd utility. You can’t backup to S3 directly if the directory you’re backing up contains files that will disappear during the S3 sync. For instance, if you’re backing up your user’s home directory, the s3cmd sync will first catalog all the files it needs to backup, and then will sync them up to Amazon. If some files don’t exist anymore (such as e-mail messages that have been deleted) the s3cmd sync fails with an error. Same goes for log files, spool files, blog and gallery cache files, etc… So even if you don’t have a secondary drive, you need to rsync your files over to a backup area, before you sync them up to S3.

I also wanted a bit of a safety net, in case I accidentally delete a file. So I have the rsync only sync over deletes every other Sunday, and on alternate Sundays I have the S3 sync propagate deletes up to S3. That means that any file I delete exists somewhere for 1-2 weeks.

Click below to see how the backups work and get the bash script I’m using.

Getting Started

First you’ll need to sign up for an Amazon S3 account. If you have an Amazon account already, this is very easy. You can start at the Amazon Web Services page, and use the sign up button on the right of the page.

Next you’ll need to download and install the s3cmd tool. Get the latest version, as it fixes several bugs. Then run the –configure option to setup your S3 account information.

s3cmd --configure

Then you’ll need to create a bucket, basically a directory, in your S3 account to backup into. Bucket names have to be unique across all of S3, not just your account, so pick something like “mydomain-backup” instead of “backup”.

s3cmd mb s3://mydomain-backup

You’ll also need to create the backup directories used by the script, you can see them below.

Now you’re ready to go

What The Backup Script Does

My script, backup.sh, is below, but I’ll walk through it here first:

Line 2: setup the Home directory. This should point to the home directory of the user where you ran the s3cmd –configure, which now contains a “.s3cfg” file with the S3 information needed to run the s3cmd sync. You’ll need this for it run correctly via cron.

Line 5: we’re defining a function to backup to my secondary drive (/data/). I’m backing up /etc, /opt, /home, /var, and /usr. You should change this to reflect the directories you want to back up. You should also change the destination directories to be wherever you want the local backup to be. This function is non-destructive, i.e. it doesn’t sync deletes over.

Line 14: defining a function to backup from the local backup to S3. You’ll need to change the local backup path, and the S3 bucket name and paths.

Line 23: this is the same as the function on line 5, except this DOES sync deleted files over.

Line 32: this is the same as the function on line 14, except this DOES sync deleted files over.

Line 41: this function does a full dump of your mysql databases. You need to put in your root or admin level username and password here on line 43. The function also keeps the latest 14 days worth of backups, in case you need to roll your database back to a specific point.

Line 50: this function does a full dump of your postgres databases. The function also keeps the latest 14 days worth of backups, in case you need to roll your database back to a specific point.

Line 58: this function does a full dump of your Subversion repository. The function also keeps the latest 14 days worth of backups, in case you need to roll your repository back to a specific point.

Line 67: we’re done defining functions, now it’s time to actually backup the databases and subversion repositories.

Line 72: now we export out the list of all your apt-get installed packages (on Debian). This list will make rebuilding a new server much easier. If you aren’t using Debian, just delete this.

Line 77: get the current day of the week. If it’s Sunday (0), then we’ll either do a sync of deletes to the local backup, or to S3.

Line 81: if the week of the year number is even, then we run the local clean function, which syncs deletes into the local backup, and the normal S3 backup.

Line 87: if the week of the year number is odd, then we run the normal local backup, and the S3 clean function, which syncs deletes up to the S3 backup.

Line 92: if it’s not a Sunday, just do the normal backups, locally and to S3.

I have this script called by cron every morning.

The Script

#!/bin/sh
export HOME=/home/$USERNAME

# backup to the secondary drive
function backup_local {
    rsync -avh /etc/ /data/backup/etc
    rsync -avh /opt/ /data/backup/opt
    rsync -avh /home/ /data/backup/home
    rsync -avh /var/ /data/backup/var
    rsync -avh /usr/ /data/backup/usr
}

# backup to S3
function backup_s3 {
    s3cmd sync /data/backup/etc/ s3://digitalsanctuary-backup/etc/
    s3cmd sync /data/backup/opt/ s3://digitalsanctuary-backup/opt/
    s3cmd sync /data/backup/home/ s3://digitalsanctuary-backup/home/
    s3cmd sync /data/backup/var/ s3://digitalsanctuary-backup/var/
    s3cmd sync /data/backup/usr/ s3://digitalsanctuary-backup/usr/
}

# backup to the secondary drive with deletes
function backup_local_clean {
    rsync -avh --delete /etc/ /data/backup/etc
    rsync -avh --delete /opt/ /data/backup/opt
    rsync -avh --delete /home/ /data/backup/home
    rsync -avh --delete /var/ /data/backup/var
    rsync -avh --delete /usr/ /data/backup/usr
}

# backup to S3 with deletes
function backup_s3_clean {
    s3cmd sync --delete-removed /data/backup/etc/ s3://digitalsanctuary-backup/etc/
    s3cmd sync --delete-removed /data/backup/opt/ s3://digitalsanctuary-backup/opt/
    s3cmd sync --delete-removed /data/backup/home/ s3://digitalsanctuary-backup/home/
    s3cmd sync --delete-removed /data/backup/var/ s3://digitalsanctuary-backup/var/
    s3cmd sync --delete-removed /data/backup/usr/ s3://digitalsanctuary-backup/usr/
}

# backup mysql
function backup_mysql {
    # Backing up each database...
    mysqldump -u$USERNAME -p$PASSWORD --opt --all-databases | bzip2 > /var/backups/mysql/dump_`date "+%Y%m%d"`.bz2

    # Removing backups older than fourteen days...
    find /var/backups/mysql -mtime +14 -exec rm -f {} \;
}

# backup postgres
function backup_postgres {
    # Backing up each database...
    su - postgres -c 'pg_dumpall -o | bzip2 > /var/backups/postgres/pgdumpall_`date "+%Y%m%d"`.bz2'

    # Removing backups older than fourteen days...
    find /var/backups/postgres -mtime +14 -exec rm -f {} \;
}

# backup subversion
function backup_svn {
    svnadmin dump /var/svn > /var/backups/svn/svn_`date "+%Y%m%d"`.dump

    # Removing backups older than fourteen days... 
    find /var/backups/svn -mtime +14 -exec rm -f {} \;
}

# do the application backups
backup_mysql
backup_postgres
backup_svn

# export the list of installed apt packages
dpkg -l > /var/backups/apt-installed.txt

# do the file backups

# if it's Sunday, then either clean the local backup, or the S3 backup
DY=`date +%w`
if [ $DY = 0 ]; then
	WK=`date +%W`
	ON_WK=`expr $WK % 2`
    if [ $ON_WK = 0 ]; then
	# On even weeks, clean out the local backup
	backup_local_clean
	backup_s3
    else
	# On odd weeks, clean out the S3 backup
	backup_local
	backup_s3_clean
    fi
else 
    # Normal Backup
    backup_local
    backup_s3
fi