Server Backups to S3

Backing Up Your Server, Locally and Remotely

After my recent hard drive failure, improving my backup strategy was high on my list. My server has two drives, so what I wanted was a local backup on the second drive for quick recovery in case the primary drive failed, and a remote backup somewhere in case the whole server tanked. It had to be cost effective, and automated.

First I had to pick a remote backup option. I looked at backup options from my hosting provider, and getting a second server, but they were both too expensive. I have about 200 GB to back up, so things add up pretty fast. Finally I settled on Amazon’s S3 service. It’s affordable, it’s outside of my data center, fast (I get about 10 Mbit/sec to and from S3) and should be pretty reliable.

Once I’d decided on using S3, I started looking at software to handle the backup. Initially I tried using backup-manager, however I ran into several issues with it. First, the S3 upload can’t handle a file larger than 2 GB, so you can’t use the tar archiver. You have to use the dar archiver. Which doesn’t handle incrementals. It also seemed a bit flakey overall. The S3 uploads would fail without any useful error messages, the dar archiving burned a ton of CPU, etc… So I gave up on it.

I ended up rolling my own. I have a bash script that backs everything up to the secondary hard drive using rsync, and then backs that up to S3 using the s3cmd utility. You can’t backup to S3 directly if the directory you’re backing up contains files that will disappear during the S3 sync. For instance, if you’re backing up your user’s home directory, the s3cmd sync will first catalog all the files it needs to backup, and then will sync them up to Amazon. If some files don’t exist anymore (such as e-mail messages that have been deleted) the s3cmd sync fails with an error. Same goes for log files, spool files, blog and gallery cache files, etc… So even if you don’t have a secondary drive, you need to rsync your files over to a backup area, before you sync them up to S3.

I also wanted a bit of a safety net, in case I accidentally delete a file. So I have the rsync only sync over deletes every other Sunday, and on alternate Sundays I have the S3 sync propagate deletes up to S3. That means that any file I delete exists somewhere for 1-2 weeks.

Click below to see how the backups work and get the bash script I’m using.

Getting Started

First you’ll need to sign up for an Amazon S3 account. If you have an Amazon account already, this is very easy. You can start at the Amazon Web Services page, and use the sign up button on the right of the page.

Next you’ll need to download and install the s3cmd tool. Get the latest version, as it fixes several bugs. Then run the –configure option to setup your S3 account information.

[fusion_builder_container hundred_percent=”yes” overflow=”visible”][fusion_builder_row][fusion_builder_column type=”1_1″ background_position=”left top” background_color=”” border_size=”” border_color=”” border_style=”solid” spacing=”yes” background_image=”” background_repeat=”no-repeat” padding=”” margin_top=”0px” margin_bottom=”0px” class=”” id=”” animation_type=”” animation_speed=”0.3″ animation_direction=”left” hide_on_mobile=”no” center_content=”no” min_height=”none”][text]s3cmd –configure
[/text]

Then you’ll need to create a bucket, basically a directory, in your S3 account to backup into. Bucket names have to be unique across all of S3, not just your account, so pick something like “mydomain-backup” instead of “backup”.

[/fusion_builder_column][fusion_builder_column type=”1_1″ background_position=”left top” background_color=”” border_size=”” border_color=”” border_style=”solid” spacing=”yes” background_image=”” background_repeat=”no-repeat” padding=”” margin_top=”0px” margin_bottom=”0px” class=”” id=”” animation_type=”” animation_speed=”0.3″ animation_direction=”left” hide_on_mobile=”no” center_content=”no” min_height=”none”][plain]
s3cmd mb s3://mydomain-backup
[/plain]

You’ll also need to create the backup directories used by the script, you can see them below.

Now you’re ready to go

What The Backup Script Does

My script, backup.sh, is below, but I’ll walk through it here first:

Line 2: setup the Home directory. This should point to the home directory of the user where you ran the s3cmd –configure, which now contains a “.s3cfg” file with the S3 information needed to run the s3cmd sync. You’ll need this for it run correctly via cron.

Line 5: we’re defining a function to backup to my secondary drive (/data/). I’m backing up /etc, /opt, /home, /var, and /usr. You should change this to reflect the directories you want to back up. You should also change the destination directories to be wherever you want the local backup to be. This function is non-destructive, i.e. it doesn’t sync deletes over.

Line 14: defining a function to backup from the local backup to S3. You’ll need to change the local backup path, and the S3 bucket name and paths.

Line 23: this is the same as the function on line 5, except this DOES sync deleted files over.

Line 32: this is the same as the function on line 14, except this DOES sync deleted files over.

Line 41: this function does a full dump of your mysql databases. You need to put in your root or admin level username and password here on line 43. The function also keeps the latest 14 days worth of backups, in case you need to roll your database back to a specific point.

Line 50: this function does a full dump of your postgres databases. The function also keeps the latest 14 days worth of backups, in case you need to roll your database back to a specific point.

Line 58: this function does a full dump of your Subversion repository. The function also keeps the latest 14 days worth of backups, in case you need to roll your repository back to a specific point.

Line 67: we’re done defining functions, now it’s time to actually backup the databases and subversion repositories.

Line 72: now we export out the list of all your apt-get installed packages (on Debian). This list will make rebuilding a new server much easier. If you aren’t using Debian, just delete this.

Line 77: get the current day of the week. If it’s Sunday (0), then we’ll either do a sync of deletes to the local backup, or to S3.

Line 81: if the week of the year number is even, then we run the local clean function, which syncs deletes into the local backup, and the normal S3 backup.

Line 87: if the week of the year number is odd, then we run the normal local backup, and the S3 clean function, which syncs deletes up to the S3 backup.

Line 92: if it’s not a Sunday, just do the normal backups, locally and to S3.

I have this script called by cron every morning.

The Script

# backup to the secondary drive
function backup_local {
rsync -avh /etc/ /data/backup/etc
rsync -avh /opt/ /data/backup/opt
rsync -avh /home/ /data/backup/home
rsync -avh /var/ /data/backup/var
rsync -avh /usr/ /data/backup/usr
}

# backup to S3
function backup_s3 {
s3cmd sync /data/backup/etc/ s3://digitalsanctuary-backup/etc/
s3cmd sync /data/backup/opt/ s3://digitalsanctuary-backup/opt/
s3cmd sync /data/backup/home/ s3://digitalsanctuary-backup/home/
s3cmd sync /data/backup/var/ s3://digitalsanctuary-backup/var/
s3cmd sync /data/backup/usr/ s3://digitalsanctuary-backup/usr/
}

# backup to the secondary drive with deletes
function backup_local_clean {
rsync -avh –delete /etc/ /data/backup/etc
rsync -avh –delete /opt/ /data/backup/opt
rsync -avh –delete /home/ /data/backup/home
rsync -avh –delete /var/ /data/backup/var
rsync -avh –delete /usr/ /data/backup/usr
}

# backup to S3 with deletes
function backup_s3_clean {
s3cmd sync –delete-removed /data/backup/etc/ s3://digitalsanctuary-backup/etc/
s3cmd sync –delete-removed /data/backup/opt/ s3://digitalsanctuary-backup/opt/
s3cmd sync –delete-removed /data/backup/home/ s3://digitalsanctuary-backup/home/
s3cmd sync –delete-removed /data/backup/var/ s3://digitalsanctuary-backup/var/
s3cmd sync –delete-removed /data/backup/usr/ s3://digitalsanctuary-backup/usr/
}

# backup mysql
function backup_mysql {
# Backing up each database…
mysqldump -u$USERNAME -p$PASSWORD –opt –all-databases | bzip2 > /var/backups/mysql/dump_`date “+%Y%m%d”`.bz2

# Removing backups older than fourteen days…
find /var/backups/mysql -mtime +14 -exec rm -f {} \;
}

# backup postgres
function backup_postgres {
# Backing up each database…
su – postgres -c ‘pg_dumpall -o | bzip2 > /var/backups/postgres/pgdumpall_`date “+%Y%m%d”`.bz2’

# Removing backups older than fourteen days…
find /var/backups/postgres -mtime +14 -exec rm -f {} \;
}

# backup subversion
function backup_svn {
svnadmin dump /var/svn > /var/backups/svn/svn_`date “+%Y%m%d”`.dump

# Removing backups older than fourteen days…
find /var/backups/svn -mtime +14 -exec rm -f {} \;
}

# do the application backups
backup_mysql
backup_postgres
backup_svn

# export the list of installed apt packages
dpkg -l > /var/backups/apt-installed.txt

# do the file backups

# if it’s Sunday, then either clean the local backup, or the S3 backup
DY=`date +%w`
if [ $DY = 0 ]; then
WK=`date +%W`
ON_WK=`expr $WK % 2`
if [ $ON_WK = 0 ]; then
# On even weeks, clean out the local backup
backup_local_clean
backup_s3
else
# On odd weeks, clean out the S3 backup
backup_local
backup_s3_clean
fi
else
# Normal Backup
backup_local
backup_s3
fi

[/bash][/fusion_builder_column][/fusion_builder_row][/fusion_builder_container]

Posted

July 10, 2009

Linux

Devon

Tags:

backup, rsync, s3

Comments

5 responses to “Server Backups to S3”

Devon

July 10, 2009

Oh! If any bash script guru types would like to offer improvements to my hacked-together script, please feel free. I’m no expert.

Reply
Val L33

April 11, 2012

Hey Devon,

I am looking to bake in my own postgres backup script to do daily backup of my databases and send them to S3.
Your script is great, but don’t you need time stamp postgres databases before you ship them off to S3?
How would you find which db to fallback onto if something goes wrong?

Thx,

Val

Reply
1. Devon
  
  April 11, 2012
  
  Val,
  
  the datestamp (I’m only doing daily backups) is in the filename of the backup:
  
  su – postgres -c ‘pg_dumpall -o | bzip2 > /var/backups/postgres/pgdumpall_`date “+%Y%m%d”`.bz2’
  
  Reply
  1. Val L33
    
    April 11, 2012
    
    Oh, I got it, that part was not showing on the screen.
    Great script indeed.
    Thx
    
    Reply
Esteban

May 19, 2012

Thanks for the great script. Very useful.

Reply