Back up your files with rsync and a VPS

rsync is an open source tool for copying and synchronizing files both locally and remotely across Unix-based systems. It’s not only fast, but also smart—the tool utilizes a famous algorithm to detect the “delta” between two files or folders. This reduces the amount of data sent over the network by only transferring the pieces of files that changed.

Let’s say you have a folder a that contains files b.txt, c.txt, and d.txt. When you first use rsync to copy that directory to another location, every file is transferred along with its contents. But let’s say you add a few sentences to c.txt, and then run rsync again. This time, it will only copy those sentences you made to c.txt. Because b.txt and d.txt were unchanged, there’s no reason to copy them again.

That means that rsync is great for backups, because it can dramatically reduce the amount of network transfer. Many people either set up a VPS that they dedicate toward backing up important personal documents, or utilize the leftover disk space on an existing VPS.

So, let’s walk through one option for an automated backup system. This isn’t meant to be a definitive guide to using rsync as a backuptool, and will only cover using your VPS as a remote backup for files you keep on your local machine.

But, once you have some of these fundamentals figured out, you’ll be able to use them toward a variety of use cases, such as setting up automated backups between two VPSes and more.

May 1: Updated with an improved version of the rsync + Bash script that actually works! Thanks to John for helping track this one down.

Prerequisites

  • A local machine running Linux or OS X
  • A virtual private server running Ubuntu 12.04/14.04/16.04, Fedora 22, Debian 7/8, CentOS 6/7
  • SSH access to your VPS

Rsync basics

If rsync isn’t installed on your system yet, you can get it with your package manager.

$ sudo apt-get install rsync   # for Debian/Ubuntu
$ sudo yum install rsync       # for Fedora/CentOS

The most basic rsync operation is between two local folders. the -a option enables “archive” mode, which is equal to -rlptgoD—check the man page for more information. The -v option enables “verbose” mode, which makes troubleshooting easier.

$ rsync -av source/ destination/

Or, when copying between a local machine and a remote one, such as a VPS, over the SSH protocol:

$ rsync -av -e ssh source/ [email protected]:/path/to/destination/

The ‘sync’ backup

For the sake of this tutorial, we are going to back up a folder called /home/joel/saveme/. We’re going to back up those files to a VPS located at the IP address 123.45.67.89, with a user of joelvps, and with a destination folder of /home/joelvps/backup/. Obviously, you’ll need to change those values according to your own setup.

We’re going to start with an incremental backup, which mimics the kind of syncing that Dropbox does, for example. All we’re doing is creating an exact copy of /home/joel/saveme on the VPS, and then copy any subsequent changes.

These commands are performed on your local machine, not the VPS.

$ rsync -av -e ssh /home/joel/saveme/ [email protected]:/home/joelvps/backup/
sending incremental file list
./
file1.txt
file2.png
file3.html

sent 247 bytes  received 76 bytes  646.00 bytes/sec
total size is 0  speedup is 0.00

Because I have SSH keys set up, I only have to enter in my SSH key passphrase. If you use password-based logins, you’ll have to enter your VPS user’s password here.

Now, all the files are replicated on the VPS. What if I make some changes to file1.txt? This is where rsync works its magic—the next time I run the command, rsync determines what’s been changed, and sends only that data across to the backup folder on my VPS.

$ rsync -av -e ssh /home/joel/saveme/ [email protected]:/home/joelvps/backup/
sending incremental file list
./
file1.txt

sent 240 bytes  received 38 bytes  185.33 bytes/sec
total size is 71  speedup is 0.26

See how only file1.txt was copied this time around? Now the folders are synchronized every time I run that command—almost.

Let’s say that I end up not wanting file2.png any more, and I delete it. I probably don’t need it backed up on the VPS now, either right? Well, the default rsync behavior will retain that file in the destination folder indefinitely. If you want rsync to delete any file from the destination that is not in the source, you can use the --delete flag.

$ rsync -av --delete -e ssh /home/joel/saveme/ [email protected]:/home/joelvps/backup/
sending incremental file list
deleting file2.png

sent 105 bytes  received 25 bytes  86.67 bytes/sec
total size is 71  speedup is 0.55

Now I’m really synced up. But, sometimes a sync isn’t enough—that’s where snapshots come in.

The ‘snapshot’ backup

The goal of my snapshots is to retain a complete copy of the saveme folder and preserve its state indefinitely, because you never know when something might go wrong. What if I accidentally delete a file and then rsync over the backup, too? What if I make a change to a file and then regret it? What if I rsync corrupted data to my backup folder? By creating snapshots at a chosen interval—whether hourly, daily, weekly, monthly, or something else entirely—you give yourself more options in worst-case scenarios.

rsync, with the allows you to create snapshots

I’m going to do a daily snapshot of my saveme folder. Instead of running one-off commands, let’s create a script using bash.

#!/bin/sh

# Create a timestamp
date=`date "+%Y-%m-%dT%H_%M_%S"`

# Specify the folder to snapshot on your local machine
SOURCE=/home/joel/saveme/

# Specify the destination folder on your VPS
DEST=/home/joelvps/snapshots

# Execute rsync followed by cleanup
rsync -azvP \
  --delete \
  --link-dest=../current \
  $SOURCE [email protected]:$DEST/backup-$date \
  && ssh [email protected] \
  "rm -rf $DEST/current \
  && ln -s $DEST/backup-$date $DEST/current"

The first time you run this script, with your customizations, the entire directory you specify under BACKUP will be synchronized to your VPS. The script will then create a symbolic link between this newest snapshot and current in your destination directory. current will be used in future snapshots for the --link-desk option in rsync. This links unchanged files to a previous backup, and only needs to claim more space for new or changed files. It’s a great way to conserve space while ensuring all your files are backed up properly.

The only thing this script doesn’t currently do is automatically delete old snapshots as new ones are created. I prefer to do that manually, at least for now, but there certainly are ways to get around that. At the end of the above script, you could look for use find to discover directories more than 5 days old and delete them:

find /home/joelvps/snapshots/ -maxdepth 1 -type d -mtime +5 -exec rm {} 

But, again, I’m going to play things cautiously for a while.

Automating everything with cron

The last step of this process is to automate each of the backups so that I don’t have to run them manually. Luckily, I have cron.

$ crontab -e

I can add both the daily synchronize command and the weekly snapshot script.

30 5 * * * rsync -av --delete -e ssh /home/joel/saveme/ [email protected]:/home/joelvps/backup/
0 20 * * 5 /usr/local/bin/daily-snapshot.sh

The first line executes the synchronization at 5:30 a.m. every morning. The second line executes my backup script at 8:00 p.m. once per week, on Friday.

Note: Using cron with rsync only works if you set up passphrase-less SSH key pairs between your local machine and the VPS. Otherwise, the commands will hang on your VPS asking for you to enter your password.

Other tools

Of course, you can use one of the many backup-centric tools that other Linux users have created to automate these processes and make them a little more friendly for those less experienced with the nuances of rsync:

rsnapshot

Attic

rdiff-backup

TimeShift

Changelog:

May 1: Updated with an improved version of the rsync + Bash script that actually works! Thanks to John for helping track this one down.