Backup Your Files with rsync and SSH

By Jimmy Bonney | February 9, 2008

Key on disk

If you have many important files on your computer, you probably save them somewhere, from time to time: usb key, CD, server… the choice is yours. But the problem is to maintain this backup up-to-date. So what about making a backup à la “time-machine” to save your files on a regular basis and be able to access previous versions or the latest one easily. We will realize a backup of the files of the computer to our web hosting / server.

For that, we’ll use the linux tools rsync and SSH. This tools are usually built-in in Linux distributions or you can easily install them otherwise with your favorite packet manager. Under Windows, that’s another story but nothing is completely lost. I will begin to describe how-to install the tools under Windows (sometimes you just don’t have the choice of your OS) and then describe the backup procedure. The procedure applies for both linux and windows.

I make here the assumption that you have an ssh access to your distant server. This will allow you to have an encrypted connection between your computer and the server and and therefore will prevent anyone to intercept your backups when you send them.

Cygwin (SSH and Rsync)

Cygwin is a software allowing you to use Linux tools under Windows. First thing to do is to download it and to install it. There is nothing special to say about it. Installation is pretty straightforward. For what I have observed some FTP sites are not responding so I recommend the following: ftp.easynet.be, ftp.gwdg.de, ftp.heanet.ie or mirror.calvin.edu. Just install the default files and in addition select the packages rsync and openssh. This will install some dependency packages as well, just go ahead. Once everything is installed, just execute Cygwin.bat in your installation directory and you will see something like:

Cygwin

SSH automatic authentication

To simplify the backup process, we will generate a private and a public key to allow automatic authentication on the server from your computer. On your newly installed cygwin console, just type:

1
ssh-keygen -b 1024 [-f identity] -P '' -t dsa

ssh-keygen will generate a couple of private / public key. The length (in bit) is given by the option -b 1024, the name of the file is given by the option -f identity, the passphrase by -P and finally the type by -t dsa. We specify an empty passphrase here since we want to be able to connect automatically to the server without having to enter manually any password. This can be dangerous as we will see later so please keep your private key PRIVATE. By default you should now have two new files in your C:\Documents and Settings\user directory named identity (private key) and identity.pub (public key).

If necessary, modify the public key file so that it finishes with a valid ssh user, i.e. a ssh user allowed to connect to your server (typically it should finish with the username you are using to connect to the server through ssh).

Next step will be to transfer the public key to the server. You can do it in many different ways. Just keep in mind that FTP is not secure so why not use SFTP instead since you have a ssh access on your server? An easy way to use SFTP is simply to use a FTP client (like Filezilla) and to enter your ssh account credentials. The default port is 22 but check with your web host since it can be something different.

Filezilla

On the server, at the root of your home directory, you should create (if it doesn’t exist already) a .ssh directory. Move the public key (identity.pub) that you have created to this directory. Then if you don’t have a file called authorized_keys just rename your identity.pub to authorized_keys. Change the permissions of the file to only allow the reading by the owner of the file. You can modify the permissions with Filezilla (right click on the file) or through ssh:

1
chmod 400 .ssh/authorized_keys

Once we have done that, it is time to modify the client side, i.e. your computer. Cygwin should have created a .ssh directory in your user folder. I suggest that you move your private key (identity) to this directory. And now it is time to try this automatic connection. Open a Cygwin console and type:

1
ssh -i .ssh/identity -p 22 -l username yoursite.com

This command tries to open a SSH session using the private key identity (you have to specify the path of the file: -i .ssh/identity), on the port 22 (-p 22 but as before check if your SSH access is done through the default port), with the user username (-l username) on the distant host yoursite.com. If everything works you should now be connected to your server without having to enter any password. As I said earlier, anyone who got access to your private key can now connect to the server and execute arbitrary command without having to enter password. That is the reason why you have to make sure your private key remains private. You can as well limit the command that one can execute with the couple of keys you are using to connect to the server (more information here).

Backup command

Everything is now in place so let see how to backup your important files. As I said earlier, we will use rsync to do that. You need rsync on both sides, i.e. the machine you want to save files from and the distant server. We previously installed cygwin and rsync on the computer from which you want to backup file. I assume that your server is running Linux and that your web host allows you to execute rsync (if you have a SSH access to your server, you should as well have Rsync).

And then, you just have to execute the following magic line to make your backup:

1
rsync [options] -e "ssh -i .ssh/identity -p 22 -l username" /local/machine/repository/to/backup \  yoursite.com:/repository/where/you/want/your/backup

Ok, maybe it is not clear enough so let say a few words about it. First thing to know is that rsync comes with many many options. You can read them all easily (man rsync). So we have to find adequate options to do our backups. The first option to explain is the -e that specify the remote shell to use. In our case, we just use the SSH access that we have been created earlier.

Then the other options will depend on what you want to do.

If you want to create a mirror of some of your directory, the following options should make the trick:

1
2
rsync -avz --delete -e "ssh -i .ssh/identity -p 22 -l username" /local/machine/repository/to/backup \
yoursite.com:/repository/where/you/want/your/backup

-a is to archive the file,\ -v is the verbose option,\ -z is used to compress the files,\ –delete is used to delete the file on the server if they have been deleted on the local machine. If you omit this option, then you will do an incremental backup of your files, meaning that everything that has been to the server will remain on the server.\ /local/machine/repository/to/backup is the source\ yoursite.com:/repository/where/you/want/your/backup is the destination folder

If you want to create daily backup so that if your files get corrupted one day, you are able to restore them as they were the day before, you would prefer:

1
2
rsync -avz -e "ssh -i .ssh/identity -p 22 -l username" --link-dest=/repository/where/is/your/previous/backup \
/local/machine/repository/to/backup yoursite.com:/repository/where/you/want/your/backup

–link-dest is used to create a full backup from the previous location by using hard links. I will say more about it in the next section.

Scripts

It is time to put everything together now. In order to do that, we will create two little scripts. One on our machine that will contact the server and send the files to it and the other one on the server to do some post backup operations. The script that I will use here is based on the second option, i.e. it will do a snapshot of your directory every time you are calling it allowing you to go back in time to a previous version whenever you have to. You should be able to copy paste the script below, the only things you have to modify are the paths to the files and the ssh parameters.

Create a file backitup.sh that you will put in your user directory (local machine) and copy the following content in it:

1
2
3
4
5
6
7
8
#!/bin/bash
# Script to be executed on the client side
# Take the actual date and time
date=`date "+%Y-%m-%dT%H:%M:%S"`
# Execute the synchronisation
rsync -avz --exclude-from=/path/to/files_to_exclude --link-dest=/path/to/current/backup -e "ssh -i .ssh/identity -p 22 -l username"  /cygdrive/c/Important/Family/Photos yoursite.com:/path/to/important/folder/backup-$date
# Execute post backup command (in order to create correct shortcut to the current version)
ssh -i .ssh/identity -p 22 -l username yoursite.com /path/to/post_backup.sh "$date"

This script will create a snapshot of an important folder of your drive on your distant server. The path to the folder has to be given according to cygwin rules. Your C:\ drive is accessible under /cygwin/c/. If you want to save the folder Photos located in C:\Important\Family\ you have two options:

  • /cygdrive/c/Important/Family/Photos
  • /cygdrive/c/Important/Family/Photos/

The final / will determine how files will be saved on the server. If you omit it, all the subfolders of Photos will be copied and Photos will be copied as well. If you put it, you will copy all the subfolders but not Photos.

Your files will be copied on the server in a folder containing the date and time of the backup (given by the date attribute on the first line of the script). In my case the path will look something like /home/jimmy/backups/backup-2008-02-12T15:24:23. Depending on the choice you have made previously the folder backup-2008-02-12T15:24:23 will contain either just the Photos folder (and all subdirectories of course will be under Photos) or will contain all the subdirectories of Photos. The choice is yours.

If you want to exclude some files from the backups, for example some annoying system files or some hidden files, just add the --exclude-from=/path/to/files_to_exclude. Here is an example of a file but for more options and information, just check the EXCLUDE PATTERNS list available here:

#RSync exclusion list
# Usual system files (Windows, Mac, Linux)
Thumbs.db
.DS_Store
.directory
# Recycle bin to ignore
.Trashes/
Recycled/

Finally, the option --link-dest=/path/to/current/backup is used so that we do not copy all the files each time we are doing a backup. In fact, only new files will be sent to the server. If files remained unchanged between two copies, then we will just create a hard link to the file in the new backup (For more information, you can visit one of the source of this article). Be careful that in our case, the path to current backup is a path on the server side. I will give more information about this in the next script.

Last line of the script is a connection to your server and is used to execute the post_backup script explained below. It executes the script and gives it an argument: the date of the backup. That’s all for the client side for now.

Create a file post_backup.sh with the following content:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
#!/bin/bash
# Script to be executed on the server side
echo "Execute the post back-up script on the server";
# Delete previous current link (=shortcut to most recent version) if it exists
if [ -d /path/to/current/backup ] ; then
rm /path/to/current/backup ;
echo "Previous shortcut deleted";
fi ;
#Create the new link with the date passed by the client
# Check if we have the date of the last backup
if [ $# -eq 1 ] ; then
NEW_LINK=backup-$1;
ln -s /path/to/important/folder/$NEW_LINK /path/to/current/backup;
echo "Create new link to backup files";
fi;

This script does only two things. The first one is to delete (if it exists) a shortcut that was linking to the latest previous backup. The second one is to recreate this shortcut to make it link to the backup we have just done. This script takes an argument, the date of the latest backup. That’s the argument the client give when executing its last line. Once you have modified the script to suit your own needs, send it to the server as you did before to send the public key and put it in the right place, i.e. at the same place you specify on the client script.

Try to execute the script on the client side and verify the result on the server. Once executed, you should find the folders you wanted to backup, plus a shortcut pointing to the latest backup. If you execute it right after, it should be faster since almost no data will be sent to the server. You will then have two copies, but the files will be only once on the server. Deleting one of the backups does not affect the others so you can clean your server regularly if you want to (to keep only the 10 latest backup for example (I will probably do another post about that later)).

You now have an easy manual way to save your important file. You can stop reading here if you are not interested to do that automatically. The last section will create a windows job to execute it at a regular time.

[Edit] I have posted a new message concerning SSH limitations that I have encountered while executing these commands. [/Edit]

Automatic task

Alright, we are almost done. Last part will be to create a scheduled task with windows or a cron job with linux. For the cron job, it’s pretty easy since you can execute the command directly. Just look for a nice tutorial (for example this one) and follow it. For windows, you just have to create a scheduled task (accessible from the control panel) and make it execute the following automatic_backup.bat

C:
chdir C:\cygwin\bin
bash --login -i ./backitup.sh

You can as well give the absolute path to the script file /cygwin/c/…/backitup.sh.



For the time being, comments are managed by Disqus, a third-party library. I will eventually replace it with another solution, but the timeline is unclear. Considering the amount of data being loaded, if you would like to view comments or post a comment, click on the button below. For more information about why you see this button, take a look at the following article.