I’m dealing with terabytes of data on multiple servers that needs to be backup on a weekly basis. First solution was to compress all the data and transfer it via FTP on remote backup server. This is huge waste of resources as 90+% of data doesn’t change between two backups. I thought about rsync, which is great for minimizing bandwidth usage, because it only transfer parts of the file that changed. Main problem with rsync is, that it doesn’t compress the data. Storage required for backup is 1:1 which can be a waste of resources. We can compress the data with gzip, rar, …, but we would make a big mistake. DAR is the answer.
Application desired functionality:
- scheduled backups (e.g. once per week)
- files and MySQL database
- minimal bandwidth: transfer only changed data
- minimal backup size: compression
Why we want to use DAR over gzip or something similar? Let assume we have 1.000 files and we change only couple of them. Compression algorithms would normally create compressed file which would differ a lot from the compressed file before the change. This is bad news for rsync as it has to transfer all that changed data to backup server. DAR has solution for this problem. It compress and catalogs every file individually in DAR archive. There is some overhead but it’s minimal. However only parts of the archive that contain changed files will be different from previous archive, which means less work for rsync and a lot less used bandwidth. We can also use Parchive with DAR for error correction. This is optional but it provides additional data redundancy for correcting damaged archives.
Application
I wrote a small script in BASH. I’m not a BASH programmer. If you can improve the scripting, please do so. You can download it here.
Before you can run it, you have to edit conf file. Inside you will find options like list of directories to backup with rsync, DAR + rsync, MySQL database backup and credentials for rsync server. You must setup rsync server on your backup server. It’s supported basically on every platform. After configuration you can run the backup job by running file named backup.
First it will backup MySQL database, then it will make local DAR archive of directories and finish with rsync of directories without DAR. It will create temporary archives in local folder, but after transfer it will remove them. You need a minimum free space of biggest folder you specified in DAR archive under conf file.
Required dependent applications:
- rsync client and server
- DAR
- par2 for archive error correcting (optional)
- nice and ionice (part of every linux distribution)
Application specifications:
- backup for files using rsync or DAR + rsync
- MySQL backup
- supports setting nice and ionice
- data transfer is done to remote rsync server
- ability to run from scheduled jobs (crontab)
Future work:
Every new backup will override previously created backup. I would like to add an option to keep N backups on backup server.
Attachments:
or bitcoin donation: 1AZpTrJbUNHGSaXfG1AwzCxSRRTZGEJQck