After talking with a new colleague, I got asked why I did not write a simple bash script using sha1deep.

To make things short… thank you Norbert for that! :) The script itself is based on a idea from him. So all credits go to him!

To make things easier and usable for additional functionality, I decided to split everything into a bunch of scripts:

Downloading everything onto our machine Using sha1deep and later comparing the results from yesterday with today Archiving everything Clean up Step 1 – Downloading everything onto our machine

First we need something to download everything to the analysis machine. You can use scp, which offers the compression option to reduce the amount of data to transfer, or if not possible (like in my case), just use wget with the ftp parameter set.

#!/bin/bash
cd /media/sda1/
wget -q --mirror --ftp-user="ftpuser" --ftp-password="ftppassword" ftp://domain.tld/

Step 2 – Using sha1deep and later comparing the results from yesterday with today

To adjust the script to your needs, simply change the content of the variable path into something for you fitting.

#!/bin/bash
path='/media/sda1/domain.tld/'
dt=$(date "+%Y%m%d")
dy=$(date "+%Y%m%d" -d "yesterday")
sha1deep -r $path >/media/sda1/backups/logs/fdt-$dt 
diff /media/sda1/backups/logs/fdt-$dt /media/sda1/backups/logs/fdy-$dy >/media/sda1/backups/logs/fdiff-$dt

Step 3 – Archiving everything

#!/bin/bash
dt=$(date "+%Y%m%d")
# Create Archive
tar -czf /media/sda1/backups/webspace/domain.tld.$dt.tgz /media/sda1/domain.tld/

Step 4 – Clean up

To ensure, that we keep enough space on the USB Stick, I decided to delete every file older than 7 days.

#!/bin/bash
find /media/sda1/backups/ -type f -mtime +7 -exec rm -f {} \;

I know that this scripts should be improved. But I will give it a try and let us see, what goes wrong in the next few days ;-)

Future plans

Documentation, speedups, more checks and something I don’t know about yet ;-)

This bunch of scripts, or this single script if put together, are fulfilling my needs at the moment.

Now I can think about reporting and analyzing the output, which will be the next step.

Sidenote

All scripts are running on a spare Raspberry Pi and because everything is running during night times, it does not matter if if takes longer :-)

Raspberry Pi