Monday, January 9, 2012

Quasi-realtime filesystem replication using ZFS snapshots

Doing rsync backups of our ESXi nfs share was taking a considerable amount of time (18+ hours). This
wouldn't really be an issue except that it impacts performance of the NFS server. Also your backups
are considerably "stale" towards one week old, which makes recovery of critical data questionable.

If you need absolutely immediate replication, you can considering doing a mirror
of iSCSI targets on ZFS over a high speed connection. I've read of people doing that
but never had a need to, myself. The advantage here is that ZFS would handle re-silvering the
mirror automagicaly if the connection dropped to the external iSCSI target for a period of time.
This seems a little overkill for our situation, however.

So, I came up with a quick and dirty script that:

makes a snapshot
replicates it
deletes the older (previous) snapshot

so (as a testing phase) I have it scheduled to run every 5 minutes:
0,5,10,15,20,25,30,35,40,45,50,55 * * * * /root/replicate_vms.sh zpool1/nfs/vm/esxi XXX.XXX.XXX.XXX

(ip changed to XXX for security reasons)
you may want to change the location of the log file, /root/replicate_vms.log
to suit your purposes, but you should end up with something like 

root@XXXXXXXXX:/zpool1/nfs/vm# cat ~/replicate_vms.log
zpool1/nfs/vm/esxi@2012-01-09 08:39:41 replicated to XXX.XXX.XXX.XXX successfully.
zpool1/nfs/vm/esxi@2012-01-09 08:40:44 replicated to XXX.XXX.XXX.XXX successfully.
zpool1/nfs/vm/esxi@2012-01-09 08:43:16 replicated to XXX.XXX.XXX.XXX successfully.
zpool1/nfs/vm/esxi@2012-01-09 08:45:01 replicated to XXX.XXX.XXX.XXX successfully.
zpool1/nfs/vm/esxi@2012-01-09 08:50:00 replicated to XXX.XXX.XXX.XXX successfully.


pre-requisites: 
replication is one-way only
target filesystem must be set read only (zfs set readonly=true)
source server must be added to /root/.ssh/known_hosts in target server so ssh does not require a password
target server must have: "PermitRootLogin yes" in /etc/ssh/sshd_config
source and target server must have "lzop" program in /usr/local/bin (you can download and build it from lzop site)

source and target filesystems must be "primed" by doing the first snapshot and zfs send | zfs receive manually:
source snapshot must have YYYY-MM-DD H:M:S format so the sort works in ordering them chronologically:
zfs snapshot "<filesystem>`date "+%Y-%m-%d %H:%M:%S"`"


possible issues with this strategory:
If the volume of updates made to the filesystem > the connection bandwith between source and target,
there will be no way for it to "keep up" with live updates. I don't expect that to happen unless you
are trying to replicate over the internet or a 100mbit connection. Or perhaps if you have large size
databases or fileservers running in your VMs on the source NFS target.
If you are replicating over the internet and you don't trust your VPN security 100%, you could 
add an additional layer of encryption on top of ssh using crypt or some other command line utility
that supports standard input and standard output. ssh + lzop + crypt = pretty darn secure.

here is the script:

replicate_vms.sh
------------------------------------------------
#!/bin/bash

export PATH=/usr/gnu/bin:/usr/bin:/usr/sbin:/sbin
export last_local_snapshot="`zfs list -t snapshot -o name | grep $1 | sort | tail --lines=1`"
export new_local_snapshot=""$1@`date"+%Y-%m-%d %H:%M:%S"`""
export last_remote_snapshot=`ssh root@$2"zfs list -t snapshot -o name | grep $1 " | sort | tail --lines=1`

echo "last previous snapshot: " $last_local_snapshot
echo "new snapshot: " $new_local_snapshot
echo "last remote snapshot: " $last_remote_snapshot

zfs snapshot "$new_local_snapshot"

echo "zfs send -i \"$last_remote_snapshot\" \"$new_local_snapshot\" | ssh root@$2\"zfs receive $1 \""
zfs send -i "$last_remote_snapshot" "$new_local_snapshot" | /usr/local/bin/lzop -1c | ssh root@$2"/usr/local/bin/lzop -dc | zfs receive $1 "

export new_last_remote_snapshot=`ssh root@$2"zfs list -t snapshot -o name | grep $1 " | sort | tail --lines=1`

if [ "$new_local_snapshot" == "$new_last_remote_snapshot" ]; then

echo "$new_local_snapshot replicated to $2 successfully." >> /root/replicate_vms.log
zfs destroy "$last_local_snapshot"
ssh root@$2"zfs destroy \"$last_remote_snapshot\""
else
echo "$new_local_snapshot failed to replicate to $2! ERROR!" >> /root/replicate_vms.log
fi
-----------------------------------------------------