Failover active/passive on NFS using Pacemaker and DRBD
Bring High-availability to your NFS server!
I. Hosts and IPs
127.0.0.1 localhost
# Pacemaker
10.0.0.1 ha-node-01
10.0.0.2 ha-node-02
For high-availability purpose, I recommend using bond interface, it’s always better to have a dedicated link between the nodes. Don’t forget the ifenslave
package for setting up the bonding. Something of the parameter below can be specifics to a setup. Simply note that it’s optinnal if you only want to try it with virtal machines.
auto eth1
iface eth1 inet manual
bond-master ha
bond-primary eth1 eth2
pre-up /sbin/ethtool -s $IFACE speed 1000 duplex full
auto eth2
iface eth2 inet manual
bond-master ha
bond-primary eth1 eth2
pre-up /sbin/ethtool -s $IFACE speed 1000 duplex full
auto ha
iface ha inet static
address 10.0.0.1
netmask 255.255.255.0
mtu 9000
bond-slaves none
bond-mode balance-rr
bond-miimon 100
Do the same setup for the second node.
II. DRBD Setup
Create a logical volume. This volume will act as a DRBD device. Here I assume that you have a LVM based setup. I create a logical volume named drbd
on my volume group data
.
$ sudo lvcreate -L 10GB -n drbd data |
And custom some LVM options /etc/lvm/lvm.conf
for drbd:
write_cache_state=0
$ sudo rm -rf /etc/lvm/cache/.cache |
Check if the changes took effect:
$ sudo lvm dumpconfig | grep write_cache |
The following actions have to be done on each backend servers.
Install DRBD and remove it from the boot sequence because pacemaker will manage it:
$ sudo aptitude install -y drbd8-utils |
The drbd global configuration file:
global {
usage-count yes;
# minor-count dialog-refresh disable-ip-verification
}
common {
protocol C;
handlers {
pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
## avoid split-brain in pacemaker cluster
fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
split-brain "/usr/lib/drbd/notify-split-brain.sh root";
after-resync-target "/usr/lib/drbd/crm-unfence-peer.sh";
}
startup {
# reduce the timeouts when booting
degr-wfc-timeout 1;
wfc-timeout 1;
}
disk {
on-io-error detach;
#avoid split-brain in pacemaker cluster
fencing resource-only;
}
net {
## DRBD recovery policy
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}
syncer {
rate 300M;
al-extents 257;
}
}
The drbd resource configuration file (/etc/drbd.d/r0.res
):
resource r0 {
on ha-node-01 {
device /dev/drbd0;
disk /dev/data/drbd;
address 10.0.0.1:7788;
meta-disk internal;
}
on ha-node-02 {
device /dev/drbd0;
disk /dev/data/drbd;
address 10.0.0.2:7788;
meta-disk internal;
}
}
Before continuing check your configuration file:
$ sudo drbdadm dump all |
Now wipe off the device:
$ sudo dd if=/dev/zero of=/dev/data/drbd bs=1M count=128 |
Still on both servers launch, initialize the meta data and start the resource:
$ sudo drbdadm -- --ignore-sanity-checks create-md r0 |
On the ha-node-01 run:
$ sudo drbdadm -- --overwrite-data-of-peer primary r0 |
Watch the synchronisation state:
$ sudo watch -n1 cat /proc/drbd |
Also on the ha-node-01 run:
$ sudo mkfs.ext3 /dev/drbd0 |
Mount your resource and check his state:
$ sudo mount /dev/drbd0 /mnt/data/ |
III. NFS server setup
Install NFS tools:
$ sudo aptitude install nfs-kernel-server |
Fill the /etc/exports file with:
/mnt/data/ 10.0.0.0/8(rw,async,no_root_squash,no_subtree_check)
Always on ha-node-01
Change in /etc/default/nfs-kernel-server
this value:
NEED_SVCGSSD=no
For rpc communication in /etc/default/portmap
change this
#OPTIONS="-i 127.0.0.1"
Now export the share:
$ sudo exportfs -ra |
IV. Pacemaker setup
Install pacemaker:
$ sudo aptitude install pacemaker |
Configure corosync:
$ sudo sed -i s/START=no/START=yes/ /etc/default/corosync |
For the corosync-keygen you will need an entropy generator, run this from an other shell:
$ while /bin/true; do dd if=/dev/urandom of=/tmp/100 bs=1024 count=100000; for i in {1..10}; do cp /tmp/100 /tmp/tmp_$i_$RANDOM; done; rm -f /tmp/tmp_* /tmp/100; done |
Copy the generated key and the corosync configuration on the other backend node:
$ sudo scp /etc/corosync/authkey /etc/corosync/corosync.conf root@ha-node-02:/etc/corosync/ |
Modify this section in /etc/corosync/corosync.conf
interface {
# The following values need to be set based on your environment
ringnumber: 0
bindnetaddr: 10.0.0.0
mcastaddr: 226.94.1.1
mcastport: 5405
}
Run the corosync daemon on both backend node:
$ sudo service corosync start |
At the first you should see something like this:
$ sudo crm_mon -1 |
IV.1. Setup the failover
- Virtual IP address
- NFS ra
- DRBD ra
Use this configuration for pacemaker:
$ sudo crm configure show |
Warning: be sure that the name of your nfs init script is nfs-kernel-server
or modify this line:
lsb:nfs-kernel-server
With the correct script name.
At the end, you should see this:
$ sudo crm_mon -1 |
Enjoy ;)
Comments