Corosync: Redundant Ring Protocol
Putting a Lord of the Rings picture could have been too cliché, so I opted for this… RRP abbreviation stands for Redundant Ring Protocol. A way to achieve HA on top of bonded interface.
I. Corosync communication
I.1. Reminder
Corosync is the messaging layer inside your cluster. It is responsable for several things like:
- Cluster membership and messaging thanks to the Totem Single Ring Ordering and Membership protocol
- Quorum calculation
- Availability manager
I.2.What do I need?
A lot of:
- Network interfaces, at least 4
- Cables, 4
- Switch ports
For this setup I used 2 networks:
eth0
: 10.0.0.0/8eth1
: 172.16.0.0/16
R You don’t necessary need to setup 2 different networks, 2 subnets are also ok.
II. Setup
It’s pretty easy to setup. RRP supports various mode of operation:
- Active: both rings will be active, in use
- Passive: only one of the N ring is in use, the second one will be use only if the first one fails
Make your own choice!
Can I do this on a running cluster? TOTALLY
For this put your cluster on maintenance mode, this mode means that pacemaker won’t orchestrate your cluster and will put your resource as unmanaged
. It allows you to perform some critical operations like upgrading corosync. The resources are still running but unmanaged by pacemaker.
$ sudo crm configure property maintenance-mode=true |
The state of your cluster must change with an unmanaged
flag between parenthesis:
$ sudo crm_mon -1 |
Before changes:
$ sudo corosync-cfgtool -s |
Edit your corosync.conf
with the following:
totem {
version: 2
secauth: on
threads: 0
rrp_mode: passive
interface {
ringnumber: 0
bindnetaddr: 10.0.0.0
mcastaddr: 226.94.1.1
mcastport: 5405
ttl: 1
}
interface {
ringnumber: 1
bindnetaddr: 172.16.0.0
mcastaddr: 226.94.1.2
mcastport: 5407
ttl: 1
}
}
You already have the first interface sub-section, the one with the option ringnumber
set to 0. You just need to:
- enable the rrp mode with the
rrp_mode: passive
option - add a new interface sub-section with:
- a new ring number
- the address of your new network
- a new multicast address
- a new multicast port
W The ringnumber must start at 0.
W Corosync uses two UDP ports mcastport (for mcast receives) and mcastport - 1 (for mcast sends). By default Corosync uses the mcastport 5405 consequently it will bind to:
- mcast receives: 5405
- mcast sends: 5404
In a redundant ring setup you need to specify a gap here setting 5407 will do the following:
- mcast receives: 5407
- mcast sends: 5406
Restart the corosync daemon on each servers:
[user@node1 ~]$ sudo service corosync restart |
Multicast addresses and ports:
$ sudo netstat -plantu | grep 54 |
Check the result:
[user@node1 ~]$ sudo corosync-cfgtool -s |
Check the totem members:
$ sudo corosync-objctl | grep member |
One more validation using the member’s ID:
$ sudo corosync-cfgtool -a 16777226 -a 33554442 |
Finally disable the maintenance mode:
$ sudo crm configure property maintenance-mode=false |
The (unmanaged)
flag from crm_mon -1
should disappear.
III. Break it!
The easiest way to test the rrp mode is to shutdown one of the interface:
$ sudo ifdown eth0 |
If you go to your crm_mon you will see that your cluster is perfectly running, without outage.
Et voilà! The usage of NIC bonding is mandatory for all production environment. Enabling NIC bonding + RRP make your setup ‘highly highly’ available.
Comments