Corosync: Redundant Ring Protocol
Putting a Lord of the Rings picture could have been too cliché, so I opted for this… RRP abbreviation stands for Redundant Ring Protocol. A way to achieve HA on top of bonded interface.
I. Corosync communication
I.1. Reminder
Corosync is the messaging layer inside your cluster. It is responsable for several things like:
- Cluster membership and messaging thanks to the Totem Single Ring Ordering and Membership protocol
- Quorum calculation
- Availability manager
I.2.What do I need?
A lot of:
- Network interfaces, at least 4
- Cables, 4
- Switch ports
For this setup I used 2 networks:
eth0: 10.0.0.0/8eth1: 172.16.0.0/16
R You don’t necessary need to setup 2 different networks, 2 subnets are also ok.
II. Setup
It’s pretty easy to setup. RRP supports various mode of operation:
- Active: both rings will be active, in use
- Passive: only one of the N ring is in use, the second one will be use only if the first one fails
Make your own choice!
Can I do this on a running cluster? TOTALLY
For this put your cluster on maintenance mode, this mode means that pacemaker won’t orchestrate your cluster and will put your resource as unmanaged. It allows you to perform some critical operations like upgrading corosync. The resources are still running but unmanaged by pacemaker.
$ sudo crm configure property maintenance-mode=true |
The state of your cluster must change with an unmanaged flag between parenthesis:
$ sudo crm_mon -1 |
Before changes:
$ sudo corosync-cfgtool -s |
Edit your corosync.conf with the following:
totem {
version: 2
secauth: on
threads: 0
rrp_mode: passive
interface {
ringnumber: 0
bindnetaddr: 10.0.0.0
mcastaddr: 226.94.1.1
mcastport: 5405
ttl: 1
}
interface {
ringnumber: 1
bindnetaddr: 172.16.0.0
mcastaddr: 226.94.1.2
mcastport: 5407
ttl: 1
}
}
You already have the first interface sub-section, the one with the option ringnumber set to 0. You just need to:
- enable the rrp mode with the
rrp_mode: passiveoption - add a new interface sub-section with:
- a new ring number
- the address of your new network
- a new multicast address
- a new multicast port
W The ringnumber must start at 0.
W Corosync uses two UDP ports mcastport (for mcast receives) and mcastport - 1 (for mcast sends). By default Corosync uses the mcastport 5405 consequently it will bind to:
- mcast receives: 5405
- mcast sends: 5404
In a redundant ring setup you need to specify a gap here setting 5407 will do the following:
- mcast receives: 5407
- mcast sends: 5406
Restart the corosync daemon on each servers:
[user@node1 ~]$ sudo service corosync restart |
Multicast addresses and ports:
$ sudo netstat -plantu | grep 54 |
Check the result:
[user@node1 ~]$ sudo corosync-cfgtool -s |
Check the totem members:
$ sudo corosync-objctl | grep member |
One more validation using the member’s ID:
$ sudo corosync-cfgtool -a 16777226 -a 33554442 |
Finally disable the maintenance mode:
$ sudo crm configure property maintenance-mode=false |
The (unmanaged) flag from crm_mon -1 should disappear.
III. Break it!
The easiest way to test the rrp mode is to shutdown one of the interface:
$ sudo ifdown eth0 |
If you go to your crm_mon you will see that your cluster is perfectly running, without outage.
Et voilà! The usage of NIC bonding is mandatory for all production environment. Enabling NIC bonding + RRP make your setup ‘highly highly’ available.
Comments