OpenStack High Availability: RabbitMQ
Rabbitmq has his own buildin cluster management system. Here, we don’t need Pacemaker, everything is managed by RabbitMQ itself.
RabbitMQ or more generally the management queues layer is a critical component of OpenStack because every requests/queries use this layer to communicate.
I. Clustering setup
$ sudo apt-get install rabbitmq-server |
RabbitMQ generates a cookie for each server instance. This cookie must be the same on each member of the cluster:
rabbitmq-01:~$ sudo cat /var/lib/rabbitmq/.erlang.cookie |
Check your cluster status, on the node 01 or 02, whatever:
rabbitmq-02:~$ sudo rabbitmqctl cluster_status |
Cluster nodes can be of two types: disk or ram. Disk nodes replicate data in ram and on disk, thus providing redundancy in the event of node failure and recovery from global events such as power failure across all nodes. Ram nodes replicate data in ram only and are mainly used for scalability. A cluster must always have at least one disk node.
You can also verify that the connection is well established between the node:
$ sudo netstat -plantu | grep 10.0. |
I.1. Tips
I.1.1. Change the IP or the hostname of a node
If you changed you IP address or your hostname, this is pretty nasty and harsh but it works:
$ sudo rabbitmqctl stop_app |
The IP address and/or will be refresh in the rabbitmq database.
I.1.2. Convert RAM node to Disk node
rabbitmq-02:~$ sudo rabbitmqctl cluster_status |
II. HAProxy configuration
Clustering doesn’t mean high-availability, this is why I put a load-balancer on top. Here HAProxy will balance the request only on one node, if this node fails the request will be route to the other node. It’s simple as that. The native port of HAproxy and the OpenStack queues ports are configured.
global
log 127.0.0.1 local0
#log loghost local0 info
maxconn 1024
#chroot /usr/share/haproxy
user haproxy
group haproxy
daemon
#debug
#quiet
defaults
log global
#log 127.0.0.1:514 local0 debug
log 127.0.0.1 local1 debug
mode tcp
option tcplog
option dontlognull
retries 3
option redispatch
maxconn 1024
# Default!
timeout connect 5000ms
timeout client 50000ms
timeout server 50000ms
listen rabbitmq_cluster 0.0.0.0:4369
mode tcp
balance roundrobin
server server-07_active 172.17.1.8:4369 check inter 5000 rise 2 fall 3
server server-08_backup 172.17.1.9:4369 backup check inter 5000 rise 2 fall 3
listen rabbitmq_cluster_openstack 0.0.0.0:5672
mode tcp
balance roundrobin
server server-07_active 172.17.1.8:5672 check inter 5000 rise 2 fall 3
server server-08_backup 172.17.1.9:5672 backup check inter 5000 rise 2 fall 3
W The use of HAProxy only makes sense if you have implemented the mirrored queues.
W Here we only have a RabbitMQ cluster which means that all data/state required for the operation of a RabbitMQ broker is replicated across all nodes, for reliability and scaling, with full ACID properties. An exception to this are message queues, which by default reside on the node that created them, though they are visible and reachable from all nodes. to achieve HA you need to implement mirrored queues in order to build a clustered mirrored queues RabbitMQ. This patch has been submitted on Gerrit and waiting for approval. This patch will only been available with Folsom.
At the moment a valid solution is to build something with Pacemaker/DRBD/RabbitMQ as an active/passive cluster. This setup is documented on the RabbitMQ website.
Comments