Ceph maintenance with Ansible

Following up this article.

This playbook was made to automate Ceph servers maintenance. The typical use case is an hardware change. By running this playbook you will set the noout flag on your cluster, which means that OSD can’t be marked as out of the CRUSH map, but they will be marked as down. Thus the OSD will not receive any data. Basically we tell the cluster to do not move any data since the operation will not last for too long.

What does it do?

  • It sets the noout flag on your Ceph cluster
  • Turn off the machine that you want to manage
  • Wait for the server to come up again
  • Unset the noout flag on your Ceph cluster

How to use it:

$ ansible-playbook -v maintenance.yml

PLAY [ceph3] ***************************************************

TASK: [Set the noout flag] ****************************************************
changed: [ceph3] => {"changed": true, "cmd": ["ceph", "osd", "set", "noout"], "delta": "0:00:00.280238", "end": "2014-04-09 17:40:40.101276", "rc": 0, "start": "2014-04-09 17:40:39.821038", "stderr": "set noout", "stdout": ""}

TASK: [Turn off the server] ***************************************************
changed: [ceph3] => {"changed": true, "cmd": ["poweroff"], "delta": "0:00:00.008236", "end": "2014-04-09 17:40:41.385631", "rc": 0, "start": "2014-04-09 17:40:41.377395", "stderr": "", "stdout": ""}

TASK: [wait for the server to go down (reboot)] *******************************
ok: [ceph3] => {"changed": false, "elapsed": 2, "path": null, "port": 22, "search_regex": null, "state": "stopped"}

TASK: [Wait for the server to come up] ****************************************
ok: [ceph3] => {"changed": false, "elapsed": 47, "path": null, "port": 22, "search_regex": null, "state": "started"}

TASK: [Unset the noout flag] **************************************************
changed: [ceph3] => {"changed": true, "cmd": ["ceph", "osd", "unset", "noout"], "delta": "0:00:00.277196", "end": "2014-04-09 17:41:30.993053", "rc": 0, "start": "2014-04-09 17:41:30.715857", "stderr": "unset noout", "stdout": ""}

PLAY RECAP ********************************************************************
ceph3 : ok=5 changed=3 unreachable=0 failed=0


Hope it helps!

Comments