Ceph: manually repair object
Debugging scrubbing errors can be tricky and you don’t necessary know how to proceed.
Assuming you have a cluster state similar to this one:
health HEALTH_ERR 1 pgs inconsistent; 2 scrub errors
Let’s trouble shoot this!
Find the PG
A simple command can give use the PG:
$ sudo ceph health detail |
Ok, so the problematic PG is 17.1c1
and is acting on OSD 21, 25 and 30.
You can always try to run ceph pg repair 17.1c1
and check if this will fix your issue.
Sometime it does, something it does not and you need to dig further.
Find the problem
In order to get the root cause, we need to dive into the OSD log files.
A simple grep -Hn 'ERR' /var/log/ceph/ceph-osd.21.log
, note that if logs rotated you might have to use zgrep
instead.
This gives us the following root cause:
log [ERR] : 17.1c1 shard 21: soid 58bcc1c1/rb.0.90213.238e1f29.00000001232d/head//17 digest 0 != known digest 3062795895
log [ERR] : 17.1c1 shard 25: soid 58bcc1c1/rb.0.90213.238e1f29.00000001232d/head//17 digest 0 != known digest 3062795895
What is telling this log?
Well it says that the object digest should be 3062795895 and is actually 0.
Find the object
Now we have to dive into OSD 21 directory, thanks to the information we have it is pretty straightforward.
What do we know?
- Problematic PG: 17.1c1
- OSD number
- Object name: rb.0.90213.238e1f29.00000001232d
At this stage we search the object:
$ sudo find /var/lib/ceph/osd/ceph-21/current/17.1c1_head/ -name 'rb.0.90213.238e1f29.00000001232d*' -ls |
Now there are a couple of other things you can check:
- Look at the size of each objects on every systems
- Look at the MD5 of each objects on every systems
Then compare all of them to find the bad object.
Fix the problem
Just move the object away :) with the following:
- stop the OSD that has the wrong object responsible for that PG
- flush the journal (
ceph-osd -i <id> --flush-journal
) - move the bad object to another location
- start the OSD again
- call
ceph pg repair 17.1c1
It might look a bit rough to delete an object but in the end it’s job Ceph’s job to do that. Of course the above works well when you have 3 replicas when it is easier for Ceph to compare two versions against another one. A situation with 2 replicas can be a bit different, Ceph might not be able to solve this conflict and the problem could persist. So a simple trick could be to chose the latest version of the object, set the noout flag on the cluster, stop the OSD that has a wrong version. Wait a bit, start the OSD again and unset the noout flag. The cluster should sync up the good version of the object to OSD that had a wrong version.
Comments