Ceph and KRBD discard
Space reclamation mechanism for the Kernel RBD module. Having this kind of support is really crucial for operators and ease your capacity planing. RBD images are sparse, thus size after creation is equal to 0 MB. The main issue with sparse images is that images grow to eventually reach their entire size. The thing is Ceph doesn’t know anything that this happening on top of that block especially if you have a filesystem. You can easily write the entire filesystem and then delete everything, Ceph will still believe that the block is fully used and will keep that metric. However thanks to the discard support on the block device, the filesystem can send discard flush commands to the block. In the end, the storage will free up blocks.
This feature was added into the Kernel 3.18.
Let’s create a RBD image
$ rbd create -s 10240 leseb |
Map it to a host and put a filesystem on top of it:
$ sudo rbd -p rbd map leseb |
Ok we are all set now, so let’s write some data:
$ dd if=/dev/zero of=/mnt/leseb bs=1M count=128 |
Then we check the size of the image again:
$ rbd diff rbd/leseb | awk '{ SUM += $2 } END { print SUM/1024/1024 " MB" }' |
We know have 128MB of data and ~14,406MB of filesystem data/metadata. Check that discard is properly enabled on the device:
root@ceph-mon0:~# cat /sys/block/rbd0/queue/discard_* |
Now let’s check the default behavior, when discard is not supported, we delete our 128 MB file so we free up some space on the filesystem. Unfortunately Ceph didn’t notice anything and still believes that this 128 MB of data are still there.
$ rm /mnt/leseb |
Now let’s Run the fstrim
command on the mounted filesystem to instruct the block to free up unused space:
$ fstrim /mnt/ |
Et voilà ! Ceph freed up our 128 MB.
If you want to run discard on the fly and let the filesystem check for discard all the time you can mount the filesystem with the discard
option:
$ mount -o discard /dev/rbd0 /mnt/ |
Note that using the
discard
mount option can be a real performance killer. So generally you want to trigger thefstrim
command through a daily cron job.
Comments