Handling Ceph near full OSDs

Running Ceph near full is a bad idea.   What you need to do is add more OSDs to recover.    However, during testing it will inevitably happen.  It can also happen if you have plenty of disk space, but the weights were wrong.  UPDATE: even better, calculate how much space you really need to run ceph safely ahead of time.  If you have to resort to handling near full OSDs, your assumptions about safe utilization are probably wrong.

Usually when OSDs are near full, you’ll notice that some are more full than others.   Here are the ways to fix it:

Decrease the weight of the OSD that’s too full.  That will cause data to be moved from it to OSDs that are less full.

ceph osd crush reweight osd.[x] [y]  

x is the OSD id, y is the new weight, be careful making big changes, usually even a small incremental change is sufficient

Temporarily decrease the weight of the OSD.  This is same as above except that the change is not permanent

ceph osd reweight [id] [weight]

id is the OSD# and weight is value from 0 to 1.0 (1.0 no change, 0.5 is 50% reduction in weight)

for example:
ceph osd reweight [14] [0.9]

 

Let Ceph reweight automatically

ceph osd reweight-by-utilization [percentage]

Reweights all the OSDs by reducing the weight of OSDs which are heavily overused. By default it will adjust the weights downward on OSDs which have 120% of the average utilization, but if you include threshold it will use that percentage instead