Replacing a failed disk in an MD array

Sooner or later you will end up with the infamous [_U] in your MD array. If you are lucky you may be able to recover it by rebooting/removing/inserting the drive and then re-adding it back to the MD device. The question is always if you should trust once failed drive. Some times it is just easier if the drive just dies completely. Whether or not the faulty drive is accessible I find the most robust way of removing it from an array is to simply just power the system down and remove the drive, or if you are in a hot swap environment, just unplug the drive. (Make sure to remove the correct drive!) MD will detect the faulty dry drive and remove it from the array and set the MD-device in a degraded state. Some people claim that you should fail the drive first but this will make it not to be a member of the MD-device anymore and MD will also write to the disk which is not always a good idea if the drive is bad (you may need to recover data from it at some point). let's say I have replaced /dev/sda and created a partition of the same size as the old disk. The new disk have the same name but is unknown to mdadm and have to be added to the array to complete the replacement. # mdadm --manage /dev/md0 --add /dev/sda1 If you are unable to re-create the disk under the same name, for example /dev/sda1, you can add a new drive under another name, for example /dev/sdc1, and then remove the old reference from mdadm.

# mdadm --manage /dev/md0 --add /dev/sdc1
# mdadm --manage /dev/md0 --remove /dev/sda1

If the old disk name is no longer available you can remove all references to disks that are no longer present. # mdadm --manage /dev/md0 --remove detached As usual the status can be read from /proc/mdstat

#cat /proc/mdstat

Personalities : [raid1]
md0 : active raid1 sda1[2] sdb1[1]
      1953512400 blocks super 1.2 [2/1] [_U]
      [=========>...........]  recovery = 45.0% (879803904/1953512400) finish=191.4min speed=93475K/sec

unused devices: <none>

If the rebuilding speed is slow you can force a higher speed by increasing the lower speed limit. echo 100000 > /proc/sys/dev/raid/speed_limit_min This will limit the speed to at least 100000K/sec (if the system can keep up with that). Watch out though! Rebuilding the array will put extra stress on the remaining drives. Rebuilding may cause other drives to fail. It may be more safe to rebuild at a low speed.

Write a comment