mdadm: replacing a disk
It's the first time for me that a HDD in a raid1/5 array fails on me. I guess there has to be a first time for everything.
Anyways, I have a 6-2TB disk raid5 array for a total capacity of 10TB. I've had that array for now about 5 years. It's filled with 2TB hard disk drives and has been working pretty smoothly for me. 450MB/s read throughput is how I like it. It's even faster than my GB network, so this has never been the culprit in any of my operations.
For the record, I used the tool idle3-tools from cbothamy to reset my sleep time to something worthy. I have WDC green drives (see the picture, which is actually the drive that failed), and their default settings are really nuts. Really nuts.
Also for the record, my boot drive is a single old Maxtor 320GB drive and has nothing to do with mdadm. So if you have issues with booting onto your mdadm array, look no further. This page is not for you.
Now, I've had a failure this morning and I've had to change one of the drive. I have a "regular" box (See here for pics and french text) so there was no hot-swapping involved. But just a 5 minute downtime was good enough for me.
How did I find out about the faulty drive? Well, I have an "Error" section in my conky configuration and it outputs the diff of the regular /proc/mdstat with the initial one. Just logging into my desktop alerted me right away with a big red section that is usually empty right on my desktop. This is important, because if you don't monitor your raid array, it'll fail eventually on more than one drive and you'll lose everything. Note that from time to time, a red section appears in my conky as mdadm runs a routine check of the raid array.
Now, how do you get down to it. First, I needed to figure out which drive in my raid array was faulty. Here was the result of my mdstat file:
First a little explanation. The [UUUUU_] indicates the number of good disks (U) vs. the number of faulty disks (_). Similarly, the 6/5 indicates the number of total disks in the array (6) vs. the number of working disks (5). At last, the (F) next to sdf1 indicates the partifion that isn't available anymore.
Well... clearly one of the drives has made it to heaven, and it was /dev/sdf. Or has it? Could it just be a software glitch? In doubt, I reboot the server to see if things will get better. After a fight against the BIOS to make it boot with a faulty drive, I notice the same thing. I kinda hoped it would be a defect in the disk driver. Alas...
Well, the HDD seems to be dead. Let's remove it.
First, let's remove the drive from the array in mdadm:
This is done and mdadm is now aware that there is no disk /dev/sdf1 anymore. Next, I need to open the box and locate the good drive. Now... which one is it? I have 6 WDC green in my box... Let's list all the drives with their serial number:
Well... The faulty drive doesn't show its serial number anymore. The good news is that I have the serial numbers of the other drives and I'll be able to locate the one that I don't have.
I just opened the box, looked for a drive with a serial number not on my list, swapped it with a new one and put everything back together. Note that I did take care of making sure I did not swap SATA plugs. I wanted to make sure the new drive was on the same SATA plug the faulty one was on. I don't know if it matters.
After booting up, my raid array was in the same state as before shutting down. First, I had to partition the drive the same way the others were partitioned. I ran (as root):
Now, don't get mixed up with your drives. This will effectively erase the partition table of your drive, copying from /dev/sde to /dev/sdf. Depending on your setup, you can lose everything on the target drive. Here it was /dev/sdf, my newly installed drive.
The last thing I needed was to add the new drive to the array:
And that's all... almost. Now mdadm will rebuild the new drive with the data it should contain. It took about 500 minutes for me, a little more than 8 hours, so be patient. This will take time.
A few hours later, and my raid array is slowly recovering from the disaster:
As you can see, I still have 102.6 minutes left.
When everything was done:
I have to say I'm surprised at how smooth everything went. I haven't lost a byte and I still have a spare drive, although I'll need a few others if/when this happens again.
My first experience with mdadm was kind of a disaster, but it was because I used it as my / partition. Debugging it in busybox wasn't a lot of fun. But this time around, everything went smooth.