Replacing A Failed USB Disk In A Raspberry Pi-Based RAID Mirror

My previous post went into how to create a simple but functional NAS with a Raspberry Pi 4B and two USB-attached SATA disks. In the two weeks or so that it’s been running, the NAS I built has performed very well and has been reliable (hopefully I won’t regret typing that).

But what to do WHEN a disk fails? Disks fail – even that fancy new enterprise-grade SSD that cost an arm and a leg will fail at some point. The good news is that if you’re using mdadm to provide some kind of redundancy with your disks, things should still be working if a disk fails. The bad news is that unless you’ve got a RAIDset that can specifically tolerate more than one failure (like RAID 6), you need to replace that failed disk ASAP.

I’m confident that I’ll be able to recover from losing a disk in my shiny new NAS, but I’m not one to tempt fate so I built another RAIDset with a spare Pi and two 64GB SanDisk USB sticks to play around with instead. They’re slower than the disks so things like the speed the RAIDset syncs back up is going to be different than in my previous post.

So here’s the setup – it’s a Raspberry Pi 4B (2GB) with two 64GB USB flash drives in a RAID 0 (mirror) configuration.

Here it is, working properly, with the output of cat /proc/mdstat:

and checking to see if it’s mounted using df:

To simulate a disk failure, I removed one of the USB sticks while everything was running. Here’s the output of dmesg showing the disconnection and that mdadm is soldiering on with only one disk:

Looking at the list of USB-connected devices only shows one SanDisk device:

And now the output of cat /proc/mdstat is showing a failed disk (note the “U_”):

The good news is that yes, /dev/md0 is still mounted and usable, even though it’s in a degraded state.

I reformatted the USB stick on my Windows PC so the data that was on there was lost, then reconnected it to the Pi:

There are two SanDisk devices again.

And here’s the output of dmesg again – you can see the time difference between the failure and when the “new” disk was connected:

Note that the messages both of the failure and of the newly connected USB stick show them as sdb. It could just as easily have been sda, so make sure you check to see which one failed – and, more importantly, which one didn’t!

So now there are two disks connected again, but only one of them has the RAIDset data on it. In this case, sda is the one with the data that needs to be mirrored over. Again, it could’ve been sdb. For one last check, get the output of cat /proc/mdstat again:

Notice it says sda – that means that sda has the data we want to mirror over to the other disk, which, as the previous output of dmesg showed, is sdb.

If you are replacing a failed RAID member, the replacement must be the same size or larger than the failed member. That goes for any kind of RAID level and any type (i.e. disk mirroring or partition mirroring). Keep in mind that not all disks of the same stated capacity will actually have the same capacity, so make sure you do a bit of research before going out and spending your money on a new disk that won’t fit your current array!

Now that the disk is reconnected and showing up, copy the partition layout from the existing RAIDset disk to the new disk with the following command:

sudo sfdisk -d /dev/sdX | sudo sfdisk /dev/sdY

In this case, the existing disk is /dev/sda and the new disk is /dev/sdb:

This step isn’t needed if you’re mirroring disks (as opposed to mirroring partitions), but it’s a good idea to do it anyway – if there’s an error here, you certainly don’t want to go any further until you’ve fixed the problem.

If sfdisk worked and didn’t give you any errors, then you’re ready to add the new disk to the RAIDset with the following command:

sudo mdadm --manage /dev/md0 --add /dev/sdY

Where sdY is the new disk – in my case, sdb:

If you didn’t get any errors, run cat /proc/mdstat again and you’ll see your RAIDset is rebuilding:

Notice how it now shows that there are two active elements in md0sdb[2] and sda[0]? That’s a good sign. Keep checking every once in a while to make sure the recovery is progressing.

Once it’s done, the RAIDset should be showing as all “U” again:

If you see that, everything’s rebuilt and your RAIDset is ready to handle another disk failure.

Hopefully you never need to use this information, but if you do, I hope it helps!

10 thoughts on “Replacing A Failed USB Disk In A Raspberry Pi-Based RAID Mirror”

  1. Hey Mark! Great series on using the rpi to run the mirrored NAS. I’m going to be building this as soon as my next delivery comes along. One question though, do you have a way to get notifications of a degraded mirror set?

    1. Hi Gallo!

      Thanks for the comment! The only thing I do right now is have the Pi write the contents of /proc/mdstat to a txt file at the root of the share once a minute so I can check it to see if both disks are showing as “U”. The disk enclosures are in my field of view enough that I expect I’ll see the status light change on them.

      That being said… it’s probably a good idea to set up an email notification or something like that. I will add that to my list of projects. 🙂

      Good luck with your build!

        1. Hello – thank you for the link, that looks like it’ll do the job quite nicely. Automatically sending an email when mdadm sees a problem is a lot better than counting on me looking at the disk enclosures at the right time. Thanks for the comment!

  2. Thanks, just ran through this, a few times so if/when any of my nice new disks go byby I stand a chance of recovering the RAID:
    The smartmontools that tell me the serial number of each disk was of great help in making sure I turned off the correct “faulty” drive.
    sudo smartctl -i /dev/sdY
    also
    sudo smartctl -H /dev/sda, may be useful, to list SMART problems, if any.
    Now I need to a set up the ability to send an email if there is a detected fault.
    Thanks for all the articles.

  3. Hello,
    if I get the error “mdadm: /dev/sdb not large enough to join array” then probably the new USB-Stick is too small, correct?
    Damn -.-
    I am actually very happy with this guide, as I really have a crushed USB stick in my raid1.

    1. Hi Tom!

      Sorry it took a while to get back to you. You’ve probably already got it going by now, but you’re correct – the new USB stick is too small. I’ve run into that problem a couple of times myself – a 32GB stick from one company might be just slightly different than one from another.

      Thanks for the comment!

  4. This helped me a lot!
    In my case one of the sticks stopped working properly and somehow got stuck in an old version.
    Is there a clever way to check the raid1 for example each day is the raid1 is still working? And if not, then I will receive an E-Mail or something like that? Any ideas links or suggestions? In the www I could not find a good solution.

    1. Hello Carl!

      Sorry I didn’t get back to you sooner.

      I’m glad you found the post helpful, thanks for the comment!

      A little while ago, someone else was wondering the same question and posted a link to an article showing how to set up email so mdadm can report its status. it’s here:

    2. http://marksbench.com/electronics/replacing-a-failed-usb-disk-in-a-raspberry-pi-based-raid-mirror/#comment-1347
    3. I haven’t had a chance to try it myself but hopefully it will be useful for you.

      Good luck!

Leave a Reply to Mark Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.