A mdadm cheet sheet

Addressing some of my recurring issues with mdadm

  1. Intro
  2. Check
    1. Check Overall Status
    2. Detailed Status of a Specific RAID Array
    3. List RAID Devices and Their Status
    4. Check Disk Health Using SMART
  3. Add Drive (back)
    1. Identify the Missing/Removed Drive
    2. Re-add the Drive to the RAID
    3. Check the rebuild status:

Intro

I have this old Intel Atom D2700 board that I am using as a NAS. I love the thing since it is a really low power (like 15 W), I guess partially due to the small (300 W) 80+ gold PSU and only using laptop/2.5 inch drives.

The server uses five 1 TB 2.5 inch hard drives (it’s what I had lying around) as a RAID-5 array and a small 64 GB SSD for the OS. (The board itself only has two SATA ports, but it also has a mini PCI-E slot in which I put a SATA controller card with four more SATA ports). I also put a second SATA controller in the PCI-E slot, but haven’t tried if it works together with the mini PCI-E SATA card yet though.

It has been running very solid, but every once in a while when I check, I notice that a drive is removed from the array. Scans of smartctl never show anything alarming, so I am wondering if it is actually an issue with the SATA controller.

Anyway, here is how I reattached the removed disk. It’s been working fine now. But if it happens again, I will definitely replace the whole array with new and bigger drives (maybe three 4 TB SSDs since the whole point of this machine is to be ultra-low power)

Check

To check the health status of a mdadm RAID array:

Check Overall Status

cat /proc/mdstat

Detailed Status of a Specific RAID Array

sudo mdadm --detail /dev/md0

List RAID Devices and Their Status

sudo mdadm --examine --scan

Check Disk Health Using SMART

If you suspect a disk issue, you can check its SMART status:

smartctl -a /dev/sdf

Add Drive (back)

For me, sudo mdadm --detail /dev/md0 showed /dev/sdf1 as removed. So I wanted to add it back:

Identify the Missing/Removed Drive

Ensure the drive is physically connected and detected by the system:

lsblk
sudo fdisk -l

Re-add the Drive to the RAID

Once you confirm the disk is available, add it back to the array:

sudo mdadm --add /dev/md0 /dev/sdf1

Adding fails with error:

The above command yielded:

"mdadm: Failed to write metadata to /dev/sdf1"

Which was a bit alarming. I first tried wiping the metadata

sudo wipefs -a /dev/sdf1
sudo dd if=/dev/zero of=/dev/sdf1 bs=1M count=100

The adding still failed, so I checked dmesg:

dmesg | grep sdf

Which showed me even more alarming news:

Buffer I/O error on dev sdf1, logical block 25343, lost async page write

I then ran a long smartctl test:

sudo smartctl -t long /dev/sdf

And it passed with no error. So I zeroed the whole drive:

sudo dd if=/dev/zero of=/dev/sdf bs=1M status=progress

Then repartitioned:

sudo parted /dev/sdf --script mklabel gpt
sudo parted /dev/sdf --script mkpart primary 0% 100%

And set partition type:

sudo fdisk /dev/sdf
  • Press t to change type
  • Choose fd (Linux RAID autodetect)
  • Press w to write and exit

Verified all looks as it should:

lsblk /dev/sdf

And finally was able to re-add:

sudo mdadm --add /dev/md0 /dev/sdf1

Check the rebuild status:

cat /proc/mdstat
sudo watch mdadm --detail /dev/md0