Confused about software RAID

I don’t think this problem is to do with Fedora specifically, but y’all and a smart and friendly bunch so I thought I’d try finding some help here. I’ve been learning about RAID recently because I plan on using it in a NAS I’m going to build in the future. To get some hands-on experience with it, I purchased 5 identical flash drives. My plan was to set them up and format them a few times so I could try out different RAID levels, and most importantly practice what to do in the event of a disk failure. I found that after deleting the first RAID array and creating a new array with a different level, the computer immediately begins “recovery” on the new array. I did not expect this because these are “new” disks in a new array, and the previous array level was 0, where there is not data recovery possible.

On top of some general reading of RAID and the mdadm man page, I’ve been following these two articles as a guide:

My first experiment was to set up the flash drives in RAID level 0, which I did easily with no issue. I copied some data onto the array and everything worked as expected. The next step in my plan was to wipe all the drives clean, set them up with RAID level 5, then remove a disk to simulate a failure and plug in the 5th blank flash drive to practice recovering from the failure.

I followed these steps to delete the RAID 0 array, and set up the new level 5 array:

  • unmount the filesystem
  • stop the array (mdadm)
  • remove the array (mdadm)
  • erase the md superblock (--zero-superblock) on all the disks
  • confirm there are no references to the array in /etc/fstab and /etc/mdadm/mdadm.conf
  • update initramfs
  • format the disks with fdisk --wipe
  • create level 5 array: sudo mdadm --create --verbose /dev/md0 --level=5 --raid-devices=4 /dev/sd{a,b,c,d}

At this point I run cat /proc/mdstat to check the array and to my surprise I find that it is in the middle of recovery. Recovering what? There’s nothing on these disks! I’m thinking I must’ve not cleared these disks properly and there must be some RAID data still on the partitions. I stop the array, remove the disks and bring them over to my laptop where I format them with gnome-disks and write 0’s to the whole disk. That’ll take care of it, I thought.

Back on my main desktop, I created the level 5 array and AGAIN, I find that it immediately begins attempting “recovery” on the disks! This makes no sense to me. My expectation is that mdadm should just create the new array and there should be no attempt at recovery. I must be doing something wrong and I was hoping someone who knows more about RAID might be able to point out where I went off.

I did manage to get a level 5 array working as expected by telling mdadm to create the virtual disk at /dev/md1 instead of /dev/md0. I think this is a clue that I must have missed something but I don’t know what. Can anyone point out where I went wrong? Or maybe it’s normal for a recovery to sometimes be run on new arrays?

Scratch that, I just checked sudo mdadm -D /dev/md1 and found that it is actually somehow level 0, despite me passing --level=5… Or maybe I just imagined I did. Man, I am confused.

2 Likes

Added mdadm and removed gnome

Any raid 5 or 6 array must do an initial “recovery” or “rebuild” process as it builds the structure for a new array. If you do not understand that process then read up about how the data is stored.

Every device in the array has a data stripe and a parity stripe to enable recovery of the data in the event of a device failure. The recovery process is building the parity stripe to match the data on the array.

Parity allows recovery of a missing data stripe when a device fails and is not really complex how it works but must be understood if you want the nitty gritty details of a raid array.

Raid 0 – striped and no recovery possible
Raid 1 – mirrored and data is identical on each (2 or more) copies.
Raid 5 & 6 – parity striped to avoid data loss. Raid 5 can only lose one device without data loss, but Raid 6 can lose 2 devices and not lose data.

1 Like

I guess I have been fortunate with RAID, because I have had an LSI card for almost a decade running 8 HDD’s. When I was experimenting I always did it through the Installer and using blivet-gui or the Advanced option for partitioning. It’s actually been the smoothest transition in installs.

Prior ro the LSI card I used onboard SATA ports. Also seamless experience. When I created a new array I made sure I dd the drives beforehand.

Great project and anything you need I’m willing to take a stab at helping with.

I was a contractor at the NCR Peripheral Products Division back in the day when they developed the very first hardware RAID controller. I still have their booklet describing RAID.

Did the new array become functional (rebuild complete)?

With mdraid doesn’t all the metadata get stored on the storage devices? It is really better if that is the case as then the devices can be moved to another maching and the array would remain intact. For instance if a motherboard fails and/or the boot drive fails.

The differences in performance between RAID5 and RAID6 can be large. I was testing a setup a few years ago and changing from RAID6 to RAID5 increased throughput from 6MB/s (painfully slow) to 160MB/s (about the full throughput of a single drive) and resolved the researchers slow storage complaints. To sort of match MTBF I just made more arrays with fewer disks.

I also did AIX support in the late '90s and had customers, even internal IBM customers, expect RAID to eliminate the need for backups. They were calling me when it turned out to be a bad decision.

More recently I have moved away from RAID. Having my data on more than one host has been easier to manage. Sticking with a filesystem like xfs or ext4 has also lessened the difficulty to manage data. What do you see as your reason for choosing to use RAID?

1 Like

Thanks for the clarification. I understand (big picture) of how data is stored in the more common raid levels. I wasn’t aware that an initial building of the parity was required when creating the new array. I guess I thought that would happen as I put data on the disks.

I’ve wiped the disks once again and I’m waiting for the rebuild to finish right now. I’m sure it will go fine.

I went with RAID because it’s the only multi-disk solution I am somewhat familiar with. I know enough to be dangerous. Case and point, I’m currently running a media server in my home using 4x 4 TB disks in RAID 5. And I installed the OS with the root partition on this array because the motherboard couldn’t support another disk just for the OS. It recently dawned on me that this is probably a bad idea. So I want to build a new system with 2 more 4 TB disks in the array and the OS on a separate NVME.

Since starting this thread I’ve been looking into other solutions like ZFS and LVM. ZFS seems pretty straight forward and easy to manage for my use case. But I can’t play around with it on my Pi, so I’ll need to figure something else out.

Although it’s old, RAID seems like it should work just fine for my use case. The machine will basically be a media server (movies, music, phone camera backup, etc.) and probably host a couple other services as well. The disks won’t be taking a frequent pounding read/write wise, so I think I should be safe with RAID 5. The current system has been chugging along for 4 years now with no problems.

2 Likes

I use raid 5 with LVM for my home and media. The combination is easiest and most flexible as far as I can tell.