RAID 5 disk failed; can't get it restarted

Yes smartctl would have been my choice but it isn’t available on the rescue kernel. Attempting to --fail /dev/md127 /dev/sde1 gets a “no such device” response. so I’m really stuck here. I do know which is the physical failed disk and I doubt that it is doing anything so I’m going to pull the SATA data cable and reboot and see what happens.

Well, I disconnected /dev/sde and now it has rebooted to the rescue kernel with mdadm not showing /dev//sde any more. Shows device 0 major 0 minor 0 as “removed” and /dev/sdb1 major 8 minor 17 as “spare”… I can’t add it error: “device busy” and it isn’t at this point rebuilding. The array will not start at this time even though /dev/sdc1 and /dev/sdd1 are shown as “active sync”

Sorry, I don’t know. I’ve always used RAID1 (mirroring) with mdadm and that has always worked when a member device failed. I do use RAIDZ10 on several servers, but that is using a completely different software stack.

Hopefully someone with a little more RAID5 mdadm experience can weigh in.

Well, you physically removed sde1 while it was still showing as part of the array.

The steps to fail it then remove it should have been completed in that order before disconnecting it.

Somehow the array seems to show that it actually has 4 devices assigned. (3 + 1 spare)

From the man page it shows that marking a device to replace it, then adding another device should have allowed one to then mark the device to be replaced as failed and it would automatically switch over to the new device.

You may have hit the key issue though. The array is not defined within the live media, but within the already installed OS so these commands should be done in that OS if at all possible. Even a chroot environment should work for that if necessary.

Once you saw this config the next step would have been to immediately fail sde1 so sdb1 could replace it.

Note also that mdadm --remove detached should remove sde1 from the array since it is already physically detached and you want it removed from the config (it is in the man page). Mdadm will know which device is detached and theoretically that should now allow the array to rebuild on the already defined spare sdb1.

Actually I did do the following in this sequence:

replace (a couple of days ago)
fail (Friday PM, mdadm --detail didn’t show anything. At the time I was running the rescue kernel)
remove (nothing seemed to happen, mdadm still showed the array rebuilding)
physically removed the defective disk, mdadm now showed 3 devices with /dev/sdb1 as a spare. /dev/sde1 was no longer in the list, device marked as removed

I just rebooted on the rescue kernel, this doesn’t provide either a network connection or USB key support so I have to type these results:

from mdadm:
/dev/sdb1 still shows as device - spare BUT is NOT rebuilding
/dev/sdc1 and /dev/sdd1 show as device 2 and 3, sync

The old /dev/sde1 space is now device 0, removed
the array status is active, FAILED, not started

so it looks to me like /dev/sde has finally got removed and the array status has improved to FAILED from degraded. For completeness I did cat /proc/mdstat with the result:
Personalities [raid6][raid5][raid4]
md127 : inactive /dev/sdc1[1] /dev/sdb14 /dev/sdd1[3]
2197453199 blocks super 1.2
unused devices {none}

Reading the man page it looked like --run should start the array but when I tried it I got error:
md/raid : md127 cannot start dirty/degraded array
mdadm failed to start array /dev/md/root : input output error
At this point it looks like what I need to do is run:
mdadm --run --assume-clean /dev/md127
but since this will probably lead to data loss if it fails I would like a second opinion. Also since I/O errors are reported should I run fsck and would that do any good (or harm)?

fsck would not work at all unless the array was started – no file system would be seen.

Is it possible to go back to the config with all 4 devices attached then show the --detail results?

I replugged it and got a bios screen that I have never seen before: "Systen has POSTED in safe mode This may be due to the previous POST attempt failing because of system instability or if the power button was held i n to force the system off. [comment - it was because “systemctl poweroff” stopped the system but didn’t power off). SATA_6 has the old defective sdb disk showing. I just let it start without making any changes.

Booted to rescue kernel. sde is back too:
mdadm --detail /dev/md127
Raid level : 5
Raid devices : 3
Total devices : 4
State : active, degraded, Not Started
Active devices : 2
Working devices : 4
Failed devices : 0
Spare devices : 2
Number Major Minor RaidDevice State
0 8 65 0 spare rebuilding /dev/sde1
1 8 33 1 active /dev/sdc1
3 8 49 2 active /dev/sdd1
4 8 17 - spare /dev/sdb1

cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md127 : inactive sde1[0] sdb1[4] (S) sdc1[1] sdd1 [3]
2441645455 blocks super 1.2
unused devices {none}

mdadm --fail /dev/md127 /dev/sde1
mdadm : set device faulty failed for /dev/sde1 : no such device

Is the data on that array vital enough that you could not stand to lose it?

If so then keep working, but I do suggest that you may wish to wait at least 24 hours before issuing any more commands to mdadm. Possibly up to 48 hours. Commands to an inactive array are like beating on a brick wall with a bare knuckled hand. No result and painful.

If the data is not 100% vital then it would be much easier to simply build a new array and start over.

In my case the OS is on a single device. My /home is on a raid array and I am using raid6 which allows one drive failure with no consequences. A second drive failure makes it degraded. I could lose the array completely and have no effect on the machine other than failing to boot since it could not mount /home (but that is an easily solved problem).

Sort of. What is the problem here is that this was my workstation and it failed and I couldn’t get the RAID restarted last year. So I grabbed an old server, the only machine available at the time, and installed Fedora. The disks were formatted as LVM and it started showing read errors on one disk just before I went on vacation in June. To that time I didn’t know that one LVM failure crapped ALL the data on ALL the disks and the various pvxxxx functions can’t do anything because all disks report as full even though they are not really. So I tried to recover the original workstation which should have been easier but now I’m over a week into it and getting no work done.

As a technical book editor my immediate problem is that I have lost my edit comments and notes for the author for about a month’s work (on the LVM machine). I have also lost my contacts list and a number of URL’s and my templates. All can probably be regenerated from partial backups but last time it took days. The tendency is that when using RAID you expect it to work so you don’t back everything up as often as you should. In this case a double failure has compounded the problem; I was always going to have to recover this machine eventually.

One comment about the RAID, originally it was Fedora 27 and the O/S has been updated to 31. I note that the online man pages have some differences from what I recall and some commands just don’t work. Is this likely one of the problems?

Anyhow Jeff, thanks for all your work and input, I do appreciate it. I’ll just leave it running until the next power failure (thunderstorm) and see what happens.

Re your comments, for the same number of disks would it make more sense to put the system on a RAID 1 and the data on a different RAID 1 and do automatic rsync backups of everything every day when I shut down? If that doesn’t make sense, is RAID 5 and one spare disk the same effectively as RAID 6?

Regards, John

If I’m reading the man page correctly, --fail should be after /dev/md127 but before the component device (/dev/sde1).

From the man page:

mdadm [mode] <raiddevice> [options] <component-devices>

Note also that the “mode” is optional in some (but not all) cases.

If a device is given before any options, or if the first option is one of --add, --re-add, --add-spare, --fail, --remove, or --replace, then the MANAGE mode is as‐ sumed. Anything other than these will cause the Misc mode to be assumed.

The above might be why that earlier “frozen” command I mentioned did not work. I.e., it probably should have been the following.

mdadm /dev/md127 --action=frozen /dev/sde1

Edit: nevermind, on re-reading that, it looks like misc should have been assumed for the --action=... option.

A double failure on raid 5 normally means the data on the array is toast – totally gone.
You did not seem to indicate it was a double failure until just now.

Yes, there have been many changes to mdadm as time has gone on so I cannot verify what does or does not work on an older system. If one were to boot with live media for F38 then it could be assumed that all the commands for mdadm in the current man pages would work as described.

A raid5 array with 3 active devices and one spare (4 disks total) is similar to a raid6 with 4 devices active. There is one major difference though.
With raid5, even with a spare, having a second active device fail before the array finishes rebuilding on the spare means data loss. With raid6 2 devices can fail at the same time and the array can remain active and in a degraded state, but no data loss at that point.

The real difference is the increased risk of a second failure during the time required for the rebuild on the raid5 array since all disks will be very active during that time period.

With either type raid array, replacing the failed device promptly is always recommended.

When using 4 drives, your suggestion about using 2 separate raid1 arrays is not a bad idea either. With mirroring it is usual that the data on at least one of the 2 mirrored drives remain valid and the other can be replaced.

Regardless of the way data is stored, it is said that the data is deemed only as valuable as the effort one puts into backing it up.

Doesn’t seem to matter, I get error "mdadm : /dev/sde1 is no (sic) an md array
–fail is one of the commands that assumes --manage. If I move the --fail after the array identfier “mdadm /dev/md127 --fail /dev/sde1” I get error “No such device” Same if I use the disk (sde) instead of the partition. This error seems to be fairly common and I don’t know how I can be rebuilding an array where the devices are not known!

I agree that the sequence of the parts of the commands are ambiguous in the man pages and some examples would be helpful.

Jeff: sorry I mislead you; I’m getting a bit wangy here with all this stuff. The “double” failure to which I refer is the failure of ONE disk on each machine. This one and the LVM one.

Ah,
I understand now

Which physical device is sde?.
That can probably be seen by running dmesg and looking for sde then checking the lines around it. It should show the address of the sata port it is connected to
I ran dmesg | grep -10 sdb and I got this

[    2.383748] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    2.386489] ata4.00: ATA-11: ST8000VN004-2M2101, SC60, max UDMA/133
[    2.400428] ata4.00: 15628053168 sectors, multi 16: LBA48 NCQ (depth 32), AA
[    2.401641] ata4.00: Features: NCQ-sndrcv
[    2.417489] ata4.00: configured for UDMA/133
[    2.418162] scsi 3:0:0:0: Direct-Access     ATA      ST8000VN004-2M21 SC60 PQ: 0 ANSI: 5
[    2.418959] sd 3:0:0:0: Attached scsi generic sg2 type 0
[    2.418969] sd 3:0:0:0: [sdb] 15628053168 512-byte logical blocks: (8.00 TB/7.28 TiB)
[    2.419818] sd 3:0:0:0: [sdb] 4096-byte physical blocks
[    2.420221] sd 3:0:0:0: [sdb] Write Protect is off
[    2.420621] sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[    2.420631] sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    2.421037] sd 3:0:0:0: [sdb] Preferred minimum I/O size 4096 bytes

As you can see it is on ata4 (SATA port 4) and it gives more of the device info so one can identify it.

Is it possible that you pulled what you thought was sdb and it was not actually the correct device?

With old linux systems the devices were configure in port order. When they moved to sata interfaces that is not necessarily true. You can see that sdb is actually on sata port 4 for me. This is the reason UUIDs are used for mounting most file systems any more.

One should be able to run iotop and watch the data reads & writes on the array drives if the rebuild is actually in progress.

I get a significantly different report. The disk type is correct ST3500320AS (Seagate Barracuda) which is reported on sd 9:0:0:0 as [sde]. There is a SATA link up message for ata9 and 10 with this disk type identifier (ST3500320AS). I can account for all the disks and the CD/ROM as being link up status. Looking at the board I checked and the disk now assigned to sde IS plugged to SATA 6.

So, being a bit confused here I found the original paper where I wrote down the info from smartctl when I noticed the disk was failing. It shows type ST3500320AS on /dev/sdb at the time and the disk serial number as SQM3PTE5 which is the serial number I pulled and is the serial number currently plugged to the SATA6 port as marked on the mother board.

For completeness, here is the 3 lines of interest:
ata9 SATA link up 1.5 Gbps (sstatus 113 SControl 300)
ata9.00 configured for UDMA/133
sd 9:0:0:0 [sde] write cache enabled , read cache enabled doesn’t support DPO or FUA sde :sde1

Unfortunately iotop is not available on the rescue kernel. I was running the Live CD when I got the previous information.

I would be willing to wager that is your issue.
The Barracuda drives have been using the SMR technology for some time and are not suitable for use as raid for that reason.

Seagate makes it very difficult to locate a device data sheet on their site and just now I was unable to find the sheet for that drive to confirm if it is SMR or CMR. CMR drives are mostly usable for NAS/Enterprise/raid use but SMR drives are not. Even Western Digital Blue drives seem to mostly be SMR.

One could run the live media and manage the array from there. Tying ones hands with a rescue kernel when the live media is available (and where iotop could be installed and used) seems a bit restrictive.

Thanks for that info Jeff; I was unaware that it mattered. I looked CMR/SMR up because this is my first encounter with that and it said “since 2015…” but unlike the other disks the one in question has no date of manufacture. I was hoping 2014 so it would be CMR but it only has a date code, 09221. The other disks are all WD-blue including one that was a replacement a couple of years ago so they might be problematic too.

The new server that I just built and am trying to finish configuring at the moment has WD-red which say they designed for servers. I also put in an order today for 4 SSD (WD-blue) so I can replace the disks on the other failed workstation. From the description in the Internet article it seems these should be OK for a pair of RAID1 devices, one for the system and one for all the data on the rebuild on that machine.

When I grabbed the old disk to look up the date I noticed that it is quite warm and is vibrating which suggests it may actually doing the rebuild now. I’ll wait for a couple of days, at least until the replacement SSD’s come and I need the bench again! Thanks again, this has been an education!

I had some time so I attacked this problem again. Linux Journal issued a recovery CD a few months back and one of the options on it is to boot without the md device driver active. So I did that and with the new disk installed I did a mdadm create then mdadm assemble and the missing RAID member started to regenerate and completed properly and runs properly. The recovery function on the Fedora menu doesn’t allow starting with md devices inactive which was needed for this recovery. Might be a good idea to add the option for just this purpose. Please close this problem as solved. Thanks to everyone who tried to assist.

2 Likes