System fails to boot after dnf system upgrade due to missing MD (RAID) devices

I could do with some help troubleshooting an issue I encountered after I performed dnf system-upgrade, upgrading from 37 to 39 today. The system no longer boots due to missing MD RAID devices.

I already tried booting into an old kernel (f37) still present on the system. The result is the same. I assume that tells me the issue is not with f39, but with the MD RAID itself.

I have three disks in the system. The disks are partitioned identical and RAID is setup using partitions.

|-- sda1  swap
|-- sda2  linux_raid_member
|-- sda3  linux_raid_member
|-- sda4  linux_raid_member

The remaining SATA drives are the same. The MD devices are /dev/md1, /dev/md5 and /dev/md54, where:

/dev/md1   raid1 sda3[S] sdb3 sdc3
/dev/md5   raid5 sda2 sdb2 sdc2
/dev/md54  raid5 sda4 sdb4 sdc4

While /dev/md54 is assembled just fine, the other two are not. Examining the members with mdadm --examine I see all the information regarding the RAID devices. One difference I noticed (not sure if that matters), /dev/md54 is using metadata version 1.2, while the two missing devices use version 1.1.

During boot I’m thrown into recovery mode. After doing some investigation I tried to assemble /dev/md5 myself. First I tried with mdadm --assemble --scan --no-degraded. A line is printed telling me the array has been assembled. However, at the same time the console freezes. Input is no longer possible. CTRL+C does not return me to the prompt. No further information is printed and I don’t see any disk activity looking at the disk LEDs. CTRL+ALT+DEL lets me reboot the system, though.

In another attempt I tried assembling with mdadm --assemble /dev/md5 /dev/sda2 /dev/sdb2 /dev/sdb3. The result was the same. After the message regarding assembly the system freezes.

Notably, the root partition is not on any of the MD RAID devices. It resides on a small SSD. So, I suppose I could edit /etc/fstab and comment out a bunch of lines, create a temporary /home and get to boot at least into the desktop.

Some more background for completeness’ sake. One of the disks was replaced recently after failure. The new disk is larger. But the partition layout (copied using dd from another drive) is the same. The RAID arrays were rebuilt successfully after that and raid-check ran two nights ago not finding any issues.

My questions are:

  1. Has anyone else experienced a system freeze attempting manual assembly using mdadm? If so, what was the cause.
  2. What should I do next in an attempt to recover from the situation?

Update #1

After booting from a USB memory stick I still had lying around with Fedora 38 Everything on it, Anaconda greeted me with a message regarding duplicate partition UUIDs on two separate devices. Using lsblk and blkid I could confirm that to be true. So, maybe the issue arose using dd for copying the partition table after replacing a faulty disk.

However, not all partitions carry duplicate information. In particular, the partitions for the /dev/md54 device look different. While UUID and PARTUUID are duplicated for sda4 and sdc4, UUID_SUB is unique for sd[a-c]4. These partitions also carry a LABEL containing the Name field as reported by mdadm --examine.

For the partitions used for /dev/md1 and /dev/md5, blkid only shows PARTUUID, where the values for corresponding partitions on sda and sdc are duplicated.

As for sd[a-c]4 of /dev/md54, UUID corresponds to Array UUID and UUID_SUB to Device UUID as shown by mdadm --examine. Albeit, both use a different notation (colon vs. dashes and the break up differs as well).

This raises some more questions:

  1. How did that information end up in the partition information for the partitions used by /dev/md54? Does mdadm add it for MD devices using metadata version 1.2 (md1 and md5 use version 1.1)?
  2. Could the issue be solved by adding UUID, UUID_SUB and LABEL to the partitions of the troublesome MD devices? If so, is there an easy way of transforming the notation from the RAID superblocks (uses colons) to the format blkid reports (uses dashes)? Is that even needed? It could just be the way the output is formatted using different functions.

I’ll start with 4. and see how far I’ll get. Stay tuned and please do tune in if you can shed some light on the questions.

Update #2

I managed to fix the duplicate UUID[1] using gdisk’s randomize disk and partitions GUIDs feature. Unfortunately, it didn’t solve the issue at hand. I’m still unable to boot into the upgraded system, nor am I able to assemble and run /dev/md5 manually.

I’m now thinking this might be a bug of some kind. Assembling /dev/md1 manually works. However, in a very particular fashion. The rescue console freezes briefly. After a couple of seconds I’m being thrown back to the password prompt (provide root password or press CTRL+D). Sure enough after logging in again, /dev/md1 is up and running. Doing the same, however, for /dev/md5 the system just freezes. I even left it overnight, hoping for some miracle. But that didn’t happen.

It’s the activation part, marking the array as running in readwrite mode, that causes the freeze. Assembling both arrays with --readonly works without issues. Both arrays look healthy, no missing disks, status is clean. Now putting the arrays in readwrite mode (mdadm -w) will exhibit the same behavior as described above. That is for md1 the system freezes briefly and for md5 it freezes completely.

I also ran smartctl -t long on all three devices. No errors were reported.

Back to booting from USB, Anaconda no longer complains about duplicate UUIDs. From the shell I’m able to assemble all arrays, activate the volume group spread over /dev/md5 and /dev/md54, mount the logical volume and access the contents. No freezes here. This leads me to believe the issue is not with the arrays itself, but rather with the mechanism of discovery and assembly.

I looked through the logs, but couldn’t find any hints as to why /dev/md1 and /dev/md5 fail to be assembled, while there appears to be no issue assembling /dev/md54.

  1. I didn’t manage to add the the labels (UUID_SUB etc.) found on /dev/sd[a-c]4 (the component devices of /dev/md54). But I no longer think they are really important. ↩︎

There is another layer of metadata when you use MD RAID. It is stored in the “superblock”. As with the partition tables, you are not supposed to clone the MD superblock because doing so can duplicate information that is supposed to be unique between the devices on a given system.

You might be able to fix the situation by removing the partition with the duplicated metadata from the array, zeroing the superblock on that partition, and then re-adding the partition back to the array to let mdadm reconstruct things correctly. I have some notes about this procedure documented here under the “Other important notes” section.

Update #3

I asked on the mailinglist for help. In order to keep everyone in the loop, here’s a summary of where I stand at the moment.

First, I added some options to /etc/mdadm.conf hoping this might help in getting the arrays up and running. The options I added are level for the raid level and num-devices specifying the number of expected devices in the array. Unfortunately, that didn’t help. I’ve read through the man page of mdadm.conf and I will try adding devices as well, specifying the component devices of each array.

Meanwhile, I was able to assemble the arrays manually using mdadm --assemble --verbose --scan --no-degraded after booting from a USB drive. They all showed as clean and I was able to access the data stored in the logical volumes living on the arrays. At least it seems the data itself is not corrupt, which is a relief.

During the manual assembly I asked mdadm to be verbose and it told me

mdadm: No super block found on /dev/sda2 (Expected magic a92b4efc, got 6d746962)

for all component devices of the failing arrays. For the functioning array the message was

mdadm: /dev/sda4 is identified as a member of /dev/md/, slot 0.

This smells very much like something is off with the version 1.1 superblock. This being the only notable difference between arrays md1 and md5 and array md54.

However, I can access the information in the superblock using mdadm --examine for every component device. At least, I assume the information printed by the command comes from the superblock on the device.

Thank you for your reply, Gregory. I really appreciate any help while I try to nail this down.

Maybe changing the UUIDs was a bad idea or just part of the solution and rebuilding the array is required. However, I’m quite sure I have cloned the MBR before when replacing disks in the array and didn’t run into trouble then. Moreover, I did not run into problems when rebooting after the arrays were rebuilt after disk replacement.

If the partition / disk UUIDs are the issue here, why does it not affect md54, which shares the same disks? That array is always coming up clean. Not having any log messages during boot telling me why md1 and md5 cannot be assembled, makes troubleshooting and taking corrective measures rather challenging.

Yeah, I came across your article and read through it. If all else fails I will have to jump the gun and do some rebuilding. Right now I’m still hoping I can avoid that and fix whatever it is that’s broken.

The MBR is something different from the MD array superblock. That error about the “magic” number being wrong is probably the main problem. I suspect each device/partition in the MD array is supposed to have a different and unique “magic numer”. But when you used dd to clone the partition, you duplicated a magic number that was not supposed to be duplicated.

Let me clarify. I have not cloned any of the component devices nor an entire array now or in the past. I merely copied the MBR from one of the disks to a new disk before adding it to the arrays as a replacement of a faulty drive. For that, being lazy, I used something like dd if=/dev/sdc of=/dev/sda bs=512 count=1. To my understanding that copies the 512 bytes holding the MBR and the partition table in it and nothing more. It certainly does not copy any information held inside any of the partitions making up the array. The first partition on each disk is a 6.0GB swap partition.

Looking at the information in the superblock using mdadm --examine, I see that the magic matches. Here’s an example of one of them:

          Magic : a92b4efc
        Version : 1.1
    Feature Map : 0x1
     Array UUID : 39295d93:e5a75797:b72287f3:51563755
           Name :
  Creation Time : Sun Jun 15 17:11:41 2014
     Raid Level : raid5
   Raid Devices : 3

 Avail Dev Size : 1048573952 sectors (500.00 GiB 536.87 GB)
     Array Size : 1048573952 KiB (1000.00 GiB 1073.74 GB)
    Data Offset : 2048 sectors
   Super Offset : 0 sectors
   Unused Space : before=1976 sectors, after=0 sectors
          State : clean
    Device UUID : 4bed6fca:21ecefb3:c1eec35f:0254ff2b

Internal Bitmap : 8 sectors from superblock
    Update Time : Wed Dec 27 11:13:21 2023
  Bad Block Log : 512 entries available at offset 16 sectors
       Checksum : 2d2992a0 - correct
         Events : 630858

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)

Yes, that array is indeed almost ten years old and so is one of the disks with 80,952 Power_On_Hours.

The partition table for the GUID partition table format extends well beyond the first 512-byte sector: GUID Partition Table - Wikipedia

Ah, yes, the magic number is shared between the different members of the array. But what about the “Device UUID” that is supposed to be unique between the devices.

Another possibility, if you only cloned the first sector of the drive with dd, is that MD is seeing and being confused by some old information that was on the disk previously (it is somewhat unlikely, but if the old partitions happened to line up just right, it is not impossible).

Yes, I am aware. Yet, my disks don’t use gpt. They use old skool mbr only.

It is now. Obviously, when copying the MBR from one disk to a brand new one, the new drive inherited the device UUID as well as the partition UUIDs from the old drive. But that didn’t pose a problem when rebooting the machine before I upgraded to F39.

There was nothing on the disk previously. It was factory sealed.

OK, I’m confused. :slightly_smiling_face: I thought you said earlier that you were using gdisk to manage your partitions?

The g in gdisk stands for GPT (GUID Partition Table). That tool will, however, convert (with a warning) older DOS partition tables to the newer GPT format.

The older DOS partition tables didn’t have UUIDs at all. There were 8-nibble disk IDs and partition numbers, but I think that was all it had for ID information.

Edit: Also, BTW, the Device UUID that you see in the MD superblock is a different UUID from the partition UUID that is set with (s)gdisk.

# ls -al /dev/disk/by-partuuid | grep sda1
lrwxrwxrwx. 1 root root  10 Dec 21 11:22 c2ac99f6-38bd-4ea5-a646-09ee775aa305 -> ../../sda1
# mdadm --examine /dev/sda1 | grep -i uuid
     Array UUID : 5abb9000:af244e79:fe532f92:f09431a5
    Device UUID : bde2882d:709bc3b2:c6a7f79d:58f378f2

Not quite. I said I used gdisk for fixing the issue with duplicate UUIDs Anaconda warned about. And yes, you are right, that means one of my disks, the newest one, is now converted to gpt. I didn’t want to dwell on that since it deviates from the topic more and more. I’m quite convinced the partitioning is not the issue here.

Well, blkid calls it UUID. So, I went with that. The partition IDs are essentially the device ID with the partition number suffixed:

/dev/sdc: PTUUID="9c3d16cb" PTTYPE="dos"
/dev/sdc2: PARTUUID="9c3d16cb-02"

Right. Though, at least for the md54 devices there appears to be a relation. As I mentioned earlier, for the partitions making up md54, the information blkid reports is somewhat richer. I don’t have any idea why that is. I know the array is from a later date, hence the newer superblock version. It may also have been created differently. I don’t remember.

/dev/sdc4: UUID="fb919273-c6bf-b891-ea1c-a83c0a8b3ad7" UUID_SUB="60defeb5-e1fd-3807-5211-24e73927ee3a" LABEL="" TYPE="linux_raid_member" PARTUUID="9c3d16cb-04"

Here UUID corresponds to Array UUID and UUID_SUB matches Device UUID as stored in the superblock:

          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : fb919273:c6bfb891:ea1ca83c:0a8b3ad7
           Name :
  Creation Time : Fri Oct 13 09:56:17 2017
     Raid Level : raid5
   Raid Devices : 3

 Avail Dev Size : 1797031936 sectors (856.89 GiB 920.08 GB)
     Array Size : 1797031936 KiB (1713.78 GiB 1840.16 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262064 sectors, after=0 sectors
          State : clean
    Device UUID : 60defeb5:e1fd3807:521124e7:3927ee3a

Internal Bitmap : 8 sectors from superblock
    Update Time : Wed Dec 27 16:21:58 2023
  Bad Block Log : 512 entries available at offset 16 sectors
       Checksum : 7866a16 - correct
         Events : 229700

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : AAA ('A' == active, '.' == missing, 'R' == replacing)

Did you try inspecting at a few of the sectors with a hex editor? Since version 1.1 stores the superblock at the start of the device/partition, something like the following should work to get a dump of the first few sectors:

sudo dd if=/dev/sda2 bs=4096 count=1 | xxd

That probably won’t be much help if you don’t know what the data should look like. But if you see something like “GRUB” in the output, it could hint that a bootloader wrote to some sectors it wasn’t supposed to or something like that. I’d offer a hex dump from one of mine for comparison but I use version 1.0 for my arrays and that superblock isn’t as easy to find on the disk.

Sorry, I’m out of ideas.

FWIW: Here is a link to some online info about the on-disk layout of the MD version 1 superblock: RAID superblock formats - Linux Raid Wiki

I did not. I don’t think it makes sense. Even if something was overwritten at the start of the disk, it would not affect the RAID partitions. The first partition on each disk is 6G of swap. I would also assume mdadm --examine would tell if it had trouble reading the superblock.

No worries. Thanks for sparring.
I read quite a bit of the wiki already, but not that article. I’ll sure take a look.

I agree that it is highly improbable. But it is not impossible. One of the “features” of older bootloaders and the DOS partition table was that you could install/write your first-stage bootloader code to the first 440 bytes of a partition. The old BIOS systems would then use that bootloader (instead of the one that was in the MBR on the first sector of the disk) if the partition was marked “active”. For this reason, most filesystems do not use the first few sectors of the partition to store any data. That sector is reserved for the PBR/VBR.

Anyway, it might not be impossible that a system update could trigger grub-install and that might try to update the first-stage bootloader on a partition. (At least, I wouldn’t trust grub not to do something like that.) That’s also probably why mdadm came out with version 1.2 of the superblock which stores the superblock at 4K into the partition/device instead of at the very beginning. (Those first sectors are a little dangerous because bootloaders might try to write to them.)

True. But if the array is assembled and running, is it possible that that command is showing you what the superblock should be rather than what it actually is?

This still looks like the key problem to me. I’d want to look at that magic number “manually” and try to figure out why mdadm isn’t seeing what it thinks should be there. (According to that wIki, the magic number should be the very first few bytes on the partition.)

Another possibility, especially if this is a really old device, is that you just have some bad sectors on your disk and when the data is being read back, it isn’t what was actually written. Just in case the problem is due to a bad sector, I’d suggest using dd’s iflag=direct option to make sure it is reading directly from the disk rather than from the page cache. I.e.:

sudo dd if=/dev/sda2 bs=512 count=4 status=none iflag=direct | xxd

Just my two cents. :slightly_smiling_face:

There is an old, but interesting, bug report here that matches the same erronious “magic number” that you are seeing (which translates to “bitm”). One of the later posts in that report suggests that the following command might help:

# echo repair > /sys/devices/virtual/block/md1/md/sync_action

Why? Firstly, examine is run on a component device (partition). Secondly, I believe mdadm reads that information itself when assembling the array, verifying all components belong to the same array and are in a sane state.

I was planning on writing another update. Yesterday, when I compiled all the information asked for on the Linux RAID Wiki, I assembled the arrays one by one in a Fedora 39 live environment. When doing so, I did not get the warnings regarding mismatched magic numbers. The output looked like:

# mdadm --assemble --verbose --readonly /dev/md5 /dev/sda2 /dev/sdb2 /dev/sdc2
mdadm: looking for devices for /dev/md5
mdadm: /dev/sda2 is identified as a member of /dev/md5, slot 0.
mdadm: /dev/sdb2 is identified as a member of /dev/md5, slot 1.
mdadm: /dev/sdc2 is identified as a member of /dev/md5, slot 2.
mdadm: added /dev/sdb2 to /dev/md5 as 1
mdadm: added /dev/sdc2 to /dev/md5 as 2
mdadm: added /dev/sda2 to /dev/md5 as 0
mdadm: /dev/md5 has been started with 3 drives.

# mdadm --assemble --verbose --readonly /dev/md1 /dev/sda3 /dev/sdb3 /dev/sdc3
mdadm: looking for devices for /dev/md1
mdadm: /dev/sda3 is identified as a member of /dev/md1, slot -1.
mdadm: /dev/sdb3 is identified as a member of /dev/md1, slot 0.
mdadm: /dev/sdc3 is identified as a member of /dev/md1, slot 1.
mdadm: added /dev/sdc3 to /dev/md1 as 1
mdadm: added /dev/sda3 to /dev/md1 as -1
mdadm: added /dev/sdb3 to /dev/md1 as 0
mdadm: /dev/md1 has been started with 2 drives and 1 spare.

# mdadm --assemble --verbose --readonly /dev/md54 /dev/sda4 /dev/sdb4 /dev/sdc4
mdadm: looking for devices for /dev/md54
mdadm: /dev/sda4 is identified as a member of /dev/md54, slot 0.
mdadm: /dev/sdb4 is identified as a member of /dev/md54, slot 1.
mdadm: /dev/sdc4 is identified as a member of /dev/md54, slot 2.
mdadm: added /dev/sdb4 to /dev/md54 as 1
mdadm: added /dev/sdc4 to /dev/md54 as 2
mdadm: added /dev/sda4 to /dev/md54 as 0
mdadm: /dev/md54 has been started with 3 drives.

I believe the previous warnings to be spurious, e.g. some side effect of the --scan flag.
Anyway, since the arrays can be assembled manually and --examine shows correct information for all partitions involved, I do not believe there’s anything wrong with the superblocks.

Update #4

Having tried a few things including checking the drives and filesystems, I have yet to find a solution to the boot problem. I’m also still wondering why the system freezes completely bringing up md5.

I wrote earlier that mdadm --assemble --verbose --scan reported not being able to find a superblock on the component devices of md1 and md5. Of course, if that were true it would be reason to worry. However, since mdadm is able to assemble the raid devices using the component devices I found it suspicious from the start.

While I still don’t know why mdadm reported the superblocks to be missing or, rather, the magic number to be a mismatch, nothing to that extent is reported when assembling the arrays one by one. For that the output looks like:

# mdadm --assemble --verbose --readonly /dev/md5 /dev/sda2 /dev/sdb2 /dev/sdc2
mdadm: looking for devices for /dev/md5
mdadm: /dev/sda2 is identified as a member of /dev/md5, slot 0.
mdadm: /dev/sdb2 is identified as a member of /dev/md5, slot 1.
mdadm: /dev/sdc2 is identified as a member of /dev/md5, slot 2.
mdadm: added /dev/sdb2 to /dev/md5 as 1
mdadm: added /dev/sdc2 to /dev/md5 as 2
mdadm: added /dev/sda2 to /dev/md5 as 0
mdadm: /dev/md5 has been started with 3 drives.

I’d say it’s save to assume the superblocks are okay and the source of the issue is to be sought elsewhere.

In addition to running smartctl -t long on all devices, I also ran fsck.ext4 on all data volumes after I had assembled the arrays manually in Fedora 39 live environment. No errors came up, only optimization suggestions. A full read of all raid devices using dd ran from start to finish without any hangs or reports of read errors. I think it’s save to assume nothing is wrong with the physical state of the drives.

What’s been puzzling me is the lack of any log entries[1] regarding md1 and md5. For the one array that is brought up I do see messages in the boot log and systemd creates units for the device like sys-devices-virtual-block-md54.device. I read that md raid assembly for raid devices using version 1 superblocks is a system task. For 0.90 superblocks, the kernel was able to assemble those.

I suppose with system task I would need to look at systemd. I poked around, but wasn’t able to find anything particular helpful. Does anyone know of any helpful documentation on how Fedora brings up md devices? Is it possible to increase the verbosity of the boot process? If so, where can I configure that?

I would really like to see at least some log messages regarding discovery and assembly of md devices. I suppose I could try starting with debugging mdadm, which still hangs for some time when assembling md1 and completely freezes on assembly of md5 in the emergency environment. However, the same version of mdadm works just fine in the live environment.

  1. This also happens in the live environment, where md54 is brought up as auto-read-only, albeit under a different device name. ↩︎

It is a series of scripts that are copied into the initramfs if mdraid is detected on your system. You can find the scripts under /usr/lib/dracut/modules.d/90mdraid.

It looks like there was a potentially interesting change to those scrips about 10 months ago:

If I’m reading the commit correctly, that change switched from using the standalone blkid command (the same one you run on the command line) to a “built-in” version that is part of more recent versions of udev.