RAID assembly on boot not working after upgrade from Fedora 38 to 39

,

Hey guys,

I am using a RAID1 for /home and after upgrading to Fedora 39 the array is not assembled anymore. (Booting F39 ISO has the same issue while the raid is correctly assembled after booting the F38 ISO)
The array includes the devices /dev/sda1 and /dev/sdb1 and was configured as md127 in F38. But after boot only sdb1 ist recognized and in inactive state:

Personalities : 
md126 : inactive sdb1[2](S)
      3906881368 blocks super 1.2
       
unused devices: <none>

Stopping md126 and executing mdadm -A -s also does only start one device:

Personalities : [raid1] 
md127 : active raid1 sda1[1]
      3906881344 blocks super 1.2 [2/1] [_U]
      bitmap: 0/30 pages [0KB], 65536KB chunk

unused devices: <none>

Only manuall assembly with “mdadm -A /dev/md127 /dev/sda1 /dev/sdb1” works:

Personalities : [raid1] 
md127 : active raid1 sdb1[2] sda1[1]
      3906881344 blocks super 1.2 [2/2] [UU]
      bitmap: 0/30 pages [0KB], 65536KB chunk

unused devices: <none>

So I tried to create a /etc/mdadm.conf which was not needed in F38 but this does not help.

ARRAY /dev/md127 level=raid1 metadata=1.2 name=fedora:1 UUID=12345678:12345678:12345678:12345678 devices=/dev/sda1,dev/sdb1
#mdadm -E /dev/sda1
/dev/sda1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 12345678:12345678:12345678:12345678
           Name : fedora:1
  Creation Time : Thu Aug  3 14:21:47 2023
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 7813762737 sectors (3.64 TiB 4.00 TB)
     Array Size : 3906881344 KiB (3.64 TiB 4.00 TB)
  Used Dev Size : 7813762688 sectors (3.64 TiB 4.00 TB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=49 sectors
          State : clean
    Device UUID : 12345678:12345678:12345678:12345678

Internal Bitmap : 8 sectors from superblock
    Update Time : Sun Nov 26 19:09:56 2023
  Bad Block Log : 512 entries available at offset 24 sectors
       Checksum : af810f23 - correct
         Events : 124763


   Device Role : Active device 1
   Array State : AA ('A' == active, '.' == missing, 'R' == replacing)
# mdadm -E /dev/sdb1
/dev/sdb1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 12345678:12345678:12345678:12345678
           Name : fedora:1
  Creation Time : Thu Aug  3 14:21:47 2023
     Raid Level : raid1
   Raid Devices : 2

 Avail Dev Size : 7813762737 sectors (3.64 TiB 4.00 TB)
     Array Size : 3906881344 KiB (3.64 TiB 4.00 TB)
  Used Dev Size : 7813762688 sectors (3.64 TiB 4.00 TB)
    Data Offset : 264192 sectors
   Super Offset : 8 sectors
   Unused Space : before=264112 sectors, after=49 sectors
          State : active
    Device UUID : 12345678:12345678:12345678:12345678

Internal Bitmap : 8 sectors from superblock
    Update Time : Sun Nov 26 19:10:05 2023
  Bad Block Log : 512 entries available at offset 24 sectors
       Checksum : bdc5c978 - correct
         Events : 124764


   Device Role : Active device 0
   Array State : AA ('A' == active, '.' == missing, 'R' == replacing)

Do you have any idea how I can fix the issue so that the array is correctly assembled during booting the system?

1 Like

I am not sure how to fix it, but raid 1 is mirrored so you should still have a functioning system.

Is that a typo or did you actually create the array at the beginning using /dev/sda1 and /dev/sdb?
Earlier you stated The array includes the devices /dev/sda1 and /dev/sdb1 .

You are also inconsistent in what array name you are using. You are using md126 in some spots and md127 in others.

I would suspect the array may have been created initially using /dev/sda1 and /dev/sdb, thus it confuses mdadm when trying to autostart the array. Ideally each device in the array should be identical in size, and even if sda and sdb are identical sda1 will not be identical to sdb

You can find the answer to my question with sudo fdisk -l which will show info about each drive and any defined partitions on that drive.

Thanks for you reply, Jeff.

System is working fine. But I have to manually assemble the raid, decrypt the array and mount it to home on each boot.

Typo, sorry for that. I corrected it.

Actually I am not. md127 is what was the previous name of the array in F38 and also is the name when I assemble it manually.
md126 is what F39 creates after booting the system for some reason that I do not understand yet.

Both Disks/partions are the same and have been created the same way. From my documentation I can see that I actually created it as “md0”. I am not sure when it acutually changed to “md127” (maybe during update from F37 to F38?) but from my understanding that does not make any difference, right?

sudo parted /dev/sda mklabel gpt
sudo parted /dev/sdb mklabel gpt
sudo parted -a optimal -- /dev/sda mkpart primary 2048s -8192s
sudo parted -a optimal -- /dev/sdb mkpart primary 2048s -8192s
sudo parted /dev/sda set 1 raid on 
sudo parted /dev/sdb set 1 raid on 
sudo mdadm --create /dev/md0 --auto md --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1 
Disk /dev/sda: 3,64 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: CT4000MX500SSD1 
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX

Device     Start        End    Sectors  Size Type
/dev/sda1   2048 7814028976 7814026929  3,6T Linux RAID


Disk /dev/sdb: 3,64 TiB, 4000787030016 bytes, 7814037168 sectors
Disk model: CT4000MX500SSD1 
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: YYYYYYYY-YYYY-YYYY-YYYY-YYYYYYYYYYYY

Device     Start        End    Sectors  Size Type
/dev/sdb1   2048 7814028976 7814026929  3,6T Linux RAID

In an effort to figure out exactly what is happening please post ls -l /dev/md* and cat /proc/mdstat

If possible do that just after booting and before you do anything else with the array so we can see the default config that is auto created. Also repeat that after activating the array so we can see any differences.

Finally post cat /etc/fstab and lsblk -f

If I have no mdadm.conf file it looks like this after boot:

cat /proc/mdstat
Personalities : 
md127 : inactive sdb1[2](S)
      3906881368 blocks super 1.2
       
unused devices: <none>

If I create /etc/mdadm.conf or /etc/mdadm/mdadm.conf it looks like this after boot.
So the config file seems to be recognized but does not assemble the array as configured:

cat /proc/mdstat
Personalities : 
md126 : inactive sdb1[2](S)
      3906881368 blocks super 1.2
       
unused devices: <none>

It can be fixed like this:

mdadm --stop md126
mdadm -A /dev/md127 /dev/sda1 /dev/sdb1
cat /proc/mdstat
Personalities : [raid1] 
md127 : active raid1 sdb1[2] sda1[1]
      3906881344 blocks super 1.2 [2/2] [UU]
      bitmap: 7/30 pages [28KB], 65536KB chunk

unused devices: <none>

lsblk -f (after manuall assembly, decryption and mount)

NAME FSTYPE FSVER LABEL    UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sda                                                                            
└─sda1
     crypto 2              b656c121-b246-4083-98df-00febe96c6fd                
  └─md127
     crypto 2              a6dd37d8-b237-4294-a179-61b5c56f0e30                
    └─home
       ext4   1.0            f453ed45-fce9-44ac-b5db-901c604a6794    1,7T    46% /home
sdb                                                                            
└─sdb1
     linux_ 1.2   fedora:1 80d41500-4552-1694-9991-d7492221dfb6                
  └─md127
     crypto 2              a6dd37d8-b237-4294-a179-61b5c56f0e30                
    └─home
       ext4   1.0            f453ed45-fce9-44ac-b5db-901c604a6794    1,7T    46% /home
zram0
                                                                               [SWAP]
nvme0n1
                                                                               
├─nvme0n1p1
│    vfat   FAT32          66AB-7F0F                              12,9M    87% /boot/efi
├─nvme0n1p2
│                                                                              
├─nvme0n1p3
│    ntfs                  BE0AB65F0AB6147D                                    
├─nvme0n1p4
│    ntfs                  587AD43D7AD41A18                                    
├─nvme0n1p5
│    ext4   1.0   boot     6f0d6af4-63a0-4b61-9f04-9274457044e4  489,9M    43% /boot
└─nvme0n1p6
     crypto 2              8154f7fe-07e2-423e-8def-d1be7731bb6e                
  └─luks-8154f7fe-07e2-423e-8def-d1be7731bb6e
     ext4   1.0   root     758f4a8f-d2e0-4948-ad52-fd2d2daa1962  197,4G     6% /

My mdadm.conf looks like that. I think there has to be something missing/wrong. But I cannot see it.

ARRAY /dev/md127 level=raid1 metadata=1.2 name=fedora:1 UUID=80d41500:45521694:9991d749:2221dfb6 devices=/dev/sda1,dev/sdb1

I also tried to insert this line above the ARRAY line but this did not change the behavior:

DEVICE /dev/sda1 /dev/sdb1 

This is different between sda1 & sdb1. Not only the `fedora:1 but the UUIDs. Sdb1 shows linux_ 1.2 while sda1 shows crypto 2. In fact, sda1 shows 2 levels of encryption but sdb1 only shows the second level of encryption.

This is mine with raid 5 and LVM (no encryption)

# lsblk -f
NAME                  FSTYPE            FSVER    LABEL      UUID                                   FSAVAIL FSUSE% MOUNTPOINTS
sda                   linux_raid_member 1.2      raptor:md1 80cb19cf-a800-07cf-b612-0c882b755211                  
└─md1                 LVM2_member       LVM2 001            JFXU1l-O8R0-q75g-e0iI-oaKY-oWZZ-IWBHje                
  └─fedora_raid1-home ext4              1.0                 06dc6e01-ed93-4114-8042-dfb2376ae174    723.3G    83% /home
sdc                   linux_raid_member 1.2      raptor:md1 80cb19cf-a800-07cf-b612-0c882b755211                  
└─md1                 LVM2_member       LVM2 001            JFXU1l-O8R0-q75g-e0iI-oaKY-oWZZ-IWBHje                
  └─fedora_raid1-home ext4              1.0                 06dc6e01-ed93-4114-8042-dfb2376ae174    723.3G    83% /home
sdd                   linux_raid_member 1.2      raptor:md1 80cb19cf-a800-07cf-b612-0c882b755211                  
└─md1                 LVM2_member       LVM2 001            JFXU1l-O8R0-q75g-e0iI-oaKY-oWZZ-IWBHje                
  └─fedora_raid1-home ext4              1.0                 06dc6e01-ed93-4114-8042-dfb2376ae174    723.3G    83% /home

On that ‘md1’ is the PV for the VG ‘fedora_raid1’ which contains LV ‘home’

The biggest difference I see with my main array (beyond the raid5 vs raid1) is that I created the raid array and used that array device as the PV for the LVM management. You described partitioning then created the raid, which should not have made a difference, but then you did not explicitly state how you are mounting the array.

You also marked the devices as raid using parted where I used only mdadm to define the array.

What I could have done if I had not created the array from the raw devices was use parted to create the partitions then immediately created the array from those partitions.
Skip these steps which is not required since mdadm does that automatically for the devices that are members.

sudo parted /dev/sda set 1 raid on 
sudo parted /dev/sdb set 1 raid on 

then do this.

sudo mdadm --create md0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1

It would have marked the partitions as raid, created the array, and should have made use of the array automatic. It did so for me.

Once the array was created then the array (/dev/md0) (which is treated as a raw device by the file system) can be formatted.

I just did some testing:

First I created 2 small partitions.
I then created a raid1 array with
mdadm --create md3 --level=1 --raid-devices=2 /dev/sdb2 /dev/sdb3
I formatted the entire array with mkfs.ext4 /dev/md3
I then edited my /etc/fstab and created a mount entry with the line
/dev/md3 /mnt/test ext4 defaults 1 2
following which I ran systemctl daemon-reload and mkdir /mnt/test
Finally to test the mount I ran mount -a and verified the mount with mount which showed the array properly mounted.
I then rebooted and once again verified the mount with mount which showed the device /dev/md3 mounted at /mnt/test.

Success.

It would seem that the extra steps you used may have interfered with the final config.
My partitions were created using gparted and I simply defined the size of the partition (on a drive that already had a gpt partition table). Gparted automatically does optimal primary partitions and can format it as well if specified (though since they were to be made into an array I did not format each device [ individual partitions ] but formatted the entire array once it was activated).

In summary:
The steps required are to create the array.
(partition the array if needed)
encrypt the array if desired.
format the file system on the array
mount the array
done

Note that actions are taken on the array and not on each member of the array.
The only thing done to each individual member is creation of partitions that will be used as array members if that is desired. It is not necessary.

1 Like

Thank you again for your reply and for the hint about the missmatch of “FSTYPE”, “FSVER” and “LABEL”.
This is really weird and likely causes the issue I am facing. Obviously sda1 has for some reasons wrong values here.

However there are three things that are even more weird:

  1. Why does F38 assemble the array correctly and F39 does not?
  2. Why are these attributes different? I created both raid members the same way. Also in gparted and “fdisk -l” they look the same. The difference is only visible in “lsblk -f”. Also in mdadm -E both devices look fine.
  3. Why can I assemble the array manually but not via mdadm.conf?!

Is there any option to reset the “FSTYPE”, “FSVER” and “LABEL” attributes for sda1?

Otherwise I tend to remove sda1 from the array, repartition it and add it to the array again which would result in a rebuilt obviously which I prefer to avoid.

first #3. Fedora does not create mdadm.conf by default. It supposedly should work but is not required for most mdadm arrays and I have never used it. The array metadata on the drives makes mdadm.conf superfluous.

#2. The way the array was created seems to have caused the difference in config. I only used a raw non-partitioned disk to create my array. It works the same with a partitioned disk and using the partitions as member devices. I think the extra steps you performed with parted may be a factor in why it is not working – and especially the extra layer of encryption on sda.

#1. Changes in mdadm and possibly luks make F39 different in how it manages arrays.

You might do as you stated.

  1. before you bring the array active, while sda1 is ‘missing’ you could first fail the device then remove it.
  2. Following the removal you could wipe the array metadata and partition table with dd if=/dev/zero of=/dev/sda bs=1MB count=2
  3. Then use gparted to create the partition (or use parted as you did before)
  4. Add the partition back into the array with mdadm as a spare and it should have the partitioning and encryption as it already is on sdb1. Make sure you use the proper mdXXX device name for the array.

I would suggest trying this before you do anything else since it appears sdb1 is functioning properly and only sda1 is having the problem.

Hi Jeff,

I was on vacation last week and did not find the time to try rebuilding the raid until now.
But I did it now and it looks good so far. After doing

mdadm /dev/md127 --fail /dev/sda1
mdadm /dev/md127 --remove /dev/sda1 
dd if=/dev/zero of=/dev/sda bs=1MB count=2
sudo parted /dev/sda mklabel gpt
sudo parted -a optimal -- /dev/sda mkpart primary 2048s -8192s
mdadm /dev/md127 --add /dev/sda1

the raid is rebuilding and the output of " lsblk -f" looks correct now:

NAME                  FSTYPE            FSVER LABEL    UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
sda                                                                                                        
└─sda1                linux_raid_member 1.2   fedora:1 80d41500-4552-1694-9991-d7492221dfb6                
  └─md127             crypto_LUKS       2              a6dd37d8-b237-4294-a179-61b5c56f0e30                
    └─home            ext4              1.0            f453ed45-fce9-44ac-b5db-901c604a6794    1,7T    46% /home
sdb                                                                                                        
└─sdb1                linux_raid_member 1.2   fedora:1 80d41500-4552-1694-9991-d7492221dfb6                
  └─md127             crypto_LUKS       2              a6dd37d8-b237-4294-a179-61b5c56f0e30                
    └─home            ext4              1.0            f453ed45-fce9-44ac-b5db-901c604a6794    1,7T    46% /home

The rebuild will take until tomorrow and I will get back with the results then but I am very positive that it will work after the rebuild.

Thanks for the update. :sunglasses:

The rebuild is done and the raid is assembled fine without any config file again.

I still have no clue what happend to /dev/sda but at least the issue is fixed now. :smiley:

Thanks for your support!

1 Like