Fedora CoreOs degraded RAID1 encrypted with LUKS not booting

I’m using ignition to configure a CoreOS install. Everything works fine as far as booting into the OS, LUKS being enabled with tang, raid1 being configured and clean.

I’m attempting to test failing a drive (running CoreOS in proxmox) by detaching the drive. If i detach the drive and boot the OS or detach the drive and attach a new empty drive; CoreOS will not boot. If I re-attach the regular drive, everything works fine.

I’m not sure if this is common, or just flawed testing methodology, but I wanted to test this out on a VM before attempting to run this on a home server and make sure I could recover a failed drive in the root filesystem mirror.

boot_device:
  luks:
    tang:
      - url: http://
        thumbprint: 
  mirror:
    devices:
      - /dev/sda
      - /dev/sdb

I’d like to note that I tested this configuration without LUKS enabled and it works fine when attempting the same steps.

If your ignition creates multiple LUKS volumes housing your RAID1 Filesystem, then it’s likely an issue with /etc/crypttab. If there are two devices listed to be unlocked, without a nofail option then it would halt the boot if a LUKS device could not be decrypted.

I’d advise to use LVM for the disks and a LUKS filesystem on top.

LVM is not supported on Fedora CoreOS.

Thanks, I was thinking this as well, but this is the result of the ignition that I posted above:

I could be wrong here, but my assumption based off of this is that the raid md127 is what is being opened by luks on boot, which is the root partition, but even with only 1 drive, md127 should still be operational but degraded. From my observation when I try the same config without luks, md127 becomes degraded when I remove a drive, but it still boots. So I’m not sure why in this config when sda is missing it causes a boot failure.

UUID in this output is different than original screenshot due to reprovisioning on a new VM.

core@localhost:~$ sudo cat /etc/crypttab 
root UUID=466e38f5-51d9-48f8-8482-4bd53e276811 none luks,_netdev
core@localhost:~$ ls -l /dev/disk/by-uuid/
total 0
lrwxrwxrwx. 1 root root 10 Mar  7 19:00 04f6bf31-6c75-4234-b4d5-8c280350ec9a -> ../../dm-0
lrwxrwxrwx. 1 root root 10 Mar  7 19:00 1B5C-7260 -> ../../sda2
lrwxrwxrwx. 1 root root 10 Mar  7 19:00 1B5C-7610 -> ../../sdb2
lrwxrwxrwx. 1 root root  9 Mar  7 19:00 2025-02-17-12-57-22-00 -> ../../sr0
lrwxrwxrwx. 1 root root 11 Mar  7 19:00 466e38f5-51d9-48f8-8482-4bd53e276811 -> ../../md127
lrwxrwxrwx. 1 root root 11 Mar  7 19:00 ace259a8-051f-4416-bc11-a42f47dae9af -> ../../md126
core@localhost:~$ lsblk
NAME       MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
sda          8:0    0    32G  0 disk  
├─sda1       8:1    0     1M  0 part  
├─sda2       8:2    0   127M  0 part  
├─sda3       8:3    0   384M  0 part  
│ └─md126    9:126  0 383.9M  0 raid1 /boot
└─sda4       8:4    0  31.5G  0 part  
  └─md127    9:127  0  31.5G  0 raid1 
    └─root 253:0    0  31.5G  0 crypt /usr/bin/swtpm
                                      /var
                                      /sysroot/ostree/deploy/fedora-coreos/var
                                      /etc
                                      /sysroot
sdb          8:16   0    32G  0 disk  
├─sdb1       8:17   0     1M  0 part  
├─sdb2       8:18   0   127M  0 part  
├─sdb3       8:19   0   384M  0 part  
│ └─md126    9:126  0 383.9M  0 raid1 /boot
└─sdb4       8:20   0  31.5G  0 part  
  └─md127    9:127  0  31.5G  0 raid1 
    └─root 253:0    0  31.5G  0 crypt /usr/bin/swtpm
                                      /var
                                      /sysroot/ostree/deploy/fedora-coreos/var
                                      /etc
                                      /sysroot
sr0         11:0    1   849M  0 rom   
zram0      252:0    0   1.9G  0 disk  [SWAP]

I’m following this example from the documentation, except i’m not adding the mirrored /var partition.

This is expected but worth calling out even with the mirrored var partition in the above example still have the same issue.

I’m not sure why having partiton 4 encrypted is causing this issue. I’m assuming /etc/crypttab pointing at the mirror should allow it to unlock even when it’s degraded. This basically puts me in a spot where the mirror isn’t helping much as I wouldn’t be able to repair it in the event of a failure.

Does anyone have any thoughts or ideas on how to handle this type of scenario? I don’t want to get to a point where I have a drive crash and no way to remediate the issue.

Test scenario: Create a VM with ignition that has Luks / Tang configured and a 2 disk mirror. Stop the VM, remove disk 0 (sda) and then replace it with a new empty disk 0 (sda). Set boot device to disk 1 (sdab) and start up the VM, you’ll notice that your system will hang indefitely.

Well, I figured it out, but the best bet is to document your drives before an Issue comes up.

I’ve tried a bunch of ways to get into emergency mode and this was the only one I found that worked. I would love if I could find an easier way to get into it. I think might need to test this: Emergency console access :: Fedora Docs

lsblk

ls -al /dev/disk/by-uuid/

When booting while at the grub menu, press e and then replace the root=UUID= (should be the UUID of dm-0 in the screenshot above) with the md127 uuid (af6732… from screenshot above) and press ctrl + x to boot.

This will put you in emergency mode. I couldn’t find another way to get into emergency mode via kernel command etc. This was the only option that worked for me.

Copy partition table to new disk: (b is good, a is the new / replaced failed)

sfdisk -d /dev/sda | sfdisk /dev/sdb

Add disk to the mirror

mdadm --manage /dev/md127 --add /dev/sda4

Validate the progress

/sbin/mdadm --detail /dev/md0
cat /proc/mdstat

Reboot

Login and then check status of md126, it’s probably degraded

sudo mdadm --detail /dev/md126

Add the disk to the mirror

mdadm --manage /dev/md126 --add /dev/sda3

Validate progress

I would expect the system to come up fully if only one disk failed. Then you’d be able to execute your recovery steps in the fully booted machine and not just in the emergency shell.

I’d agree… based off of the configuration it should. But it doesn’t appear to be doing that when I enable LUKS. It works just fine when I do the same exact mirror configuration in butane/ignition and remove LUKS.

The issue with the steps I did above is that I missed something (maybe when I copied partition structure to the second disk) and now my replacement disk doesn’t have a UUID and it can’t boot if I set bios to boot to it. Could be something with grub too.

Regardless, it’d be great if the luks config would just boot when you have one of the mirrored disks fail.

Here is my full ignition file in case it helps someone point me in the right direction. Really not sure why this fails to boot.

{
  "ignition": {
    "version": "3.4.0"
  },
  "kernelArguments": {
    "shouldExist": [
      "console=tty0 console=ttyS0,115200n8"
    ],
    "shouldNotExist": [
      "console=hvc0",
      "console=tty0",
      "console=ttyAMA0,115200n8",
      "console=ttyS0,115200n8",
      "console=ttyS1,115200n8"
    ]
  },
  "passwd": {
    "users": [
      {
        "name": "",
        "passwordHash": "",
        "sshAuthorizedKeys": [
          ""
        ]
      }
    ]
  },
  "storage": {
    "directories": [
      {
        "path": "/etc/ucore-autorebase",
        "mode": 492
      }
    ],
    "disks": [
      {
        "device": "/dev/sda",
        "partitions": [
          {
            "label": "bios-1",
            "number": 1,
            "sizeMiB": 1,
            "typeGuid": "21686148-6449-6E6F-744E-656564454649"
          },
          {
            "label": "esp-1",
            "number": 2,
            "sizeMiB": 127,
            "typeGuid": "C12A7328-F81F-11D2-BA4B-00A0C93EC93B"
          },
          {
            "label": "boot-1",
            "number": 3,
            "sizeMiB": 384
          },
          {
            "label": "root-1",
            "number": 4,
            "resize": true
          }
        ],
        "wipeTable": true
      },
      {
        "device": "/dev/sdb",
        "partitions": [
          {
            "label": "bios-2",
            "number": 1,
            "sizeMiB": 1,
            "typeGuid": "21686148-6449-6E6F-744E-656564454649"
          },
          {
            "label": "esp-2",
            "number": 2,
            "sizeMiB": 127,
            "typeGuid": "C12A7328-F81F-11D2-BA4B-00A0C93EC93B"
          },
          {
            "label": "boot-2",
            "number": 3,
            "sizeMiB": 384
          },
          {
            "label": "root-2",
            "number": 4,
            "resize": true
          }
        ],
        "wipeTable": true
      }
    ],
    "filesystems": [
      {
        "device": "/dev/disk/by-partlabel/esp-1",
        "format": "vfat",
        "label": "esp-1",
        "wipeFilesystem": true
      },
      {
        "device": "/dev/disk/by-partlabel/esp-2",
        "format": "vfat",
        "label": "esp-2",
        "wipeFilesystem": true
      },
      {
        "device": "/dev/md/md-boot",
        "format": "ext4",
        "label": "boot",
        "wipeFilesystem": true
      },
      {
        "device": "/dev/mapper/root",
        "format": "xfs",
        "label": "root",
        "wipeFilesystem": true
      }
    ],
    "luks": [
      {
        "clevis": {
          "tang": [
            {
              "thumbprint": "",
              "url": ""
            }
          ]
        },
        "device": "/dev/md/md-root",
        "discard": true,
        "label": "luks-root",
        "name": "root",
        "wipeVolume": true
      }
    ],
    "raid": [
      {
        "devices": [
          "/dev/disk/by-partlabel/boot-1",
          "/dev/disk/by-partlabel/boot-2"
        ],
        "level": "raid1",
        "name": "md-boot",
        "options": [
          "--metadata=1.0"
        ]
      },
      {
        "devices": [
          "/dev/disk/by-partlabel/root-1",
          "/dev/disk/by-partlabel/root-2"
        ],
        "level": "raid1",
        "name": "md-root"
      }
    ]
  }
}