System with Full Disk Encryption Stopped Offering LUKS Prompt On Boot

Well, I pushed my luck too far and now I cannot boot. Recently, I was having trouble booting kernel 6.8 kernels from vanilla copr which would frequently, but not always, boot because they would hang early after loading the kernel unable to present the prompt for LUKS password. I had been falling back to the F39 kernel, but now that is broken two, even though I didn’t update it or make any changes to the boot configuration. I just tried to update kernel from vanilla copr.

Can’t post photo of boot errors right now, but it hangs for a while and then dracut script reports some timeout error, udev empty, and shows some BASH code about detecting a UUID that I recognize as my LUSK uuid.

Dead in the water, please help.

I’ve tried to keep up with changes over time, but I had to manually upgrade to systemd-boot and I wonder if something is missing in kernel parameter or initrd? If it’s initrd, I don’t know how to fix it. If it’s just some kernel boot parameter, I can type that in if I know what it wants. But, I’m not sure what is wrong. I see all the boot paramters I’m familiar with, ones that used to work fine. It’s not missing anythng I’m used to seeing. For exmaple, there is rd.luks.uuid=… which looks right, and soo on. I see something called “cryptdevice=” in the Arch docs, but I don’t know if that’s something I need; seems like it’s specific to a particular boot setup.

I just don’t know what to check because I couldn’t find anything about F39 + md + cryptsetup + systemd-boot. My UEFI seems to be working. But, something is broken with unlocking full disk encryption.

Good news, I added some kernel debug parameters and that actually helpped it boot, somehow. So, I think I’ve got a workaround for this emergency, but not much more…

Hmm, well, I think I know what parameter fixed the F39 6.7 kernel boot configuration, but the same thing had no effect on the 6.9 kernel from vanilla copr. I suppose that could just be a problem with upstream, but it seems unlikely.

by the way, the parameter is ‘rd.auto’. I also used rd.debug, which produced a log of dracut script during boot. I made a copy of that. I’ll post here later. I’ll also try that with the vanilla kernel, which should be helpful, but I already know what it shows: it says it cannot find the LUKS device.

Since you are using a kernel that is out of tree how do you expect us to assist?

If you were to install the latest kernel from fedora (6.7.9) and if it still presented the same problems then it would be a fedora issue. A vanilla kernel does not present a fedora issue.

Fedora kernels are mostly tweaked to work with other fedora software. Vanilla kernels are not.
While fedora kernels seem to start as vanilla they are tweaked and tested vigorously before release for general use.

I don’t understand. I’m just reaching out the the community like normal. I’m only asking for the same help I always get here: your expertise. If the problem is the kernel, then that’s fine, nothing I can do. That’s a fine result.

But, as I said, the problem affects the official F39 6.7-9 kernel too.

In any case, I think the symptoms suggest some kind of config thing. I’ve check the things I know to check, but I think I’m missing something.

At this point, the symptoms suggest a problem with kernel parameters, and I need to figure out where those come from. But, after that, I still need to figure out what the correct parameter are, and why the system doesn’t have them. I don’t remember the last time I made any manual changes. I don’t think anything I added is on the cmdline anymore, it was all rewritten during system upgrades.

If you btrfs on top of luks, then the commandline would be like

root=UUID=ec37241a-5e7c-4e00-b319-a23fbce59bf6 ro rootflags=subvol=root rd.luks.uuid=luks-391941a3-9e12-4f19-b622-9df3236d01c9 rhgb quiet

The file /etc/kernel/cmdline should show that. It was originally customized by anaconda during initial installation.

1 Like

You might try the Fedora kernel 6.8 from our koji, since it already is there, and several people have already tested it. This kernel is still a TESTING kernel, but I tend to consider it more stable than any kernel from outside our build and testing processes.

If you want to get the current TESTING 6.8 kernel from Fedora, and if you have x86_64 hardware as most users, use dnf update https://kojipkgs.fedoraproject.org//packages/kernel/6.8.1/201.fc39/x86_64/bpftool-6.8.1-201.fc39.x86_64.rpm https://kojipkgs.fedoraproject.org//packages/kernel/6.8.1/201.fc39/x86_64/kernel-6.8.1-201.fc39.x86_64.rpm https://kojipkgs.fedoraproject.org//packages/kernel/6.8.1/201.fc39/x86_64/kernel-core-6.8.1-201.fc39.x86_64.rpm https://kojipkgs.fedoraproject.org//packages/kernel/6.8.1/201.fc39/x86_64/kernel-modules-6.8.1-201.fc39.x86_64.rpm https://kojipkgs.fedoraproject.org//packages/kernel/6.8.1/201.fc39/x86_64/kernel-modules-core-6.8.1-201.fc39.x86_64.rpm https://kojipkgs.fedoraproject.org//packages/kernel/6.8.1/201.fc39/x86_64/kernel-modules-extra-6.8.1-201.fc39.x86_64.rpm https://kojipkgs.fedoraproject.org//packages/kernel/6.8.1/201.fc39/x86_64/kernel-tools-6.8.1-201.fc39.x86_64.rpm https://kojipkgs.fedoraproject.org//packages/kernel/6.8.1/201.fc39/x86_64/kernel-tools-libs-6.8.1-201.fc39.x86_64.rpm https://kojipkgs.fedoraproject.org//packages/kernel/6.8.1/201.fc39/x86_64/libperf-6.8.1-201.fc39.x86_64.rpm https://kojipkgs.fedoraproject.org//packages/kernel/6.8.1/201.fc39/x86_64/perf-6.8.1-201.fc39.x86_64.rpm https://kojipkgs.fedoraproject.org//packages/kernel/6.8.1/201.fc39/x86_64/rtla-6.8.1-201.fc39.x86_64.rpm https://kojipkgs.fedoraproject.org//packages/kernel/6.8.1/201.fc39/x86_64/rv-6.8.1-201.fc39.x86_64.rpm https://kojipkgs.fedoraproject.org//packages/kernel/6.8.1/201.fc39/x86_64/python3-perf-6.8.1-201.fc39.x86_64.rpm → if you have no x86_64, take the packages for your very architecture from kernel-6.8.1-201.fc39 | Build Info | koji

I suggest to remove your copr kernel before doing the dnf command, I have never tested them in conjunction, but maybe dnf will not install it (or some packages) if there is already something of 6.8.1 installed (I have no copr experience).

Feel free to let us know if the problem persists, and then more details, logs, outputs please. If you want to stick with the copr kernel instead, I suggest to get in touch with the maintainer of this copr kernel and let them know. But in any case I think the first thing to test is to check if this is an issue specific to kernel 6.8 or to the copr build.

Also: Do not keep working with a kernel that has debugging parameters enabled. There are not intended for production use. Some even disable sensitive security/stability measures. There is a reason why they are disabled by default.

1 Like

Here is what I see behind the graphical boot screen:

I wonder if it could be more fallout from the broken blkid command that was going around recently: System fails to boot after dnf system upgrade due to missing MD (RAID) devices - #30 by gui1ty

2 Likes

@vekruse That looks very much like my cmdline.

initrd=\<redacted>\6.7.9-200.fc39.x86_64\initrd root=/dev/mapper/vgNew-LVroot ro rhgb rd.auto rd.lvm.lv=vgNew/LVswap rd.dm=0 rd.luks.uuid=luks-6958ec9b-02ae-4886-8b78-66a9a8b615d3 rd.lvm.lv=vgNew/LVroot rd.md.uuid=6ee6e4ea:f55c899b:288b4396:68cb4488 root=/dev/mapper/vgNew-LVroot rootfstype=ext4 rootflags=rw,relatime,seclabel,data=ordered SYSFONT=latarcyrheb-sun16 KEYMAP=us quiet LANG=en_US.UTF-8 systemd.machine_id=<redacted>

where you see that I added rd.auto flag. Without that option, it doesn’t prompt and doesn’t find my LUKS UUID. Also, notice this is the current F39 kernel.

You see I’m using ext4 for root and specify it with device name instead of UUID. I could change that, but I don’t think that’s the problem. Geez, how long has SYSFONT been in there?

I lost my rd.debug output, but I can get it again pretty easy, but this is my main system and I need to get some work done. More soon…

@glb Oh, that’s very interesting. I didn’t catch that at the time, but that’s a good thought.

Personalities : [raid10] 
md126 : active raid10 nvme1n1[8] nvme0n1[5] sdd[4](S) sda[6] sdc[7]
      1953262592 blocks super 1.2 512K chunks 2 near-copies [4/4] [UUUU]
      bitmap: 11/15 pages [44KB], 65536KB chunk

Looks like the fix was some package changes, so it shouldn’t have been an issue today, though, right? I mean, I updated a few hours ago. So, if my blkid was working at that time, dracut wouldn’t have got wrong information when installing the new kernel, right? Just checking my understanding of the issue.

My blkid looks good, now, anyway:

/dev/nvme0n1: UUID="6ee6e4ea-f55c-899b-288b-439668cb4488" UUID_SUB="05289bdf-ada3-ab5a-1ce1-5557a9f79cba" LABEL="raid10" TYPE="linux_raid_member"
/dev/sdd: UUID="6ee6e4ea-f55c-899b-288b-439668cb4488" UUID_SUB="ba142b62-cd21-c78e-e22f-7ebd6402c3e1" LABEL="raid10" TYPE="linux_raid_member"
/dev/mapper/vgNew-LVroot: UUID="22634dce-975d-4ab4-a8a0-28b70c047499" BLOCK_SIZE="4096" TYPE="ext4"
/dev/mapper/luks-raid10: UUID="LX3H01-0bvg-n3Xl-vbmE-tvOh-eUj0-t5GypC" TYPE="LVM2_member"
/dev/sdc: UUID="6ee6e4ea-f55c-899b-288b-439668cb4488" UUID_SUB="50512748-d5cd-9fb2-5fdd-0fa01438dd70" LABEL="raid10" TYPE="linux_raid_member"
/dev/md126: UUID="6958ec9b-02ae-4886-8b78-66a9a8b615d3" TYPE="crypto_LUKS"
/dev/nvme1n1: UUID="6ee6e4ea-f55c-899b-288b-439668cb4488" UUID_SUB="9f05a54a-6a7a-ad42-b018-46a6269a6d34" LABEL="raid10" TYPE="linux_raid_member"
/dev/sda: UUID="6ee6e4ea-f55c-899b-288b-439668cb4488" UUID_SUB="0a36f98e-cadb-0a03-baca-bdc2fb56e2ad" LABEL="raid10" TYPE="linux_raid_member"

I don’t know for sure. The problem never affected me. All I know is that the blkid command has been undergoing some revisions of late and, depending on what version your system had installed at the time your initramfs was generated, it might cause issues with the partition being identified and the file system being mounted.

Looking at that blkid output I see that you have both HDD (sda, sdc, sdd) and SSD (nvme0n1 & nvme1n1) in that raid array. Since the HDD is much slower on both read & write mixing the device types in a raid array has at times become problematic. I wonder if this is the cause of your problem.

Also, since sdd shows as a spare I am a bit puzzled because raid 10 normally requires an even number of devices. I have never tried using a spare with an odd number of devices in raid 10.

How are your file systems distributed.? Please show lsblk -f

Thank you everyone for the help and great ideas!

As I said above, I’ll get back to this with more details soon. I just found the rd.debug flag, so now I now how to get a lot more information about what dracut is trying to do. I’ll get that info and bring it here.

In the meantime, allow me to explain why I use the copr. I’ve been troubleshooting an major AMDGPU OpenCL issue specific to RDNA3 with the ROCm pkg maintainer for about 9 months now. That’s the only reason I installed vanilla-womerge copr. So, I’ve been using upstream kernels since 6.4 maybe? This issue of not prompting LUKS passphase started with kernel 6.8, I think; the first 6.8 kernel I got. That was about three months ago, i think. So, I figured it was just the kernel. But, then, it started affecting the latest F39 kernel after I upgraded that, but usually it was just the vanilla kernels. And, there was always another kernel installed that worked, so I just hopped it would get worked out.

But, that luck ran out today. I updated the vanilla kernel, it didn’t work, I booted the F39 kernel, which worked, I removed the old vanilla kernel with dnf and checked for a new F39 kernel to install, but there wasn’t one. I installed flatpak updates and rebooted. Then, neither of the remaining kernels would work.

I know, it doesn’t make sense. Why would a kernel config that previously worked stop working. I removed a kernel with dnf, but, I don’t see how that would affect boot config. Maybe the UEFI vars? But, I’m almost certain the working configuration wasn’t modified when I uninstalled that other kernel. I think I’ve checked mod times in /etc/loader/entries before and they’re not getting mangled or anything. Which suggests the problem is NOT a kernel parameter, but something is preventing dracut from finding LUKS dev, so I don’t quite get it. The other idea I had was that initrd was missing a module, but again, the initrd files arent’ getting modified by dnf remove of other kernels; I’ve checked.

It’s not an older LUKS type 1 volume is it? I expect that would have broken with the upgrade to F38 if so, but it might be worth checking: Upgrade from Luks => Luks2 on Main Disk?

2 Likes

/dev/sdX are SSDs, too. Not sure why you assume HDD. It’s SATAIII or whatever, but the drives themselves start nearly as quickly as NVMe, I’m assuming.

Although, it seems on the edge of possibility that linux could enumerate SATA, think that’s enough, and ignore NVMe, which would prevent the raid from auto build. Hmm, I remember thinking of that once before, but I don’t remember what I figured out.

Hmm, I’m not following you about RAID10. This is near copies, though, so not traditional 1+0. It’s been a while since I set it up and even longer since I was sysadmin, so I’d have to do some research, but I can tell you that this configuration works as expected; not only have I tested it by software failing a drive, but I’ve had it fail over correctly under a genuine hardware failure.

My lsblk is really long, how about this part:

NAME            FSTYPE FSVER LABEL  UUID                                   FSAVAIL FSUSE% MOUNTPOINTS
sda             linux_ 1.2   raid10 6ee6e4ea-f55c-899b-288b-439668cb4488                  
└─md126         crypto 1            6958ec9b-02ae-4886-8b78-66a9a8b615d3                  
  └─luks-raid10 LVM2_m LVM2         LX3H01-0bvg-n3Xl-vbmE-tvOh-eUj0-t5GypC                
    ├─vgNew-LVswap
    │           swap   1            074dad4f-082d-456e-9d74-991d90f68ddd                  [SWAP]
    ├─vgNew-LVroot
    │           ext4   1.0          22634dce-975d-4ab4-a8a0-28b70c047499     60.5G    34% /
    ├─vgNew-LVhome
    │           ext4   1.0          456bd697-b6c2-4ad0-a26f-0e32bb3c7615      217G    87% /var/spool/mail
    │                                                                                     /home
    └─vgNew-LVvar
                ext4   1.0          64dcca21-d2da-4ca2-8389-881b352151ce     32.9G    40% /var
sdd             linux_ 1.2   raid10 6ee6e4ea-f55c-899b-288b-439668cb4488                  
└─md126         crypto 1            6958ec9b-02ae-4886-8b78-66a9a8b615d3                  
  └─luks-raid10 LVM2_m LVM2         LX3H01-0bvg-n3Xl-vbmE-tvOh-eUj0-t5GypC                
    ├─vgNew-LVswap
    │           swap   1            074dad4f-082d-456e-9d74-991d90f68ddd                  [SWAP]
    ├─vgNew-LVroot
    │           ext4   1.0          22634dce-975d-4ab4-a8a0-28b70c047499     60.5G    34% /
    ├─vgNew-LVhome
    │           ext4   1.0          456bd697-b6c2-4ad0-a26f-0e32bb3c7615      217G    87% /var/spool/mail
    │                                                                                     /home
    └─vgNew-LVvar
                ext4   1.0          64dcca21-d2da-4ca2-8389-881b352151ce     32.9G    40% /var
nvme0n1         linux_ 1.2   raid10 6ee6e4ea-f55c-899b-288b-439668cb4488                  
└─md126         crypto 1            6958ec9b-02ae-4886-8b78-66a9a8b615d3                  
  └─luks-raid10 LVM2_m LVM2         LX3H01-0bvg-n3Xl-vbmE-tvOh-eUj0-t5GypC                
    ├─vgNew-LVswap
    │           swap   1            074dad4f-082d-456e-9d74-991d90f68ddd                  [SWAP]
    ├─vgNew-LVroot
    │           ext4   1.0          22634dce-975d-4ab4-a8a0-28b70c047499     60.5G    34% /
    ├─vgNew-LVhome
    │           ext4   1.0          456bd697-b6c2-4ad0-a26f-0e32bb3c7615      217G    87% /var/spool/mail
    │                                                                                     /home
    └─vgNew-LVvar
                ext4   1.0          64dcca21-d2da-4ca2-8389-881b352151ce     32.9G    40% /var
nvme1n1         linux_ 1.2   raid10 6ee6e4ea-f55c-899b-288b-439668cb4488                  
└─md126         crypto 1            6958ec9b-02ae-4886-8b78-66a9a8b615d3                  
  └─luks-raid10 LVM2_m LVM2         LX3H01-0bvg-n3Xl-vbmE-tvOh-eUj0-t5GypC                
    ├─vgNew-LVswap
    │           swap   1            074dad4f-082d-456e-9d74-991d90f68ddd                  [SWAP]
    ├─vgNew-LVroot
    │           ext4   1.0          22634dce-975d-4ab4-a8a0-28b70c047499     60.5G    34% /
    ├─vgNew-LVhome
    │           ext4   1.0          456bd697-b6c2-4ad0-a26f-0e32bb3c7615      217G    87% /var/spool/mail
    │                                                                                     /home
    └─vgNew-LVvar
                ext4   1.0          64dcca21-d2da-4ca2-8389-881b352151ce     32.9G    40% /var

@glb Yes, it’s version 1!

Can I upgrade it? I remember reading something about the new LUKS, but I thought I upgraded it. Apparently not. Can you?

Ah, I see, there is a way.

Sorry, I’m not the one to ask (I use ZFS for all this :slightly_smiling_face:). Glad you narrowed it down though. :slightly_smiling_face: