Cant boot kernel 6.14.x on Fedora 41 machine with zfs disks

,

Im a fairly advanced linux user, but this one has me stumped…

First off, its worth noting that my system isnt quite a “standard” fedora install.

  • First, it uses zfs on root, where the zpool with the root zfs dataset is on a luks encrypted partition.
  • Second, it boots unified kernel images generated using dracut --uefi with an embedded kernel cmdline specifying the root dataset and luks volume UUID. this is enrolled with efibootmgr and booted directly from the UEFI.

That said, when I try to boot kernel 6.14.4, the systemd-modules-load.service fails and it drops me into a dracut emergency shell. trying to manually run ssystemd-modules-load tells me that the following modules fail to load because “decompression failed with status 6” : i2c_dev, fuse, scsi_dh_alua, scsi_dh_emc, and scsi_dh_rdac. each also gives a error message “Failed to insert module : Invalid argument”. attempting to manually insert them with modprobe also fails.

checking dmesg/journalctl tells me that iscsi service was requested on a kernel/initrd tha is not capable of doing iscsi. I tried re-generating the UKI with the dracut iscsi module omitted, but this resulte in many more “decompression failed with status 6” and then puts me in an infinite “dracut-initqueue” loop without ever dropping my into a dracut emergency shell.

I can access the rdsoreport.txt (in /run/initramfs) but cant export it out of the initrd…block devices seem to be not working. if i plug in a usb drive i can find it under /sys and figure out which device under /dev/bus/usb it is, but if i try and mount it it says it cant find the underlying blockdev. /dev/disk does not exist, nor does do the block devices for my nvme drive that holds my root (luks-encrypted) filesystem.

Anyone have a fix? or know what the issue here is? or have a suggestion on how to figure it out?

Thanks in advance.

Not sure if this is relevant, but perhaps take a loot at Topicbox ?

I add zfs to your topic title.

As I understand it zfs takes time to be ported to a new kernel.
You need to make sure you have a new zfs for a new kernel before switching over.


Good thought, but I’m already using zfs 2.3.2 , which supports kernel 6.14.x.

Its worth noting that openZFS provides DNF repos with rpm packages for zfs, and that these leverage the dependency system built into DNF that prevent the kernel from updating to a version that isnt compatible with zfs. This is done in large part to prevent the situation you are envisioning (i.e., booting a kernel that needs the zfs kmods but is unable to compile them with dkms).


I really dont think this is a zfs problem. the boot process is failing far before zfs plays any part. The zfs pool containing the datasets with my root filesystem live on the unlocked LUKS block device on /dev/mapper/luks-<luksUUID>. This block device is created via a command that basically does

cryptsetup open /dev/disk/by-uuid/$diskUUID luks-$(cryptsetup luksUUID /dev/disk/by-uuid/$diskUUID)

and $diskUUID is passed in the rd.luks.uuid=<...> kernel cmdline argument that I explicitly specify when I generate the UKI with dracut --uefi ands that dracut builds into the UKI.

This cryptsetup command fails to run since, at the point where the boot fails and i get dropped into a dracut emergency shell, /dev/disk doesnt exist (nor do the /dev/nvmeXnYpZ “real” block devices that the entries under /dev/disk/by-*/* would typically symlink to).


At any rate, while I cant 100% rule out ZFS, I really dont think that ZFS is the problem here.

UPDATE: the core issue here seems to be that on kernel 6.14.x the initrd’s version of modprobe/insmod has forgotten how to use xz. uncompressed kmods (*.ko) load just fine, but compressed ones (*.ko.xz) fail to load.

I was able to get the system to boot by adding an override to systemd-module-load.service thatchecks if it is in the initrd, and if so remounts /usr as rw and decompresses all the *.ko.xz files manually, runs depmod -a, and then remounts /usr back to ro. Run systemctl edit systemd-module-load.service and then put

[Service]
ExecStartPre=/bin/bash -c 'grep -qF '"'"' / zfs '"'"' </proc/mounts || { grep -qF '"'"' /usr '"'"' /proc/mounts && mount -o remount,rw /usr; shopt -s globstar; for nn in /usr/lib/modules/[0-9]*/**/*.ko.xz; do xz -d "$nn"; done; shopt -u globstar; depmod -a; grep -qF '"'"' /usr '"'"' /proc/mounts && mount -o remount,ro /usr; }'

Note to anyone who might try this fix - in grep -qF '/ zfs ' change zfs to whatever your root filesystem type actually is.


This is a band-aid, not a fix. I still dont know why the initrd suddenly cant load xz-compressed kmods…anyone have any guesses?

Is it all .xz mods or only the zfs one?

What happens when you modprobe after boot a .xz module?

Its all the kmods.

Funnily enough, the zfs and nvidia kmods (the two “usual problems”) were the only kmods I *could * load, since dkms/akmods didnt compress these after building them.

From a dracut emergency shell that is still in the initrd (i.e., before switch_root happens): manually trying to modprobe / insmod a compressed kmod fails with an error that says something like "decompression failed with status code 6”.

If i remount the initrd’s /usr as rw, I can manually decompress the kmod via xz -d /usr/lib/modules/$(uname -r)/.../___.ko.xz, then run depmod -A, and then modprobe ___ works just fine. So, its not a missing/broken xz binary.

From a fully booted system (after logging in): modprobe works just fine with compresed kmods.

This makes me think that dracut may be at fault here.

Look at this thread, the kernel module compression changed in f42

dkms script will need adjusting

1 Like

This fixed it.

I self sign all my kmods (i use secure boot with only my personal self-signed keys). I have a script that does uncompress–>sign–>re-compress for all the (compressed) kmods, and the recompress operation didnt have the --check=crc32 --lzma2=dict=1MiB flags for xz. Adding those flags / recompressing again with those flags and rebuilding my UKI’s fixed booting for me.

This explains why it was seemingly “just me” with this issue. Im still not sure why decompression only was failing for 6.,14.x kernels and only failed in the initrd, not the main system, but I can live with that remaining a mystery.

Many thanks for the help.