Cannot boot kernels newer than 6.7.11-200.fc39 | Fedora 40 |

Hi,
I’m coming here as a last resort.
I’ve been looking around everywhere while hoping it would eventually be fixed with an update.
Long story short, all kernels above 6.7.11-200.fc39 fail to boot.
Here’s some info:

When did it start?

  • The first 6.8 kernel failed to boot. I hoped eventually it would get fixed, but there’s been about 10 kernel versions since then.

What happens whenever I try, for example, the latest 6.9.6-200.fc40?

  • I select the latest kernel in grub2
  • I get a prompt to unlock the encrypted device and then it freezes infinitely (did not wait more than 20min though)
  • I cannot give you the journalctl output, for some reason, the failed log isn’t saved, though I saw some services fail, notably: audit-rules and remount-fs (I saw this by removing the quiet argument)
  • The system seems to be responsive though, because I can press power or ctrl+alt+del to restart

Some clues
Some things that may or may not be related that might help

  • I’m running F40 on a laptop with and amd dgpu
  • My filesystem setup consists of a JBOD btrfs with 2 nvme disks. Only 1 is encrypted (I think, because only one luks device is listed in /etc/crypttab): one is the root subvol created during the original f36 installation; the 2nd one mounts the /home subvol created afterwards to increase storage
  • Both devices are listed in /etc/fstab with the same UUID. Each one lists the corresponding btrfs subvol they mount though
  • Because journalctl is not saving the failed boots, I suspect there is an issue with the decryption and my filesystem is never available?
  • The latest f40 workstation live usb with 6.8.3? booted fine, so I guess the problem must be with the configuration in my system
  • The setup comes from f36 upgrades and is based on luks v2 (I saw other threads asking this)
1 Like

Should have come here first ! :fedora:

Do you have a Swap partition or a Swap file?

Other possible issues

Do you have a Swap partition or a Swap file?

I do (see below), though I don’t have a RESUME=... kernel arg.

$ lsblk
# bunch of loop stuff from snapd...
zram0                                         252:0    0  47.1G  0 disk  [SWAP]
nvme0n1                                       259:0    0   1.9T  0 disk  
nvme1n1                                       259:1    0 953.9G  0 disk  
├─nvme1n1p1                                   259:2    0   260M  0 part  /boot/efi
├─nvme1n1p2                                   259:3    0    16M  0 part  
├─nvme1n1p3                                   259:4    0 147.1G  0 part  
├─nvme1n1p4                                   259:5    0   700M  0 part  
├─nvme1n1p5                                   259:6    0    22G  0 part  
├─nvme1n1p6                                   259:7    0   200M  0 part  
├─nvme1n1p7                                   259:8    0     1G  0 part  /boot
└─nvme1n1p8                                   259:9    0 782.7G  0 part  
  └─luks-ba7654bc-d6e4-4d12-bf25-e053c4a36a87 253:0    0 782.7G  0 crypt /home
                                                                         /
1 Like

I was actually able to get a terminal and login by using single kernel arg with 6.9.6.... For some reason the filesystem seems to be in read-only mode, so I can’t do much.

1 Like

Added f40 and removed amd, f36

Can you drop down to a Terminal and try to run journalctl -xe | fpaste --raw-url and paste the url here for us to see the logs.

I didn’t think that was possible!

Here’s 6.7.11 where it works fine: https://paste.centos.org/view/raw/1d404e74

Here’s 6.9.6 where it didn’t boot (I dropped a terminal using single kernel arg): https://paste.centos.org/view/raw/00da1665

Can you post the results of cat /proc/cmdline Need to figure out what’s happening, as those logs did not produce much substance.

Also adding /etc/fstab and /etc/crypttab in case that helps.

$ cat /proc/cmdline 
BOOT_IMAGE=(hd1,gpt7)/vmlinuz-6.7.11-200.fc39.x86_64 root=UUID=0601ace4-94df-4df4-a724-15508bcd64c8 ro rootflags=subvol=root rd.luks.uuid=luks-ba7654bc-d6e4-4d12-bf25-e053c4a36a87 rhgb quiet

$ cat /etc/fstab

#
# /etc/fstab
# Created by anaconda on Thu Dec 16 08:20:11 2021
#
# Accessible filesystems, by reference, are maintained under '/dev/disk/'.
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info.
#
# After editing this file, run 'systemctl daemon-reload' to update systemd
# units generated from this file.
#
UUID=0601ace4-94df-4df4-a724-15508bcd64c8 /                       btrfs   subvol=root,device=UUID_SUB=68f44241-757e-48d9-bb6a-81cba5b9e7c6,compress=zstd:1,x-systemd.device-timeout=0 0 1
UUID=c02ed8e0-e49b-47c5-9e17-8c9cd42ce474 /boot                   ext4    defaults        1 2
UUID=CA65-1BC4          /boot/efi               vfat    umask=0077,shortname=winnt 0 2
UUID=0601ace4-94df-4df4-a724-15508bcd64c8 /home                   btrfs   subvol=home,compress=zstd:1,x-systemd.device-timeout=0 0 2


$ sudo cat /etc/crypttab
luks-ba7654bc-d6e4-4d12-bf25-e053c4a36a87 UUID=ba7654bc-d6e4-4d12-bf25-e053c4a36a87 none discard

Adding dmesg logs just in case.

Here’s the working 6.7.11: https://paste.centos.org/view/raw/0d6658ef

Here’s the non-working 6.9.6: https://paste.centos.org/view/raw/c41073ca

I’m pretty sure it could be solved by reinstalling from scratch but I’d really love to understand and learn what’s going on. So if you have any insights, they’re all welcomed.

1 Like

I noticed the last bootable kernel is fc39 but you’re on Fedora 40. Is it possible that the upgrade triggered this bug?

That seemed like a great idea but I ran the test sudo ls -d "/boot/efi/$(cat /etc/machine-id)" and it shows I’m not affected by it.

Also, I can’t remember exactly right now, but I think I first noticed it while on F38, during the first upgrade to 6.8.x. It might have been on F39, but I’m pretty sure I’ve never been able to run any 6.8. I could probably try to force install an old rpm from F39 to test. Do you think it’s worth a try?

In any case, thanks everyone for the help so far.

You could simply run ls /boot/efi/ and see what is shown. Potentially there may be a directory there that could trigger that bug even though the name may not match the result of cat /etc/machine-id

Also the result of sudo du -hs /boot/efi would be interesting along with df -h and sudo ls -R /boot

1 Like
$ sudo ls /boot/efi/
 EFI   mach_kernel   System  'System Volume Information'


$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/dm-0       2.7T  2.2T  533G  81% /
devtmpfs        4.0M     0  4.0M   0% /dev
tmpfs            16G  590M   15G   4% /dev/shm
efivarfs        128K   44K   80K  36% /sys/firmware/efi/efivars
tmpfs           6.2G  6.8M  6.2G   1% /run
/dev/dm-0       2.7T  2.2T  533G  81% /home
/dev/nvme0n1p7  974M  325M  582M  36% /boot
tmpfs            16G   60M   16G   1% /tmp
/dev/nvme0n1p1  256M   45M  212M  18% /boot/efi
tmpfs           3.1G  152K  3.1G   1% /run/user/1000

What is the result of ls /boot and sudo cat /boot/efi/EFI/fedora/grub.cfg?

We have often seen issues where a user altered the file noted above and thus could boot no newer kernels regardless of how many updates were done.

Please let us know the content of that file and if necessary we can assist.

1 Like
$ sudo ls /boot/
config-6.7.10-200.fc39.x86_64				 memtest86+x64.efi
config-6.7.11-200.fc39.x86_64				 symvers-6.7.10-200.fc39.x86_64.xz
config-6.9.6-200.fc40.x86_64				 symvers-6.7.11-200.fc39.x86_64.xz
efi							 symvers-6.9.6-200.fc40.x86_64.xz
extlinux						 System.map-6.7.10-200.fc39.x86_64
grub2							 System.map-6.7.11-200.fc39.x86_64
initramfs-0-rescue-6bbc7441f1264f80ac6cddda90802d18.img  System.map-6.9.6-200.fc40.x86_64
initramfs-6.7.10-200.fc39.x86_64.img			 vmlinuz-0-rescue-6bbc7441f1264f80ac6cddda90802d18
initramfs-6.7.11-200.fc39.x86_64.img			 vmlinuz-6.7.10-200.fc39.x86_64
initramfs-6.9.6-200.fc40.x86_64.img			 vmlinuz-6.7.11-200.fc39.x86_64
loader							 vmlinuz-6.9.6-200.fc40.x86_64
lost+found


$ sudo cat /boot/efi/EFI/fedora/grub.cfg
search --no-floppy --fs-uuid --set=dev c02ed8e0-e49b-47c5-9e17-8c9cd42ce474
set prefix=($dev)/grub2

export $prefix
configfile $prefix/grub.cfg

For what is worth, I installed 6.9.6-100.fc39 and no luck.

Maybe unrelated but I noticed a complaint from selinux in the logs and tried to fix it:

[   51.899910] SELinux: https://github.com/SELinuxProject/selinux-kernel/wiki/DEPRECATE-runtime-disable
[   51.900766] SELinux: Runtime disable is not supported, use selinux=0 on the kernel cmdline.

Apparently my /etc/selinux/config has SELINUX=disabled which was deprecated in favor of the kernel arg selinux=0. I don’t remember having manually changed it but it is possible it was a requirement for installing docker at some point? I’m not sure. In any case, SELINUX=enforcing won’t boot even on my currently working kernel 6.7.11.fc39 (though if I add selinux=0 it fixes it).

1 Like

This seems to mean the 6.9.6 kernel is booting.

Is the problem occuring before you unlock the luks device or after. I see luks in the kernel command line so that could become an issue.

Also, selinux has made several changes since the 6.7.11 kernel and your comments indicate that selinux may be a factor (you need to use selinux=0 to disable selinux when booting).
It may be appropriate to boot to the 6.9.6 kernel and run touch /.autorelabel so the system is fully relabeled according to the current selinux policies with the next boot. Then see if it boots normally to the newer kernel or not.

I assume that you have internet connection while in single user mode, and if so then a dnf upgrade --refresh should not hurt and may help.

1 Like

This seems to mean the 6.9.6 kernel is booting.

Is the problem occuring before you unlock the luks device or after. I see luks in the kernel command line so that could become an issue.

It doesn’t seem to get stuck unlocking the luks device. I get the prompt an then it gets stuck (all I see is the log and it stops with a few services failed (the ones I copy pasted above).

When I add the single kernel arg then I’m able to ctrl+alt+F2 to login into an interactive session with my user. I do have internet access with it, that’s how I got the logs online. I can also see my files, so that makes me think the luks unlocking went well.

However, I had to do it that way because the filesystem is in read-only mode. I think it may be related to the remount-fs service failing?

Also, selinux has made several changes since the 6.7.11 kernel and your comments indicate that selinux may be a factor (you need to use selinux=0 to disable selinux when booting).

I tried 6.9.6 with and without single and selinux=0 without any effect. I’m convinced that selinux=0 is necessary though (because 6.7.11 won’t work without it) but I don’t know why.

It may be appropriate to boot to the 6.9.6 kernel and run touch /.autorelabel so the system is fully relabeled according to the current selinux policies with the next boot. Then see if it boots normally to the newer kernel or not.

Can I do it with 6.7.11? Because of the read-only mode I don’t think that’s going to work.

I assume that you have internet connection while in single user mode, and if so then a dnf upgrade --refresh should not hurt and may help.

I do, though I don’t see the need for it as I keep it updated when I boot 6.7.11 (I mean, other than the kernel).

Please verify that everything is updated (belts and suspenders mode). Don’t forget system firmware: sudo fwupdtool get-updates or check the vendor’s site.

There are similar reports (booting to black screen) for AMD dGPU systems. Please run inxi -Fzxx so we can see hardware and driver details.