Unable to boot custom kernel compiled from SRPM (hangs at "job dev-disk-by.../start running")

I am trying to build a custom kernel based on 6.15.4 in order to try a specific 2-line patch that fixes a problem with amdgpu driver on my laptop. I downloaded the SRPM, and modified the spec file, and built RPMS with rpmbuild -ba --with=baseonly SPECS/kernel.spec

Process seems to work, and generated RPMs can be installed. However, when I try to boot, it hangs indefinitely on:

I’ve seen a couple of posts about similar problems (eg. this one), but that’s not my case (I don’t have resume=XXX on GRUB config)

I have a Nvidia dGPU, and the Nvidia driver akmod builds with no apparent errors. Kernel 6.15.3 installed from official RPMs boots just fine. Any idea about what could be wrong? Did I skip an important part of the process?

BTW I noticed when I ran dnf builddep for 6.15.4 SRPM that it pulled system-boot-unsigned package, which was not necessary for 6.15.3. Has anything changed from 6.15.3 to 6.15.4 on this regard? Secure boot is disabled on BIOS.

BTW building a custom 6.15.4 kernel from vanilla source (no RPMS) following these instructions boots just fine (without the Nvidia driver)

What disk on your system has that UUID?
How is it mounted in /etc/fstab?

I see X.509 certs failed to load. I wonder what that could have broken?

Was nvidia built against your new kernel?

Did you sign the new kernel? if not you will need to disable secure boot.

Thanks guys for jumping in.

@barryascott this is my main disk, mounted as root:

UUID=40d2311c-5ca9-4a6e-a0ad-86f341521078 /                       btrfs   subvol=root,compress=zstd:1 0 0

AFAICS every boot shows exactly two X.509 errors, so I am assuming they could be ignored. This is a snippet from a successful boot:

jun 30 18:45:00 kernel: integrity: Loading X.509 certificate: UEFI:db
jun 30 18:45:00 kernel: integrity: Loaded X.509 cert 'Acer Database: 84f00f5841571abd2cc11a8c26d5c9c8d2b6b0b5'
jun 30 18:45:00 kernel: integrity: Loading X.509 certificate: UEFI:db
jun 30 18:45:00 kernel: integrity: Loaded X.509 cert 'Microsoft Windows Production PCA 2011: a92902398e16c49778cd90f99e4f9ae17c55af53'
jun 30 18:45:00 kernel: integrity: Loading X.509 certificate: UEFI:db
jun 30 18:45:00 kernel: integrity: Loaded X.509 cert 'Microsoft Corporation UEFI CA 2011: 13adbf4309bd82709c8cd54f316ed522988a1bd4'
jun 30 18:45:00 kernel: integrity: Loading X.509 certificate: UEFI:db
jun 30 18:45:00 kernel: integrity: Problem loading X.509 certificate -65
jun 30 18:45:00 kernel: integrity: Error adding keys to platform keyring UEFI:db
jun 30 18:45:00 kernel: integrity: Loading X.509 certificate: UEFI:db
jun 30 18:45:00 kernel: integrity: Problem loading X.509 certificate -65
jun 30 18:45:00 kernel: integrity: Error adding keys to platform keyring UEFI:db
jun 30 18:45:00 kernel: integrity: Loading X.509 certificate: UEFI:db
jun 30 18:45:00 kernel: integrity: Loaded X.509 cert 'Linpus: linpus.com: 2e092cab5e97a89f94a6e272ec7267c267cf4483'
jun 30 18:45:00 kernel: Loading compiled-in module X.509 certificates
jun 30 18:45:00 kernel: Loaded X.509 cert 'Fedora kernel signing key: 2ea9d7411b4e3e72537d3959fd0fab69b3723451'

Also:

❯ journalctl --no-hostname -k | grep "certificate -65"
jun 30 18:45:00 kernel: integrity: Problem loading X.509 certificate -65
jun 30 18:45:00 kernel: integrity: Problem loading X.509 certificate -65

And, yes, I built nvidia modules like this: sudo akmods --akmod nvidia --rebuild --kernels 6.15.4.amd-200.fc42.x86_64

@leigh123linux I can’t tell for sure, I was hoping this would be handled automatically by the build process. How can I verify it is signed? Anyway, secure boot is disabled on the BIOS, so this shouldn’t matter (or should it?)

I built RPMs passing --with=baseonly to rpmbuild (I didn’t want to generate debug or rt RPMs), could this have had any side effect?

You have kernel lockdown enabled and it’s in integrity mode which also blocks un-signed kernel modules.

The lockdown=integrity parameter can be removed or changed to lockdown=none in the kernel command line.

sudo grubby --remove-args="lockdown=integrity" --update-kernel=ALL
sudo grubby --args="lockdown=none" --update-kernel=ALL

Thanks @leigh123linux you gave me a glimpse of hope. But, for some reason, it didn’t seem to cause any effect. I applied the lockdown=none to all kernels, and I even verified kernel command line on GRUB before booting, but I ended up on the same situation as the screenshot on post #1. Booting on standard 6.15.3-200 kernel shows that indeed “none” has been applied:

❯ cat /sys/kernel/security/lockdown 
[none] integrity confidentiality

~ 
❯ cat /proc/cmdline
BOOT_IMAGE=(hd1,gpt2)/vmlinuz-6.15.3-200.fc42.x86_64 root=UUID=40d2311c-5ca9-4a6e-a0ad-86f341521078 ro rootflags=subvol=root rhgb quiet rd.driver.blacklist=nouveau rd.driver.blacklist=nova-core lockdown=none

Can you think of anything else I could be doing wrong?

This is wrong, it should be

rd.driver.blacklist=nouveau,nova_core modprobe.blacklist=nouveau,nova_core
1 Like

Yes, building custom kernels with fedpkg using similar instructions works great for me too. That’s how you also build the kernel the “official” way.

I think to troubleshoot you’d need to compile the kernel that way without any mods, see if it works without any mods, and then modify/patch step-by-step, compare configs if something doesn’t work…

@steppybug I really don’t know… this has been configured automatically during Fedora installation AFAIK

@leigh123linux thanks! Already fixed with grubby :wink:

@soconfused makes sense. Should it boot if I just remove the *.ko files from /lib/modules/6.15.4.amd-200.fc42.x86_64/extra/nvidia/ ?

1 Like

I moved /lib/modules/6.15.4.amd-200.fc42.x86_64/extra/nvidia/ somewhere else, and rebooted. No luck, same problem, stuck again on the “job dev-disk-xxx” stage :disappointed_face: Boy, that’s frustrating…

When I build it from kernel.org tarball I can boot normally, so it is something related to the way it is done on Fedora. However, I need nvidia working in order to use an external monitor, so I would need to compile nvidia modules for this custom kernel, not sure how hard this is.

Just a side note: it seems there is no none option for the lockdown parameter. Boot log shows

jul 04 20:18:04 kernel: Malformed early option 'lockdown'

Indeed, kernel.org documentation only shows integrity and confidentiality as possible options. Anyway, that doesn’t seem to be the problem…