I’ve been using Linux for years, but this is my first time taking a bad kernel update. I guess it was about time…
I have 5.18.9 and 5.18.10 installed on my system, both of which still work fine, but 5.18.11 will not boot. First attempt gave me several “soft lockup” errors:
How do I report this to the kernel maintainers? ABRT says there isn’t enough detail in the logs for either failure. And if I’m going to reach out via email, I have no idea who to approach—how do I know what module is causing these oops-es?
How do I get 5.18.11 off my system? Mashing shift and arrowing thru Grub is getting old fast.
sudo dnf downgrade kernel unhelpfully offers to remove 5.18.9 (wrong one) and install 5.17.5 (way old). This leaves the broken version installed.
sudo dnf rm kernel*5.18.11* also removes akmod-nvidia and kmod-nvidia, which depend on 5.18.11 specifically for some reason. I do not want to uninstall my graphics drivers. Besides, I thought the whole point of akmods was that they worked with multiple kernel versions, no?
Yes, I can boot from an older kernel no problem. That’s how I managed to write the original post
I’ll take a look at Red Hat’s bugzilla after work today. Thanks for the recommendation, that wasn’t the first place to come to mind. The crash is 100% reproducible so far, so it should be easy to re-test if they need more data, I hope.
Still looking for suggestions about the package management thing. One idea I had: I could force install nvidia without the latest kernel (breaking dependencies). Really, the akmod should work with another kernel version installed. But that sounds a little dicey, and I don’t want to make the situation worse, so I haven’t tried it yet.
You can boot to the 5.18.10 kernel then do sudo dnf remove kernel*5.18.11* and accept the removals. You may also add the ‘–noautoremove’ option to see if it still wants to remove akmod-nvidia. The kmod-nvidia package related to that kernel must be removed, but akmod-nvidia should not be a forced removal.
Even it you remove the akmod-nvidia package it is simple to reinstall it. Already loaded and operating kernel modules should not be affected by a remove and reinstall. I have done so several times.
edit:
The desire to remove the akmod-nvidia package may be related to removing the kernel-devel or the kernel-devel-matched package for the newest kernel. If so then allow it and reinstall akmod-nvidia while booted to the older kernel. My system only has the kernel-devel-matched package for the latest installed kernel so you might need to install that one for the older kernel as well.
I have an nvidia GPU but I have no problem since it isn’t much better than the iGPU so I have disabled it completely.
Nvidia drivers on Linux can be annoying sometimes.
Edit: try to disable your nvidia drivers temporarily and check if something changes.
Sorry I was just stating that on systems with nvidia hardware but no nvidia driver it isn’t a problem. I was not suggesting him to completely stop using his nvidia GPU, but maybe uninstall the driver for a short while and try again to check if something changes. I should have phrased it better, thanks for the remark
Lots of activity here! Thanks for the attention everyone.
That’s what I expected too, but it doesn’t match dnf’s behavior:
Vanilla rm
$ sudo dnf remove kernel*5.18.11* <1>
Dependencies resolved.
=======================================================================================================================
Package Arch Version Repository Size
=======================================================================================================================
Removing:
kernel x86_64 5.18.11-200.fc36 @updates 0
kernel-core x86_64 5.18.11-200.fc36 @updates 92 M
kernel-debug x86_64 5.18.11-200.fc36 @updates 0
kernel-debug-core x86_64 5.18.11-200.fc36 @updates 96 M
kernel-debug-devel x86_64 5.18.11-200.fc36 @updates 64 M
kernel-debug-devel-matched x86_64 5.18.11-200.fc36 @updates 0
kernel-debug-modules x86_64 5.18.11-200.fc36 @updates 57 M
kernel-devel x86_64 5.18.11-200.fc36 @updates 63 M
kernel-devel-matched x86_64 5.18.11-200.fc36 @updates 0
kernel-modules x86_64 5.18.11-200.fc36 @updates 56 M
kernel-modules-extra x86_64 5.18.11-200.fc36 @updates 3.3 M
Removing dependent packages:
akmod-nvidia x86_64 3:515.57-1.fc36 @rpmfusion-nonfree-nvidia-driver 23 k
kmod-nvidia x86_64 3:515.57-1.fc36 @rpmfusion-nonfree-nvidia-driver 0
kmod-nvidia-5.18.11-200.fc36.x86_64 x86_64 3:515.57-1.fc36 @@commandline 29 M
Removing unused dependencies:
akmods noarch 0.5.7-8.fc36 @updates 47 k
kmodtool noarch 1.1-3.fc36 @fedora 28 k
xorg-x11-drv-nvidia-kmodsrc x86_64 3:515.57-1.fc36 @rpmfusion-nonfree-nvidia-driver 32 M
Transaction Summary
=======================================================================================================================
Remove 17 Packages
Freed space: 493 M
Is this ok [y/N]:
With --noautoremove
$ sudo dnf remove --noautoremove kernel*5.18.11* <2>
Dependencies resolved.
=======================================================================================================================
Package Arch Version Repository Size
=======================================================================================================================
Removing:
kernel x86_64 5.18.11-200.fc36 @updates 0
kernel-core x86_64 5.18.11-200.fc36 @updates 92 M
kernel-debug x86_64 5.18.11-200.fc36 @updates 0
kernel-debug-core x86_64 5.18.11-200.fc36 @updates 96 M
kernel-debug-devel x86_64 5.18.11-200.fc36 @updates 64 M
kernel-debug-devel-matched x86_64 5.18.11-200.fc36 @updates 0
kernel-debug-modules x86_64 5.18.11-200.fc36 @updates 57 M
kernel-devel x86_64 5.18.11-200.fc36 @updates 63 M
kernel-devel-matched x86_64 5.18.11-200.fc36 @updates 0
kernel-modules x86_64 5.18.11-200.fc36 @updates 56 M
kernel-modules-extra x86_64 5.18.11-200.fc36 @updates 3.3 M
Removing dependent packages:
akmod-nvidia x86_64 3:515.57-1.fc36 @rpmfusion-nonfree-nvidia-driver 23 k
akmods noarch 0.5.7-8.fc36 @updates 47 k
kmod-nvidia x86_64 3:515.57-1.fc36 @rpmfusion-nonfree-nvidia-driver 0
kmod-nvidia-5.18.11-200.fc36.x86_64 x86_64 3:515.57-1.fc36 @@commandline 29 M
Transaction Summary
=======================================================================================================================
Remove 15 Packages
Freed space: 461 M
Is this ok [y/N]:
In either case, akmod-nvidia gets hit. It’s not an unused dependency, it’s a dependent package.
I’m noticing now that the whole akmods package gets removed too—I don’t think that should be happening, and it’s probably the direct reason that nvidia-akmod gets removed. I think the issue might be that the kernel(-debug)-devel-matched dependency is provided by the 5.18.11 kernel specifically. Really, there should be plenty of kernels that can provide this, right?
$ dnf deplist akmods
Last metadata expiration check: 0:03:38 ago on Wed 20 Jul 2022 05:58:31 PM EDT.
package: akmods-0.5.7-7.fc36.noarch
dependency: (kernel-debug-devel-matched if kernel-debug-core)
provider: kernel-debug-devel-matched-5.18.11-200.fc36.x86_64
dependency: (kernel-devel-matched if kernel-core)
provider: kernel-devel-matched-5.18.11-200.fc36.x86_64
dependency: (kernel-lpae-devel-matched if kernel-lpae-core)
dependency: /bin/sh
provider: bash-5.1.16-2.fc36.x86_64
dependency: /usr/bin/bash
provider: bash-5.1.16-2.fc36.x86_64
# etc...
I actually tried this last night. Uninstall everything 5.18.11, and nvidia gets removed along with. Installing nvidia-akmod after that just pulls 5.18.11 back in, as dependencies. Maybe if I rebooted after uninstalling? Kind of forgot that nouveau exists for a minute there…
I agree!
Good idea, I’ll give this a shot too. I don’t know if the nvidia kernel module actually caused the errors, but nvidia is too problematic not to check it.
I replaced the nouveau blacklist lines in the grub cmdline with nvidia to temporarily switch drivers. I tested it first on 5.18.10 (works as expected), then 5.18.11 (boots without error!). Then I booted 5.18.11 with nvidia like normal and…it boots fine? The “C” in NVIDIA is for “consistency”, I guess.
If it sticks, I guess I’ll mark your post as the solution! Maybe the crash isn’t as reproducible as it seemed at first.
I’m still curious if anyone knows a straightforward way to blacklist one version of the kernel without upsetting akmods and friends. Might come in useful down the line, once in a while.
Install akmod-nvidia (and nothing else) It will pull in akmods and all the other packages needed to build the drivers, including kernel-devel-matched.
I noted that you have the below which may be part of your problem. The one from the @@commandline is built by the akmod-nvidia package to match your installed kernel and the other is downloaded from the repo. They may conflict.
BTW, you probably do not need any of the kernel-debug* packages, so unless you are doing something that is directly related to kernel development where debugging is necessary you could remove all those for all the installed kernels.