Kernel update botched my system

Hi, how are you all?

Yesterday I updated my Fedora 42 KDE plasma pc through sudo dnf upgrade command which contains kernel updates too. Once I rebooted my PC to complete the updates, my system got BOTCHED!

While starting my PC, the GRUB menu shows 4 boot entries -

sudo ls -1 /boot/loader/entries/

> **************-0-rescue.conf
> **************-6.14.5-300.fc42.x86_64.conf
> **************-6.14.6-300.fc42.x86_64.conf
> **************-6.14.8-300.fc42.x86_64.conf

************** - Redacted for privacy!

(I am currently running my PC by booting it from (6.14.6-300.fc42.x86_64) kernel.)

These are installed kernels in my pc -

dnf list kernel

Installed packages
kernel.x86_64 6.14.5-300.fc42 <unknown>
kernel.x86_64 6.14.6-300.fc42 updates
kernel.x86_64 6.14.8-300.fc42 updates

My PC is running fine under the (6.14.6-300.fc42.x86_64) kernel version. Now I want to know what to do next. Should I uninstall the latest kernel?

2 Likes

You do not need to uninstall the bad kernel, you can set the working kernel as your default in grub using the grubby command.

dnf will never uninstall the running kernel.
So you could allow new kernels to be installed so you can test them.

Are you able to collect any information on the new kernel failure?
Usually you will want to edit in grub to remove rhgb and quiet options so all messages are easier to see.

1 Like

“Botched” is vague. Did the system boot to the login screen, black screen, or something else?. If the kernel booted as far as the login screen, journalctl should have relevant details from prior boots.

  • In a terminal run journalctl --no-hostname -b -p 3 to see “priority 1,2,&3” errors for the current boot, then
  • Generate a list of boots with dates and times: run journalctl --list-boots.
  • Select a date and time of the issue and use the corresponding negative number , in journalctl --no-hostname -b <N> -p 3 to see errors for the “botched” boot. You can usually ignore error present in the current (working) boot.

Post the errors associated with the “botched” boot. If the lines are too long for your screen, run the command again with | cat added at the end of the line to wrap the long lines so you can post complete messages.

This may help other linux users (even those running other distros) who encounter a similar “botch”.

1 Like

Thank you for your response!

When booting the PC from the latest kernel (kernel.x86_64 6.14.8-300.fc42), the PC gets stuck in the boot splash screen itself. I can’t reach the login screen.

In this situation, how can I solve the issue with the latest kernel?
and also sorry for using the word “Botched” :blush:

This is not my image, just to show the situation.
boot-splash-screen

What type GPU are you using?
If it is nvidia and you are using the nvidia drivers from rpmfusion then it usually is mandatory that the user wait several minutes for the new drivers to be compiled and installed before the system can properly complete a boot.

If that is the case then the command sudo akmods --rebuild --force --kernels 6.14.8-300.fc42.x86_64 should properly rebuild that module and allow the system to boot properly with that kernel

1 Like

This is a more useful description of your issue.

You can use a working kernel to see if there are entries from the boot that is stuck in the splash screen.
If not, you may be able to see error messages by pressing the <Esc> key when booting the problem kernel. If that doesn’t show the issue, disable the graphical display so you can see text status messages by using the grub2 editor (Press e at the grub2 menu, then delete the rhgb quiet from the kernel command-line , and add <space>nomodeset at the end of the command line. Then boot using one of the methods displayed on the editor screen).

1 Like

Thank you!
I am using nvidia drivers from rpmfusion.
let me try your method.

Yeah, I just wrote the word “Botched” instead of any useful description in flow!
I will take care of it in future.

I will update you soon

You can type ESC to see messages.
What are the last few messages?

Every services goes as OK,OK,OK,OK in green color.

Saying they are ok does not help us to help you.

Please share what the last few messages are, that will tell us how far the boot process has got. From that we maybe able to guess what maybe wrong.

I ran the below command and the nvidia driver successfully builded.

Then I did this: the PC got stuck on some service called “Plymouth” for more than 2-3 minutes. after the system booted successfully and was running fine on the latest kernel. (kernel.x86_64 6.14.8-300.fc42)

The system analyze command shows this -

systemd-analyze

Startup finished in 12.023s (firmware) + 2min 35.545s (loader) + 1.854s (kernel) + 7.584s (initrd) + 4min 35.675s (userspace) = 7min 32.684s 
graphical.target reached after 4min 35.650s in userspace.

Is there a way to solve this?

UPADTE - I rebooted my pc again and the time reduced drastically!
systemd-analyze

Startup finished in 7.932s (firmware) + 11.046s (loader) + 1.850s (kernel) + 5.573s (initrd) + 40.398s (userspace) = 1min 6.801s 
graphical.target reached after 40.376s in userspace.
2 Likes

It sounds like it actually was caused by the nvidia driver issue which often occurs when a user reboots too quickly after an update.

2 Likes

Yeah, do I need to force rebuild the NVIDIA driver every time when new kernel updates come in? Is there any way to automate this process?

Now how can I remove the extra boot entries on the GRUB menu and put the latest kernel by default?

It is automated, but unless (as I always do) the rhbg quiet has been removed from the kernel command-line, you don’t see any indication that the Nvidia driver is being built. I think the rhgb quiet is for cubicle farms where IT manages updates and users would be confused by the text messages.

Grub is configured to save 2 older kernels and the “rescue” kernel. The older kernels are useful when the newest kernel fails to boot. Rescue kernels are bigger because they include drivers for the full range of supported hardware. That can be important if you suffer a hardware failure and need to boot the old system on newer hardware. Mass storage space is cheap, but you can’t buy time, so it usually makes sense to spend space on things that may save time in the event of problems.

1 Like

You can first check the file /var/cache/akmods/*/.last.log. It would help us to determine why or if building the akmod module failed.

But force rebuilding the akmod modules should fix it.

2 Likes

Thank you to everyone who tried to help me! I have finally solved my issue and learned a lot!

Special thanks to @gnwiii and @computersavvy because their posts helped a lot to solve the issue. Sorry if I didn’t mention your name; you are also great! :smiling_face_with_three_hearts:

2 Likes

No, the rebuild is automatic, but does require a delay (as much as 5 to 10 minutes for some systems) between the completion of the update and the next reboot so the driver rebuild has time to successfully complete.

1 Like

The RPMFusion docs show how you can check that the rebuild is complete:

Once the module is built, modinfo -F version nvidia should outputs the version of the driver such as 440.64 and not modinfo: ERROR: Module nvidia not found.

1 Like

Also check the file /var/cache/akmods/nvidia/.last.log. It would show if the modules are built and also if the module was installed.

1 Like