Can't boot after upgrading kernel to 6.2.15-300

With Fedora 38, after upgrading kernel from 6.2.14 to 6.2.15, on power-on I get the disk encryption unlock prompt as usual, but it doesn’t recognize the password. After 3 attempts I get the usual splash screen spinning but nothing happens. I force restart with ctrl+alt+del which brings me to the grub menu, where I can choose the older kernel and it works fine.

I tried to view a log using journalctl -b -1 | grep 6.2.15 but it shows nothing.

In the meantime I managed to configure grub to load 6.2.14 by default. My question is - what should I do next? Should I just wait for the next kernel update and hope it solves it? How do I prevent the working kernel from being deleted in a future update until things are fixed?

2 Likes

On one hand, the kernel does not necessarily log its version number. Off my mind, I know this only from kernel bugs which the kernel could identify itself. Otherwise, the kernel only logs errors with regards to “kernel” but not with the version number. This is not necessary because there is always only one kernel running. So if the boot -1 was 6.2.15, you can grep for kernel to get all errors of the 6.2.15 kernel from during that boot.

However, on the other hand, if I got you right, your disk encryption has to be unlocked immediately after the grub menu, which means before the actual boot of the operating system: if the root partition is itself encrypted, the errors of that moment cannot be logged in journalctl because these are stored in /var/ (even if /var/ is separated, the information which paritition is /var/ is stored on the root partition).


I have currently not much time so that I cannot say if I can keep helping you here until he issue is solved (if anyone has ideas and time, feel free to take over). But the first thing I have in mind is that for some reason, the keyboard has been reset to US. Here I am assuming that you use maybe a different keyboard layout? If so, try to enter your password based upon the US keyboard layout. E.g., what is § on a European keyboard is # on a US keyboard. You can find the keys on the Internet (maybe this is helpful: QWERTY - Wikipedia )

Besides that, once you have chosen the 6.2.15 kernel (try 6.2.15 again for this test), do nothing more. Just wait for the splash password prompt to appear. Then, press ESC to see all information from the terminal. You can now try your password, but if it does not work, you will get more information on that screen and then a new prompt to try again.

This will look like …

[ ok ] some information
[ ok ] some information
Please enter password for 219e9c5c-d189-854f-8c76-d73a53987043:
[error] some error
Please enter password for 219e9c5c-d189-854f-8c76-d73a53987043:

I have written the above off the cuff so do not take it literally. It is just that you have an impression what type of screen we are seeking here.

If you got the related screen and if the actual password does not work, provide a screenshot (maybe with a camera or so) of that screen. Relevant is what is output before the first password prompt appears but also what is output after the failed password attempt and before it is asked for a second try. Maybe that information offers some indication for the next supporter.

Also, if that is ok, feel free to add the logs of a journalctl --boot=-1 - if I assumed your disk encryption right (in terms of the root partition being encrypted), it is likely that this will only output the boots of 6.2.14. However, they could still be indicative. Feel free to anonymize data you consider private (e.g., MAC addresses). IMPORTANT: At the best, you only provide the link to the log file. If you need to paste the logs here for some reason, do always set them in brackets (mark all log text and then click the </> button)

Keeping old kernels too long should be avoided. This is at the best to get some time for fixing and/or to get over a buggy kernel, but not more/longer. Let’s focus on fixing the issue :slight_smile: However, what you elaborate sounds worth a bug report if the mentioned logs/information do not indicate something else and if the issue remains in the 6.3.X kernels.

However, if you know FOR SURE that you have no XFS file system, you can try if 6.3.3 works for you: https://bodhi.fedoraproject.org/updates/FEDORA-2023-514965dd8a - but please try that only if you are 100% sure that you have no XFS file system. If no XFS file system is involved, the 6.3.3 kernel can be considered stable (it has been not pushed to stable only because of an XFS issue that is not yet investigated).

Thanks alot for the reply!

I’m using the US layout also on working boot, so I’m not sure that is the problem…
Is there a way for me to make the terminal show what I type instead of asterisks?

This is what the failed boot looks like:

According to the timestamps this only gives the log of the working kernel and didn’t log the failed boot at all…

Well I don’t know what’s an XFS file system, so can’t tell either way :sweat_smile:

Given your screenshot, we can already exclude my keyboard-layout assumption.

In that case, we want to be safe and assume there could be an XFS file system on your system :wink:

The idea was to see if the working kernel contains also some errors that might be indicative for why the later kernel completely strikes. That was relevant for the case that your screenshot does not contain relevant information. However, your screenshot contains indeed relevant information that indicate the next steps.

If 6.2.14 always works and if 6.2.15 always creates this issue while you do not impose further changes except your menu choice in grub (this is how the situation is, correct?), then I expect there is an issue in the kernel that conflicts with cryptsetup, which is responsible to decrypt your root partition and to provide a mapped device that can be mounted as root partition. Obviously, this stage does not work, and so the kernel cannot proceed.

First, let’s see if cryptsetup works properly on 6.2.14: please boot your system with the working kernel and provide the output of both
sudo systemctl status cryptsetu*
and
sudo systemctl status systemd-cryptsetup@luks\\x2d0ce2888*

If all is fine on 6.2.14 with cryptsetup, I suggest to wait for 6.3.4 and see if it solves the issue. Otherwise, it will be necessary to get deeper into the issue and file a bug.

6.3.4 is already on its way. The first tests already indicated that the xfs issue was solved, and there is hope that the new kernel will end up in your daily updates in the coming week :wink: But I suggest to wait until it is stable and not install it in advance to be sure the issue is really gone (in case you thought about that).

You can see the condition of the new kernel’s testing here: https://bodhi.fedoraproject.org/updates/?packages=kernel → once the update “kernel-6.3.4-201.fc38, kernel-headers-6.3.3-200.fc38, & 1 more” is marked as “stable”, you will find it in your next dnf update (if you then don’t wanna wait for the next auto-refresh, do for once dnf update --refresh :wink:

The kernel is on “pending → testing” and will pass on its way to stable “testing” and “testing → stable” before finally becoming “stable” (unless we experience issues).

Let’s hope the issue is only occurring in 6.2.15.

Here are the outputs:

$ sudo systemctl status cryptsetu*
● cryptsetup.target - Local Encrypted Volumes
     Loaded: loaded (/usr/lib/systemd/system/cryptsetup.target; static)
     Active: active since Sun 2023-05-28 11:20:08 IDT; 11h ago
       Docs: man:systemd.special(7)

$ sudo systemctl status systemd-cryptsetup@luks\\x2d0ce2888*
● systemd-cryptsetup@luks\x2d0ce2888c\x2d293c\x2d4e31\x2da567\x2d5af7a0889e9a.service>
     Loaded: loaded (/etc/crypttab; generated)
    Drop-In: /usr/lib/systemd/system/service.d
             └─10-timeout-abort.conf
     Active: active (exited) since Sun 2023-05-28 11:20:07 IDT; 11h ago
       Docs: man:crypttab(5)
             man:systemd-cryptsetup-generator(8)
             man:systemd-cryptsetup@.service(8)
   Main PID: 540 (code=exited, status=0/SUCCESS)
        CPU: 7.318s

May 28 11:19:18 ur-latitude systemd[1]: Starting systemd-cryptsetup@luks\x2d0ce2888c\>
May 28 11:20:05 ur-latitude systemd-cryptsetup[540]: Set cipher aes, mode xts-plain64>
May 28 11:20:07 ur-latitude systemd[1]: Finished systemd-cryptsetup@luks\x2d0ce2888c\>

That looks good. cryptsetup seems to work as intended with 6.2.14 on
your system.

The issue seems to be only in the kernel. So I suggest to wait for the
next kernel update and remain with 6.2.14 until then.

If it works for you, 6.2.14 is ok for some more days as it does not
contain time-critical issues that need immediate replacement or so (of
course this is only an interim solution).

However, once your updates bring a new kernel, try again to get rid of
the obsoleted kernel.

If the problem persists with the next kernel, let us know here.

If the problem is then solved, it would be nice if you let us know that
as well so that we know that (and how) this issue was solved.

There has been talk recently about some keyboards being treated as US/ENGLISH during the entry of the luks password, which seems to interfere with the expected password.

This is one example

I don’t know if it applies here, but it seems possible.

I think it is a different one. The related errors (at least those I saw)
always lead the system to go back to the default US layout (I saw that
also in sddm recently but with another cause). But this user works by
default with the US layout. Also, the error has not been introduced by a
setup but by a kernel and can be bypassed the same way. This means the
origin can only be the kernel itself or the generation of its related
files in /boot. In both cases evaluation can take a lot of time, which
it is worth only if the problem persists in future kernels that are
already in testing (any bug report would anyway be initially answered by
the question if it persists in 6.3.X).

However, Jeff brought me to an idea: in case there was something wrong
in the generation of the files, you might try to re-install 6.2.15 with
sudo dnf reinstall kerne*. I don’t think it will solve the issue but
as it is quick and easy, it could be worth a try. However, be aware that
this can reset the changes you made to your grub boot menu manually in
order to boot 6.2.14 by default. So you would need to make 6.2.14 the
default again manually if 6.2.15 still does not work. In any case, it
would also be ok to just wait for 6.3.X and see if that works.

Thanks a lot! I don’t mind waiting :slightly_smiling_face:

I just want to know - how can I make sure that the upgrade to the next kernel won’t automatically delete my working kernel until I verify the problem is solved? As I understand it does automatically delete old kernels up to a certain point.

The system will keep your recent 3 kernels. This means after the next update, you will still have 6.2.14, while it will remove 6.2.13 (at the moment, you still have 6.2.13, correct?).

If 6.3.X solves the issue and if you still have 6.2.13, there is no need to adjust.

If the issue persists, you can remove each time a kernel that doesn’t work BEFORE you update. Since you will have then only 2 kernels installed during the update, it will not remove another one.

You can see the installed kernels and their versions by sudo dnf list installed kernel and all kernel packages with sudo dnf list installed kernel* . Pick one you do not need (e.g., 6.3.3) and remove it by, e.g., sudo dnf remove kernel-core-6.3.3-200.fc38 kernel-tools-6.3.3-200.fc38 while core ensures that all related packages are contained as well, except kernel-tools that you have to remove separately IF you have installed it, which is not always the case, so don’t wonder if the kernel-tools package is not found (be aware that you cannot remove the kernel that you currently use: you can check with uname -r).

Alternatively, you can just change the option installonly_limit=3 in /etc/dnf/dnf.conf by increasing the number to as many kernels as you want to keep (be aware that one kernel and the files that come with it can take up to 200MB). Once the issue is solved, you can change the number back to 3.

Thanks for the detailed response! I currently have installed

6.2.14-200.fc37                         
6.2.14-300.fc38     
6.2.15-300.fc38   

so I guess I can either remove the non-working 6.2.15 or the older 6.2.14-200.fc37.
But I think I’ll just do what you suggested last and save 4 kernels until the next update, and then choose what to remove.

That is indeed an interesting situation. I assume that dnf will remove only the fc37 kernel since it is older than fc38. But since both are 6.2.14, I cannot guarantee this unless someone tested it explicitly (I never thought about it after upgrading tbh).

So I indeed suggest to increase the number of kernels that are kept by dnf, or remove one kernel in advance to the next update. Just to be sure.

Done :slightly_smiling_face: thanks a lot and I’ll update when the next kernel arrives.

So, this is a bit embarrassing, but turns out it IS a layout issue!
I recently configured the fedora login screen to use my alternate layout (workman), but the encryption password under kernel 6.14 wasn’t affected and stayed with QWERTY, so I assumed it reads some other configuration file and left it.
Now after the kernel upgrade, the encryption password also switched to my workman layout, even though it is not indicated in any way (it is still a US layout…).
I could swear I checked this before writing here, but perhaps I misstyped a key or something :person_facepalming:
So sorry for the hassle and thanks, all’s well now!