Fedora 38 kernel 6.4.4 hangs for over an hour on boot

I recently installed Fedora 38 on a Sager laptop with Intel processor and Nvidia RTX GPU. After upgrading, kernel 6.4.4 showed up in GRUB. When I select it, all I get on the screen is a single underscore. The screen is frozen that way until my computer goes to sleep. If I wake it up after an hour or so, then it’s on the login page.

How do I debug this? I’ve tried a number of suggestions from articles to get log text to show up during boot, but nothing has worked.

Did you see and follow the advice in Laptop fails to boot after update to kernel-6.4.4-200.fc38.x86_64 - #2 by kparal ?

You can also read logs in detail after booting.

Once the user is able to log in, the first thing I would do is run dmesg > dmesg.boot then peruse that file. If there is a long hang as indicated the times in the first column would show where that hang occurred.

journalctl -b > journal.boot would provide more information but the times there are daytimes and not time-since-power-on as is shown in dmesg so one would need to correlate the times to match up the entries.

One could also use something like systemd-analyze plot > boot.svg to obtain a graphical plot of the boot progress . That fiie could then be opened with a browser or similar to view timings during the boot.

There is very nice option to dmesg to show your the times in wall clock time:
dmesg --reltime

example output:

[Jul23 22:25] perf: interrupt took too long (2507 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
[Jul24 19:06] perf: interrupt took too long (3134 > 3133), lowering kernel.perf_event_max_sample_rate to 63000
[Jul27 12:13] systemd-journald[3864]: Data hash table of /var/log/journal/23861aed63d748da85011d84ee28e601/system.journal has a fill level at 75.0 (170043 of 226723 items, 67108864 file size, 394 bytes per hash table item), suggesting rotation.
[  +0.000881] systemd-journald[3864]: /var/log/journal/23861aed63d748da85011d84ee28e601/system.journal: Journal header limits reached or header out-of-date, rotating.

I have this alias in my .bashrc as I always want wall-clock timestamps

alias dmesg='dmesg --reltime'
1 Like

That would be a very good option to show the times in a way to correlate the dmesg output with journalctl output. :cowboy_hat_face:

dmesg > dmesg.boot yields some interesting areas in terms of time.

  1. It looks like something here is taking 20 mins:
[    5.256699] input: ImPS/2 Generic Wheel Mouse as /devices/platform/i8042/serio1/input/input6
[    5.348735] usb 3-10: New USB device found, idVendor=8087, idProduct=0033, bcdDevice= 0.00
[    5.348779] usb 3-10: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[ 1269.904992] ima: No architecture policies found
[ 1269.905103] evm: Initialising EVM extended attributes:
[ 1269.905106] evm: security.selinux

  1. Another 15 mins later the laptop goes to sleep. Once everything is asleep, this error is thrown repeatedly:
[ 2687.574437] tpm tpm0: tpm_try_transmit: send(): error -62

And what about the journalctl output?

Here’s the journal corresponding to 1269 in dmesg. Nothing shows up before that.

Jul 27 10:31:40 anubis systemd-journald[361]: Received SIGTERM from PID 1 (n/a).
Jul 27 10:31:40 anubis kernel: audit: type=1404 audit(1690453900.045:2): enforcing=1 old_enforcing=0 auid=4294967295 ses=4294967295 enabled=1 old-enabled=1 lsm=selinux res=1
Jul 27 10:31:40 anubis kernel: SELinux:  policy capability network_peer_controls=1
Jul 27 10:31:40 anubis kernel: SELinux:  policy capability open_perms=1
Jul 27 10:31:40 anubis kernel: SELinux:  policy capability extended_socket_class=1
Jul 27 10:31:40 anubis kernel: SELinux:  policy capability always_check_network=0
Jul 27 10:31:40 anubis kernel: SELinux:  policy capability cgroup_seclabel=1
Jul 27 10:31:40 anubis kernel: SELinux:  policy capability nnp_nosuid_transition=1
Jul 27 10:31:40 anubis kernel: SELinux:  policy capability genfs_seclabel_symlinks=1
Jul 27 10:31:40 anubis kernel: SELinux:  policy capability ioctl_skip_cloexec=0
Jul 27 10:31:40 anubis kernel: audit: type=1403 audit(1690453900.091:3): auid=4294967295 ses=4294967295 lsm=selinux res=1
Jul 27 10:31:40 anubis systemd[1]: Successfully loaded SELinux policy in 46.726ms.
Jul 27 10:31:40 anubis systemd[1]: RTC configured in localtime, applying delta of -360 minutes to system time.
Jul 27 10:31:40 anubis systemd[1]: Relabelled /dev, /dev/shm, /run, /sys/fs/cgroup in 20.552ms.
Jul 27 10:31:40 anubis systemd[1]: systemd 253.7-1.fc38 running in system mode (+PAM +AUDIT +SELINUX -APPARMOR +IMA +SMACK +SECCOMP -GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN -IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 +PWQUALITY +P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD +BPF_FRAMEWORK +XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified)
Jul 27 10:31:40 anubis systemd[1]: Detected architecture x86-64.
Jul 27 10:31:40 anubis systemd[1]: Hostname set to <anubis>.
Jul 27 10:31:40 anubis systemd[1]: bpf-lsm: LSM BPF program attached
Jul 27 10:31:40 anubis (sd-execu[657]: /usr/lib/systemd/system-generators/ostree-system-generator failed with exit status 1.
Jul 27 10:31:40 anubis kernel: zram: Added device: zram0
Jul 27 10:31:40 anubis systemd[1]: initrd-switch-root.service: Deactivated successfully.

Jeff asked you to do three things to help debug thus.
Now do the third please!

You journal extra t is too small i would need to a lit more before the system fot going again. That third command will help.

Linux Hardware Database entry forUSB Device 8087:0033 says the device is supported
“in-kernel” and has 1440 scans. Quick check for Fedora 38 entries shows many with “detected” status, but none with “works” status.

Are you using ultrablue?

I’m not using ultrablue, no. Just generic fedora 38 workstation

The third thing didn’t work (svg won’t open). Maybe I’ve had the PC on too long. I’ll try again next time I reboot (which I’ve been avoiding bc it takes so long).

Linux doens’t fall apart when running for long periods. Do you get an error popup? What do get if you run inkscape <svg_filename>? Use journalctl to look for errors related the svg problem.

I didn’t say linux falls apart. I was suggesting that it didn’t load because the system has been on for a long time so the image is probably enormous. I opened it in Inkscape and it’s like 100,000 pixels across, so I’m going to try again after reboot.

I use chrome, and yes the svg it is wide, but chrome allows panning across the image as well as up&down.
If you were to post that svg image then we might also be able to view it and make suggestions.

You should be able to kill the process without rebooting. There are many linux tools that will show you when a process is using excessive resources (CPU time, RAM). You should try to provide enough detail to allow others to reproduce a problem.