The system randomly freezes and/or restarts without any warning

Hi!

Context:

The title says it all. The system freezes completely, audio cuts, mouse cursors do not move, and even the Capslock key does not light up when pressed. I’m unable to access the TTY, and the only way to get it working is to reboot the entire machine. It doesn’t matter whether I’m running the system as live media or just TTY. The system will freeze within 30 minutes of the boot anyway.

I’ve been daily driving this PC with Linux (mainly Kubuntu) for roughly 2 years now, and it has been a huge PITA. I’ve never managed to fully fix all the issues the system had. About 4 weeks ago, I started to observe this freezing and rebooting behavior, which was quite rare at first, but then it started occurring regularly, and that was about 2 weeks ago.

This is when I decided to switch from Kubuntu to Mint, but the issue was not resolved at all. In fact, after a few days, it got so bad that I decided to try a completely different flavor of Linux, and that’s how I ended up with Nobara. I’ve also installed Windows 10 on a second drive (just to see if it works and rule out potential hardware damage), and I’m currently dualbooting it alongside Nobara.

However, even the installation of Nobara was crashing, so I had to install it through the troubleshooting menu, using some kind of “simplified mode”. Windows, on the other hand, was installed without any problems. So I kept testing the stability of both systems for 2 weeks, and Windows is running well enough, while Nobara is not running at all (reboots or crashes after a few minutes).

Unfortunately, the logs are useless. Every time I encountered a crash (starting from when it first happened), I was looking through logs, but they all seemed unrelated, and the most recent logs were always 2–5 minutes before the crash, so they weren’t much of a help.

Technical details of the device:

  • Laptop: Asus Rog Strix G15 2022
  • CPU: AMD Ryzen 9 5980 HX
  • GPU: AMD Radeon RX 6800M + integrated GPU
  • RAM: 32 GB
  • OS: fresh install of Nobara 39
  • Kernel: 6.7
  • DE: vanilla KDE

Additional observations:

  • Crashes do not seem to be connected to the current load on the system.
  • The temperature seems to be within the norm (40–80 °C), I don’t think it’s related.
  • I’ve got multiple monitors plugged in; removing them does not solve the issue.
  • At the time of writing this post, I’m no longer able to log into the system using the GUI or TTY. After typing in the password, the screen goes black for a second or two and then goes back to the logging prompt. I can’t enter TTY, as it freezes a few seconds after logging in.
  • Recovery mode does not work.

What I’ve tried:

  • updating all of the drivers
  • updating BIOS
  • running memtests/ system checks, no issues were found.
  • disabling SWAP
  • Reinstalling OS dozens of times
  • I’ve done some googling, and found a guy who has a similar problem:
    Ubuntu 20.04.3 LTS very often freezes randomly and does not recover without force shutdown - Ask Ubuntu - Unfortunately, the solutions posted in the answer weren’t very helpful. One of them suggested altering the CPU configuration in the BIOS. I tried that too, but my BIOS does not allow for any form of such configuration.

Logs:

  • I wanted to provide at least journalctl, but TTY freezes before I’m able to type in the command… For now, no logs, but I’ll try to get them.

If you have ever encountered such a problem, please let me know how you resolved it.

Edit: I managed to log in using a Wayland session and gathered some logs:

journalctl -b --priority=3

may 19 21:13:53 smc kernel: ucsi_acpi USBC000:00: unknown error 5128
may 19 21:13:53 smc kernel: ucsi_acpi USBC000:00: UCSI_GET_PDOS failed (-5)
may 19 21:13:53 smc kernel: ucsi_acpi USBC000:00: unknown error 4104
may 19 21:13:53 smc kernel: ucsi_acpi USBC000:00: UCSI_GET_PDOS failed (-5)
may 19 21:13:54 smc kernel: ucsi_acpi USBC000:00: unknown error 4104
may 19 21:13:54 smc kernel: ucsi_acpi USBC000:00: UCSI_GET_PDOS failed (-5)
may 19 21:14:09 smc sddm-helper[2536]: gkr-pam: unable to locate daemon control file
may 19 21:14:10 smc sddm-helper-start-x11user[2589]: Failed to read display number from pipe
may 19 21:14:23 smc sddm-helper[3039]: gkr-pam: unable to locate daemon control file
may 19 21:14:25 smc systemd-coredump[3442]: Process 3436 (smartctl) of user 0 dumped core.
                                            
                                            Module libpcre2-8.so.0 from rpm pcre2-10.42-1.fc39.2.x86_64
                                            Module libselinux.so.1 from rpm libselinux-3.5-5.fc39.x86_64
                                            Stack trace of thread 3436:
                                            #0  0x00007f33b03a9834 __pthread_kill_implementation (libc.so.6 + 0x90834)
                                            #1  0x00007f33b03578ee raise (libc.so.6 + 0x3e8ee)
                                            #2  0x00007f33b033f8ff abort (libc.so.6 + 0x268ff)
                                            #3  0x00007f33b03407d0 __libc_message.cold (libc.so.6 + 0x277d0)
                                            #4  0x00007f33b03b37a5 malloc_printerr (libc.so.6 + 0x9a7a5)
                                            #5  0x00007f33b03b5a9c _int_free (libc.so.6 + 0x9ca9c)
                                            #6  0x00007f33b03b83de free (libc.so.6 + 0x9f3de)
                                            #7  0x00005594584120a1 _ZN14drive_databaseD1Ev (smartctl + 0x640a1)
                                            #8  0x00007f33b0359fd6 __run_exit_handlers (libc.so.6 + 0x40fd6)
                                            #9  0x00007f33b035a11e exit (libc.so.6 + 0x4111e)
                                            #10 0x00007f33b0341151 __libc_start_call_main (libc.so.6 + 0x28151)
                                            #11 0x00007f33b034120b __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2820b)
                                            #12 0x00005594583debc5 _start (smartctl + 0x30bc5)
                                            ELF object binary architecture: AMD x86-64
may 19 21:14:25 smc systemd-coredump[3441]: Process 3437 (smartctl) of user 0 dumped core.
                                            
                                            Module libpcre2-8.so.0 from rpm pcre2-10.42-1.fc39.2.x86_64
                                            Module libselinux.so.1 from rpm libselinux-3.5-5.fc39.x86_64
                                            Stack trace of thread 3437:
                                            #0  0x00007fdd2568a834 __pthread_kill_implementation (libc.so.6 + 0x90834)
                                            #1  0x00007fdd256388ee raise (libc.so.6 + 0x3e8ee)
                                            #2  0x00007fdd256208ff abort (libc.so.6 + 0x268ff)
                                            #3  0x00007fdd256217d0 __libc_message.cold (libc.so.6 + 0x277d0)
                                            #4  0x00007fdd256947a5 malloc_printerr (libc.so.6 + 0x9a7a5)
                                            #5  0x00007fdd25696a9c _int_free (libc.so.6 + 0x9ca9c)
                                            #6  0x00007fdd256993de free (libc.so.6 + 0x9f3de)
                                            #7  0x000056048f9e10a1 _ZN14drive_databaseD1Ev (smartctl + 0x640a1)
                                            #8  0x00007fdd2563afd6 __run_exit_handlers (libc.so.6 + 0x40fd6)
                                            #9  0x00007fdd2563b11e exit (libc.so.6 + 0x4111e)
                                            #10 0x00007fdd25622151 __libc_start_call_main (libc.so.6 + 0x28151)
                                            #11 0x00007fdd2562220b __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x2820b)
                                            #12 0x000056048f9adbc5 _start (smartctl + 0x30bc5)
                                            ELF object binary architecture: AMD x86-64

I think I fixed it: AMD Ryzen "Freezing" Bug on GNU/Linux Systems · GitHub

Removed amd, audio, f39, kde, radeon