This machine ran fine for months. Now I’m getting complete hard hangs requiring power-button reset. Display freezes, no keyboard, no SSH, no magic SysRq. Happens during routine desktop activity (browsing, terminal, container builds — no obvious common trigger). No video playback or gaming.
Journal ends abruptly mid-stream. No kernel panic, oops, MCE, OOM, thermal event, soft-lockup warning, GPU reset attempt, or pstore artifact.
kdump was set up with reserved crashkernel; /var/crash empty after each hang — the lockup appears to block the kexec path.
No amdgpu warnings or ring timeouts in the seconds leading up to the freeze. Last messages are unrelated userspace (e.g., a Signal notification, a podman push).
Timeline
2026-04-28 — dnf transaction 98: kernel 6.19.11 → 6.19.14, plus ~110 packages including mesa and linux-firmware bumps.
2026-05-08 — first boot into the new stack.
Following days — first hang after ~3 days, then ~1 day.
2026-05-11 — tried kernel 7.0.4 hoping newer driver would help RDNA 4. Hung twice within a minute. Reverted.
2026-05-12 — switched to 6.19.13. Hung after 2 hours. Currently booted again with netconsole running.
Anyone else run into this? Any ideas on how to troubleshoot?
I ran memtest overnight for 13 hours and it passed. It’s a custom built desktop from a few years ago, so I don’t think I have an vendor software to run.
I will run some stress tests tonight to test GPU and CPU. There was a BIOS update I did about a month ago (it was released last September).
amdgpu firmware was bumped to 20260410-1.fc43 on 2026-04-23 and first hang was two weeks later.
Thanks for your help. I’m just kind of loss without a smoking gun.
Please start a new thread. Your hardware appears to be different, which complicates efforts to understand issues. Please provide hardware details (posting the output from running inxi -Fzxx in a terminal as pre-formatted web-discoverable text is often effective at reaching others with similar hardware who can provide a solution).
Yes. Before retiring, I worked with colleagues at large institutions. IT groups often had collections of misbehaving systems set aside for troubleshooting as time allowed. That allowed them to swap power supplies, cables, mass storage devices, and system boards to see if the problem moves to the 2nd machine.
If you have a way to log in from another system with ssh you can run journalctl —follow in the hope some error messages appear before the problem system fully crashes. I often use Termius on an ipad for ssh access to linux boxes.
You can also try just removing all but the very minimal optional hardware, even down to a “headless” configuration accessed by ssh.
% sudo dnf5 info rasdaemon
[sudo: authenticate] Password:
Updating and loading repositories:
Repositories loaded.
Available packages
Name : rasdaemon
Epoch : 0
Version : 0.8.0
Release : 9.fc44
Architecture : x86_64
Download size : 89.6 KiB
Installed size : 267.8 KiB
Source : rasdaemon-0.8.0-9.fc44.src.rpm
Repository : fedora
Summary : Utility to receive RAS error tracings
URL : http://git.infradead.org/users/mchehab/rasdaemon.git
License : GPL-2.0-only
Description : rasdaemon is a RAS (Reliability, Availability and Serviceability) logging tool.
: It currently records memory errors, using the EDAC tracing events.
: EDAC is drivers in the Linux kernel that handle detection of ECC errors
: from memory controllers for most chipsets on i386 and x86_64 architectures.
: EDAC drivers for other architectures like arm also exists.
: This userspace component consists of an init script which makes sure
: EDAC drivers and DIMM labels are loaded at system startup, as well as
: an utility for reporting current error counts from the EDAC sysfs files.
Vendor : Fedora Project