Frequent GPU crashes/resets on Fedora 43 with kernel 6.17.12

I’ve been experiencing amdgpu crashes ever since I updated to Fedora 43 last week. I didn’t have this issue with Fedora 42.

It happens at least once a day, on low CPU and RAM load (<40%), but I can have this crash 2-4 times per day.

Any tips on how to fix this would be appreciated - or if someone saw another ticket/issue somewhere else I could follow and share my logs to help out :slight_smile:

I saw some comments on Freedesktop gitlab about disabling HW accel in the browser, which I did on Firefox after the last crash today. I will also reply there with my info just in case, since that issue dates back from 2023 with kernel 6.6.

System Information

Hardware

  • GPU: AMD Radeon RX 6650 XT (Navi 23, MSI MECH 2X)
    • Device ID: 1002:73ef
    • Subsystem: 1462:5027
    • Revision: c1
    • VRAM: 8176MB
  • CPU: AMD Ryzen 9 5900X 12-Core Processor (24 threads)
  • RAM: 62GB
  • PCI Address: 0000:0c:00.0

Software Versions

  • OS: Fedora release 43
  • Kernel: 6.17.12-300.fc43.x86_64 (SMP PREEMPT_DYNAMIC)
  • Mesa: 25.2.7-3.fc43
  • OpenGL: 4.6 (Mesa 25.2.7)
  • Driver: amdgpu (in-tree kernel module)
  • DRM: 3.64.0
  • LLVM: 21.1.5
  • Wayland: 1.24.0
  • Desktop: KDE Plasma 6.5.4 on Wayland
  • KWin: 6.5.4-2.fc43
  • Qt: 6.10.1

Driver Details

Driver: amdgpu (kernel module)
Module path: /lib/modules/6.17.12-300.fc43.x86_64/kernel/drivers/gpu/drm/amd/amdgpu/amdgpu.ko.xz
License: GPL and additional rights
GPU Codename: DIMGREY_CAVEFISH (Navi 23)
DRM version: 3.64
Display Core: v3.2.340 (DCN 3.0.2)
SMU firmware: version 0x003b3100 (59.49.0)
VCN firmware: ENC 1.33, DEC 4, VEP 0, Rev 14
DMUB firmware: 0x02020021
VBIOS: 113-V502MECH-1OC

Kernel Parameters

BOOT_IMAGE=(hd5,gpt2)/vmlinuz-6.17.12-300.fc43.x86_64
root=UUID=ee3f5ffb-4312-4315-8f37-0c269fa9ce37
ro rootflags=subvol=root
rd.luks.uuid=luks-b60051eb-eaac-41c9-bd08-7e5a7bdbd47b
rhgb quiet

Crash Logs

Boot -1 (2026-01-03 15:41) - Multiple GPU Resets

Initial Page Fault & First Reset:

Jan 03 15:41:21 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:2 pasid:32813)
Jan 03 15:41:21 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: Process codium pid 9998 thread codium:cs0 pid 10040
Jan 03 15:41:21 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: in page starting at address 0x000087283ab21000 from client 0x1b (UTCL2)
Jan 03 15:41:31 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: Dumping IP State
Jan 03 15:41:32 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: Dumping IP State Completed
Jan 03 15:41:32 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: [drm] AMDGPU device coredump file has been created
Jan 03 15:41:32 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: [drm] Check your /sys/class/drm/card1/device/devcoredump/data
Jan 03 15:41:32 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: ring gfx_0.1.0 timeout, signaled seq=4743204, emitted seq=4743206
Jan 03 15:41:32 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: Process kwin_wayland pid 3577 thread kwin_wayla:cs0 pid 3726
Jan 03 15:41:32 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: Starting gfx_0.1.0 ring reset
Jan 03 15:41:32 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: Ring gfx_0.1.0 reset failed
Jan 03 15:41:32 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
Jan 03 15:41:32 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: MODE1 reset
Jan 03 15:41:32 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: GPU mode1 reset
Jan 03 15:41:32 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: GPU smu mode1 reset
Jan 03 15:41:32 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset succeeded, trying to resume
Jan 03 15:41:32 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: VRAM is lost due to GPU reset!
Jan 03 15:41:33 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset(1) succeeded!
Jan 03 15:41:33 user-pc kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Jan 03 15:41:33 user-pc kernel: amdgpu 0000:0c:00.0: [drm] device wedged, but recovered through reset

Display Timeout After First Reset:

Jan 03 15:41:43 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: [drm] *ERROR* [CRTC:90:crtc-1] hw_done or flip_done timed out
Jan 03 15:41:43 user-pc kernel: amdgpu 0000:0c:00.0: [drm] *ERROR* [CRTC:90:crtc-1] flip_done timed out

Second Reset (10 seconds later):

Jan 03 15:41:53 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: Dumping IP State Completed
Jan 03 15:41:53 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: [drm] AMDGPU device coredump file has been created
Jan 03 15:41:53 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=2111356, emitted seq=2111359
Jan 03 15:41:53 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: Process plasmashell pid 4425 thread plasmashel:cs0 pid 4507
Jan 03 15:41:53 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: Starting gfx_0.0.0 ring reset
Jan 03 15:41:54 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: Ring gfx_0.0.0 reset failed
Jan 03 15:41:54 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
Jan 03 15:41:54 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: MODE1 reset
Jan 03 15:41:55 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset(2) succeeded!
Jan 03 15:41:55 user-pc kernel: amdgpu 0000:0c:00.0: [drm] device wedged, but recovered through reset

Continued Display Issues:

Jan 03 15:42:05 user-pc kernel: amdgpu 0000:0c:00.0: [drm] *ERROR* [CRTC:86:crtc-0] flip_done timed out
Jan 03 15:42:05 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: [drm] *ERROR* [CRTC:86:crtc-0] hw_done or flip_done timed out
Jan 03 15:42:05 user-pc kernel: amdgpu 0000:0c:00.0: [drm] *ERROR* [CRTC:90:crtc-1] flip_done timed out
Jan 03 15:42:15 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: [drm] *ERROR* [CRTC:90:crtc-1] hw_done or flip_done timed out
Jan 03 15:42:16 user-pc kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!

Boot -2 (2026-01-02 20:36) - Cascading Ring Timeouts

Initial Page Faults:

Jan 02 20:36:02 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:6 pasid:32811)
[Multiple repeated page faults]

First Reset:

Jan 02 20:36:13 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: ring gfx_0.1.0 timeout, signaled seq=1558585, emitted seq=1558587
Jan 02 20:36:13 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
Jan 02 20:36:14 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset succeeded, trying to resume
Jan 02 20:36:14 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: VRAM is lost due to GPU reset!
Jan 02 20:36:14 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset(1) succeeded!

Cascading Timeouts (every 10 seconds):

Jan 02 20:36:35 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: ring gfx_0.1.0 timeout, signaled seq=1558589, emitted seq=1558592
Jan 02 20:36:45 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=778525, emitted seq=778527
Jan 02 20:36:45 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: ring gfx_0.1.0 timeout, signaled seq=1558590, emitted seq=1558594
Jan 02 20:36:55 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: ring gfx_0.1.0 timeout, signaled seq=1558592, emitted seq=1558596
Jan 02 20:36:55 user-pc kernel: amdgpu 0000:0c:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=778526, emitted seq=778529
[Pattern continues with timeouts every 10 seconds]

According to

it is a firmware recession. Follow the instructions written there.

regression, of course.

Thanks for chiming in, but the instructions provided seems to be for NixOS exclusively:

I’m sadly unsure what to do with such information :1

I had the same issue. Temporary fix by downgrading amdgpu firmware to 20251021-1. Here are the commands for F43:

sudo dnf downgrade amd-gpu-firmware-20251021-1.fc43 amd-ucode-firmware-20251021-1.fc43
sudo dracut --force
sudo reboot now

Source
Apparently patches will be released this friday. Here is the progress tracker link.

I am experiencing a very similar issue to this with an amd hd 5750 on two separate systems , but not intermittent crashes, just complete freeze up at boot freezes on the three dots of plymouth, escape shows system freezing, documented initially here system wont boot on any kernel past 6.17.7 , but slightly different hardware same kernel revs involved between works and freezes , to much of a coincidence not to be related IMHO.

but ive not yet tried the firmware downgrade at this point in time, but will as soon as i can.

regards peter winterflood

I got the same problem on a ThinkPad X13 Gen 6:

$ sudo dmidecode -t system -t baseboard -t processor | grep -E "Manufacturer|Product Name|Version|Family":

  • Family: Zen
  • Manufacturer: Advanced Micro Devices, Inc.
  • Signature: Family 26, Model 96, Stepping 0
  • Version: AMD Ryzen AI 7 PRO 350 w/ Radeon 860M
  • Manufacturer: LENOVO
  • Product Name: 21RMCTO1WW
  • Version: ThinkPad X13 Gen 6
  • Family: ThinkPad X13 Gen 6
  • Manufacturer: LENOVO
  • Product Name: 21RMCTO1WW
  • Version: Not Defined

Downgrading the firmware to 20251021-1.fc43 was not enough. I also had to go back to kernel 6.17.1-300.fc43.x86_64 using

sudo dnf downgrade kernel kernel-core kernel-modules kernel-modules-extra amd-gpu-firmware amd-ucode-firmware

to get rid of the crashes (which happened quite regular when using VS Code, so Electron triggers the issue quite fast).