Kernel crash since 6.15.3 / 6.15.4 on Radeon 7900 XTX

Last working kernel on my system is 6.14.11-300.fc42.x86_64, since the update to 6.15.x my system crashes during GPU initialization. ABRT can’t report the error due to tainted flags, although it’s a stock kernel with nothing extra loaded.

Machine:
  Type: Desktop Mobo: ASUSTeK model: PRIME X570-P v: Rev X.0x
    serial: 200974601303301 UEFI: American Megatrends v: 5013 date: 03/22/2024
CPU:
  Info: 16-core model: AMD Ryzen 9 5950X bits: 64 type: MT MCP cache:
    L2: 8 MiB
  Speed (MHz): avg: 1746 min/max: 550/5086 cores: 1: 1746 2: 1746 3: 1746
    4: 1746 5: 1746 6: 1746 7: 1746 8: 1746 9: 1746 10: 1746 11: 1746 12: 1746
    13: 1746 14: 1746 15: 1746 16: 1746 17: 1746 18: 1746 19: 1746 20: 1746
    21: 1746 22: 1746 23: 1746 24: 1746 25: 1746 26: 1746 27: 1746 28: 1746
    29: 1746 30: 1746 31: 1746 32: 1746
Graphics:
  Device-1: Advanced Micro Devices [AMD/ATI] Navi 31 [Radeon RX 7900 XT/7900
    XTX/7900 GRE/7900M] driver: amdgpu v: kernel
  Display: unspecified server: X.Org v: 24.1.8 with: Xwayland v: 24.1.8
    driver: dri: radeonsi gpu: amdgpu resolution: 7680x2160~120Hz

The system will boot without display, so I’m able to SSH to the machine, however due to the OOPSes it barely works.
I’ve enabled a crashkernel and made it dump with SysRq-c, so I have a vmcore and dmesg of the failing boot. Will paste the dmesg crash part here and can provide the full file and vmcore if needed.

[    0.000000] Linux version 6.15.4-200.fc42.x86_64 (mockbuild@d2195f98ab4b4400b4eac9c028c691e1) (gcc (GCC) 15.1.1 20250521 (Red Hat 15.1.1-2), GNU ld version 2.44-3.fc42) #1 SMP PREEMPT_DYNAMIC Fri Jun 27 15:32:46 UTC 2025
[    0.000000] Command line: BOOT_IMAGE=(hd2,gpt2)/vmlinuz-6.15.4-200.fc42.x86_64 root=UUID=11460f9d-c451-41b1-ace8-055a9512e070 ro rootflags=subvol=root crashkernel=2G-64G:256M,64G-:512M
...
... [cut]
...
[    5.384287] [drm] amdgpu kernel modesetting enabled.
[    5.393733] amdgpu: Virtual CRAT table created for CPU
[    5.393918] amdgpu: Topology: Add CPU node
[    5.394178] amdgpu 0000:0c:00.0: enabling device (0006 -> 0007)
[    5.394367] [drm] initializing kernel modesetting (IP DISCOVERY 0x1002:0x744C 0x1849:0x5305 0xC8).
[    5.394805] [drm] register mmio base: 0xFCC00000
[    5.394963] [drm] register mmio size: 1048576
[    5.399844] amdgpu 0000:0c:00.0: amdgpu: detected ip block number 0 <soc21_common>
[    5.400023] amdgpu 0000:0c:00.0: amdgpu: detected ip block number 1 <gmc_v11_0>
[    5.400187] amdgpu 0000:0c:00.0: amdgpu: detected ip block number 2 <ih_v6_0>
[    5.400348] amdgpu 0000:0c:00.0: amdgpu: detected ip block number 3 <psp>
[    5.400510] amdgpu 0000:0c:00.0: amdgpu: detected ip block number 4 <smu>
[    5.400680] amdgpu 0000:0c:00.0: amdgpu: detected ip block number 5 <dm>
[    5.400881] amdgpu 0000:0c:00.0: amdgpu: detected ip block number 6 <gfx_v11_0>
[    5.401128] amdgpu 0000:0c:00.0: amdgpu: detected ip block number 7 <sdma_v6_0>
[    5.401371] amdgpu 0000:0c:00.0: amdgpu: detected ip block number 8 <vcn_v4_0>
[    5.401613] amdgpu 0000:0c:00.0: amdgpu: detected ip block number 9 <jpeg_v4_0>
[    5.401850] amdgpu 0000:0c:00.0: amdgpu: detected ip block number 10 <mes_v11_0>
[    5.402097] amdgpu 0000:0c:00.0: amdgpu: Fetched VBIOS from VFCT
[    5.402294] amdgpu: ATOM BIOS: 113-D70201-810011
[    5.403505] amdgpu 0000:0c:00.0: amdgpu: CP RS64 enable
[    5.426503] Console: switching to colour dummy device 80x25
[    5.436771] amdgpu 0000:0c:00.0: vgaarb: deactivate vga console
[    5.436781] amdgpu 0000:0c:00.0: amdgpu: Trusted Memory Zone (TMZ) feature not supported
[    5.436815] amdgpu 0000:0c:00.0: amdgpu: MEM ECC is not presented.
[    5.436819] amdgpu 0000:0c:00.0: amdgpu: SRAM ECC is not presented.
[    5.436832] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit
[    5.436844] amdgpu 0000:0c:00.0: amdgpu: VRAM: 24560M 0x0000008000000000 - 0x00000085FEFFFFFF (24560M used)
[    5.436851] amdgpu 0000:0c:00.0: amdgpu: GART: 512M 0x00007FFF00000000 - 0x00007FFF1FFFFFFF
[    5.436863] [drm] Detected VRAM RAM=24560M, BAR=32768M
[    5.436867] [drm] RAM width 384bits GDDR6
[    5.437056] [drm] amdgpu: 24560M of VRAM memory ready
[    5.437061] [drm] amdgpu: 31840M of GTT memory ready.
[    5.437081] [drm] GART: num cpu pages 131072, num gpu pages 131072
[    5.437149] [drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
[    5.437813] [drm] Loading DMUB firmware via PSP: version=0x07001B00
[    5.437996] amdgpu 0000:0c:00.0: amdgpu: Found VCN firmware Version ENC: 1.12 DEC: 5 VEP: 0 Revision: 0
[    5.438060] amdgpu 0000:0c:00.0: amdgpu: Found VCN firmware Version ENC: 1.12 DEC: 5 VEP: 0 Revision: 0
[    5.508823] amdgpu 0000:0c:00.0: amdgpu: reserve 0x1300000 from 0x85fc000000 for PSP TMR
[    5.651713] amdgpu 0000:0c:00.0: amdgpu: RAP: optional rap ta ucode is not available
[    5.651719] amdgpu 0000:0c:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[    5.651783] amdgpu 0000:0c:00.0: amdgpu: smu driver if version = 0x0000003d, smu fw if version = 0x0000003f, smu fw program = 0, smu fw version = 0x004e6601 (78.102.1)
[    5.651793] amdgpu 0000:0c:00.0: amdgpu: SMU driver if version not matched
[    5.797808] amdgpu 0000:0c:00.0: amdgpu: SMU is initialized successfully!
[    5.798199] [drm] Display Core v3.2.325 initialized on DCN 3.2
[    5.798203] [drm] DP-HDMI FRL PCON supported
[    5.800071] [drm] DMUB hardware initialized: version=0x07001B00
[    6.081477] amdgpu 0000:0c:00.0: amdgpu: MES failed to respond to msg=SET_HW_RSRC_1
[    6.081484] [drm:mes_v11_0_hw_init [amdgpu]] *ERROR* failed mes_v11_0_set_hw_resources_1, r=-110
[    6.081768] [drm:amdgpu_device_ip_init [amdgpu]] *ERROR* hw_init of IP block <gfx_v11_0> failed -110
[    6.082057] amdgpu 0000:0c:00.0: amdgpu: amdgpu_device_ip_init failed
[    6.082062] amdgpu 0000:0c:00.0: amdgpu: Fatal error during GPU init
[    6.082067] amdgpu 0000:0c:00.0: amdgpu: amdgpu: finishing device.
[    6.082508] [drm] pre_validate_dsc:1627 MST_DSC dsc precompute is not needed
[    6.082531] ------------[ cut here ]------------
[    6.082535] WARNING: CPU: 7 PID: 682 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:631 amdgpu_irq_put+0x46/0x70 [amdgpu]
[    6.082783] Modules linked in: amdgpu(+) amdxcp i2c_algo_bit drm_ttm_helper ttm drm_exec gpu_sched drm_suballoc_helper video drm_panel_backlight_quirks drm_buddy nvme drm_display_helper polyval_clmulni polyval_generic ghash_clmulni_intel nvme_core sha512_ssse3 sha256_ssse3 cec sha1_ssse3 hid_apple sp5100_tco nvme_keyring wmi nvme_auth scsi_dh_rdac scsi_dh_emc scsi_dh_alua fuse i2c_dev
[    6.082832] CPU: 7 UID: 0 PID: 682 Comm: (udev-worker) Not tainted 6.15.4-200.fc42.x86_64 #1 PREEMPT(lazy)
[    6.082839] Hardware name: System manufacturer System Product Name/PRIME X570-P, BIOS 5013 03/22/2024
[    6.082845] RIP: 0010:amdgpu_irq_put+0x46/0x70 [amdgpu]
[    6.083076] Code: c0 74 33 48 8b 4e 10 48 83 39 00 74 29 89 d1 48 8d 04 88 8b 08 85 c9 74 11 f0 ff 08 74 07 31 c0 e9 3a 54 20 e1 e9 1a fd ff ff <0f> 0b b8 ea ff ff ff e9 29 54 20 e1 b8 ea ff ff ff e9 1f 54 20 e1
[    6.083085] RSP: 0018:ffffccd000a43a18 EFLAGS: 00010246
[    6.083091] RAX: ffff8b84598dfd00 RBX: ffff8b8462498b50 RCX: 0000000000000000
[    6.083095] RDX: 0000000000000000 RSI: ffff8b84624a59a0 RDI: ffff8b8462480000
[    6.083100] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffffa28ffabf
[    6.083104] R10: ffff8b8460a92c00 R11: fffff6b7c482a480 R12: ffff8b8462480000
[    6.083109] R13: ffff8b8462480000 R14: ffff8b84624a59a0 R15: 0000000000000000
[    6.083113] FS:  00007f4d71363040(0000) GS:ffff8b93893f1000(0000) knlGS:0000000000000000
[    6.083119] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.083123] CR2: 00007fb8c2b14010 CR3: 0000000106db8000 CR4: 0000000000f50ef0
[    6.083128] PKRU: 55555554
[    6.083131] Call Trace:
[    6.083135]  <TASK>
[    6.083138]  amdgpu_fence_driver_hw_fini+0x119/0x160 [amdgpu]
[    6.083345]  amdgpu_device_fini_hw+0xb5/0x1a6 [amdgpu]
[    6.083632]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.083639]  amdgpu_driver_load_kms.cold+0x19/0x2f [amdgpu]
[    6.083908]  amdgpu_pci_probe+0x1df/0x540 [amdgpu]
[    6.084105]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.084111]  local_pci_probe+0x42/0x90
[    6.084116]  pci_call_probe+0x5b/0x190
[    6.084120]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.084125]  ? kernfs_create_link+0x61/0xb0
[    6.084132]  pci_device_probe+0x95/0x140
[    6.084136]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.084141]  really_probe+0xde/0x340
[    6.084146]  ? pm_runtime_barrier+0x55/0x90
[    6.084151]  __driver_probe_device+0x78/0x140
[    6.084156]  driver_probe_device+0x1f/0xa0
[    6.084161]  ? __pfx___driver_attach+0x10/0x10
[    6.084166]  __driver_attach+0xcb/0x1e0
[    6.084171]  bus_for_each_dev+0x85/0xd0
[    6.084176]  bus_add_driver+0x12f/0x210
[    6.084181]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
[    6.084368]  driver_register+0x75/0xe0
[    6.084373]  ? amdgpu_init+0x42/0xff0 [amdgpu]
[    6.084564]  do_one_initcall+0x5b/0x300
[    6.084572]  do_init_module+0x84/0x260
[    6.084578]  init_module_from_file+0x8a/0xe0
[    6.084585]  idempotent_init_module+0x114/0x310
[    6.084592]  __x64_sys_finit_module+0x67/0xc0
[    6.084597]  do_syscall_64+0x7b/0x160
[    6.084603]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.084608]  ? switch_fpu_return+0x4e/0xd0
[    6.084612]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.084617]  ? syscall_exit_to_user_mode+0x1d5/0x210
[    6.084622]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.084626]  ? do_syscall_64+0x87/0x160
[    6.084630]  ? exc_page_fault+0x7e/0x1a0
[    6.084635]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[    6.084639] RIP: 0033:0x7f4d71c22a8d
[    6.084644] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4b 63 0f 00 f7 d8 64 89 01 48
[    6.084652] RSP: 002b:00007ffd16ed0188 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[    6.084658] RAX: ffffffffffffffda RBX: 00005624328bbe10 RCX: 00007f4d71c22a8d
[    6.084662] RDX: 0000000000000004 RSI: 00007f4d71359965 RDI: 000000000000003f
[    6.084667] RBP: 00007ffd16ed0240 R08: 0000000000000000 R09: 00007ffd16ed01f0
[    6.084671] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000020000
[    6.084675] R13: 00005624328b9110 R14: 00007f4d71359965 R15: 0000000000000000
[    6.084682]  </TASK>
[    6.084684] ---[ end trace 0000000000000000 ]---
[    6.084694] ------------[ cut here ]------------
[    6.084697] WARNING: CPU: 7 PID: 682 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:631 amdgpu_irq_put+0x46/0x70 [amdgpu]
[    6.084914] Modules linked in: amdgpu(+) amdxcp i2c_algo_bit drm_ttm_helper ttm drm_exec gpu_sched drm_suballoc_helper video drm_panel_backlight_quirks drm_buddy nvme drm_display_helper polyval_clmulni polyval_generic ghash_clmulni_intel nvme_core sha512_ssse3 sha256_ssse3 cec sha1_ssse3 hid_apple sp5100_tco nvme_keyring wmi nvme_auth scsi_dh_rdac scsi_dh_emc scsi_dh_alua fuse i2c_dev
[    6.084954] CPU: 7 UID: 0 PID: 682 Comm: (udev-worker) Tainted: G        W           6.15.4-200.fc42.x86_64 #1 PREEMPT(lazy)
[    6.084962] Tainted: [W]=WARN
[    6.084964] Hardware name: System manufacturer System Product Name/PRIME X570-P, BIOS 5013 03/22/2024
[    6.084969] RIP: 0010:amdgpu_irq_put+0x46/0x70 [amdgpu]
[    6.085172] Code: c0 74 33 48 8b 4e 10 48 83 39 00 74 29 89 d1 48 8d 04 88 8b 08 85 c9 74 11 f0 ff 08 74 07 31 c0 e9 3a 54 20 e1 e9 1a fd ff ff <0f> 0b b8 ea ff ff ff e9 29 54 20 e1 b8 ea ff ff ff e9 1f 54 20 e1
[    6.085180] RSP: 0018:ffffccd000a43a18 EFLAGS: 00010246
[    6.085184] RAX: ffff8b84598dfd08 RBX: ffff8b8462499198 RCX: 0000000000000000
[    6.085188] RDX: 0000000000000002 RSI: ffff8b84624a59a0 RDI: ffff8b8462480000
[    6.085192] RBP: 0000000000000001 R08: 0000000000000000 R09: ffffffffa28ffabf
[    6.085196] R10: ffff8b8460a92c00 R11: fffff6b7c482a480 R12: ffff8b8462480000
[    6.085200] R13: ffff8b8462480000 R14: ffff8b84624a59a0 R15: 0000000000000000
[    6.085204] FS:  00007f4d71363040(0000) GS:ffff8b93893f1000(0000) knlGS:0000000000000000
[    6.085209] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    6.085212] CR2: 00007fb8c2b14010 CR3: 0000000106db8000 CR4: 0000000000f50ef0
[    6.085216] PKRU: 55555554
[    6.085219] Call Trace:
[    6.085222]  <TASK>
[    6.085224]  amdgpu_fence_driver_hw_fini+0x119/0x160 [amdgpu]
[    6.085409]  amdgpu_device_fini_hw+0xb5/0x1a6 [amdgpu]
[    6.085656]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.085661]  amdgpu_driver_load_kms.cold+0x19/0x2f [amdgpu]
[    6.085904]  amdgpu_pci_probe+0x1df/0x540 [amdgpu]
[    6.086080]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.086084]  local_pci_probe+0x42/0x90
[    6.086089]  pci_call_probe+0x5b/0x190
[    6.086092]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.086096]  ? kernfs_create_link+0x61/0xb0
[    6.086101]  pci_device_probe+0x95/0x140
[    6.086105]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.086109]  really_probe+0xde/0x340
[    6.086113]  ? pm_runtime_barrier+0x55/0x90
[    6.086117]  __driver_probe_device+0x78/0x140
[    6.086122]  driver_probe_device+0x1f/0xa0
[    6.086126]  ? __pfx___driver_attach+0x10/0x10
[    6.086130]  __driver_attach+0xcb/0x1e0
[    6.086135]  bus_for_each_dev+0x85/0xd0
[    6.086139]  bus_add_driver+0x12f/0x210
[    6.086143]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
[    6.086311]  driver_register+0x75/0xe0
[    6.086315]  ? amdgpu_init+0x42/0xff0 [amdgpu]
[    6.086487]  do_one_initcall+0x5b/0x300
[    6.086494]  do_init_module+0x84/0x260
[    6.086498]  init_module_from_file+0x8a/0xe0
[    6.086505]  idempotent_init_module+0x114/0x310
[    6.086511]  __x64_sys_finit_module+0x67/0xc0
[    6.086516]  do_syscall_64+0x7b/0x160
[    6.086520]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.086524]  ? switch_fpu_return+0x4e/0xd0
[    6.086528]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.086532]  ? syscall_exit_to_user_mode+0x1d5/0x210
[    6.086536]  ? srso_alias_return_thunk+0x5/0xfbef5
[    6.086540]  ? do_syscall_64+0x87/0x160
[    6.086544]  ? exc_page_fault+0x7e/0x1a0
[    6.086548]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[    6.086552] RIP: 0033:0x7f4d71c22a8d
[    6.086555] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 4b 63 0f 00 f7 d8 64 89 01 48
[    6.086563] RSP: 002b:00007ffd16ed0188 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[    6.086568] RAX: ffffffffffffffda RBX: 00005624328bbe10 RCX: 00007f4d71c22a8d
[    6.086572] RDX: 0000000000000004 RSI: 00007f4d71359965 RDI: 000000000000003f
[    6.086575] RBP: 00007ffd16ed0240 R08: 0000000000000000 R09: 00007ffd16ed01f0
[    6.086579] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000020000
[    6.086583] R13: 00005624328b9110 R14: 00007f4d71359965 R15: 0000000000000000
[    6.086589]  </TASK>
[    6.086591] ---[ end trace 0000000000000000 ]---
[    6.086599] ------------[ cut here ]------------

Looks like it might be an amdgpu firmware issue.

Can you report this on the amd gpu bug tracker?

Reported: Making sure you're not a bot!

1 Like

Issue is fixed in kernel-6.15.5-200

1 Like

I tried to upgrade a few days ago because looking forward to KDE 6.4, but ended up with a system with blank screens, ctrl+F1 or F2 or something gave access to terminal and journalctl showed amdgpu crashing at boot. (amd 780m igpu).

Lucky i took a ZFS snapshot before upgrade, so rolled back and been looking for a solution.. just upgraded test VM and it pulls kernel 6.15.4.

So i should wait for 6.15.5 before trying again? thanks

Please start a new topic. The problem described here is fixed.
It seems you have a new issue.

1 Like