Fedora 40, AMD A10, Radeon R7 GPU, kernel 6.8.8-6.8.11 issue, anyone else?

Short:

[   29.265891] UBSAN: array-index-out-of-bounds in drivers/gpu/drm/amd/amdgpu/../pm/legacy-dpm/kv_dpm.c:167:30
[   29.272518] UBSAN: array-index-out-of-bounds in drivers/gpu/drm/amd/amdgpu/../pm/legacy-dpm/kv_dpm.c:169:30
[   29.279187] [drm] Internal thermal controller without fan control
[   29.279334] UBSAN: array-index-out-of-bounds in drivers/gpu/drm/amd/amdgpu/../pm/legacy-dpm/kv_dpm.c:2736:39
[   29.285735] UBSAN: array-index-out-of-bounds in drivers/gpu/drm/amd/amdgpu/../pm/legacy-dpm/kv_dpm.c:2769:32

Long:

[   28.823554] [drm] PCIE GART of 1024M enabled (table at 0x000000F400A00000).
[   28.830800] RPC: Registered named UNIX socket transport module.
[   28.830806] RPC: Registered udp transport module.
[   28.830807] RPC: Registered tcp transport module.
[   28.830808] RPC: Registered tcp-with-tls transport module.
[   28.830809] RPC: Registered tcp NFSv4.1 backchannel transport module.
[   28.864945] tda829x 6-0042: could not clearly identify tuner address, defaulting to 60
[   28.887337] tda18271 6-0060: creating new instance
[   28.920619] tda18271: TDA18271HD/C1 detected @ 6-0060
[   29.032212] NET: Registered PF_QIPCRTR protocol family
[   29.254535] tda829x 6-0042: type set to tda8295+18271
[   29.265885] ------------[ cut here ]------------
[   29.265891] UBSAN: array-index-out-of-bounds in drivers/gpu/drm/amd/amdgpu/../pm/legacy-dpm/kv_dpm.c:167:30
[   29.265896] index 4 is out of range for type 'sumo_vid_mapping_entry [4]'
[   29.265900] CPU: 3 PID: 1503 Comm: (udev-worker) Not tainted 6.8.8-300.fc40.x86_64 #1
[   29.265905] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./FM2A88X+ Killer, BIOS P2.90 05/17/2016
[   29.265908] Call Trace:
[   29.265912]  <TASK>
[   29.265917]  dump_stack_lvl+0x6a/0x90
[   29.265930]  __ubsan_handle_out_of_bounds+0x95/0xd0
[   29.265938]  kv_dpm_sw_init+0xd04/0xd10 [amdgpu]
[   29.267282]  amdgpu_device_init+0x11e3/0x2a40 [amdgpu]
[   29.268619]  amdgpu_driver_load_kms+0x19/0x190 [amdgpu]
[   29.269903]  amdgpu_pci_probe+0x1aa/0x5c0 [amdgpu]
[   29.271168]  local_pci_probe+0x45/0xa0
[   29.271177]  pci_device_probe+0xc1/0x2a0
[   29.271182]  really_probe+0x19e/0x3e0
[   29.271188]  ? __pfx___driver_attach+0x10/0x10
[   29.271190]  __driver_probe_device+0x78/0x160
[   29.271195]  driver_probe_device+0x1f/0xa0
[   29.271197]  __driver_attach+0xba/0x1c0
[   29.271200]  bus_for_each_dev+0x8f/0xe0
[   29.271204]  bus_add_driver+0x116/0x220
[   29.271209]  driver_register+0x5c/0x100
[   29.271212]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
[   29.272381]  do_one_initcall+0x5b/0x320
[   29.272392]  do_init_module+0x60/0x240
[   29.272396]  __do_sys_init_module+0x17a/0x1b0
[   29.272402]  do_syscall_64+0x83/0x170
[   29.272407]  ? __handle_mm_fault+0xca6/0xe90
[   29.272413]  ? update_process_times+0x9c/0xb0
[   29.272418]  ? __count_memcg_events+0x69/0x100
[   29.272422]  ? count_memcg_events.constprop.0+0x1a/0x30
[   29.272425]  ? handle_mm_fault+0x1f2/0x350
[   29.272429]  ? do_user_addr_fault+0x304/0x690
[   29.272433]  ? exc_page_fault+0x7f/0x180
[   29.272437]  entry_SYSCALL_64_after_hwframe+0x78/0x80
[   29.272442] RIP: 0033:0x7efeb8f2857e
[   29.272461] Code: 48 8b 0d 9d 98 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6a 98 0c 00 f7 d8 64 89 01 48
[   29.272464] RSP: 002b:00007fff4575eb38 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[   29.272469] RAX: ffffffffffffffda RBX: 0000555c37671be0 RCX: 00007efeb8f2857e
[   29.272471] RDX: 0000555c374fa4f0 RSI: 000000000237121e RDI: 00007efeb4600010
[   29.272473] RBP: 00007fff4575ebf0 R08: 0000555c374f6010 R09: 0000000000000007
[   29.272474] R10: 0000000000000004 R11: 0000000000000246 R12: 0000555c374fa4f0
[   29.272476] R13: 0000000000020000 R14: 0000555c374fcf50 R15: 0000555c376f6d90
[   29.272480]  </TASK>
[   29.272514] ---[ end trace ]---
[   29.272517] ------------[ cut here ]------------
[   29.272518] UBSAN: array-index-out-of-bounds in drivers/gpu/drm/amd/amdgpu/../pm/legacy-dpm/kv_dpm.c:169:30
[   29.272521] index 4 is out of range for type 'sumo_vid_mapping_entry [4]'
[   29.272524] CPU: 3 PID: 1503 Comm: (udev-worker) Not tainted 6.8.8-300.fc40.x86_64 #1
[   29.272528] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./FM2A88X+ Killer, BIOS P2.90 05/17/2016
[   29.272529] Call Trace:
[   29.272532]  <TASK>
[   29.272533]  dump_stack_lvl+0x6a/0x90
[   29.272539]  __ubsan_handle_out_of_bounds+0x95/0xd0
[   29.272545]  kv_dpm_sw_init+0xcf0/0xd10 [amdgpu]
[   29.273933]  amdgpu_device_init+0x11e3/0x2a40 [amdgpu]
[   29.275160]  amdgpu_driver_load_kms+0x19/0x190 [amdgpu]
[   29.276502]  amdgpu_pci_probe+0x1aa/0x5c0 [amdgpu]
[   29.277652]  local_pci_probe+0x45/0xa0
[   29.277660]  pci_device_probe+0xc1/0x2a0
[   29.277665]  really_probe+0x19e/0x3e0
[   29.277671]  ? __pfx___driver_attach+0x10/0x10
[   29.277674]  __driver_probe_device+0x78/0x160
[   29.277678]  driver_probe_device+0x1f/0xa0
[   29.277680]  __driver_attach+0xba/0x1c0
[   29.277683]  bus_for_each_dev+0x8f/0xe0
[   29.277687]  bus_add_driver+0x116/0x220
[   29.277691]  driver_register+0x5c/0x100
[   29.277694]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
[   29.279062]  do_one_initcall+0x5b/0x320
[   29.279073]  do_init_module+0x60/0x240
[   29.279078]  __do_sys_init_module+0x17a/0x1b0
[   29.279082]  do_syscall_64+0x83/0x170
[   29.279088]  ? __handle_mm_fault+0xca6/0xe90
[   29.279093]  ? update_process_times+0x9c/0xb0
[   29.279098]  ? __count_memcg_events+0x69/0x100
[   29.279101]  ? count_memcg_events.constprop.0+0x1a/0x30
[   29.279104]  ? handle_mm_fault+0x1f2/0x350
[   29.279108]  ? do_user_addr_fault+0x304/0x690
[   29.279112]  ? exc_page_fault+0x7f/0x180
[   29.279115]  entry_SYSCALL_64_after_hwframe+0x78/0x80
[   29.279120] RIP: 0033:0x7efeb8f2857e
[   29.279140] Code: 48 8b 0d 9d 98 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6a 98 0c 00 f7 d8 64 89 01 48
[   29.279142] RSP: 002b:00007fff4575eb38 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[   29.279146] RAX: ffffffffffffffda RBX: 0000555c37671be0 RCX: 00007efeb8f2857e
[   29.279148] RDX: 0000555c374fa4f0 RSI: 000000000237121e RDI: 00007efeb4600010
[   29.279150] RBP: 00007fff4575ebf0 R08: 0000555c374f6010 R09: 0000000000000007
[   29.279151] R10: 0000000000000004 R11: 0000000000000246 R12: 0000555c374fa4f0
[   29.279153] R13: 0000000000020000 R14: 0000555c374fcf50 R15: 0000555c376f6d90
[   29.279156]  </TASK>
[   29.279182] ---[ end trace ]---
[   29.279187] [drm] Internal thermal controller without fan control
[   29.279332] ------------[ cut here ]------------
[   29.279334] UBSAN: array-index-out-of-bounds in drivers/gpu/drm/amd/amdgpu/../pm/legacy-dpm/kv_dpm.c:2736:39
[   29.279337] index 2 is out of range for type 'ATOM_PPLIB_NONCLOCK_INFO [1]'
[   29.279339] CPU: 3 PID: 1503 Comm: (udev-worker) Not tainted 6.8.8-300.fc40.x86_64 #1
[   29.279343] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./FM2A88X+ Killer, BIOS P2.90 05/17/2016
[   29.279345] Call Trace:
[   29.279347]  <TASK>
[   29.279349]  dump_stack_lvl+0x6a/0x90
[   29.279355]  __ubsan_handle_out_of_bounds+0x95/0xd0
[   29.279362]  kv_dpm_sw_init+0xc87/0xd10 [amdgpu]
[   29.280617]  amdgpu_device_init+0x11e3/0x2a40 [amdgpu]
[   29.281732]  amdgpu_driver_load_kms+0x19/0x190 [amdgpu]
[   29.282784]  amdgpu_pci_probe+0x1aa/0x5c0 [amdgpu]
[   29.284238]  local_pci_probe+0x45/0xa0
[   29.284246]  pci_device_probe+0xc1/0x2a0
[   29.284250]  really_probe+0x19e/0x3e0
[   29.284256]  ? __pfx___driver_attach+0x10/0x10
[   29.284259]  __driver_probe_device+0x78/0x160
[   29.284263]  driver_probe_device+0x1f/0xa0
[   29.284266]  __driver_attach+0xba/0x1c0
[   29.284268]  bus_for_each_dev+0x8f/0xe0
[   29.284272]  bus_add_driver+0x116/0x220
[   29.284276]  driver_register+0x5c/0x100
[   29.284280]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
[   29.285572]  do_one_initcall+0x5b/0x320
[   29.285583]  do_init_module+0x60/0x240
[   29.285587]  __do_sys_init_module+0x17a/0x1b0
[   29.285592]  do_syscall_64+0x83/0x170
[   29.285597]  ? __handle_mm_fault+0xca6/0xe90
[   29.285602]  ? update_process_times+0x9c/0xb0
[   29.285606]  ? __count_memcg_events+0x69/0x100
[   29.285610]  ? count_memcg_events.constprop.0+0x1a/0x30
[   29.285613]  ? handle_mm_fault+0x1f2/0x350
[   29.285616]  ? do_user_addr_fault+0x304/0x690
[   29.285620]  ? exc_page_fault+0x7f/0x180
[   29.285624]  entry_SYSCALL_64_after_hwframe+0x78/0x80
[   29.285629] RIP: 0033:0x7efeb8f2857e
[   29.285651] Code: 48 8b 0d 9d 98 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6a 98 0c 00 f7 d8 64 89 01 48
[   29.285654] RSP: 002b:00007fff4575eb38 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[   29.285657] RAX: ffffffffffffffda RBX: 0000555c37671be0 RCX: 00007efeb8f2857e
[   29.285659] RDX: 0000555c374fa4f0 RSI: 000000000237121e RDI: 00007efeb4600010
[   29.285661] RBP: 00007fff4575ebf0 R08: 0000555c374f6010 R09: 0000000000000007
[   29.285663] R10: 0000000000000004 R11: 0000000000000246 R12: 0000555c374fa4f0
[   29.285664] R13: 0000000000020000 R14: 0000555c374fcf50 R15: 0000555c376f6d90
[   29.285668]  </TASK>
[   29.285711] ---[ end trace ]---
[   29.285733] ------------[ cut here ]------------
[   29.285735] UBSAN: array-index-out-of-bounds in drivers/gpu/drm/amd/amdgpu/../pm/legacy-dpm/kv_dpm.c:2769:32
[   29.285739] index 48 is out of range for type 'UCHAR [1]'
[   29.285742] CPU: 3 PID: 1503 Comm: (udev-worker) Not tainted 6.8.8-300.fc40.x86_64 #1
[   29.285745] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./FM2A88X+ Killer, BIOS P2.90 05/17/2016
[   29.285747] Call Trace:
[   29.285750]  <TASK>
[   29.285752]  dump_stack_lvl+0x6a/0x90
[   29.285757]  __ubsan_handle_out_of_bounds+0x95/0xd0
[   29.285763]  kv_dpm_sw_init+0xba4/0xd10 [amdgpu]
[   29.286876]  amdgpu_device_init+0x11e3/0x2a40 [amdgpu]
[   29.287878]  amdgpu_driver_load_kms+0x19/0x190 [amdgpu]
[   29.289154]  amdgpu_pci_probe+0x1aa/0x5c0 [amdgpu]
[   29.290341]  local_pci_probe+0x45/0xa0
[   29.290347]  pci_device_probe+0xc1/0x2a0
[   29.290352]  really_probe+0x19e/0x3e0
[   29.290356]  ? __pfx___driver_attach+0x10/0x10
[   29.290358]  __driver_probe_device+0x78/0x160
[   29.290362]  driver_probe_device+0x1f/0xa0
[   29.290364]  __driver_attach+0xba/0x1c0
[   29.290366]  bus_for_each_dev+0x8f/0xe0
[   29.290370]  bus_add_driver+0x116/0x220
[   29.290373]  driver_register+0x5c/0x100
[   29.290376]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
[   29.291630]  do_one_initcall+0x5b/0x320
[   29.291642]  do_init_module+0x60/0x240
[   29.291647]  __do_sys_init_module+0x17a/0x1b0
[   29.291652]  do_syscall_64+0x83/0x170
[   29.291657]  ? __handle_mm_fault+0xca6/0xe90
[   29.291662]  ? update_process_times+0x9c/0xb0
[   29.291667]  ? __count_memcg_events+0x69/0x100
[   29.291671]  ? count_memcg_events.constprop.0+0x1a/0x30
[   29.291674]  ? handle_mm_fault+0x1f2/0x350
[   29.291677]  ? do_user_addr_fault+0x304/0x690
[   29.291682]  ? exc_page_fault+0x7f/0x180
[   29.291687]  entry_SYSCALL_64_after_hwframe+0x78/0x80
[   29.291692] RIP: 0033:0x7efeb8f2857e
[   29.291713] Code: 48 8b 0d 9d 98 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 af 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 6a 98 0c 00 f7 d8 64 89 01 48
[   29.291716] RSP: 002b:00007fff4575eb38 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
[   29.291719] RAX: ffffffffffffffda RBX: 0000555c37671be0 RCX: 00007efeb8f2857e
[   29.291721] RDX: 0000555c374fa4f0 RSI: 000000000237121e RDI: 00007efeb4600010
[   29.291723] RBP: 00007fff4575ebf0 R08: 0000555c374f6010 R09: 0000000000000007
[   29.291724] R10: 0000000000000004 R11: 0000000000000246 R12: 0000555c374fa4f0
[   29.291726] R13: 0000000000020000 R14: 0000555c374fcf50 R15: 0000555c376f6d90
[   29.291729]  </TASK>
[   29.291778] ---[ end trace ]---
[   29.291781] [drm] amdgpu: dpm initialized

Are you running a debug kernel?

Sounds like you should raise an issue about the UBSAN errors against the kernel for the amdgpu.

It apparently happens with NVIDIA GPUs too, but it’s mentioned to be harmless there: UBSAN: array-index-out-of-bounds complaints in newer kernels - #6 by aplattner - Linux - NVIDIA Developer Forums

Is it actually causing 3D failure or system freezing?

Are you running a debug kernel?

Sounds like you should raise an issue about the UBSAN errors against the kernel for the amdgpu.

Standard Kernel.

Kernel 6.8.9 affected as well.

Haven’t tested any 3D yet.

Kernel 6.8.9, which is also affected froze on boot before I could reach the desktop. Had to revert to 6.8.7 to be able to complete boot.

Thanks for the link, will read.

Should have said to raise bug for panic as well.

https://bugzilla.redhat.com/show_bug.cgi?id=2280155

Well, well, well. it’s still present in the 6.8.10 kernel in testing kernel | Package Info | koji

And as usual. my bug reports are happily being ignored. I reported several times bugs at boot about radeon or amdgpu errors and it’s now obvious nobody in the kernel team even cares about it.

Not an email, not even an acknowledgement. So why bother reporting…

That’s the benefit of open-source graphics card drivers! Or whatever AMD GPU users like saying when they barely even have an OpenCL stack still :stuck_out_tongue:


sumo_vid_mapping_entry implies it’s something with PowerPlay tables, or something with power management: linux/drivers/gpu/drm/radeon/sumo_dpm.h at master · torvalds/linux · GitHub

I wonder if forcing the highest-performance clock and power states might avoid the issue? I don’t know how it’s done today, but this is what I did with a RX 6600 XT: scripts:nightwane [RoE | Wiki] under the GPU tuning part. I’d skip the core/mem clock parts but everything else looks ok

Reported here

I’m getting a freeze/hang when suspending with a Radeon RX 6400 installed. Reverting to 6.8.7 doesn’t help, so could be a different issue as the UBSAN warnings seem harmless and unrelated.

Reported here: Hang/freeze at suspend with Radeon RX 6400 on Fedora 40 (kernel 6.8.x) (#3392) · Issues · drm / amd · GitLab .

Reported here

No, I’m not familiar with gitlab and I reported it as confidential. And now I don’t know how to remove that flag. I prefer wnen my email address is not all over the internet.

GitLab doesn’t reveal people’s emails by default.

Apparently a patch was issued. I guess we’ll see in kernel 6.8.11.

There a patched version here. I don’t know how to install it though.

https://koji.fedoraproject.org/koji/taskinfo?taskID=118011610

Either patch wasn’t applied, or doesn’t work because kernel 6.8.11 newly released today is affected as well.

Back to kernel 6.8.7!

Ajout de f40, kernel, radeon

Although not tested, the entire 6.9 testing kernel looks affected since Linux v6.9.0-0.rc4.8cd26fd90c1a

https://koji.fedoraproject.org/koji/buildinfo?buildID=2458949

* Thu Apr 18 2024 Fedora Kernel Team <kernel-team@fedoraproject.org> [6.9.0-0.rc4.8cd26fd90c1a.40]
- Turn on UBSAN for Fedora (Justin M. Forbes)
- Linux v6.9.0-0.rc4.8cd26fd90c1a

So would be 6.8.12 testing:
https://koji.fedoraproject.org/koji/buildinfo?buildID=2458998

No trace in the changelogs the patch was ever applied.

Can’t believe they finally fixed this in 6.10! Thanks nevertheless. Three months later…


* Tue May 14 2024 Fedora Kernel Team <kernel-team@fedoraproject.org> [6.10.0-0.rc0.a5131c3fdf26.2]
- Reset RHEL_RELEASE to 0 for 6.10 (Justin M. Forbes)
(...)
- Turn off some Fedora UBSAN options to avoid false positives (Justin M. Forbes)

And of course, 6.10 and 6.11 came with this new bug still plaguing the radeon 7. Back to 6.8.7