Suspend leads to irresponsive computer, amdgpu

Hello, I decided to ask for help with this.

Since I built this PC in January 2024, I was unable to suspend. Now I got irritated enough to look into why.

I’m on Fedora Workstation 40, Gnome, Wayland, kernel 6.10.11.

Motherboard: MSI PRO B650-S WIFI
CPU: AMD Ryzen 5 7600X
GPU: ASROCK Radeon RX 7600

When I hit suspend, the screen goes dark, mouse and keyboard go irresponsive, but PC stays on with all fans running and it stays like that indefinitely, that is until I hard-restart. After that, there are these logs:

sudo journalctl -p err -b -1 -r -k:

říj 05 19:24:27 fedora kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
říj 05 19:24:27 fedora kernel: amdgpu 0000:03:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
říj 05 19:24:24 fedora kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
říj 05 19:24:24 fedora kernel: amdgpu 0000:03:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
říj 05 19:24:20 fedora kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
říj 05 19:24:20 fedora kernel: amdgpu 0000:03:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
říj 05 19:24:17 fedora kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
říj 05 19:24:17 fedora kernel: amdgpu 0000:03:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
říj 05 19:24:13 fedora kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=4390, emitted seq=4393
říj 05 19:24:09 fedora kernel: Bluetooth: hci0: Failed to set up firmware (-110)
říj 05 19:24:09 fedora kernel: Bluetooth: hci0: Failed to send wmt patch dwnld (-110)
říj 05 19:24:09 fedora kernel: Bluetooth: hci0: Execution of wmt command timed out
říj 05 19:24:03 fedora kernel: amdgpu 0000:03:00.0: PM: failed to resume async: error -62
říj 05 19:24:03 fedora kernel: amdgpu 0000:03:00.0: PM: dpm_run_callback(): pci_pm_resume returns -62
říj 05 19:24:03 fedora kernel: amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
říj 05 19:24:03 fedora kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
říj 05 19:24:03 fedora kernel: amdgpu 0000:03:00.0: amdgpu: Failed to setup smc hw!
říj 05 19:24:03 fedora kernel: amdgpu 0000:03:00.0: amdgpu: Failed to enable requested dpm features!
říj 05 19:24:03 fedora kernel: amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000006 SMN_C2PMSG_82:0x00000000
říj 05 19:23:56 fedora kernel: Non-boot CPUs are not disabled
říj 05 19:22:48 fedora kernel: hub 8-0:1.0: config failed, hub doesn't have any ports! (err -19)

Searching for this problem, I found a somewhat similar thread here, but that was apparently solved with kernel 6.7.7 :frowning_face: .

fastfetch:

OS: Fedora Linux 40 (Workstation Edition) x86_64
Host: MS-7E26 (1.0)
Kernel: Linux 6.10.11-200.fc40.x86_64
Uptime: 15 mins
Packages: 2572 (rpm), 15 (flatpak)
Shell: bash 5.2.26
Display (Q32G1WG4): 2560x1440 @ 60 Hz in 32″ [External]
DE: GNOME 46.5
WM: Mutter (Wayland)
WM Theme: Adwaita
Theme: Adwaita [GTK2/3/4]
Icons: Adwaita [GTK2/3/4]
Font: Cantarell (11pt) [GTK2/3/4]
Cursor: Adwaita (24px)
Terminal: terminator 3.12.6
Terminal Font: Mono (10pt)
CPU: AMD Ryzen 5 7600X (12) @ 5.45 GHz
GPU: AMD Radeon RX 7600 [Discrete]
Memory: 5.10 GiB / 62.48 GiB (8%)
Swap: 0 B / 8.00 GiB (0%)
Disk (/): 144.36 GiB / 464.17 GiB (31%) - btrfs
Disk (/mnt/backup-btrfs): 219.45 GiB / 465.76 GiB (47%) - btrfs
Disk (/mnt/backup-ext4): 48.00 KiB / 457.37 GiB (0%) - ext4
Disk (/mnt/data): 647.75 GiB / 792.79 GiB (82%) - ext4
Disk (/mnt/work): 28.37 GiB / 122.46 GiB (23%) - ext4
Local IP (wlp15s0): 192.168.68.117/24
Locale: en_US.UTF-8

What options do I have for debugging this? I would appreciate any ideas :slightly_smiling_face: .

There is a good chance that this is related to amdgpu regressions in the last few kernels. Do you still have kernel 6.10.9 installed on your system? It’s possible that this issue will not exhibit on that kernel version.

I do have 6.10.9, switched to it, but it made no difference in the symptoms. It did make a slight difference in the logs, which I attach below.

It makes sense, though, given that I’ve had these issues since January, so I did not really expect a slighly older kernel to fix it :frowning_with_open_mouth: .

sudo journalctl -p err -b -1 -r -k:

říj 05 20:15:32 fedora kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -62
říj 05 20:15:32 fedora kernel: snd_hda_intel 0000:03:00.1: CORB reset timeout#2, CORBRP = 65535
říj 05 20:15:32 fedora kernel: amdgpu 0000:03:00.0: amdgpu: ASIC reset failed with error, -62 for drm dev, 0000:03:00.0
říj 05 20:15:32 fedora kernel: amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset failed
říj 05 20:15:32 fedora kernel: amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000006 SMN_C2PMSG_82:0x00000000
říj 05 20:15:29 fedora kernel: [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <psp> failed -22
říj 05 20:15:29 fedora kernel: amdgpu 0000:03:00.0: amdgpu: Failed to terminate tmr
říj 05 20:15:27 fedora kernel: [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -22
říj 05 20:15:27 fedora kernel: amdgpu 0000:03:00.0: amdgpu: Fail to disable thermal alert!
říj 05 20:15:27 fedora kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
říj 05 20:15:27 fedora kernel: amdgpu 0000:03:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
říj 05 20:15:23 fedora kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
říj 05 20:15:23 fedora kernel: amdgpu 0000:03:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
říj 05 20:15:22 fedora kernel:         #5: 100% system,          0% softirq,          0% hardirq,          0% idle
říj 05 20:15:22 fedora kernel:         #4: 100% system,          0% softirq,          1% hardirq,          0% idle
říj 05 20:15:22 fedora kernel:         #3: 101% system,          0% softirq,          0% hardirq,          0% idle
říj 05 20:15:22 fedora kernel:         #2: 100% system,          0% softirq,          0% hardirq,          0% idle
říj 05 20:15:22 fedora kernel:         #1: 100% system,          0% softirq,          0% hardirq,          0% idle
říj 05 20:15:22 fedora kernel: CPU#8 Utilization every 4s during lockup:
říj 05 20:15:22 fedora kernel: watchdog: BUG: soft lockup - CPU#8 stuck for 22s! [kworker/u48:99:4818]
říj 05 20:15:20 fedora kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
říj 05 20:15:20 fedora kernel: amdgpu 0000:03:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
říj 05 20:15:16 fedora kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
říj 05 20:15:16 fedora kernel: amdgpu 0000:03:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
říj 05 20:15:13 fedora kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
říj 05 20:15:13 fedora kernel: amdgpu 0000:03:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
říj 05 20:15:09 fedora kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
říj 05 20:15:09 fedora kernel: amdgpu 0000:03:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
říj 05 20:15:06 fedora kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
říj 05 20:15:06 fedora kernel: amdgpu 0000:03:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
říj 05 20:15:03 fedora kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
říj 05 20:15:03 fedora kernel: amdgpu 0000:03:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
říj 05 20:14:59 fedora kernel: [drm:amdgpu_mes_unmap_legacy_queue [amdgpu]] *ERROR* failed to unmap legacy queue
říj 05 20:14:59 fedora kernel: amdgpu 0000:03:00.0: amdgpu: MES failed to respond to msg=REMOVE_QUEUE
říj 05 20:14:56 fedora kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=4494, emitted seq=4497
říj 05 20:14:45 fedora kernel: amdgpu 0000:03:00.0: PM: failed to resume async: error -62
říj 05 20:14:45 fedora kernel: amdgpu 0000:03:00.0: PM: dpm_run_callback(): pci_pm_resume returns -62
říj 05 20:14:45 fedora kernel: amdgpu 0000:03:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
říj 05 20:14:45 fedora kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
říj 05 20:14:45 fedora kernel: amdgpu 0000:03:00.0: amdgpu: Failed to setup smc hw!
říj 05 20:14:45 fedora kernel: amdgpu 0000:03:00.0: amdgpu: Failed to enable requested dpm features!
říj 05 20:14:45 fedora kernel: amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x00000006 SMN_C2PMSG_82:0x00000000
říj 05 20:14:45 fedora kernel: Bluetooth: hci0: Failed to send wmt func ctrl (-113)
říj 05 20:14:45 fedora kernel: Bluetooth: hci0: urb 00000000bb3292c6 submission failed (113)
říj 05 20:14:45 fedora kernel: Bluetooth: hci0: Failed to write uhw reg(-113)
říj 05 20:14:39 fedora kernel: Non-boot CPUs are not disabled
říj 05 20:14:11 fedora kernel: Bluetooth: hci0: Failed to set up firmware (-5)
říj 05 20:14:11 fedora kernel: Bluetooth: hci0: Failed to send wmt patch dwnld (-5)
říj 05 20:14:11 fedora kernel: Bluetooth: hci0: Wrong op received 6 expected 1
říj 05 20:14:07 fedora kernel: hub 8-0:1.0: config failed, hub doesn't have any ports! (err -19)

That’s too bad David, you may be able to find an existing bug report below or you could file a new one:

1 Like