The problem only seems to occur after I remove my external HDMI display and put my laptop to sleep (close the lid). I’d say it happens about ~30% of the time. When it does:
The screen is completely black
I can’t switch to tty
There are no logs in journalctl after the initial call to suspend. This is my main problem as it’s making it impossible for me to debug.
This has been happening for about a month now. As of now I’ve tried:
Manually disconnecting the external monitor before closing the lid (i.e. xrandr --output $HDMI_MONITOR --off).
Switching from sddm to gdm (per a suggestion).
Disabling sleep/suspend.
So far nothing has worked.
Below is the output of inxi -Fzxx. Of note is that I’m running F37 with a Nvidia GPU and an AMD Ryzen CPU. My DE is KDE Plasma. All software is up to date.
Lately, when you use nvidia drivers, this problem pops out with alarming frequency. Check this forum.
Boot with a live distribution and try to reproduce the problem without nvidia driver but only nouveau, then using only amd gpu. Then report back.
If I recall correctly, the nvidia-suspend (& nvidia-hiberhate) services are to properly store the GPU ram content at suspend time and the nvidia-resume.service is to restore that data directly to the GPU so things resume exactly where they left off.
Also, the nvidia-powerd.service is to power-down and back up the (mobile) GPU at suspend or hibernate & resume time.
It appears those services are working properly, but something else is causing the system to freeze after the sleep.
Thanks for the suggestion. Let me try with nouveau.
There is no need to check using the amd gpu since the problem only occurs after disconnecting an external monitor and going to sleep. The HDMI connects directly to my nvidia card so I can’t use an external monitor with my amd gpu.
ISTR that when suspending or resuming the kernel command line needs a parameter that begins with resume=.......
I don’t have a machine that I either suspend or resume, but surely someone will be able to assist with the exact parameter needed. It seems that missing parameter could be the cause of your errors.
Per this article on debugging suspend/hibernate issues I added the following boot options to the kernel cmdline: ignore_loglevel, no_console_suspend, and initcall_debug.
Next time it failed to wake from suspend I found the following logs in my journal (Note: I omitted a giant core dump):
1. Mar 17 14:13:03 fedora konsole[3307]: kf.xmlgui: Shortcut for action "" "Show Quick Commands" set with QAction::setShortcut()! Use KActionCollection::setDefaultShortcut(s) instead.
2. Mar 17 14:13:03 fedora konsole[3307]: kf.xmlgui: Shortcut for action "" "Show SSH Manager" set with QAction::setShortcut()! Use KActionCollection::setDefaultShortcut(s) instead.
3. Mar 17 14:24:21 fedora kernel: nvidia 0000:01:00.0: PM: pci_pm_suspend(): nv_pmops_suspend+0x0/0x20 [nvidia] returns -5
4. Mar 17 14:24:21 fedora kernel: nvidia 0000:01:00.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x170 returns -5
5. Mar 17 14:24:21 fedora kernel: nvidia 0000:01:00.0: PM: failed to suspend async: error -5
6. Mar 17 14:24:21 fedora kernel: PM: Some devices failed to suspend, or early wake event detected
7. Mar 17 14:24:21 fedora kernel: amdgpu 0000:06:00.0: amdgpu: Secure display: Generic Failure.
8. Mar 17 14:24:21 fedora kernel: amdgpu 0000:06:00.0: amdgpu: SECUREDISPLAY: query securedisplay TA failed. ret 0x0
9. Mar 17 14:24:21 fedora kernel: amdgpu 0000:06:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx (-110).
10. Mar 17 14:24:21 fedora kernel: [drm:process_one_work] *ERROR* ib ring test failed (-110).
The line numbers are mine. Of interest if line 5, nvidia 0000:01:00.0: PM: failed to suspend async: error -5. So it seems to indicate that the problem may lie with Nvidia.
I’m going to keep digging into this. I’m open to any thoughts/suggestions or help!