Randomly Freezing after Waking up from Sleep

The problem only seems to occur after I remove my external HDMI display and put my laptop to sleep (close the lid). I’d say it happens about ~30% of the time. When it does:

  1. The screen is completely black
  2. I can’t switch to tty
  3. There are no logs in journalctl after the initial call to suspend. This is my main problem as it’s making it impossible for me to debug.

This has been happening for about a month now. As of now I’ve tried:

  1. Manually disconnecting the external monitor before closing the lid (i.e. xrandr --output $HDMI_MONITOR --off).
  2. Switching from sddm to gdm (per a suggestion).
  3. Disabling sleep/suspend.

So far nothing has worked.

Below is the output of inxi -Fzxx. Of note is that I’m running F37 with a Nvidia GPU and an AMD Ryzen CPU. My DE is KDE Plasma. All software is up to date.

I’d greatly appreciate any help!

System:
  Kernel: 6.1.14-200.fc37.x86_64 arch: x86_64 bits: 64 compiler: gcc
    v: 2.38-25.fc37 Console: pty pts/1 wm: kwin_x11 DM: SDDM Distro: Fedora
    release 37 (Thirty Seven)
Machine:
  Type: Laptop System: Acer product: Nitro AN515-44 v: V1.01 serial: <filter>
  Mobo: RO model: Stonic_RNS v: V1.01 serial: <filter> UEFI: Insyde v: 1.01
    date: 04/16/2020
Battery:
  ID-1: BAT1 charge: 41.1 Wh (100.0%) condition: 41.1/57.5 Wh (71.4%)
    volts: 16.6 min: 15.4 model: LGC AP18E8M serial: <filter>
    status: not charging
CPU:
  Info: 6-core model: AMD Ryzen 5 4600H with Radeon Graphics bits: 64
    type: MT MCP arch: Zen 2 rev: 1 cache: L1: 384 KiB L2: 3 MiB L3: 8 MiB
  Speed (MHz): avg: 1533 high: 3000 min/max: 1400/3000 boost: enabled cores:
    1: 1400 2: 1400 3: 1400 4: 3000 5: 1400 6: 1400 7: 1400 8: 1400 9: 1400
    10: 1400 11: 1400 12: 1400 bogomips: 71863
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Graphics:
  Device-1: NVIDIA TU117M vendor: Acer Incorporated ALI driver: nvidia
    v: 525.89.02 arch: Turing pcie: speed: 5 GT/s lanes: 8 ports: active: none
    off: HDMI-A-1 empty: none bus-ID: 01:00.0 chip-ID: 10de:1f99
  Device-2: AMD Renoir vendor: Acer Incorporated ALI driver: amdgpu
    v: kernel arch: GCN-5 pcie: speed: 16 GT/s lanes: 16 ports: active: none
    off: eDP-1 empty: none bus-ID: 06:00.0 chip-ID: 1002:1636 temp: 36.0 C
  Device-3: Chicony HD User Facing type: USB driver: uvcvideo bus-ID: 3-3:3
    chip-ID: 04f2:b64f
  Display: x11 server: X.Org v: 1.20.14 with: Xwayland v: 22.1.8
    compositor: kwin_x11 driver: X: loaded: amdgpu,nvidia
    unloaded: fbdev,modesetting,nouveau,vesa alternate: nv dri: radeonsi
    gpu: amdgpu,nvidia,nvidia-nvswitch display-ID: :0 screens: 1
  Screen-1: 0 s-res: 1920x1080 s-dpi: 92
  Monitor-1: HDMI-A-1 mapped: HDMI-0 note: disabled model: WH22FX9019
    res: 1920x1080 dpi: 102 diag: 546mm (21.5")
  Monitor-2: eDP-1 mapped: eDP-1-0 note: disabled pos: primary
    model: BOE Display 0x0818 res: 1920x1080 dpi: 142 diag: 395mm (15.5")
  API: OpenGL v: 4.6.0 NVIDIA 525.89.02 renderer: NVIDIA GeForce GTX
    1650/PCIe/SSE2 direct-render: Yes
Audio:
  Device-1: NVIDIA vendor: Acer Incorporated ALI driver: snd_hda_intel
    v: kernel pcie: speed: 8 GT/s lanes: 8 bus-ID: 01:00.1 chip-ID: 10de:10fa
  Device-2: AMD ACP/ACP3X/ACP6x Audio Coprocessor
    vendor: Acer Incorporated ALI driver: N/A pcie: speed: 16 GT/s lanes: 16
    bus-ID: 06:00.5 chip-ID: 1022:15e2
  Device-3: AMD Family 17h/19h HD Audio vendor: Acer Incorporated ALI
    driver: snd_hda_intel v: kernel pcie: speed: 16 GT/s lanes: 16
    bus-ID: 06:00.6 chip-ID: 1022:15e3
  Sound API: ALSA v: k6.1.14-200.fc37.x86_64 running: yes
  Sound Server-1: PulseAudio v: 16.1 running: no
  Sound Server-2: PipeWire v: 0.3.66 running: yes
Network:
  Device-1: Realtek vendor: Acer Incorporated ALI driver: r8169 v: kernel
    pcie: speed: 2.5 GT/s lanes: 1 port: 2000 bus-ID: 04:00.0 chip-ID: 10ec:2600
  IF: enp4s0 state: down mac: <filter>
  Device-2: Intel Wi-Fi 6 AX200 driver: iwlwifi v: kernel pcie:
    speed: 5 GT/s lanes: 1 bus-ID: 05:00.0 chip-ID: 8086:2723
  IF: wlp5s0 state: up mac: <filter>
Bluetooth:
  Device-1: Intel AX200 Bluetooth type: USB driver: btusb v: 0.8 bus-ID: 1-4:3
    chip-ID: 8087:0029
  Report: rfkill ID: hci0 rfk-id: 5 state: down bt-service: disabled
    rfk-block: hardware: no software: yes address: see --recommends
Drives:
  Local Storage: total: 942.71 GiB used: 44.58 GiB (4.7%)
  ID-1: /dev/nvme0n1 vendor: Western Digital model: PC SN530
    SDBPNPZ-256G-1014 size: 238.47 GiB speed: 31.6 Gb/s lanes: 4
    serial: <filter> temp: 35.9 C
  ID-2: /dev/nvme1n1 vendor: Sabrent model: N/A size: 238.47 GiB
    speed: 31.6 Gb/s lanes: 4 serial: <filter> temp: 24.9 C
  ID-3: /dev/sda vendor: Crucial model: CT500MX500SSD1 size: 465.76 GiB
    speed: 6.0 Gb/s serial: <filter> temp: 24 C
Partition:
  ID-1: / size: 466.4 GiB used: 44.24 GiB (9.5%) fs: ext4 dev: /dev/dm-1
    mapped: luks-b909cc99-309f-49c2-b607-6b588c31f47c
  ID-2: /boot size: 973.4 MiB used: 327.9 MiB (33.7%) fs: ext4
    dev: /dev/nvme0n1p3
  ID-3: /boot/efi size: 1022 MiB used: 17.4 MiB (1.7%) fs: vfat
    dev: /dev/nvme0n1p1
Swap:
  ID-1: swap-1 type: zram size: 8 GiB used: 0 KiB (0.0%) priority: 100
    dev: /dev/zram0
Sensors:
  System Temperatures: cpu: 45.0 C mobo: N/A
  Fan Speeds (RPM): N/A
  GPU: device: nvidia screen: :0.0 temp: 38 C device: amdgpu temp: 36.0 C
Info:
  Processes: 390 Uptime: 10h 48m Memory: 15 GiB used: 5.97 GiB (39.8%)
  Init: systemd v: 251 target: graphical (5) default: graphical Compilers:
  gcc: 12.2.1 clang: 15.0.7 Packages: pm: rpm pkgs: N/A note: see --rpm
  pm: flatpak pkgs: 11 Shell: Bash v: 5.2.15 running-in: konsole inxi: 3.3.25

What is the output of $ systemctl status nvidia-{hibernate,resume,suspend}.service?
Which GPU is normally used?

Lately, when you use nvidia drivers, this problem pops out with alarming frequency. Check this forum.
Boot with a live distribution and try to reproduce the problem without nvidia driver but only nouveau, then using only amd gpu. Then report back.

Here’s the output

○ nvidia-hibernate.service - NVIDIA system hibernate actions
     Loaded: loaded (/usr/lib/systemd/system/nvidia-hibernate.service; enabled; preset: enabled)
     Active: inactive (dead)

○ nvidia-resume.service - NVIDIA system resume actions
     Loaded: loaded (/usr/lib/systemd/system/nvidia-resume.service; enabled; preset: enabled)
○ nvidia-hibernate.service - NVIDIA system hibernate actions
     Loaded: loaded (/usr/lib/systemd/system/nvidia-hibernate.service; enabled; preset: enabled)
     Active: inactive (dead)

○ nvidia-resume.service - NVIDIA system resume actions
     Loaded: loaded (/usr/lib/systemd/system/nvidia-resume.service; enabled; preset: enabled)
     Active: inactive (dead)

Mar 10 16:45:30 fedora systemd[1]: Starting nvidia-resume.service - NVIDIA system resume actions...
Mar 10 16:45:30 fedora suspend[16325]: nvidia-resume.service
Mar 10 16:45:30 fedora logger[16325]: <13>Mar 10 16:45:30 suspend: nvidia-resume.service
Mar 10 16:45:30 fedora systemd[1]: nvidia-resume.service: Deactivated successfully.
Mar 10 16:45:30 fedora systemd[1]: Finished nvidia-resume.service - NVIDIA system resume actions.
Mar 10 16:45:44 fedora systemd[1]: Starting nvidia-resume.service - NVIDIA system resume actions...
Mar 10 16:45:44 fedora suspend[16653]: nvidia-resume.service
Mar 10 16:45:44 fedora logger[16653]: <13>Mar 10 16:45:44 suspend: nvidia-resume.service
Mar 10 16:45:44 fedora systemd[1]: nvidia-resume.service: Deactivated successfully.
Mar 10 16:45:44 fedora systemd[1]: Finished nvidia-resume.service - NVIDIA system resume actions.

○ nvidia-suspend.service - NVIDIA system suspend actions
     Loaded: loaded (/usr/lib/systemd/system/nvidia-suspend.service; enabled; preset: enabled)
     Active: inactive (dead)

Mar 10 15:42:05 fedora logger[15887]: <13>Mar 10 15:42:05 suspend: nvidia-suspend.service
Mar 10 15:42:07 fedora systemd[1]: nvidia-suspend.service: Deactivated successfully.
Mar 10 15:42:07 fedora systemd[1]: Finished nvidia-suspend.service - NVIDIA system suspend actions.
Mar 10 15:42:07 fedora systemd[1]: nvidia-suspend.service: Consumed 1.902s CPU time.

Also only the nvidia gpu is used.

If I recall correctly, the nvidia-suspend (& nvidia-hiberhate) services are to properly store the GPU ram content at suspend time and the nvidia-resume.service is to restore that data directly to the GPU so things resume exactly where they left off.

Also, the nvidia-powerd.service is to power-down and back up the (mobile) GPU at suspend or hibernate & resume time.

It appears those services are working properly, but something else is causing the system to freeze after the sleep.

Thanks for the suggestion. Let me try with nouveau.

There is no need to check using the amd gpu since the problem only occurs after disconnecting an external monitor and going to sleep. The HDMI connects directly to my nvidia card so I can’t use an external monitor with my amd gpu.

ISTR that when suspending or resuming the kernel command line needs a parameter that begins with resume=.......
I don’t have a machine that I either suspend or resume, but surely someone will be able to assist with the exact parameter needed. It seems that missing parameter could be the cause of your errors.

This link may aid as well.
https://wiki.archlinux.org/title/Power_management/Suspend_and_hibernate

Forgot to check to arch wiki :scream:. Thanks for the reminder!

They linked to a good blogpost that discusses best practices when debugging suspend/hibernate issues.

I’ve implemented some of the debugging methods. This should hopefully give me a better idea of what’s going on next time this occurs.

Will report back when I have something.

**Update**

It took a few days but it finally happened again.

Per this article on debugging suspend/hibernate issues I added the following boot options to the kernel cmdline: ignore_loglevel, no_console_suspend, and initcall_debug.

Next time it failed to wake from suspend I found the following logs in my journal (Note: I omitted a giant core dump):

1. Mar 17 14:13:03 fedora konsole[3307]: kf.xmlgui: Shortcut for action  "" "Show Quick Commands" set with QAction::setShortcut()! Use KActionCollection::setDefaultShortcut(s) instead.
2. Mar 17 14:13:03 fedora konsole[3307]: kf.xmlgui: Shortcut for action  "" "Show SSH Manager" set with QAction::setShortcut()! Use KActionCollection::setDefaultShortcut(s) instead.
3. Mar 17 14:24:21 fedora kernel: nvidia 0000:01:00.0: PM: pci_pm_suspend(): nv_pmops_suspend+0x0/0x20 [nvidia] returns -5
4. Mar 17 14:24:21 fedora kernel: nvidia 0000:01:00.0: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x170 returns -5
5. Mar 17 14:24:21 fedora kernel: nvidia 0000:01:00.0: PM: failed to suspend async: error -5
6. Mar 17 14:24:21 fedora kernel: PM: Some devices failed to suspend, or early wake event detected
7. Mar 17 14:24:21 fedora kernel: amdgpu 0000:06:00.0: amdgpu: Secure display: Generic Failure.
8. Mar 17 14:24:21 fedora kernel: amdgpu 0000:06:00.0: amdgpu: SECUREDISPLAY: query securedisplay TA failed. ret 0x0
9. Mar 17 14:24:21 fedora kernel: amdgpu 0000:06:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx (-110).
10. Mar 17 14:24:21 fedora kernel: [drm:process_one_work] *ERROR* ib ring test failed (-110).

The line numbers are mine. Of interest if line 5, nvidia 0000:01:00.0: PM: failed to suspend async: error -5. So it seems to indicate that the problem may lie with Nvidia.

I’m going to keep digging into this. I’m open to any thoughts/suggestions or help!

Update

For those interested, the only way I was able to solve the problem was by reinstalling my OS.