Errors during suspend with nvidia randomly

My system suspends properly most of the time but sometimes(randomly) it fails with this logs on black screen

(timeline upwords)

8:33:02 PM kernel: PM: suspend exit
 8:33:02 PM kernel: random: crng reseeded on system resumption
 8:33:02 PM kernel: Restarting tasks ... done.
 8:33:02 PM kernel: OOM killer enabled.
 8:33:02 PM kernel: PM: resume devices took 0.918 seconds
 8:33:02 PM kernel: i915 0000:00:02.0: [drm] GT0: GUC: RC enabled
 8:33:02 PM kernel: PM: Some devices failed to suspend, or early wake event detected
 8:33:02 PM kernel: nvidia 0000:01:00.0: PM: failed to suspend async: error -5
 8:33:02 PM kernel: nvidia 0000:01:00.0: PM: failed to suspend async: error -5
 8:33:02 PM kernel: nvidia 0000:01:00.0: PM: dpm_run_callback(): pci_pm_suspend returns -5
 8:33:02 PM kernel: nvidia 0000:01:00.0: PM: pci_pm_suspend(): nv_pmops_suspend [nvidia] returns -5
 8:33:02 PM kernel: NVRM: GPU 0000:01:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted without driver procfs suspend interface. Please refer to the 'Configuring Power Management Support' section in the driver README.
 8:33:02 PM kernel: queueing ieee80211 work while going to suspend
 8:33:02 PM kernel: printk: Suspending console(s) (use no_console_suspend to debug)
 8:33:02 PM kernel: Freezing remaining freezable tasks completed (elapsed 0.003 seconds)
 8:33:02 PM kernel: OOM killer disabled.
 8:33:02 PM kernel: Freezing user space processes completed (elapsed 0.002 seconds)
 8:33:00 PM kernel: Filesystems sync: 0.004 seconds
 8:33:00 PM kernel: PM: suspend entry (s2idle)
 8:33:00 PM kernel: random: crng reseeded on system resumption
 8:33:00 PM kernel: Restarting tasks ... done.
 8:33:00 PM kernel: OOM killer enabled.
 8:33:00 PM kernel: PM: resume devices took 0.903 seconds
 8:33:00 PM kernel: i915 0000:00:02.0: [drm] GT0: GUC: RC enabled
 8:33:00 PM kernel: PM: Some devices failed to suspend, or early wake event detected
 8:33:00 PM kernel: nvidia 0000:01:00.0: PM: failed to suspend async: error -5
 8:33:00 PM kernel: NVRM: GPU 0000:01:00.0: PreserveVideoMemoryAllocations module parameter is set. System Power Management attempted without driver procfs suspend interface. Please refer to the 'Configuring Power Management Support' section in the driver README.
 8:33:00 PM kernel: queueing ieee80211 work while going to suspend
 8:33:00 PM kernel: printk: Suspending console(s) (use no_console_suspend to debug)
 8:33:00 PM kernel: Freezing remaining freezable tasks completed (elapsed 0.001 seconds)
 8:33:00 PM kernel: OOM killer disabled.
 8:33:00 PM kernel: Freezing user space processes completed (elapsed 0.003 seconds)
 8:32:59 PM kernel: Filesystems sync: 0.015 seconds
 8:32:59 PM kernel: PM: suspend entry (s2idle)
 8:32:59 PM kernel: note: nvidia-sleep.sh[100691] exited with irqs disabled
 8:32:59 PM kernel: PKRU: 55555554
 8:32:59 PM kernel: CR2: ffffb056c0571010 CR3: 0000000168530000 CR4: 0000000000f52ef0
 8:32:59 PM kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 8:32:59 PM kernel: FS:  00007f6d24508740(0000) GS:ffff95283b280000(0000) knlGS:0000000000000000
 8:32:59 PM kernel: R13: ffff9526b2b930d8 R14: ffffb056f1bf3800 R15: ffff9526b2b93018
 8:32:59 PM kernel: R10: 0000000000080000 R11: 0000000000000246 R12: 0000000000000004
 8:32:59 PM kernel: RBP: ffffb056f1bf39f0 R08: 0000000000000000 R09: ffffb056f1bf37fc
 8:32:59 PM kernel: RDX: 0000000000000000 RSI: ffff952539098d48 RDI: ffffb056c0571008
 8:32:59 PM kernel: RAX: 0000000000000000 RBX: ffff9526b2b93238 RCX: 0000000000000006
 8:32:59 PM kernel: RSP: 0018:ffffb056f1bf37f0 EFLAGS: 00010246
 8:32:59 PM kernel: Code: 00 48 8d b5 0c fe ff ff 48 89 df e8 8c c2 05 00 48 89 c2 48 85 c0 75 cc 41 83 c4 01 48 83 c3 10 41 83 fc 04 75 ad 49 8b 7d 00 <48> 8b 47 08 8b 90 70 02 00 00 85 d2 74 45 8d 4a ff 4c 89 f0 48 8d
 8:32:59 PM kernel: RIP: 0010:_nv000117kms+0xbe/0x140 [nvidia_modeset]
 8:32:59 PM kernel: ---[ end trace 0000000000000000 ]---
 8:32:59 PM kernel: CR2: ffffb056c0571010
 8:32:59 PM kernel:  acpi_thermal_rel joydev int340x_thermal_zone loop nfnetlink zram lz4hc_compress lz4_compress dm_crypt xe drm_ttm_helper gpu_sched drm_suballoc_helper drm_gpuvm drm_exec rtsx_usb_sdmmc mmc_core rtsx_usb i915 nvme nvme_core nvme_auth i2c_algo_bit drm_buddy ttm drm_display_helper crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic hid_multitouch ucsi_acpi ghash_clmulni_intel sha512_ssse3 typec_ucsi sha256_ssse3 sha1_ssse3 cec typec vmd i2c_hid_acpi i2c_hid video wmi pinctrl_tigerlake serio_raw fuse
 8:32:59 PM kernel: Modules linked in: uinput rfcomm snd_seq_dummy snd_hrtimer nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nvidia_drm(POE) nvidia_modeset(POE) qrtr nvidia_uvm(POE) bnep sunrpc binfmt_misc nvidia(POE) rtsx_usb_ms btusb memstick uvcvideo btrtl btintel btbcm uvc videobuf2_vmalloc btmtk videobuf2_memops videobuf2_v4l2 bluetooth videobuf2_common videodev mc vfat fat snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel soundwire_cadence snd_sof_intel_hda_common snd_soc_hdac_hda snd_sof_intel_hda_mlink snd_sof_intel_hda snd_sof_pci mt7921e snd_sof_xtensa_dsp mt7921_common snd_sof mt792x_lib mt76_connac_lib snd_sof_utils mt76 snd_soc_acpi_intel_match intel_uncore_frequency intel_uncore_frequency_common soundwire_generic_allocation intel_tcc_cooling x86_pkg_temp_thermal snd_soc_acpi
 8:32:59 PM kernel:  </TASK>

it says
System Power Management attempted without driver procfs suspend interface.

but I can confirm that nvidia-suspend, nvidia-resume …etc services are all enabled.

this errors only occurs randomly, most times the system suspends properly, and when it happens the only thing I can do is to force poweroff the system.

also sometimes(I’g related to the same issue) system just hangs for a long time with blank screen, when I click on switch user option to move to a different user. sometime it comes back and other time I just force poweroff the system and restart again.

I am using nvidia rtx3050 laptop gpu

❯ neofetch
             .',;::::;,'.                saikamal@fedora 
         .';:cccccccccccc:;,.            --------------- 
      .;cccccccccccccccccccccc;.         OS: Fedora Linux 41 (Workstation Edition) x86_64 
    .:cccccccccccccccccccccccccc:.       Host: Vivobook_ASUSLaptop K6500ZC_K6500ZC 1.0 
  .;ccccccccccccc;.:dddl:.;ccccccc;.     Kernel: 6.12.9-200.fc41.x86_64 
 .:ccccccccccccc;OWMKOOXMWd;ccccccc:.    Uptime: 1 hour, 12 mins 
.:ccccccccccccc;KMMc;cc;xMMc:ccccccc:.   Packages: 2994 (rpm), 48 (flatpak) 
,cccccccccccccc;MMM.;cc;;WW::cccccccc,   Shell: bash 5.2.32 
:cccccccccccccc;MMM.;cccccccccccccccc:   Resolution: 1920x1080 
:ccccccc;oxOOOo;MMM0OOk.;cccccccccccc:   DE: GNOME 47.3 
cccccc:0MMKxdd:;MMMkddc.;cccccccccccc;   WM: Mutter 
ccccc:XM0';cccc;MMM.;cccccccccccccccc'   WM Theme: Adwaita 
ccccc;MMo;ccccc;MMW.;ccccccccccccccc;    Theme: Adwaita [GTK2/3] 
ccccc;0MNc.ccc.xMMd:ccccccccccccccc;     Icons: Adwaita [GTK2/3] 
cccccc;dNMWXXXWM0::cccccccccccccc:,      Terminal: gnome-terminal 
cccccccc;.:odl:.;cccccccccccccc:,.       CPU: 12th Gen Intel i5-12450H (12) @ 4.400GHz 
:cccccccccccccccccccccccccccc:'.         GPU: NVIDIA GeForce RTX 3050 Mobile 
.:cccccccccccccccccccccc:;,..            GPU: Intel Alder Lake-P GT1 [UHD Graphics] 
  '::cccccccccccccc::;,.                 Memory: 5998MiB / 15671MiB 

                                                                 
// lspci gpu part                                                         
0000:01:00.0 3D controller: NVIDIA Corporation GA107M [GeForce RTX 3050 Mobile] (rev a1)

Same problem here.

❯ fastfetch
             .',;::::;,'.                 mhagnumdw@dwnote2
         .';:cccccccccccc:;,.             -----------------
      .;cccccccccccccccccccccc;.          OS: Fedora Linux 41 (Workstation Edition) x86_64
    .:cccccccccccccccccccccccccc:.        Host: 960XFH (P07ALQ)
  .;ccccccccccccc;.:dddl:.;ccccccc;.      Kernel: Linux 6.12.11-200.fc41.x86_64
 .:ccccccccccccc;OWMKOOXMWd;ccccccc:.     Uptime: 12 hours, 1 min
.:ccccccccccccc;KMMc;cc;xMMc;ccccccc:.    Packages: 2651 (rpm), 26 (flatpak)
,cccccccccccccc;MMM.;cc;;WW:;cccccccc,    Shell: zsh 5.9
:cccccccccccccc;MMM.;cccccccccccccccc:    Display (SDC4185): 2880x1800 @ 120 Hz (as 1648x1030) in 16" [Built-in]
:ccccccc;oxOOOo;MMM000k.;cccccccccccc:    DE: GNOME 47.3
cccccc;0MMKxdd:;MMMkddc.;cccccccccccc;    WM: Mutter (Wayland)
ccccc;XMO';cccc;MMM.;cccccccccccccccc'    WM Theme: Adwaita
ccccc;MMo;ccccc;MMW.;ccccccccccccccc;     Theme: Adwaita [GTK2/3/4]
ccccc;0MNc.ccc.xMMd;ccccccccccccccc;      Icons: Adwaita [GTK2/3/4]
cccccc;dNMWXXXWM0:;cccccccccccccc:,       Font: Cantarell (11pt) [GTK2/3/4]
cccccccc;.:odl:.;cccccccccccccc:,.        Cursor: Adwaita (24px)
ccccccccccccccccccccccccccccc:'.          Terminal: tilix 1.9.6
:ccccccccccccccccccccccc:;,..             Terminal Font: MesloLGS NF (10pt)
 ':cccccccccccccccc::;,.                  CPU: 13th Gen Intel(R) Core(TM) i7-13700H (20) @ 5.00 GHz
                                          GPU 1: NVIDIA GeForce RTX 4050 Max-Q / Mobile [Discrete]
                                          GPU 2: Intel Iris Xe Graphics @ 1.50 GHz [Integrated]
                                          Memory: 10.36 GiB / 30.90 GiB (34%)
                                          Swap: 1.68 GiB / 32.00 GiB (5%)
                                          Disk (/): 278.50 GiB / 499.00 GiB (56%) - btrfs
                                          Local IP (wlo1): 192.168.0.142/24
                                          Battery (SR Real Battery): 100% [AC Connected]
                                          Locale: en_US.UTF-8

Hello everyone,
Had this problem for a long time: on my Dell G3 3590 with GTX 1660Ti Max-Q, the system would go to suspend as usual but wake up unreliably with the same symptoms you have and failing to wake up the discrete NVIDIA GPU (nvidia-smi threw an error) when waking up successfully, even with the latest driver and kernel.

A good starting point might be checking what sleep mode your system is using:

cat /sys/power/mem_sleep

If it’s s2idle (it will be shown in square brackets), the solution might be to switch from s2idle to deep sleep as per this for the time being (see the note at the end).

In my case, booting with kernel parameter mem_sleep_default=deep made the system wake up much more reliably and even faster than it used to in the s2idle mode. Occasionally, it may wake up to a blank screen, but it goes away after switching to vconsole and back.

Hope that works for you, too :blush: