Random resume (after suspend) issue on ThinkPad T14s AMD Gen3 (Radeon 680M, Ryzen 7)

Hi, since 2 or 3 weeks, may be more, I don’t remember, my laptop, a Thinpad T14s AMD gen3, powered by Fedora 39 Gnome Wayland, randomly, the suspension cannot be restored, I have to do a hard reboot/reset.

This is the last line in journald:

janv. 27 23:43:42 t14s awatcher[5517]: [2024-01-27 22:43:42.757280 ERROR watchers::watchers] Error on active window iteration: org.freedesktop.DBus.Error.UnknownMethod: L’objet n’existe pas à l’emplacement « /org>
janv. 27 23:43:43 t14s systemd[1]: Reached target sleep.target - Sleep.
janv. 27 23:43:43 t14s systemd[1]: Starting systemd-suspend.service - System Suspend...
janv. 27 23:43:43 t14s rtkit-daemon[1524]: Successfully made thread 3838 of process 3792 (/usr/bin/gnome-shell) owned by '1000' high priority at nice level 0.
janv. 27 23:43:43 t14s systemd-sleep[224210]: Entering sleep state 'suspend'...
janv. 27 23:43:43 t14s kernel: PM: suspend entry (s2idle)
$ neofetch --off
stephane@t14s
-------------
OS: Fedora Linux 39 (Workstation Edition) x86_64
Host: 21CQCTO1WW ThinkPad T14s Gen 3
Kernel: 6.6.13-200.fc39.x86_64
Uptime: 10 mins
Packages: 3347 (rpm), 48 (flatpak)
Shell: zsh 5.9
Resolution: 1920x1200
DE: GNOME 45.3
WM: Mutter
WM Theme: Adwaita
Theme: Adwaita:dark [GTK2/3]
Icons: Adwaita [GTK2/3]
Terminal: tmux
CPU: AMD Ryzen 7 PRO 6850U with Radeon Graphics (16) @ 4.768GHz
GPU: AMD ATI Radeon 680M
Memory: 7262MiB / 30847MiB

I researched on ask fedora, no success.

I also searched on RedHat Bugzilla, I only found this slightly related to my problem “Radeon on Wayland - Black screen/crash on suspend/resume” but contrary to the description in this bug report, my problem only happens randomly. I’m not able to reproduce it with any certainty.

Am I the only one with this problem?

Do you have an idea of a direction in which I could explore to correct this problem?

Best regards,
Stéphane


Crossposted on:


Edit: links to two upstream issues

No issue for 5 days.

I don’t know if the bug is fixed or not :thinking:.

The problem happened to me again 8 times in 6 days!

-11 c7a8afbcd3e44e36bfa91446ef49df69 Thu 2024-02-01 15:21:24 CET Fri 2024-02-02 22:44:31 CET
 -9 7b790d9cfe004ded892b93dbae77bdfc Fri 2024-02-02 22:47:12 CET Fri 2024-02-02 22:48:22 CET
 -8 a60027b68f5246dc905f63af93566a7f Fri 2024-02-02 22:48:57 CET Sun 2024-02-04 14:51:22 CET
 -7 8e9ee5ba98b84a17bbb515a13154a36f Sun 2024-02-04 19:51:59 CET Sun 2024-02-04 19:52:47 CET
 -5 050bb8518def425dbfb52fda90a25a3e Tue 2024-02-06 10:16:33 CET Tue 2024-02-06 13:47:34 CET
 -3 cc05cfe24b584478a4b8e139612da385 Tue 2024-02-06 13:50:17 CET Tue 2024-02-06 18:27:58 CET
 -2 c7f63bb6cb5e44279900818eef988a9f Tue 2024-02-06 22:24:39 CET Wed 2024-02-07 17:36:26 CET
 -1 caf0e01f186440a99e1672276bf73541 Wed 2024-02-07 19:02:42 CET Thu 2024-02-08 00:03:15 CET
  0 81eafd16f0674394aa97a280ff4a9599 Thu 2024-02-08 07:53:40 CET Thu 2024-02-08 07:57:14 CET

My kernel version: 6.7.3-200.fc39.x86_64 and mesa packages:

$ dnf list --installed | grep "mesa"
mesa-dri-drivers.i686                                23.3.5-1.fc39                                   @updates
mesa-dri-drivers.x86_64                              23.3.5-1.fc39                                   @updates
mesa-filesystem.i686                                 23.3.5-1.fc39                                   @updates
mesa-filesystem.x86_64                               23.3.5-1.fc39                                   @updates
mesa-libEGL.i686                                     23.3.5-1.fc39                                   @updates
mesa-libEGL.x86_64                                   23.3.5-1.fc39                                   @updates
mesa-libEGL-devel.x86_64                             23.3.5-1.fc39                                   @updates
mesa-libGL.i686                                      23.3.5-1.fc39                                   @updates
mesa-libGL.x86_64                                    23.3.5-1.fc39                                   @updates
mesa-libGL-devel.x86_64                              23.3.5-1.fc39                                   @updates
mesa-libGLU.x86_64                                   9.0.3-1.fc39                                    @fedora
mesa-libGLU-devel.x86_64                             9.0.3-1.fc39                                    @fedora
mesa-libOSMesa.i686                                  23.3.5-1.fc39                                   @updates
mesa-libOSMesa.x86_64                                23.3.5-1.fc39                                   @updates
mesa-libgbm.i686                                     23.3.5-1.fc39                                   @updates
mesa-libgbm.x86_64                                   23.3.5-1.fc39                                   @updates
mesa-libglapi.i686                                   23.3.5-1.fc39                                   @updates
mesa-libglapi.x86_64                                 23.3.5-1.fc39                                   @updates
mesa-libxatracker.x86_64                             23.3.5-1.fc39                                   @updates
mesa-va-drivers.i686                                 23.3.5-1.fc39                                   @updates
mesa-va-drivers.x86_64                               23.3.5-1.fc39                                   @updates
mesa-vulkan-drivers.i686                             23.3.5-1.fc39                                   @updates
mesa-vulkan-drivers.x86_64                           23.3.5-1.fc39                                   @updates

Here on Reddit, one person tells me:

Is your swap size larger than or equal to your ram? 9 times out of 10, all resume issues I’ve ever had were due to swap not being big enough to house everything in memory at the time of suspend, and it presented as intermittent failure on resume due to me only periodically using more memory than I had swap available. openSUSE has an option in autoYAST to deal with this issue on installation, I’ve always wondered why Fedora never did. In any case, something to check out that may not be usually thought about.

I investigated and here are a few facts:

The ram on my laptop: 30845MiB

$ sudo parted -l
Model: SAMSUNG MZVL4512HBLU-00BL7 (nvme)
Disk /dev/nvme0n1: 512GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number  Start   End     Size    File system  Name                  Flags
 1      1049kB  630MB   629MB   fat32        EFI System Partition  boot, esp
 2      630MB   1704MB  1074MB  ext4
 3      1704MB  512GB   510GB


Model: Unknown (unknown)
Disk /dev/zram0: 8590MB
Sector size (logical/physical): 4096B/4096B
Partition Table: loop
Disk Flags:

Number  Start  End     Size    File system     Flags
 1      0.00B  8590MB  8590MB  linux-swap(v1)

Swap partition size: 8590MB.

Current RAM used on my desktop session:

top - 08:19:31 up 25 min,  2 users,  load average: 0.48, 0.58, 0.63
Tasks: 559 total,   1 running, 553 sleeping,   0 stopped,   5 zombie
%Cpu(s):  0.9 us,  0.4 sy,  0.0 ni, 98.5 id,  0.0 wa,  0.1 hi,  0.1 si,  0.0 st
MiB Mem :  30845.7 total,  14253.9 free,   7905.8 used,   9396.6 buff/cache
MiB Swap:   8192.0 total,   8192.0 free,      0.0 used.  22939.9 avail Mem

I’ll try to dig in that direction.

This is not a swap ‘partition’ but is instead a portion of RAM that is reserved and used as virtual swap. Fedora has used zram for swap for many years and has not created a physical swap partition by default for the same period.

Suspend does not power off the system but only allows the hardware to be powered down except for RAM which remains powered on.

I think most cases of problems with resume from suspend have actually been the result of losing the GPU config and its ram content. It seems many systems power off the GPU as part of suspend so loss of its config and memory content combined with failing to restore the graphics config when resuming is a major factor in failure to resume properly from suspend.

I think I have this bug: 2262577 – kernel-6.7.3 broke suspend (QCNFA765 ath11k while bluetooth is enabled)

See also Asus Zephyrus G14 GA402 - Suspend not working reliably since Kernel 6.6.8 (#3132) · Issues · drm / amd · GitLab

I have a P14s Gen4 (so pretty much the same setup) and I am not seeing this problem. My machine maybe failed once to resume from suspend.
I am much more affected by the WiFi issue you mentioned.

After investigation, I think I’m the victim of two bugs:

  • the first is the one described here:

Since kernel 6.6.8, I’ve been having suspend issues. Sometimes, a suspend request would result in the screen blanking, but the power LED remains lit. Other times, suspend would occur, but randomly, the system wakes itself (power LED is solid white), and eventually the fans turn on to full speed and the system gets very warm. A long press of the power button shuts it down, and it reboots normally.
After a bit of experimenting, the bad suspend only occurs on lid close. Suspend works normally if a suspend is requested by pressing the power button. This behaviour has been confirmed by other Asus G14GA402 users, as well one Asus TUF Gaming A16 Advantage Edition FA617NS user.
There was no issue with suspend on kernel 6.6.7 and lower. The issue has persisted through 6.6.8/9/11/13 and 6.7.2.

The kernel 6.6.8 was released in “stable” on 2023-12-25, I think that’s when I started having problems that I considered random.

Summary of QCNFA765 ath11k problems kernel-6.7.4

The current 6.7.x suspend crashes are intertwined with the long standing packet loss and latency problems we’ve been seeing with QCNFA765 Linux ath11k.

Kernel 6.4.12-6.6.14 all had the same problem where you need the iw dev wlp1s0 set power_save off workaround to prevent crippling packet losses and slow speeds. With the power_save workaround applied this wifi adapter was mostly tolerable.

Kernel-6.7.3 broke suspend.

Kernel-6.7.4 included a partial fix.

I think I have this issue since 2024-02-06 when this kernel-6.7.3-200.fc39 package was published in stable.

I can confirm that the issue described also happens with my laptop (Thinkpad T14s with 32GB, Ryzen 7) after upgrading to Fedora 39.

Using kernel 6.7.4 with wireless and Bluetooth enable.
Does not seem to be related to RAM since i had this issue with many GB of free RAM.

Before upgrading from Fedora 38 was working properly.

OS: Fedora Linux 39 (Workstation Edition) x86_64 
Host: 21CQCTO1WW ThinkPad T14s Gen 3 
Kernel: 6.7.4-200.fc39.x86_64 
Uptime: 57 mins 
Packages: 2276 (rpm), 24 (flatpak) 
Shell: bash 5.2.26 
Resolution: 1920x1080 
DE: GNOME 45.4 
WM: Mutter 
WM Theme: Adwaita 
Theme: Adwaita [GTK2/3] 
Icons: Adwaita [GTK2/3] 
Terminal: gnome-terminal 
CPU: AMD Ryzen 7 PRO 6850U with Radeon Graphics (16) @ 4.768GHz 
GPU: AMD ATI Radeon 680M 
Memory: 18007MiB / 30845MiB 
1 Like

Even with wireless and bluetooth disabled i still get suspend issues.

There are two issues mentioned in this thread, and some diagnostic workarounds (disabling wifi, using suspend from keyboard rather than closing lid). Too often there are delays in updates for vendor firmware after a new kernel version appears, so I find it important to have USB dongles for wifi and bluetooth for times when the internal versions are failing.

Yesterday updated to kernel 6.7.5 and i have not seen this issue occur again. Maybe its fixed

Unfortunately it is not fixed yet. While for some hardware configurations the new kernel seems to fix the issue, there’s others where it still doesn’t work.

Yes, spoke too soon, but it seems it happen less frequent.

1 Like

We need more information. You may be alble to find journalctl entries associated with the issue(s). Look for strings associated with suspend and resume so you can skip to the time period when “interesting” stuff happens, then look for differences between succesful and failed instances.

Here, I read:

Thanks Justin, that works. TIL
I rebooted into 6.7.5-201 and it seems consistent:

  • I ran amd_s2idle.py --count 4 and it didn’t break;
  • I closed the lid three times and it didn’t break;
  • I suspended from the Gnome menu two times and it didn’t break;

It looks good so far.

So, I think I need to wait 6.7.5-201 to be present in stable before execute upgrade.