Please, improve the S0ix experience under Linux

First, I should start with the obligatory LTT Modern Standby video:

Essentially, S0ix (also called Modern Standby) has been available for a while now.

The quick rundown of the history is that previously the most common sleep mode was sleep-to-ram (S3), but Intel had decided that it wanted to make laptops more like phones, and now we have Modern Standby (S0ix).

The thing is, S3 sleep relies mostly on the firmware to work and saves a lot of power, while S0ix relies mostly on the OS with slightly more power usage than S3 (that is, assuming the OS can put everything in a low power state).

If you watched the LTT video above you might now know that even Microsoft has been having issues with proper S0ix support (in their case, devices enter s0ix successfully but may wake up and not enter sleep again, therefore draining battery). And if Microsoft is having issue on Windows, you can bet Linux support is even worse (possible crashes or not putting every device on low power mode, so battery drain).

If your system is on S0ix mode, you should have this:

$ cat /sys/power/mem_sleep 
[s2idle] deep

As well as a message such as this on logs:

kernel: Low-power S0 idle used by default for system suspend

To force S3 sleep if supported on your machine (i.e. if deep showed in the command above), just add mem_sleep_default=deep to the kernel parameters.

Intel has a blog post on how to achieve proper S0ix:

And an accompanying tool to help debug:


I believe, since the kernel will automatically enable S0ix if it finds support in the ACPI table, that Fedora should work to either support it better, disable it by default unless it’s the only option or instruct users on how to disable manually.

I am willing to write docs and even a Magazine post about the third option.

7 Likes

And if Microsoft is having issue on Windows, you can bet Linux support is even worse (possible crashes or not putting every device on low power mode, so battery drain).

Take a look at the top pinned comment for that video. You’ll see that this video caused quite a stir, and it even identified a bug within Windows:

The following behavior would change depending on whether or not the laptop was connected to power while going to the Sleep S0 state:

I think it’s sensationalist to assume that Windows OS bugs affect Linux as well. Modern Standby / s2idle does absolutely require that all the devices connected to the SOC are in the proper low power energy state. If any given one of them doesn’t reach this state, the SOC may not be able to get to it’s appropriate low power energy state. If you contrast this with S3 you’ll find that the BIOS mandates that all devices go to a low power state. In exchange for this “hammer” you will find longer resume latency.

The debug script you outlined for Intel systems is a great way to debug problems with power consumption.
Here’s a similar one that does this type of sleuth work for AMD systems:

At least for an AMD system, a properly configured kernel and BIOS will consume less power over s2idle/s0i3 cycles than it would for S3.

At least in my opinion what we need to smooth out rough edges for the experience is more data automatically collected by the OS to let people know when there are problems so that they can actually get fixed. Some ideas I have:

  1. We need a standard interface to report hardware sleep data. I sent some [RFC patches to the mailing list] a while back, but after the last round of feedback I never dusted them off. Someone (maybe me?) might want to pick these back up.
  2. We need software (systemd?) to be keeping a database of sleep information. The things that come to mind for me are: Battery levels before/after suspend, time spent in suspend, time spent in hardware sleep, and a calculated rate of discharge.
  3. We need user friendly software (Somewhere in GNOME?) to be regularly analyzing this data and if there is a problem making some noise in a way that can be actionable to fix.
  • Rate of discharge was > X mW.
  • Time spent in HW sleep state was < Y%.
  • Kernel woke Z times over AAA seconds, which is too frequently and might indicate a BIOS or EC problem.

Then you can have a message like You might have a problem with a device driver - go to this wiki page for how to report a bug and capture debug data or You might have a bug with your BIOS or EC, please go to URL to file a bug with your system vendor.

  1. A way for users to opt into sharing this data anonymously into a central database. In the Windows world Microsoft collects oodles of telemetry and when there is a problem they go to the device vendor or system vendor and tell them there is and here is the data to back it up. We need the same thing in the Linux world.
4 Likes

Sorry one more comment. These are Intel-isms. On AMD systems that support s2idle you can’t use mem_sleep_default=deep. Here’s why:

$ cat /sys/power/mem_sleep
[s2idle]

I should just add that apparently NVIDIA has also S0ix support in its driver which looks like it needs to be enabled manually:

https://us.download.nvidia.com/XFree86/Linux-x86_64/525.89.02/README/powermanagement.html

Which can be enabled with NVreg_EnableS0ixPowerManagement=1 and, as far as I can understand from that page, conflicts with NVreg_PreserveVideoMemoryAllocations=1, which is enabled by default on the current driver from RPMFusion:

$ cat /usr/lib/modprobe.d/nvidia-power-management.conf 
#
# Save and restore all video memory allocations.
options nvidia NVreg_PreserveVideoMemoryAllocations=1
#
# The destination should not be using tmpfs, so we prefer
# /var/tmp instead of /tmp
options nvidia NVreg_TemporaryFilePath=/var/tmp

So, there’s a chance S0ix is broken by default for those users, and there’s no documentation on the Nvidia page on RPMFusion about S0ix.

Just to second Mario’s comment (which is awesome) and add my perspective from supporting platforms at Lenovo. Most of our platforms (with the exception of a couple of workstations) are now S0ix only.

We did dual sleep support with both S3 and S0ix as an option in the BIOS (with S3 as ‘best effort’ for users who didn’t want to switch) on our Linux certified Intel platforms for a few years - and it was a nightmare.

Sleep issues are hard - regardless of S0ix or S3 - so it more than doubled the work. We were certifying with S0ix (that was only what Intel were supporting and was the default) and then doing internal testing with S3 (but more limited).

We found many S3 issues would creep in with FW updates - devices would stop working on resume, system wouldn’t sleep properly and battery drain in a few cases were horrible. Getting fixes done took forever and we couldn’t delay FW updates for a sleep mode that was supposed to be ‘best effort’. Users were frustrated (understandably) and it was not a good experience for anybody. We were honestly trying to do the right thing - but it wasn’t working.

We made the decision to stop doing S3 support last year and to remove the option. Having it available just didn’t work well and I agree with Mario - we have to focus on getting S0ix working right (and largely it is - and when it isn’t we work on fixing it).

I know losing S3 is going to upset a few people - but it was a considered decision on our side and based on user experience and how to be able to deliver better Linux support more effectively. I think there are ways to hack around it but I suspect the cases when that gives you a better experience are few (and I’d rather fix the S0ix experience for those few cases on our platforms!)

Mark

I believe I should tag @mattdm here.

Improving hardware support, including proper S0ix sleep, should likely be part of the Fedora Strategy 2028.

1 Like

You can, but better to reply to a strategy post where it is topical :slight_smile:

Can you elaborate on this ? I have two AMD systems that both report

s2idle [deep]

(AMD Ryzen Threadripper PRO 3945WX and Ryzen 7 5800).

2 Likes

I should clarify. My statement was on modern mobile APUs that support HW s2idle. If you check the FADT you’ll see Low Power Idle support on these and they don’t advertise S3.

Your system is a workstation CPU, it doesn’t support HW s2idle, and that’s why it shows [deep].

I proposed a patch series in the past to remove s2idle from the listing for this system (and others like it) but it was rejected.

Add one more, my AMD reports same as Matthew’s systems

If you run the s2idle script I linked above it will explain why this system doesn’t support hardware s2idle. The Linux kernel will only set s2idle by default if the FADT indicates it should.

I’ve submitted an updated version of this here:
[PATCH v5 0/4] Add vendor agnostic mechanism to report hardware sleep (kernel.org)

2 Likes

I’ve filed this RFE with systemd for the ideas I have and how to do it: Introduce concept of suspend/resume with dark screen on wakeup · Issue #27077 · systemd/systemd (github.com)

1 Like

My Ryzen 4800H laptop reports the same:

$ cat /sys/power/mem_sleep
s2idle [deep]

Your system doesn’t support modern standby. Run the script and it’ll explain why.

Thanks, that seems like a good start on standardizing how S0ix data is reported, I guess the Intel and AMD specific told could use that if it makes it into the kennel.

However, what if s2idle is the selected sleep mode, s0ix is supported but when suspending there’s still a very high battery drain? Apparently the suspend worked right, but the Intel tool says that some of the motherboard componentes (don’t remember if North or South bridge) don’t support going into the lowest power state to have it?

Because that’s the scenario that happens for me, and that’s why I’m forcing S3.

Your script says amd_pmc isn’t loaded but lsmod disagrees.

❌ PMC driver `amd_pmc` not loaded
$ lsmod | grep amd_pmc
amd_pmc                36864  0

It’s not bound to an acpi device because one is missing. Your BIOS isn’t configured for modern standby.

For Intel - I would start out by using the kernel module parameter for the intel-pmc-core driver that will alert you if you’re not getting s0ix or pc10. The output of your kernel log as well as the results of the suspend test script need to be attached to a kernel bug and a developer that can interpret them will need to help.
All of the things I mentioned above help the ecosystem to raise the awareness when there are problems.

For AMD - like I said if you have a system that is offering deep then that means that it doesn’t support hardware s2idle. Manually picking s2idle in this case will not work.
The amdgpu driver will complain when you attempt suspend as will that test script.

FYI, these patches are on 6.4-rc1.

@jared maybe some design people can help with my GNOME ideas and this information.

1 Like