System freezes while swapping

Summary

Whenever the system attempts to access the swap partition, the system completely freezes for a long time.

Hardware

CPU: AMD Ryzen 7 3700X 8-Core Processor
Motherboard: Asus PRIME B450M-A
RAM: 32 GB
Storage: Samsung SSD 860 1TB (Fedora)
Storage: Samsung SSD 860 512GB (Windows)
GPU: Radeon RX 5700 XT

Operating system configuration

Plain Fedora 31 installation. The main partition as well as swap is encrypted using dmcrypt.

# lvs
  /dev/sdc: open failed: No medium found
  /dev/sdc: open failed: No medium found
  LV   VG                    Attr       LSize    Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  root fedora_localhost-live -wi-ao---- <914.77g                                                    
  swap fedora_localhost-live -wi-ao----   15.74g    

Problem description

I noticed that my computer completely hung when I had made a programming error in my code, causing the program to allocate infinite amounts of memory. When this happened, everything froze, and I was unable to move the mouse, or even ping the machine.

Since the problem was related to my program allocating a lot of memory, I wrote a small test program that mapped and accessed 32 GB of memory. I have included the program at the end of this post.

My test program can reliably reproduce the problem. Once the counter reaches around 27 GB or so (i.e. pretty close to the point where I’d expect it to start swapping) the machine hung with the exact same behaviour. At this time I waited to see if anything happened, and sure enough, after about 10 minutes the machine continued running as normal.

I then completely disabled swap and set /proc/sys/vm/overcommit_memory to 1. I then ran my test program again and this time the OOM killer killed the program at the time it would otherwise have started swapping. This confirms that the issue occurs when the system attempts to swap.

I have searched for similar issues, and I have seen people mention having problems with hangs when having swap on an encrypted LVM volume. I haven’t seen any explanations as to how to fix it though.

Is there anything I can test here? I’m out of ideas.

Test program source

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>

int main(void)
{
    size_t size = 1024L * 1024L * 1024L * 32L;
    char *p = malloc(size);
    if(p == NULL) {
        fprintf(stderr, "more memory please\n");
        exit(1);
    }

    unsigned char v = 0;
    int pz = getpagesize();
    for(size_t i = 0 ; i < size ; i += pz) {
        p[i] = v++;
        if(i % (pz * 1024) == 0) {
            printf("0x%016lx (of 0x%016lx)\n", (long)i, (long)size);
        }
    }

    return 0;
}
1 Like

You could try installing and enabling earlyoom package. It’s proposed for default in Fedora 32, but it’s available and will run on Fedora 31.

Next, try a smaller swap partition. Alternate is to disable swap on disk, and enable swap on ZRAM. You can install the package zram-generator To activate it, sudo cp /usr/share/doc/zram-generator/zram-generator.conf.example /etc/systemd/zram-generator.conf And edit it such that memory-limit = none and zram-fraction = 0.1 which will create a ~3G swap on ZRAM on your 32G RAM system. There’s actually nothing wrong with a fraction of 1, meaning a 1:1 ratio with RAM. This only uses RAM as swap as needed, it’s not a permanent reservation. But swap on ZRAM exchanges disk contention for CPU cycles (due to compression/decompression).

I suggest testing these things separately, just so you can see their effect discretely, but they are compatible with each other. This zram-generator is also being considered for Fedora 33, it just needs some tweaks so the config supports capping, rather than only ratio. I have 1 out of 4 systems where it fails to setup, which might be a race.

Still another option is to experiment with systemd run, and using resource isolation options, e.g. you can limit number of CPUs and RAM. Ideally the program should not overcommit, but yeah, it happens and then responsivity implodes. The Workstation working group has been tracking this problem for a while and is developing a plan for the next several Fedora releases - it’s on-going work for what turns out to be a complicated problem. Unquestionable a workstation should retain a responsive GUI when unprivileged programs take excessive system resources.

1 Like

Thank you for your suggestions.

If I understand you correctly, these suggestions focus on trying to avoid touching swap in the first place. However, that’s not the problem I’m experiencing.

I don’t have a problem with a process slowing down as because it doesn’t have enough memory to work with. However, in this case the entire machine comes to a complete halt for minutes, just because one or two GB of data needs to be paged out. Especially processes which were not paged out should continue running, and the machine should definitely be able to respond to ping (the ping response is sent by the kernel which is never paged out, so should not be affected by paging activity).

II think I understand generally what you’re experiencing. I can replicate that by building webkitgtk from source. Within a couple minutes the entire system is hung. Including mouse pointer. That build process is getting all the resources it needs and does make progress. The problem is GUI responsiveness is totally lost.

Now, while we might describe the same problem: lack of system responsiveness, my workload and your workload might be very different and might have entirely different behavior in important ways. And therefore it gets super complicated to talk about the specific workload effect on the system. This comes up all the time on the linux-mm@ list. You can have two totally different workloads that end up manifesting in the same way.

In my example, sometimes oom-killer is invoked. Sometimes it “never” is. Of course I don’t know how long it would really last, it’s just that after 30 minutes I give up and force power off. That is not a good user experience. But it’s also a known problem. Is it remarkably bad? I think so. I think it’s reasonable for a user to kill their system after 10 seconds, once the mouse pointer freezes. That’s just terrible.

What’s my work around? I execute ninja without defaults. I set number of jobs to 4 and eventually it finishes. And the system only occasionally stalls briefly. Should this somehow be negotiated between application and system intelligently and automatically? I think so. And some of that is work in progress.

That means the solution to the general problem, without optimizing for my case or your case, is to do better isolation to give minimum necessary resources to high priority processes that relate to system responsiveness (the GUI and everything that makes up that stack). While also getting some communication for apps to “request” resources in a smarter way. While also making out-of-memory managers smarter. The kernel oom-killer really only cares about kernel survival, it doesn’t know much about or care about user space, and especially it doesn’t care about user space responsiveness. It just doesn’t have enough information to assess that.

It will take a few more releases for this to get better. Right now the generic work around is to force the process to use more reasonable resources somehow. Because yes you almost certainly want to avoid heavy swap use because the ensuing congestion affects everything, even things that aren’t paged out. Only in the minor case of what I call “incidental” swap use, is swap a good thing, where seldom used code or data is paged out, thereby giving active pages more RAM. The “non-incidental” or heavy swap use, is apparently a problem. But as long as forward progress is being made, it’s not considered a bug, even if there are no more resources left for the GUI. Just because the GUI is not paged out doesn’t mean CPU or memory contention can’t bring the GUI to a halt.

it may be becuse the patron is encripted