ZRAM appears to freeze my server

Note that I don’t need a solution on this. This is for comments and seeing if other people have a similar problem.

I run a Fedora system as my file server and email server with a couple of btrfs arrays.

I recently upgraded it to Fedora 33, and it started to randomly freeze without putting any output in the systemd journal, and nothing on the console either, and it appeared to be completely unresponsive to the USB keyboard. So, I don’t have any debugging output to share.

But after I disabled zram it hasn’t done that since.

I’ve customized its memory settings quite a lot. Some of the interesting changes are movablecore=8G on the kernel command line and running with strict memory commit. See:

/etc/sysctl.d/10-dirty.conf:#                               512 MB
/etc/sysctl.d/10-dirty.conf:vm.dirty_background_bytes =  536870912
/etc/sysctl.d/10-dirty.conf:#                                 4 GB
/etc/sysctl.d/10-dirty.conf:vm.dirty_bytes            = 4294967296
/etc/sysctl.d/10-dirty.conf:vm.dirty_writeback_centisecs = 500
/etc/sysctl.d/10-dirty.conf:vm.dirty_expire_centisecs = 6000
/etc/sysctl.d/20-max-maps.conf:vm.max_map_count = 262144
/etc/sysctl.d/20-overcommit.conf:vm.overcommit_ratio = 52
/etc/sysctl.d/20-overcommit.conf:vm.overcommit_memory = 2
[...snip...]
/etc/sysctl.d/91-swappiness.conf:vm.swappiness=80
/etc/sysctl.d/91-swappiness.conf:vm.vfs_cache_pressure=1

Can you check the status of memory/swap?

free; swapon -s

What, like now? It’s been running fine for three and a half days at this point. But sure:

# free; swapon -s
              total        used        free      shared  buff/cache   available
Mem:       32610376     3625136      722680        1408    28262560    28268280
Swap:      16535548     1238016    15297532
Filename                                Type            Size    Used    Priority
/dev/nvme0n1p3                          partition       16535548        1238016 -2

I should see if my atop records go back far enough to just before the hangs. Yeah, there is one about five minutes before. It’s showing:

MEM | tot    31.1G  | free  482.8M |  cache  19.5G | buff   38.4M  | slab    9.6G |  shmem   7.2M | vmbal   0.0M  | hptot   0.0M |  hpuse   0.0M |
SWP | tot    19.8G  | free   18.7G |               |               |              |               |               | vmcom   6.8G |  vmlim  35.9G |
PAG | scan   433/s  | steal  319/s |  stall    0/s |               |              |               |               | swin     1/s |  swout    2/s |
PSI | cs     0/0/0  | ms     0/0/1 |  mf     0/0/1 | is     2/1/2  | if     2/1/2 |               |               |              |               |
1 Like