Due to an annoying memory leak with Firefox that happens seemingly randomly every 24 h or so, my system has been hard freezing a lot and requiring me to do a hard reset every time.
I did some research and found out about systemd-oomd and got confused as to why it wasn’t killing Firefox, so I did some experiments.
Running the command tail /dev/zero on my main desktop PC with systemd-oomd running, the system hard freezes just like when the Firefox memory leak occurs. However, if I instead use earlyoom and disable systemd-oomd, it kills the process as you would expect.
If I try the same command, on my laptop, using the default systemd-oomd, it works as expected and kills the process. Both my desktop PC and my laptop have nvidia gpus with the proprietary drivers running and very similar packages installed.
I am very confused as to why systemd-oomd works on my laptop, but not on my desktop. The only journal logs from systemd-oomd are it starting and stopping whenever I start / stop my pc, there are no logs other than that.
No that I think it would be super useful but are some logs when triggering a memory leak with tail /dev/zero and manually activating the kernel oom with Alt + SysRq + f:
aug 10 15:53:35 pc kernel: kworker/8:2 invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=-1, oom_score_adj=0
aug 10 15:53:35 pc kernel: Workqueue: events moom_callback
aug 10 15:53:35 pc kernel: oom_kill_process.cold+0xa/0xaa
aug 10 15:53:35 pc kernel: moom_callback+0x7a/0xb0
aug 10 15:53:35 pc kernel: [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
aug 10 15:53:35 pc kernel: [ 137814] 998 137814 4125 235 192 43 0 69632 0 -900 systemd-oomd
aug 10 15:53:35 pc kernel: oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-1000.slice/user@1000.service/app.slice/run-r668b349b9a9d4a959011d53056ab4fe7.service,task=tail,pid=140920,uid=1000
aug 10 15:53:35 pc kernel: Out of memory: Killed process 140920 (tail) total-vm:16345100kB, anon-rss:16118016kB, file-rss:120kB, shmem-rss:0kB, UID:1000 pgtables:31616kB oom_score_adj:200
aug 10 15:53:35 pc systemd[4172]: run-r668b349b9a9d4a959011d53056ab4fe7.service: Failed with result 'oom-kill'.
I also found these issues which I think are related,
After two more hard freezes even with earlyoom I did some more research and experimenting and it turns out that my ram is most likely failing or just bad, since turning off the XMP profile completely solved everything.
Tldr: If you’re experiencing this, try turning off the XMP profile in UEFI settings