Unexpected memory starvation - earlyoom kills VM

,

Hello group,

I am experiencing an out-of-memory situation on my Fedora 32 laptop. Earlyoom is regularly killing a VM. I would appreciate some guidance on how to troubleshoot this issue.

Background:
Since many years I am running an employer provided Windows installation in a VM on my Fedora laptop without any problems.
However, since I upgraded from Fedora 31 (a fresh install after it came out) to Fedora 32 my Windows VM regularly gets killed by earlyoom. earlyoom reports low memory
earlyoom[1086]: mem avail: 363 of 15881 MiB ( 2.29%), swap free: 0 of 2047 MiB ( 0.00%)
earlyoom[1086]: low memory! at or below SIGTERM limits: mem 2.52%, swap 10.00%

and (understandably) kills my VM:
earlyoom[1086]: sending SIGTERM to process 1547 uid 107 “qemu-system-x86”: badness 352, VmRSS 5961 MiB
(full syslog extract below)

Since my system memory is 16GB and the swap is 2GB and the VM uses 6GB, I would expect something else to use 10-12GB of memory. But I cannot find anything specific.

After a fresh reboot the VM will keep running for a few hours or even days (this varies), but eventually it always gets killed. If I simply restart the VM the period until the next kill will be shorter. Eventually it will get kiled while Windows is booting. The only workaround seems to be to reboot the entire system. Which hurts me deep inside.

A memory leak comes to mind. I do not know how to address such a problem or identify the process that is causing it. Any help would be appreciated.

I have not make any significant configuration changes to KVM, the VM, or Windows itself after the Fedora 32 upgrade. The VM is configured to use 6GB of memory.
Apart from the VM, I the most important applications I am running are vscode, chrome and Firefox (both with many tabs), Microsoft Teams, and often also Text Editor is open.
All this should fit comfortably in my 16GB system memory. I have never before experienced these out-of-memory problems.

Syslog extract:
Jun 03 08:14:48 lt2785.lucasnet.local earlyoom[1086]: mem avail: 363 of 15881 MiB ( 2.29%), swap free: 0 of 2047 MiB ( 0.00%)
Jun 03 08:14:48 lt2785.lucasnet.local earlyoom[1086]: low memory! at or below SIGTERM limits: mem 2.52%, swap 10.00%
Jun 03 08:14:48 lt2785.lucasnet.local audit[1547]: AVC avc: denied { search } for pid=1547 comm=“qemu-system-x86” name=“1086” dev=“proc” ino=1460814 scontext=system_u:system_r:svirt_t:s0:c598,c788 tcontext=system_u:system_r:unconfined_service_t:s0 tclass=dir permissive=0
Jun 03 08:14:49 lt2785.lucasnet.local earlyoom[1086]: sending SIGTERM to process 1547 uid 107 “qemu-system-x86”: badness 352, VmRSS 5961 MiB
Jun 03 08:14:49 lt2785.lucasnet.local kernel: virbr0: port 2(vnet0) entered disabled state
Jun 03 08:14:49 lt2785.lucasnet.local kernel: device vnet0 left promiscuous mode
Jun 03 08:14:49 lt2785.lucasnet.local kernel: virbr0: port 2(vnet0) entered disabled state
Jun 03 08:14:49 lt2785.lucasnet.local audit: ANOM_PROMISCUOUS dev=vnet0 prom=0 old_prom=256 auid=4294967295 uid=107 gid=107 ses=4294967295
Jun 03 08:14:49 lt2785.lucasnet.local earlyoom[1086]: process exited after 0.3 seconds
Jun 03 08:14:49 lt2785.lucasnet.local systemd[1]: machine-qemu\x2d1\x2dlt2785\x2dvm02.scope: Succeeded.
Jun 03 08:14:49 lt2785.lucasnet.local systemd[1]: machine-qemu\x2d1\x2dlt2785\x2dvm02.scope: Consumed 11h 46min 52.632s CPU time.
Jun 03 08:14:49 lt2785.lucasnet.local NetworkManager[1183]: [1591164889.3691] device (vnet0): state change: activated → unmanaged (reason ‘unmanaged’, sys-iface-state: ‘removed’)
Jun 03 08:14:49 lt2785.lucasnet.local NetworkManager[1183]: [1591164889.3699] device (vnet0): released from master device virbr0
Jun 03 08:14:49 lt2785.lucasnet.local systemd[1]: Starting Network Manager Script Dispatcher Service…
Jun 03 08:14:49 lt2785.lucasnet.local libvirtd[1344]: libvirt version: 6.1.0, package: 2.fc32 (Fedora Project, 2020-03-24-15:45:44, )
Jun 03 08:14:49 lt2785.lucasnet.local libvirtd[1344]: hostname: lt2785.lucasnet.local
Jun 03 08:14:49 lt2785.lucasnet.local libvirtd[1344]: internal error: End of file from qemu monitor
Jun 03 08:14:49 lt2785.lucasnet.local gnome-shell[2173]: Removing a network device that was not added
Jun 03 08:14:49 lt2785.lucasnet.local audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg=‘unit=NetworkManager-dispatcher comm=“systemd” exe=“/usr/lib/systemd/systemd” hostname=? addr=? terminal=? res=success’
Jun 03 08:14:49 lt2785.lucasnet.local gnome-shell[2173]: JS WARNING: [resource:///org/gnome/shell/ui/status/network.js 1925]: reference to undefined property “undefined”
Jun 03 08:14:49 lt2785.lucasnet.local gnome-shell[2173]: JS ERROR: TypeError: this._devices[section] is undefined
_connectionRemoved@resource:///org/gnome/shell/ui/status/network.js:1925:27
Jun 03 08:14:49 lt2785.lucasnet.local systemd[1]: Started Network Manager Script Dispatcher Service.
Jun 03 08:14:49 lt2785.lucasnet.local avahi-daemon[1081]: Interface vnet0.IPv6 no longer relevant for mDNS.
Jun 03 08:14:49 lt2785.lucasnet.local avahi-daemon[1081]: Leaving mDNS multicast group on interface vnet0.IPv6 with address fe80::fc54:ff:fe33:5232.
Jun 03 08:14:49 lt2785.lucasnet.local systemd-machined[1118]: Machine qemu-1-lt2785-vm02 terminated.
Jun 03 08:14:49 lt2785.lucasnet.local avahi-daemon[1081]: Withdrawing address record for fe80::fc54:ff:fe33:5232 on vnet0.
Jun 03 08:14:49 lt2785.lucasnet.local audit: BPF prog-id=43 op=UNLOAD
Jun 03 08:14:49 lt2785.lucasnet.local audit[1344]: VIRT_CONTROL pid=1344 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:virtd_t:s0-s0:c0.c1023 msg=‘virt=kvm op=stop reason=shutdown vm=“lt2785-vm02” uuid=c7d5950e-f823-4f47-9927-fe47aa3e430e vm-pid=-1 exe=“/usr/sbin/libvirtd” hostname=? addr=? terminal=? res=success’
Jun 03 08:14:49 lt2785.lucasnet.local systemd[2042]: gnome-launched-remote-viewer.desktop-3213.scope: Succeeded.
Jun 03 08:14:49 lt2785.lucasnet.local systemd[2042]: gnome-launched-remote-viewer.desktop-3213.scope: Consumed 34.739s CPU time.

Regards,
Lucas

1 Like

Hello Lucas and welcome to the forum.

You can tweak earlyoom’s behavior, or disable it altogether and go back to using the kernel oom-killer.

My guess is that Chrome is hogging your memory and since it was decided that a user’s browser session is too precious to kill, your VM is paying the price.

1 Like

Hi Alex,

Thank you for your reply. I will look into the possibilities to tweak or disable earlyoom.

However, I am not convinced that earlyoom is the problem. It does report low memory, and that sounds worrying. Of course I could stop it from killing my VM but that would not solve the root problem. I would like to know which process uses up all the memory, especially because I never saw this before.

Well, you can use tools like top, htop, ps, System Monitor to see how much memory is consumed by each process. You can also stop the earlyoom service and run “earlyoom --dryrun” in a terminal and continue using your computer until it is triggered. Or you can leave things as they are and check memory use the moment the VM is killed.

1 Like