You seem to be looking into only what happens as a consequence of being out of memory. You should be looking into what was tying up the memory.
The top program is a tool often used for that, but its UI is not at all obvious if you’re not used to it:
Press f to get into the field list
Use arrow keys to move to the RES field
Press s to select that for sorting
Press esc to return to the main display, then you see which processes use the most memory.
Many other tools can get you the same information.
Once you know which process is the problem, you can ask a better question and/or diagnose further.
In my own use, many web sites I leave open for long periods cause Mozilla to cause the Xorg process to grow without limit. When that happens, I always notice the system gets sluggish before it is too late. So I close Mozilla and reopen it causing Xorg to shrink. Your situation is likely something else. It might be much more complicated, but probably is equally simple, so you can first identify one process that is using far more memory than it should.
Thanks, I have toggled the virtual memory column in gnome system monitor and xorg is using 26GB of virtual memory (and 51MB of memory) . If I did my research well, then virtual memory should not concern me. Everything else looks like using a standard amount of memory (at the top is phpstorm 2.7gigs, second gnome-shell with 480mb,…)
In tree mode in htop, the PID 1 (systemd) shows 19gig out of 32gigs in RES column (this should be my userspace right?) and 169M in virt column. I started docker desktop which is set to use maximum 4gigs of ram, and it instantly crashed 4 times in a row when it finished initializing, 5th time was the lucky number when it did not crash. But I did not see any abnormality in htop or gnome system monitor.
On one hand, all current releases including F37 are tested with and support only 6.0.X atm. On the other hand, I cannot find any testing build or so for 6.1.X with F37 in bodhi atm. Currently, the 6.1.X is only tested against rawhide/F38 (and these are only builds for testing, not production). So the kernel / release constellation you deploy seems to be not from our build system, is it?
Therefore, I suggest to first try to use the currently supported kernel of Fedora 37: kernel-6.0.8-300.fc37 . Testing this would at least indicate if the issue is caused by the kernel.
It is a very common misunderstanding of memory use to think the cache is part of the problem and/or to think dropping the cache can help.
I don’t have either htop or gnome system monitor installed, so I can’t run them to make sure I really understand what the numbers you mentioned signify. But those don’t sound normal and likely the problem can be found in there.
Regarding the kernel
I will try to downgrade it to see how it behaves. Thanks for the tip.
History of changes I did from clean install
I had some issues with the older kernel when running xorg on nvidia (choppy scrolling on some websites, window tearing when moving windows around when youtube was playing something)
Ofc. some apps just do not work correctly on wayland, so I switched back to xorg and found out I had to enable force composition pipeline in nvidia settings which fixed my xorg issues.
Since then I was able to run all at once few instances of phpstorm + ff + android studio + android vm + macos kvm inside which i was running ios simulator + xcode + android studio all at once without having any memory issues.
I was amazed that how well the memory management improved considering quarter of that workload froze my entire PC a year or two ago forcing me to restart the PC at least 5 times a day.
Then yesterday came and apps started crashing/being killed even if only 3 apps were open.
The cache thing
I don’t like this solution either, but each time cache reaches about 20-22gigs, it starts killing apps even if the memory usage comfortably sits @ 60-70% which is plenty (or full memory progress bar in htop with all combined colors [green, light blue, blue, purple-ish, brown])
I have negative knowledge about memory management in linux or at any OS, but I can’t understand why can I cap out memory on windows to 100% and still having a snappy system without any apps being forced to close and I can run even more on top of them. Is it some kind of magic?
Maybe you are seeing a correlation. But there is absolutely no such causation.
Cache cannot cause apps to get killed. It just doesn’t work that way.
If I remember correctly, Windows can automatically increase your paging file on disk and can alternately use free space in the filesystem for paging. So if you have flawed applications that accumulate lots of stale memory, that can be paged out to disk, so the stale data does virtually no harm. If your applications are really using so much memory rather than accumulating stale memory that they aren’t actually using, then paging still stops apps from getting killed but it slows the system to a crawl.
Fedora has a relatively new (and I think quite stupid) feature of compressing stale memory within ram rather than writing it out to disk. For moderate amounts of stale memory with a hard drive, that saves time. For moderate amounts with an SSD, maybe that improves the lifetime of the SSD (though I doubt by any noticeable difference). For large amounts of stale memory, compression doesn’t free as much as you might have reasonably put on a swap partition.
I got rid of that memory compression thing and instead enabled a real swap partition. Since I have a giant hard drive, I don’t mind wasting many GB for that, and it is a better place to dump stale memory.
You might want to set up a real swap partition to have a better cushion against this problem. (The fedora feature of compression instead of traditional swapping is default but optional).
But ultimately, you have some serious memory leak issue that you ought to diagnose.
My sentence was meant more as a friendly underlying hint rather than as a serious question, since the author is obviously aware of the kernel relevance for such problems. I expected the fact that our build system does not contain this kernel for F37 made it already obvious that this kernel cannot be from us. Sorry for the confusion
My issue seems to be fixed, It might be too early to say but I did not get more crashes and the memory/cache is stable last day.
What I did is a full system upgrade + made docker daemon to start automatically with systemctl instead of starting up using docker desktop which forced me to modify one docker file where I had to remove the --link from “COPY --link …” which from my understanding is some new experimental feature that worked when dockerd was started using docker desktop instead of boot time systemctl service. Which I think might have been the culprit because I have set up the project the same day as the issue started happening but it made no sense why a simple php/postgres/caddy image would crash to everything.
I will monitor it for the next few days. Thanks for all the help so far