Filesystem cache out of control, swaps out whole system

Greetings. Ever since I upgraded to a 5.8.x kernel, I’ve been having problems with my disk-intensive application. On a 4-core Intel i7 system with 32GB of RAM and 72GB of swap partition, my application for reading and restructuring/rearranging the OpenStreetMap planetary dataset (currently 53GB for the input PBF file) is getting squeezed out of memory by Linux cache growth until the entire system hangs. I was on Fedora Core 31, and tried upgrading to Core 32 to see if it would resolve the problem. No such luck. Basically, my application uses temporary buffer files (about 57 GB worth) to re-index and buffer the data sequentially read from the OSM planet file so the OSM Ways can have their vertex latitude/longitudes directly associated with them (instead of being associated with the id’s of the OSM Nodes containing this data), plus re-sorting the data into geographical order (the planet file contains data in the chronological order it was added to the OpenStreetMap database, regardless of where that might happen to be on the planet), eventually generating about 40GB of data reorganized and optimized for efficient rendering on-demand.

This used to work on earlier kernels for at least the past 8 years. up to and including the 5.7.x kernels, taking about 20 hours of heavy disk I/O to complete at the current size of the planet file and tuning of the application. Since upgrading to the 5.8.x kernels, the Linux cache expands to take over all memory, forcing all applications out to swap (including, for example, the Xorg server, which makes it very hard to see the graphical progress indicators of the application). The application itself is configured to be restricted to use about 4GB of process virtual address space to ensure it might still work on systems of less RAM capacity than my own, which is why the application is designed to use disk files for scratch storage instead of more heap memory.

So how do I keep the cache from taking over all RAM (as opposed to just free RAM) and actually flush some Least Recently Used pages out of the cache, so running applications can stay in RAM? For example, the 53GB planet file is sequentially read, so I never need any of the earlier blocks in the file in the cache ever again once I read past them.

I have made the following kernel system parameters changes in an attempt to keep the cache under control (but it hasn’t succeeded):
vm.vfs_cache_pressure = 160 (default 100)
vm.swappiness = 1 (default 60)
vm.overcommit_ratio=1
vm.dirtytime_expire_seconds=300 (default 43200)
vm.zone_reclaim_mode = 3 (default 0)
I’ve even tried disabling my swap partition (to keep it from having a place to swap stuff out); still didn’t work.

Because the OpenStreetMap planet file is not organized in the same way as my output data (that’s why I’m reorganizing it), there is a huge amount of random access to the temporary files, and sequential appending to thousands of geographically-local segment files (requiring a file open/write/close cycle for each chronologically close group of data for a given geographical area), in no particularly efficient order.

So how do I make the Linux cache be only for recently accessed/written disk data instead of all accessed disk data? Requiring more RAM than I have disk space would be somewhat excessive, especially since this application used to work (without any kernel parameter tweaking whatsoever) before the recent kernel updates.

For those who want to replicate this, the program causing the cache to beat the system to death is YAAC (“Yet Another APRS Client”), a Java 8 application available at Yet Another APRS Client download | SourceForge.net , and selecting YAAC’s menu choice File->OpenStreetMap->Import Raw OSM Map File.

Any advice on how to keep the cache bounded and not eliminating processes would be much appreciated. I presume the type of disk accesses my application is doing is similar to what database server hosts with large databases have to deal with, just faster (since I’ve been tuning this application’s code for 7 years to get the reorganization processing down to a reasonable elapsed time).

Thanks in advance.

Andrew Pavlin, KA2DDO
author of YAAC

1 Like

I have managed to come up with a workaround, but it’s pretty ugly. Basically, it’s a script to force cache flushes when the page cache gets too completely out of control:

while true ; do
   cache=`free | head -2 | tail -1 | awk '{print $6}'`
   if [ $cache -gt 32000000 ]; then
      echo -n "cache " $cache "@ "; date
      echo 1 >/proc/sys/vm/drop_caches
      sensors | grep -F '°C'
      cache=`free | head -2 | tail -1 | awk '{print $6}'`
      echo -n "cacheafter " $cache "@ " ; date 
   fi
   sleep 30
done

Sometimes I include “echo 1 >/proc/sys/vm/compact_memory” after the echo to drop_caches.

It is annoying that I have to run this script as root to have a snowball’s chance in h**l of having my non-privilege-requiring application run to completion without the entire operating system seizing up.

P.S. Not sure why the post editor is stripping out the reverse single quotes around the expressions being assigned to the cache shell variable in my script, but they are supposed to be there (along with the indentation of the block levels in the script).

Did you try zram?
It is enabled by default in Fedora 33 instead of disk swap.

I think you should talk about your issue to the kernel team around Linus Torvalds. I am sure there will be somebody to help you.

Good idea. I submitted a ticket to Bugzilla at kernel.org.

1 Like