High memory usage in F40 on RPi 4, unable to find which process used them

wolfyuan · April 27, 2024, 7:11am

Problem

After upgrading to Fedora 40 on my Raspberry Pi 4, the memory is almost fully consumed by the ghost process, possibly kernel.

uptime:

 15:10:25 up 1 day, 17:23,  6 users,  load average: 0.08, 0.19, 0.43

uname -a output:

Linux potato 6.8.7-300.fc40.aarch64 #1 SMP PREEMPT_DYNAMIC Wed Apr 17 19:53:21 UTC 2024 aarch64 GNU/Linux

free -mh output:

               total        used        free      shared  buff/cache   available
Mem:           7.5Gi       6.3Gi       599Mi       7.2Mi       947Mi       1.3Gi
Swap:          4.0Gi       145Mi       3.9Gi

htop screenshot:

top -o RES output:

top - 15:07:30 up 1 day, 17:20,  6 users,  load average: 0.28, 0.27, 0.50
Tasks: 245 total,   1 running, 244 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.5 us,  1.0 sy,  0.0 ni, 97.2 id,  0.0 wa,  0.2 hi,  0.1 si,  0.0 st 
MiB Mem :   7726.9 total,    590.8 free,   6450.6 used,    948.9 buff/cache     
MiB Swap:   4096.0 total,   3950.2 free,    145.8 used.   1276.3 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                
   6204 wolf      20   0 4977832  47028  27120 S   0.3   0.6  14:43.83 podman                                                                                                                                 
    944 root      20   0  597712  42088  14308 S   0.0   0.5   0:03.93 firewalld                                                                                                                              
   1003 root      20   0 1315724  34440  16320 S   1.0   0.4  38:52.97 tailscaled                                                                                                                             
    626 root      20   0  103032  31324  30608 S   0.0   0.4   1:40.01 systemd-journal                                                                                                                        
   1723 root      20   0 2039764  29352  11856 S   2.0   0.4   1:16.75 dockerd                                                                                                                                
 252182 wolf      20   0  867524  27132   5380 S   0.0   0.3   0:15.59 cockpit-bridge                                                                                                                         
   1758 root      20   0 1259936  23904  10588 S   1.3   0.3  17:06.29 cloudflared
... tuncated

This is very weird cuz I didn’t face same problem before I upgrade to F40, F39 just works fine and its able to run continuously for a month without rebooting.

Cause

Unknown, fix available

Related Issues

Bugzilla report: #2275290

Workarounds

See solution.

barryascott · April 27, 2024, 8:08am

You have observations, but I am missing the problem statement.
Being different is not necessarily a problem.
Is there something that will not run or has broken?

wolfyuan · April 27, 2024, 10:57am

After roughly two days whole system will just be filled with memory garbage and system will just freeze.

barryascott · April 27, 2024, 1:46pm

Thanks for explaining.
I would start by running top in batch mode and sample every 60s into a file.
Let the system run for a while so that the leak should be easy to see in the top output. What look like the cause?

mattdm · April 28, 2024, 3:44am

From Proposed Common Issues to Ask Fedora

wolfyuan · April 28, 2024, 10:14am

I do run top when diagnosing this issue, but the problem is I cannot find which process is using that much memory! I do attach some screenshot and a top command output, you can check them out.

barryascott · April 28, 2024, 3:17pm

You need the history of process and system memory size information.
Which is why I suggested collecting top output every 60 seconds.

With that information is should be possible to figure out what is changing over time.

wolfyuan · April 28, 2024, 3:33pm

Unfortunately, I don’t have a memory usage history, but it’s not fully used before. I use that RPi as a server with some Podman containers running on it, the screenshot above is the result after stopping all of them.
Since I containerize everything, stopping all containers means stopping everything running on that server, all I am left with is system processes.

Even if I don’t collect top output every 60 seconds, the output after system memory is fully used still doesn’t give any clue. Btw the top output is sorted from high to low based on memory usage.

After digging through some threads on the internet, the strange output from /proc/meminfo indicated that Slab and SUnreclaim are very high (Around 4 GiB).

barryascott · April 28, 2024, 3:53pm

Unless you plan to turn off the system you can create such a history from now until the system fails.

To track this I would collect samples of /proc/meminfo every as well as the top output.
It should be easy to watch slab usage climb and confirm that you are on to the problem.
Next will be the issue of finding out why slab is heavy used.

wolfyuan · April 29, 2024, 1:24pm

After digging through the whole internet for a solution, I came across a thread telling me the display might be the issue. When display output is active but no display is plugged in, the memory leak issue just spawns from nowhere.

I fixed my leaking problem by disabling HDMI hotplug in config.txt, everything seems working as intended right now

utoddl · April 30, 2024, 8:56pm

I’m not convinced the “solution” is actually The Solution™. I have a raspberry pi 3B+ that I use basically for testing stuff. It has F40 minimal install. I usually run it headless, and run updates on it daily, and practically nothing else.

A week or so before I upgraded from F39 to F40, I started finding it unresponsive when it came time for morning updates. It being the least important of my fleet, I didn’t attempt to track down the issue until the F40 upgrade died after having installed all the F40 rpms but before removing the F39 rpms. I eventually re-installed, in batches, all the F40 rpms, which had the nice side effect of cleaning up the F39’s, So things should be good, right?

Wrong. It still runs about 4 to 6 hours before it runs out of RAM, and the oom killer kicks in. I’ve been testing with various kernels and logging ram with timestamps (see below) hoping to find a clue. In fact, as I type (on another box), I’m currently running kernel 6.8.6-200.fc39 in single-user mode. That’s the oldest kernel I still have, but it makes no difference. I’ve stopped every process I can and still have a functioning system. In user space, nothing is left but systemd, systemd-udevd, systemd-sulogin-shell rescue, sulogin, bash, and under bash I’m running this shell script:

#!/bin/bash
while true ; do 
  (
    echo -n $(date)
    free -v -w | grep ^Mem
  ) | tee -a ${1:-free.log}
  sleep 10
done

That’s it! There are no other user-space processes. And in about 3½ hours it’s going to die an ugly death due to all RAM being exhausted.

I’ve got to think it’s either a kernel bug, a systemd bug, or a bash bug, b/c there’s nothing else running!

I’ve got a monitor attached, but I get the same results regardless. Maybe I can play with disabling HDMI hotplug in config.txt to see if that makes a difference. It it does, that makes it a kernel bug, right?

barryascott · April 30, 2024, 9:02pm

Running an RPi 4 headless and no leaks evident.
This is runing server and configured to be a router.

$ free -h
               total        used        free      shared  buff/cache   available
Mem:           3.7Gi       362Mi       3.2Gi       1.2Mi       297Mi       3.4Gi
Swap:          3.7Gi          0B       3.7Gi

$ uptime
 22:01:40 up 4 days,  9:48,  1 user,  load average: 0.00, 0.00, 0.00

utoddl · April 30, 2024, 9:16pm

Which kernel, @barryascott ?

barryascott · April 30, 2024, 9:26pm

$ uname -r
6.8.7-300.fc40.aarch64

wolfyuan · May 1, 2024, 3:01am

Is it possible that only devices that upgrade from F39 to F40 has this issue?

wolfyuan · May 1, 2024, 3:02am

It does indeed looks like kernel bug, can you try to unplug your display and disable HDMI hot plug? It’s not a real solution for this issue, it just a workaround that works for me

barryascott · May 1, 2024, 6:29am

This is an upgraded system. I did the f39 to f40 using the dnf system-upgrade method.

wolfyuan · May 1, 2024, 7:05am

What I mean before is that when there is a output active but without a display plugged in, which isn’t your case here. You can try to boot the pi with display plugged, and unplug it after it boots.

barryascott · May 1, 2024, 8:24am

I’ll run that experiment. I’m grabbing /proc/meminfo every 10 mins to watch for a leak.

Also I had to confiugure nomodeset on the kernel command line.
Not sure if that is still needed.

utoddl · May 1, 2024, 10:43am

I started seeing this behavior a week or two before upgrading F39->F40.

Topic		Replies	Views
Weird memory usage on Kernel 6.8, fedora server 40 Ask Fedora server , f40	12	326	May 26, 2024
High Memory Usage Immediatly After Boot Ask Asahi gnome , memory , f40	5	749	July 27, 2024
Why do apps in all my task managers sum up to only about 4 GB while 16+8 GB RAM and Swap are almost full? Ask Fedora gnome , memory , f41 , fedora	11	125	April 2, 2025
Lightdm memory leak after F41 upgrade Ask Fedora f41	3	206	November 13, 2024
Raspberry Pi 4b 8G and Fedora 38 (server edition) Ask Fedora f38 , raspberry-pi , server	1	428	April 22, 2023

High memory usage in F40 on RPi 4, unable to find which process used them

Problem

Cause

Related Issues

Workarounds

Related topics