I’m reasonably sure I don’t have HDMI hot plug enabled. There’s no mention of it in /boot/efi/config.txt, and neither “HDMI” nor “hdmi” shows up in the system journal. I thought enabling HDMI hot plug would require something specific in config.txt.
I see a “leak” here is the diff of meminfo over a few hours.
% diff -y meminfo.2024-05-01T11-03-01 meminfo.2024-05-01T15-13-15 | grep '|'
MemFree: 3147256 kB | MemFree: 2562128 kB
MemAvailable: 3230020 kB | MemAvailable: 2649060 kB
Buffers: 22856 kB | Buffers: 25864 kB
Cached: 196804 kB | Cached: 197904 kB
Active: 264392 kB | Active: 265524 kB
Inactive: 48128 kB | Inactive: 51148 kB
Active(anon): 103204 kB | Active(anon): 103248 kB
Active(file): 161188 kB | Active(file): 162276 kB
Inactive(file): 48128 kB | Inactive(file): 51148 kB
Dirty: 832 kB | Dirty: 804 kB
AnonPages: 101940 kB | AnonPages: 101952 kB
Mapped: 65680 kB | Mapped: 66660 kB
KReclaimable: 18320 kB | KReclaimable: 18436 kB
Slab: 337744 kB | Slab: 918704 kB
SReclaimable: 18320 kB | SReclaimable: 18436 kB
SUnreclaim: 319424 kB | SUnreclaim: 900268 kB
KernelStack: 5288 kB | KernelStack: 5272 kB
PageTables: 4328 kB | PageTables: 4260 kB
Committed_AS: 1853796 kB | Committed_AS: 1853800 kB
VmallocUsed: 39744 kB | VmallocUsed: 39728 kB
I’ll repeat the experiment with the monitor connected all the time.
Yes, I’m referring to the HDMI hotplug in config.txt, maybe it’s something else causing this issue, I might just found a hacky way to fix this issue
I see the same pattern of memory use. Question is “is this a problem”?
It is a problem, after one or two days system will ran out of memory, OOM Killer will kick in and starts killing everything
You get one or two days? I get about 4½ hours. (But then, I’m on a rpi 3b+ with only 1GB.) I’ve started running a memory check in cron every few minutes that preemptively reboot when (not if) the SUnreclaim memory gets up to a certain fraction of total memory. That’s arguably better than facing a certain crash and having to manually reboot.
Here’s the cron job:
root@dewdrop:~# cat /etc/cron.d/15-memwatch
SHELL=/bin/bash
PATH=/sbin:/bin:/usr/sbin:/usr/bin
MAILTO=root
*/3 * * * * root /usr/local/sbin/memwatch.sh
and here’s the memwatch.sh script.
root@dewdrop:~# cat /usr/local/sbin/memwatch.sh
#!/bin/bash
read -a SUnreclaim < <( grep ^SUnreclaim: /proc/meminfo )
read -a MemTotal < <( grep ^MemTotal: /proc/meminfo )
numerator=7
denominator=10
if [ $(( ${SUnreclaim[1]} * $denominator )) -gt $(( ${MemTotal[1]} * $numerator)) ] ; then
echo "$(date) SUnreclaim (${SUnreclaim[1]}) is over $numerator/$denominator of MemTotal (${MemTotal[1]})."
wall -t 5 "Rebooting in 20 seconds."
sleep 20
reboot
else
[ -t 1 ] && echo "$(date) SUnreclaim (${SUnreclaim[1]}) is still less than $numerator/$denominator of MemTotal (${MemTotal[1]})."
fi
It rebooted 27 minutes ago, so it’s not horrible yet, but, here’s two runs 17 seconds apart, and I lost over 600KB in that time:
root@dewdrop:~# uptime
20:25:52 up 27 min, 1 user, load average: 0.08, 0.02, 0.05
root@dewdrop:~# memwatch.sh
Thu May 2 08:26:02 PM EDT 2024 SUnreclaim (124364) is still less than 7/10 of MemTotal (896188).
root@dewdrop:~# memwatch.sh
Thu May 2 08:26:19 PM EDT 2024 SUnreclaim (124976) is still less than 7/10 of MemTotal (896188).
I downgraded the kernel
# dnf downgrade kernel-0:6.5.6-300.fc39.aarch64 --releasever=39
rebooted, and the problem has gone away.
Hmm, maybe we need to create a bug report on Bugzilla?
According to bug report on Tailscale repo: RAM completely fills over time on Raspberry Pi 3B with Fedora IoT on kernel 6.8.4 and later · Issue #11888 · tailscale/tailscale · GitHub , seems like it’s kernel 6.8 and above has this issue.
But I’m not sure if version 6.8.4 motioned in GitHub issue is indeed correct.
I didn’t find more result by searching on Google, it looks like it’s some very specific reason that causes this.
A related Bugzilla Bug Report is created by me: https://bugzilla.redhat.com/show_bug.cgi?id=2279327
Hello there, I am the person from the GitHub issue mentioned: I’m joining this discussion to understand what is causing this issue.
For my specific case, I use one Raspberry Pi 3B and another 3B+ both headless and with only an Ethernet cable connected (no peripherals or GPIO attachments); I can reproduce the memory filling slowly over time issue starting from kernel 6.8.4 (Fedora ioT 39), and I am not able out of the box to deploy previous 6.8.X kernels to test exaclty where this started since I am on Fedora IoT and I would have to build a custom image which contains such kernel version for testing.
From what I could figure out by swapping back and forth the OS deployments, I can reproduce the issue on kernel 6.8.4 by just starting the Tailscale client, while in the exact same situation (same configuration files inside the user’s home and /etc, same Tailscale client version) I cannot reproduce the issue on kernel 6.7.11.
I am willing to help testing other permutations of the runtime and the system if this can help with the patching of this issue.
Hello,
Thanks for your reply, I’m also using Tailscale, but I don’t think this issue is caused by Tailscale software. It’s more like a kernel problem, where the display isn’t related to networking but is also kinda related to this issue.
I will try to run rawhide kernel when I’m free as Bugzilla suggested me to do so.
I just saw your post about the Bugzilla ticket: I’ll go over it and try to follow their advice too ![]()
Still able to reproduce on kernel 6.8.8
uname -a
Linux <redacted> 6.8.8-300.fc40.aarch64 #1 SMP PREEMPT_DYNAMIC Sat Apr 27 18:11:03 UTC 2024 aarch64 GNU/Linux
The readily available fc40 kernels unfortunately don’t go back far enough to back out of this bug. However, a few days ago I was able to go all the way back to an fc39 kernel like this:
# downgrade kernel-0:6.5.6-300.fc39.aarch64 --releasever=39
That was the kernel released with fc39. It does not exhibit the problem. All other installed software remained the same. Switching back to a “current” kernel brought the problem back.
If I had to guess, I’d say the problem started around 2024-04-20, but I can’t be certain.
Hello there, I got some good news:
This bug was reported before: 2275290 – memory leak in aarch64 beginning with kernel 6.8.4
And workaround is installing kernel version 6.8.9 from testing repository:
sudo dnf upgrade --enablerepo=updates-testing --refresh --advisory=FEDORA-2024-c90afc5c01
See https://bodhi.fedoraproject.org/updates/FEDORA-2024-c90afc5c01
Kernel 6.8.9 has been pushed to stable repository.
I can report that startig from the latest deployment of Fedora IoT (which includes Fedora 40 and Kernel 6.8.9) I cannot reproduce my issue anymore. Thanks again to @wolfyuan whose comment on my GitHub issue drove me here ![]()