Kernel 5.16.8 btrfs high disk IO load

johndoe5599 · February 14, 2022, 5:35pm

I am using Fedora 35 and there is a high disk write idling from [btrfs-transaction] in iotop (close to 1GB in couple of hours). I read some articles about an issue with btrfs fragmentation in kernel 5.16 but it is fixed in v5.16.5.
My current kernel version is 5.16.8-200.fc35.x86_64.
I set mount options for SSD and HDD to noatime and noautodefrag but issue is not solved. Do you have any suggestions?

chrismurphy · February 14, 2022, 8:21pm

Can you provide a few instances of cat /proc/$btrfs-transaction-pid/stack while the problem is happening? Just use up arrow and repeat the command a bunch of times, see what this process is up to.

Also note that autodefrag is not the default mount option so it isn’t necessary (or harmful) to use noautodefrag.

johndoe5599 · February 14, 2022, 8:40pm

Image link: Imgur: The magic of the Internet
My drive setup is: / on SSD and /home on HDD - both btrfs. I have two btrfs-transaction processes so I assume one for / and one for /home. I take screenshot for only one process but both have same output.

Update: Maybe it is not kernel related because I switched to kernel-5.14.10-300.fc35.x86_64 and IO load was similar.

Also I noticed SSD turn on/off noise every ~15-20 minutes and I think it is from non-stop disk write.

chrismurphy · February 14, 2022, 9:02pm

You could try fatrace to find the writer, it could be a bit verbose. You’ll need to mount the top level of the file system, e.g. as root

mount /dev/sdXY /mnt
cd /mnt
fatrace

You could also use auditd.

You could also check out btrfs subvolume find-new where you can find the current generation of a subvolume using btrfs subvolume list and then point find-new to a subvolume with the same or -1 generation value for that subvolume. Eventually (seconds), you’ll see whatever files are being written to that subvolume. e.g. while in ~/

btrfs sub find /home 286207
transid marker was 286207
dd if=/dev/zero of=hi count=1
btrfs sub find /home 286207
inode 5517 file offset 0 len 512 disk start 0 offset 0 gen 286210 flags COMPRESS|INLINE chris/hi
transid marker was 286210

It may also work if you go back 1000 generations and see what it’s writing a ton of. It does only show writes, not reads or deletes or renames, but new file writes to new or existing files.

johndoe5599 · February 14, 2022, 9:39pm

Output from fatrace: Pastebin
Output from btrfs subvolume find-new (I checked last generation and -1 and they were same): Pastebin

So, it is somehow journal related I think but I don’t know exactly what.

chrismurphy · February 14, 2022, 9:51pm

journalctl -f

What events are being logged? tracker-extract and tracker-miner seems busy but they are mostly doing reads.

Is tracker-extract crashing? Because I see a lot of the same files being read by different PIDs of tracker-extract, with PID increasing each time.

It could be a tracker miner bug where it’s crashing on a particular file that’s confusing the extracter.

Try both
coredumpctl
sudo coredumpctl

If it’s crashing, I think you can stop it by going to Settings->Search and toggling the switch at the top.

chrismurphy · February 14, 2022, 9:56pm

Yeah in the fatrace output, I’m seeing only one writer

systemd-journal(596): W /mnt/root/var/log/journal/54975aab116243819141783c90dcfdb1/user-1000.journal

So this suggests a user process is dumping events to journald. I’m not sure off hand if journalctl -f will show these events or if you need journalctl -f --user. But if something is crashing, the core dump is written to systemd-journald socket which would explain a lot of writes.

You can also check for the build up of many and/or large user log files with ls -lsh /var/log/journal/$machineid/

johndoe5599 · February 14, 2022, 10:19pm

coredumpctl output with and without sudo is:

Output from journalctl -f: Pastebin
Output from journalctl -f --user: Pastebin

72M was total for /var/log/journal

chrismurphy · February 15, 2022, 12:37am

Maybe tracker-miners/208.

johndoe5599 · February 15, 2022, 8:47am

When I do env TRACKER_DEBUG=config /usr/libexec/tracker-miner-fs-3 I get this message:

Could not request DBus name 'org.freedesktop.Tracker3.Miner.Files': D-Bus service name:'org.freedesktop.Tracker3.Miner.Files' is already taken, perhaps the application is already running?

I followed instructions from Gnome Gitlab to run daemon manually.

Now when I do journalctl -f there are not infinite tracker starting loop logs but there are logs from rtkit-daemon and audit.

With journalctl -f --user --unit=tracker-miner-fs-3.service --unit=tracker-extract-3.service --priority=7 there are only 4 logs for starting tracker file system and metadata extractor.

btrfs subvolume find-new still shows many journal logs and issue with high btrfs-transaction IO load still persist.

oprizal · February 15, 2022, 9:19am

Hi, just want give my though. There lot process from your journalctl with 2645 pid process. May be you want to check what exactly this process (the id may be could be different for each boot, you need to check again with journalctl) with Gnome system monitor (on Utilities group if you’re using Gnome for easy read) or from terminal with top.

You could also use system monitor for general diagnosis by sort it by the (column) time, cpu, or memory. Maybe the obvious one is time if it happen start from the beginning of session till the end of session.

johndoe5599 · February 15, 2022, 7:35pm

As I said, metadata tracker starting loop disappeared from journalctl. There is only one entry which is for systemd.
I still have high IO load. I don’t know what else to check.

Update:
Output from btrfs subvolume find-new / <gen> (where SSD is mounted) still shows many /var/log/journal entries.
Output from btrfs subvolume find-new /home <gen> (where HDD is mounted) shows many firefox COMPRESS entries. I don’t know if this behaviour is normal.
Example:

inode 263723 file offset 0 len 20480 disk start 1167560704 offset 0 gen 3463 flags COMPRESS username/.cache/mozilla/firefox/4pplbep0.default-release/cache2/entries/83C21DE5AB7FA483A48C23E1A493BDE4EFFE52F8

oprizal · February 15, 2022, 8:23pm

johndoe5599:

inode 263723 file offset 0 len 20480 disk start 1167560704 offset 0 gen 3463 flags COMPRESS username/.cache/mozilla/firefox/4pplbep0.default-release/cache2/entries/83C21DE5AB7FA483A48C23E1A493BDE4EFFE52F8

If you believe it related to ~/.config, you can test it by creating new user to see if the problem also happen there.

There also some executable file inside /usr/libexec/ directory related to tracker (ls /usr/libexec | grep -E "tracker|miner"). You can run something like /usr/libexec/tracker-miner-fs-3 --help to see available options.

Gnome it self have the tracker to autostart located at /etc/xdg/autorstart/tracker-miner* in case you want to remove it from autostart for debuging.

johndoe5599 · February 15, 2022, 9:06pm

I created a new user. Same load.
I disabled tracker with tracker3 daemon --terminate and I added Hidden=true in /etc/xdg/autorstart/tracker-miner* files. Rebooted. Same load and same entries for / and /home.

Edit: Maybe I found something unusual. When I checked for last 5 gen with btrfs subvolume find-new / <gen> there are many logs that something is written in /var/log/journal. But at the same time I opened journalctl -f and journalctl -f --user and there was nothing logged. Is it normal?

chrismurphy · February 16, 2022, 3:58am

Your /var/log/journal/$machineid is 72M which is not very much. e.g. I’m using systemd.log_level=debug which is extremely verbose, I’ve got 2.79G in just 4 days. So I don’t think your problem is with journald logs unless you’ve customized journald. Also note that journald interlaces writes to its files, so you will see the same inode listed multiple times with different offset values indicating inserted writes. Also normal but makes it seem like there’s more being written than there probably is.

Web browsers do prolifically write to the disk with many small file writes.

chrismurphy · February 16, 2022, 4:33am

systemd-journald only writes to two journals at one time, a system and a user journal. It rotates them when they reach the prescribed size.

Maybe install bcc-tools and run sudo /usr/share/bcc/tools/filetop to find a writer. In my case I’m seeing cookies.sqlite-wal being hammered by Firefox all the time. If anyone can answer why writing cookies this much this often is that important…

There’s a bunch of other tracing tools in bcc-tools you can mess around with that might help, vfscount is a high-level overview one that shows create, read, write, and fsync.

TIME         READ/s  WRITE/s  FSYNC/s   OPEN/s CREATE/s
21:30:10:      1949      623        0       38        0
21:30:11:       333      321        0       49        0
21:30:12:       337      307        0       36        0
21:30:13:      1242      407        0       42        0

This is “idle” as in, all I’m doing is typing in this browser window with maybe 10 other tabs.

Any snapshots being created or destroyed?

tbclark3 · February 16, 2022, 5:26pm

I have the same issue. I have a 2TB BTRFS filesystem, and when running any 5.16 kernel (I am currently on 5.16.8) the system load gradually increases over a period of 3-4 days until I have to reboot. The problem does not occur with 5.15 kernels. I have spent hours trying to pinpoint the responsible process but with no clear candidate. The btrfs-cleaner process does appear more prominently in iotop as the load increases, but it is not there continuously.

chrismurphy · February 16, 2022, 8:41pm

Can you post your mount options? mount | grep btrfs should do it. There was a bunch of refactoring with defragment code in the 5.16 kernel cycle that caught up autodefrag in this sort of behavior, and my advice is to remove the mount option temporarily. Some fixes are applied to 5.16.5 but I guess that’s not enough in all cases so I’ll circle back with developers and see if there’s more on the way.

Or if the workload is something other than desktop (web browser, email, small databases) then you may want to keep it off, because autodefrag isn’t intended for the very busy or large files use case. autodefrag detects small insertions and appends to files, and periodically dirties the entire file so it’s subject to being completely rewritten out, more or less just like duplicating it, thus defragmenting it.

johndoe5599 · February 16, 2022, 9:29pm

With sudo /usr/share/bcc/tools/filetop I’m also seeing cookies.sqlite-wal many times. I assume it’s normal.

This is the output from vfscount after 1-2 minutes:

This is the output from vfsstat. Browser with 13 tabs.

These are my mount options. /dev/sdb3 is SSD, /dev/sda1 is HDD.

/dev/sdb3 on / type btrfs (rw,noatime,compress=zstd:1,ssd,space_cache,subvolid=256,subvol=/root)
/dev/sda1 on /home type btrfs (rw,noatime,compress=zstd:1,space_cache,subvolid=256,subvol=/home)

chrismurphy · February 16, 2022, 9:57pm

@johndoe5599 I see similar vfsstat, and an order magnitude larger vfs_read and vfs_write. I’ve got some flatpaks running (signal and element). Doesn’t look unusual to me.

@tbclark3 earlier question about current mount options is directed at you

Topic		Replies	Views
Problem with boot, kernel and Btrfs Ask Fedora boot , btrfs , nvidia , kernel , f39	40	1059	August 13, 2024
BTRFS error on boot Fedora 40 Ask Fedora btrfs , workstation	14	1239	June 13, 2024
BTRFS error; Emergency mode entered every time I boot Ask Fedora boot , btrfs	2	248	December 31, 2024
Sysroot mount failed while kernel update Ask Fedora	5	1575	June 5, 2023
Fedora 40 BTRFS filesystem corruption on 6.9-6.10 kernals Ask Fedora amd , btrfs , f40	40	866	October 1, 2024

Kernel 5.16.8 btrfs high disk IO load

Related topics