How to take care of your BTRFS filesystem health on Atomic Desktops

Fedora uses the BTRFS (b-tree filesystem) by default since quite a while.

BTRFS has a lot of nice features (even though less than BCacheFS, but that can not be recommended for use yet).

But did you know that you need to do some of these manually?

1. Scrub

  • Detect and repair corruption

While mostly useful when using RAID, where the process can detect corrupted data and restore it from a disk with the same contents, this can also be useful on a single disk.

Here it can detect “bit rot”, and restore files using checksums and metadata.

The impact is pretty low, as it only changes files if they are corrupt.

run0

btrfs scrub start -Bd /
btrfs scrub status /

I am not sure if this works on read-only subvolumes

To set this up as an automatic systemd service:

run0 tee /etc/systemd/system/btrfs-scrub.service <<EOF
[Unit]
Description=Btrfs scrub on /
ConditionPathIsMountPoint=/

[Service]
ExecStart=/usr/bin/btrfs scrub start -Bd /
Nice=10
IOSchedulingClass=idle
EOF

run0 tee /etc/systemd/system/btrfs-scrub.timer <<EOF
[Unit]
Description=Run Btrfs scrub every 3 days

[Timer]
OnCalendar=*-*-*/3 00:00:00
Persistent=true

[Install]
WantedBy=timers.target
EOF

systemctl enable --now btrfs-scrub.timer

This will check the disk for corruptions every 3 days, while taking low priority (Nice=10) and not running when the disk is in use (IOSchedulingClass=idle)

Check the status of the service:

systemctl status btrfs-scrub.service

2. Balance

BTRFS has native RAID support, which allows distributing load onto multiple storage devices.

This can be used for performance, reliability, or to increase the amount of storage by using 2 drives for one filesystem (yes that works!).

balance is only useful when you have multiple disks with unevenly allocated (used) space, and especially if a disk is nearly full.

btrfs filesystem df /

Displays the stats you need. Example output that shows a lot of unused Data and Metadata:

Data, single: total=80GiB, used=30GiB
Metadata, DUP: total=10GiB, used=2GiB
System, DUP: total=32MiB, used=16KiB

Balancing the filesystem takes a lot of performance. If there is a powerloss during the process, data could be in the wrong place, while metadata normally isn’t. Still, ensure that your PC is under little load and has continuous power

run0 btrfs balance start --full-balance /

To be less invasive, you can only balance chunks with under 50% usage:

run0 btrfs balance start -dusage=50 /

3. Defragment

  • Reduce fragmentation, speed up access

Fragmentation is an issue on HDDs, where data would be stored under many different sections of the drive. To speed up performance, you can use this tool.

It is not recommended on SSDs, as there is no such fragmentation and the extra writes on the SSD reduce the lifespan

run0 btrfs filesystem defragment -r /

It is not recommended [1] to perform a full balance or even a metadata balance unless you are converting the profile (e.g., single > RAID 1, etc.).

As of today, on Fedora with a recent kernel, there is a feature that can be enabled via SYSFS, which performs automatic balancing only when necessary. I would recommend enabling that feature.

example: echo 1 | sudo tee /sys/fs/btrfs/UUID/allocation/data/dynamic_reclaim

[1] ENOSPC - No available disk space | Forza's Ramblings

Preventing ENOSPC - Btrfs Balanceeditedit source

What btrfs balance does is to send things back through the allocator, which results in space usage in the chunks being compacted. For example, if you have two data chunks that are both 40% full, a balance will result in them becoming one chunk that’s 80% full. By compacting chunks, the balance operation is able to convert the empty chucks into unallocated space that can be used for new applications.

It is important to run a btrfs balance before you run out of unallocated space. A common way is to set up a scheduled maintenance task that regularly runs a limited balance.

NOTE! Only balance DATA chunks, never METADATA chunks

[1] Balance — BTRFS documentation

Warning

Running balance without filters will take a lot of time as it basically move data/metadata from the whole filesystem and needs to update all block pointers.

1 Like

How to make let it survive to reboots? (Apart custom scripts and the like).

1 Like

I suppose /etc/fstab? Otherwise a systemd service would be needed, which sounds like a hack.

Maybe with a udev rule?

Edit: Note that I solved it with a systemd service, which seems to me the best solution, especially with multiple Btrfs disks.

service:

[Unit]
Description=Enable Btrfs dynamic reclaim on all Btrfs disks
After=local-fs.target

[Service]
Type=oneshot
ExecStart=/usr/local/bin/btrfs-reclaim.sh
RemainAfterExit=true

[Install]
WantedBy=multi-user.target

Script:

#!/bin/bash
for uuid in $(ls /sys/fs/btrfs/); do
    # Abilita dynamic reclaim per i dati
    if [ -f "/sys/fs/btrfs/$uuid/allocation/data/dynamic_reclaim" ]; then
        echo 1 > "/sys/fs/btrfs/$uuid/allocation/data/dynamic_reclaim"
        echo "Enabled dynamic reclaim for data on $uuid"
    else
        echo "Skipping $uuid: no dynamic_reclaim file for data"
    fi

    # Abilita dynamic reclaim per i metadati
    if [ -f "/sys/fs/btrfs/$uuid/allocation/metadata/dynamic_reclaim" ]; then
        echo 1 > "/sys/fs/btrfs/$uuid/allocation/metadata/dynamic_reclaim"
        echo "Enabled dynamic reclaim for metadata on $uuid"
    else
        echo "Skipping $uuid: no dynamic_reclaim file for metadata"
    fi
done

On the Fedora Btrfs Matrix room, I read that they are testing it, which is why it is not enabled by default. I think that one day it will become the default once it has been thoroughly tested.

ok I just asked about the dynamic reclaim stuff, it’s not enabled by default because we’re still testing it internally at Meta, but so far it’s not caused any problems

1
it’s too early to enable it by default, but it should be fine for individual users to play with it and test it

Conan Kudo
so it’s at least on the roadmap, good

Davide CavalcaD
18:58
yep, we have an engineer actively working on testing this on the production fleet

1 Like
  • btrfsmaintainance is a package with periodic scripts hooked into systemd timers.
  • There is a solution as stated above for dynamic reclaim.
  • But will these be implemented in fedora by default?
  • btrfs-assistant is a GUI for manually balancing BTW

With balancing, btrfs f defrag is also very important, esp. HDD.

Also, as balancing and defragging make full copies of shared extents [i.e. un-CoW the CoW extents], duperemove, bees etc. are also important.

What do you think?

1 Like

What is bees? Duperemove should be preinstalled I think

bees is a btrfs deduplicator just like duperemove. IDK if it is better or worse, I just know that it exists.

1 Like

From what I understand, it will probably be the default in the future, but for now, they are testing it in production. It’s still a relatively new feature and needs more testing.

As for the rest, I personally enable the “autodefrag” [1] mount option only on mechanical hard drives and don’t run any systemd service for maintenance. I’ve been using Btrfs for many years.
I only run a scrub if I suspect an issue with the system, mainly because checks already happen with every read and write.

Checksumming

Data and metadata are checksummed by default. The checksum is calculated before writing and verified after reading the blocks from devices. The whole metadata block has an inline checksum stored in the b-tree node header. Each data block has a detached checksum stored in the checksum tree.

https://btrfs.readthedocs.io/en/latest/Checksumming.html#checksumming

[¹]

autodefrag, noautodefrag

(since: 3.0, default: off)

Enable automatic file defragmentation. When enabled, small random writes into files (in a range of tens of kilobytes, currently it’s 64KiB) are detected and queued up for the defragmentation process. May not be well suited for large database workloads.

The read latency may increase due to reading the adjacent blocks that make up the range for defragmentation, successive write will merge the blocks in the new location.

Warning

Defragmenting with Linux kernel versions < 3.9 or ≥ 3.14-rc2 as well as with Linux stable kernel versions ≥ 3.10.31, ≥ 3.12.12 or ≥ 3.13.4 will break up the reflinks of COW data (for example files copied with cp --reflink, snapshots or de-duplicated data). This may cause considerable increase of space usage depending on the broken up reflinks.

https://btrfs.readthedocs.io/en/latest/Administration.html#btrfs-specific-mount-options

But it could be useful to be notified in case of problems. There’s an excellent software for this, which sends an email if it detects an issue.

Btrfsd is a lightweight daemon that takes care of all Btrfs filesystems on a Linux system.

It can:

  • Check for detected errors and broadcast a warning if any were found, or optionally send an email

Thanks for autodefrag. But it could slow down the fs.
Also, every defrag needs to be followed by deduplication, which can’t be done efficiently all the time.

I’ll try autodefrag with periodic duperemove, thanks.