Btrfs balance on single-device filesystem saved me 20% space

Ever since I began running Fedora and btrfs 1.5 years ago, I’ve had issues with my system drive and data drive filling up way too fast, and not recovering all the space when files were deleted.
Both are separate filesystems, and both were banging up against the 80% fill limit consistently. A few times the filesystem would go read-only, causing a few headaches.

I ran btrfs scrub somewhat consistently the past 6 months, hoping it would help by compressing any files. Not much help, unfortunately. However, last week I ran a btrfs balance -dusage 85 on both filesystems, and recovered 10% on system fs, and a whopping 20% on data fs. I made a systemd timer to run btrfs balance monthly.

This topic discusses btrfs maintenance, and whether it’s useful or considered a legacy bug.

Though I’d love to dress for the job I want, I don’t show up at the factory clad in mountaineering gear. What I’m trying to say is, the maintenance tasks may be legacy and not what we want, but since they are useful here and now, they should be enabled by default.

1 Like

The challenge with enabling it by default is that if it is needed or not depends on use case.

Further, a balance has a performance impact while it runs so on a laptop or something with inconsistent availability it can be problematic.

That being said, I run both btrfs balance and btrfs scrub monthly.

The btrfsmaintenance package is a convenient way to manage scrubs, balance, trim and defrag.

2 Likes

The two cases that come to mind that fit the description of a drive filling up and delete not recovering space: snapshots, and bookend extents. The former is probably self-explanatory, the second is not. Bookend extents can be created with certain workloads, the only one I know off hand is Docker and Podman with the Btrfs graph driver enabled.

Can you tell us more about the workloads this system is doing? Any containers, Docker or Podman? VMs, what hypervisor manager?

Could you report the output from mount | grep btrfs ?

Are snapshots being taken? How many? How often?

How are you measuring recovered space %, following a balance?

What kernel version are you using now?

I see this issue might not be experienced by everyone.

I don’t have my laptop with me today, but I’ll answer what I can.

The system drive is 240Gb and has three pet containers for distrobox, no idea about btrfs graph driver. Hourly, daily, weekly snapshots, with 20 kept around.

The data drive is 2Tb with data duplication/mirroring. I.E. 1Tb usable space minus metadata usage. It -has- contained a win10 qcow2 image of 70Gb for virt-manager/gnome-boxes, but I deleted it a few months back. Data drive has only documents, archives and media atm. No snapshots.

I use the system for web surfing, CAD, gaming and light video editing.

KDE Dolphin reported data drive as filled with ~700 Gb, ~75% full. After btrfs balance Dolphin reported the drive as ~55% full.

Since the system starts acting all kinds of wonky when Dolphin reported fill percentage reaches 77-79%, which coincides with how much I think is reserved for metadata I’ve no reason to doubt it.

What are your thoughts?

A few things: btrfs maintenance does a very limited balance, 10% for data block groups and 5% for metadata block groups. The only time a full balance is recommended is following a file system conversion, e.g. ext4 to btrfs.

I’m not familiar with the Dolphin code, so I don’t know what values it’s looking at to determine how full a file system is. Block groups are abstracted from user space, they have no idea what the arrangement is, and all a balance does is migrate in-use extents so they’re more contiguous, in effect it’s a free space defragmenter. But I’m not sure how that helps understand the reported behavior.

By default, Fedora uses the overlay graph driver for docker/moby/podman containers. Therefore I don’t expect to see the effect of bookend extents, and therefore the behavior of containers on Btrfs should be the same as on XFS. What dedup tool are you using?

Anyway, we need more detailed information or it’s just speculation. There are a number of commands fpaste --btrfsinfo uses on the system drive that would be useful to see before and after the unexpected behavior. The commands are listed so you can (manually) run them on the data drive as well.

1 Like

Thank you for the explanation, this is certainly puzzling. Also, happy holidays :slight_smile:

I occasionally run duperemove manually. If you’re talking about what I wrote here:

Then that’s a btrfs profile that duplicates every data file. I’m kinda paranoid about bit rot haha.


I’ve filled the drive with precious data since the balancing, so I won’t revert the system state to before the balancing (is that even possible?).
Here’s the output of fpaste --btrfsinfo this morning, i.e. after balance and refill of filesystems.

Duperemove is a deduplicator. It finds duplicate files and in effect makes a reflink copy, i.e. the files all point to the same extents, also called shared extents. It’s not duplicating every data file, it’s the opposite of that.

If you want duplicated data and metadata, you need to convert the file system to use the DUP profile for data block groups. The DUP profile is the default for metadata block groups (on hard drives, since forever, and on flash drives more recently). Of course, this uses double the space.

Sorry, I didn’t get to it in time and the paste has expired already (24 hours).

Maybe I worded that weirdly, but I do use the DUP profile for both data and metadata.
IIUC, DUP profile duplicates the datasets, and duperemove removes duplicates within a dataset?

I wasn’t aware fpaste would expire the paste so quickly. Here’s a new one.

Yeah, dedup will create shared extents for the various instances of the files; and DUP profile will in effect duplicate the block groups (data and metadata in this case, i.e. file data and file system metadata) on one device. So if you add another device you’ll want to switch this profile to raid1 so that the block groups are kept on separate devices. It’s still a single source of failure (i.e. not a backup) but so long as any corruption is limited to one of the copies, the other dup copy can be used to recover and fix up the bad copy.

1 Like

Based on the fpaste, I’m not seeing anything suspicious that could relate to this. I think you’ll need to wait until the unexpected behavior happens and then collect data.

I see the / file system is using single profile for data and metadata, i.e. one copy only.

Could you boot off recent Live media (any Fedora 37 desktop Live) and post the output from the following command for both file system?

btrfs check --readonly /dev/mapper/luks-12345

You’ll need to manually unlock the two LUKS devices, and pass the correct dev node path to the command. This will not make any repairs but will report if there are any issues with either file system. It’s unlikely there’s anything wrong per se, but maybe it finds something like qgroup confusion that’d explain some of the behavior you’re getting - in which case there’s a simple fix.