Article proposal: converting your filesystem to Btrfs

I just went through all the hoops of doing this. There is very little guidance/documentation available, especially Fedora-specific. Most says ‘just reinstall Fedora when upgrading to F33’ - but I think several Fedora users would be interested in something more detailed.

I’m happy to write the article but please note that I’m not a native English speaker.

Summary: convert your filesystem to Btrfs

Description: The purpose of the article is to guide users who are upgrading from F32 to F33 and have existing (e.g. EXT4 with or without LVM) filesystems. In case they wish to migrate to Btrfs, the article would help them assess the pros/cons of a manual conversion procedure and outline the necessary steps. Because this type of conversion is more for power users who ‘know what they are doing’, the article would contain the necessary disclaimers about the dangers of data loss or inability to boot.
Outline:

  • Intro
  • Btrfs basics - links to previous Btrfs articles on Fedora magazine
  • Pros and cons of a manual Btrfs conversion
  • The other way: just reinstall Fedora
  • Disclaimers, rollback procedure
  • Prerequisites
  • Conversion
    • Download & write live image
    • Free up space on your drive
    • Convert to Btrfs
    • Create subvolumes
    • Decide about compression
    • Modify fstab
    • Reinstall & reconfigure GRUB
    • Regenerate initrd
    • Start the new system
    • Before production use: defrag & balance
6 Likes

Hello @gombosg,
Welcome to the Fedora discussion forum.
+1 from me on this article idea, it is timely, and needed. Please take the time to review how to get setup as a writer at this link https://docs.fedoraproject.org/en-US/fedora-magazine/contributing/ and once another editor agrees with your proposal it will move forward. Thank you for wanting to contribute.

2 Likes

Awesome, thanks! I read the setup docs and logged in to Taiga. I’m happy to start working on it once it gets a green light.

Hi @gombosg,

Your outline looks great. Thanks for providing a detailed proposal and I’m looking forward to seeing the article. +1, you’ve got the green light. :slightly_smiling_face:

3 Likes

I can help with technical questions and review. Anything significant missing in man btrfs-convert we should file an upstream bug. Top two preparation items I’d emphasize:

  • backup
  • e2fsck -fyv

Convert should succeed or fail safe. A complete rewrite happened for btrfs-progs 4.6 (June 2016). The Btrfs change log lists some important fixes since then, which we do have in Fedora. Everyone should have btrfs-progs 5.9 by now, if they’re up to date.

Things btrfs-convert won’t do:

  • remove LVM, if present
  • create the subvolume layout used by default with a clean Fedora installation; namely, a subvolume named “root” mounted at / and a subvolume named “home” mounted at /home
  • the followup tasks listed in the man page

None of these are impediments to on-going usage including major version upgrades. It is possible to “retrofit” the installation so it does have the subvolume layout, by making read-write snapshots, updating fstab and bootloader configuation, and then cleaning up the “top-level” of the file system. That could be a part 2 if this is already getting long.

btrfs-convert is possible because Btrfs is copy-on-write (no overwrites) and has no locality, i.e. no fixed locations for btrfs metadata. Therefore the convert treats ext4 metadata and data as read-only areas, writing the equivalent btrfs metadata into free space, while leaving the data in place. Because of this, it’s theoretically possible to convert any file system. The tricky part comes at the end where a few blocks need to be relocated to make room for Btrfs superblocks, the only things that do have fixed locations.

2 Likes

@cmurf, thanks for your helpful comments.
The article is already in review in the Fedora Magazine Wordpress, but looks like only editors can access that.
I set up a public preview link at my blog here: https://gombosg.com/?p=257&preview=1&_ppp=a649f8f523.

I think that the article contains most of what you wrote here, but its format is more like a step-by-step guide.

I’d appreciate if you could read & review the post draft and possibly leave some comments here.

Looks good. I have a little bit of feedback:

need about 20-50% of free disk space

I think it’s reasonable to say they’ll want 20%, or more, free. In reality, it’s a tiny amount of metadata, maybe 4% of the reported used space is what’s typically needed. But that would sound like it’s OK to try to convert a 95% full file system. It might work.

What about LVM?

I think this section is good as it is. There are a number of ways to merge the two, perhaps a future article though. One idea I have is converting both / and /home LV’s to btrfs. Snapshotting / and then using btrfs send/receive to sent it to the home LV (or in the other direction if the free space assessment works out better). This would build on the previous btrfs send/receive article in Fedora Magazine.

Free up disk space

Considering VM images as top candidates for removal. They take up lots of space, which is good for the conversion process to use. And they will need to be recopied anyway, because convert will make them COW, and we need to make them nodatacow. This isn’t automaticaly done with btrfs-convert.

  • Inside root subvolume, delete home folder.

Delete the contents of the home directory, because the home subvolume will need this empty home directory in the root subvolume as a mount point.

An alternative for handling the home subvolume, instead of creating a snapshot and cleaning it up:

btrfs subvolume create home2
cp -a home/* home2/
rm -rf home
mv home2 home

New in Fedora 33, cp now uses reflink copies by default. So this will do a fairly fast lightweight copy from the old home directory to the new home subvolume. The end result is the same either way.

Modify fstab

  1. Consider removing the 2nd paragraph entirely, and recommend the user run blkid to get the UUID for the Btrfs file system. This is valid whether MBR or GPT partition scheme is used, it’s a UUID for the file system itself, and it will be repeated twice. Once for the / entry and again for the /home entry, only differing by the subvolume mount option.
  2. For LVM installations, typically UUID is not used, but the name of the VG/LV.
  3. I suggest only the single option subvol=root and subvol=home. That’s what a default installation will produce. And then the article doesn’t need to suggest using a different defragment target size to account for compression.

Chroot into your system

Looks like we need to umount the Btrfs from /mnt first. Then mount in order:

mount -o subvol=root /dev/vda3 /mnt
mount /dev/vda2 /mnt/boot
mount /dev/vda1 /mnt/boot/efi

Otherwise subsequent commands won’t work.

Reinstall GRUB, regenerate initramfs

Pretty sure the only thing required here is grub2-mkconfig with -o option pointed to the correct location. The btrfs driver is now built into the kernel.

After first boot

# btrfs balance start /  <- this can take several hours

No need for a full balance. The man page only suggests balancing metadata, using -m option. So either:
btrfs balance start -m /
or
btrfs balance start -mconvert=dup /

btrfs-convert by default creates a single copy of metadata, whereas mkfs.btrfs uses single for SSD and dup for HDD. Strictly speaking it looks in sysfs for whether it’s considered rotational, if so, it uses dup metadata.

Finally, the VM images:

chattr +C on the containing directory, e.g.
chattr +C /var/lib/libvirt/images
Copy them back or duplicate them in place with cp and delete the original. Confirm VM images are nodatacow with lsattr

New clean installs will have this configured automatically when the storage pool is first activated.

2 Likes

@cmurf, thanks for the detailed review and the helpful comments! I will incorporate the changes.

New in Fedora 33, cp now uses reflink copies by default.

Glad to hear that. I read contradictory things on the Internet about copying between subvolumes so I went with the safe way, but this is much more elegant.

I suggest only the single option subvol=root and subvol=home . That’s what a default installation will produce.

Good idea. I’ll modify the example fstab to the same defaults as a new installation, but keep the explanation of adding compression so that users can customize it.

And then the article doesn’t need to suggest using a different defragment target size to account for compression.

The defrag command example comes from the btrfs-convert manpage. See below…

The btrfs driver is now built into the kernel.

It’s apparently not added to the initramfs. I recalled having this issue, and now quickly installed an F33 VM with ext4 and another with btrfs to compare the differences. The btrfs dracut module is not in the initramfs for the ext4 installation.

So I think it needs to be regenerated either manually or by just reinstalling the kernel. Are you sure that this is unnecessary?

No need for a full balance.

Good to know that. I worked off of the Conversion wiki page which still suggests a full balance for some reason.

So let’s clarify: what do you suggest for balance and defrag command options? The wiki page and man btrfs-convert examples differ.

Finally, the VM images

New clean installs will have this configured automatically when the storage pool is first activated.

Nice point! I actually didn’t know that. This folder is empty for me, I’m using GNOME Boxes (as presumably many users do) which stores images under ~/.local/share/gnome-boxes/images.

I did a fresh F33 installation and did an lsattr to check.
~/.local/share/gnome-boxes/images has No_COW, /var/lib/libvirt/images doesn’t. Maybe because no images are being used inside the latter folder.

I’ll add this piece of instruction to the guide.

1 Like

I have updated the article inside the Fedora Magazine WP and my own. See the updated draft here: Convert your filesystem to Btrfs – Gergely Gombos

copying between subvolumes

This is not easy to keep up with. Upstream coreutils has changed cp to default to --reflink=auto behavior, but they haven’t cut a release. [Fedora’s coreutils is carrying a patch for it in Fedora 33+.](https://src.fedoraproject.org/rpms/coreutils/c/5d08d14b0

Whether it will be a reflink (efficient) copy or a regular copy depends on whether the copy crosses a mount boundary. This is a VFS imposed restriction (patches submitted and rejected). The gist is that if the subvolumes share a common mount point, it’ll work. So if I create subvolumes in my own ~/ I can cp or mv files around and they are, behind the scenes, all reflink copies (mv will try to do a simple rename() first, then fallback to cp reflink then fallback to regular copy - followed by a delete of course). Whereas:

mv /home/chris/Downloads/blah.iso /var/lib/libvirt/images/ can’t rename because separate subvolumes for / and /home; can’t reflink copy because they are different mount points; but can be fully copied then the original is deleted. So the command succeeds but might take a while. However,

mount -o subvol=/ /dev/sdXY /mnt
mv /mnt/home/chris/Downloads/blah.iso /mnt/root/var/lib/libvirt/images/ is likewise not renamed because these are separate subvolumes, just like you can’t rename between file systems; but it falls back to cp reflink which this time works because they share a common mount points at /mnt. In this case /mnt/home and /mnt/root are the same subvolumes as /home and / respectively but the former share /mnt as their mount point, so the reflink copy works. :smiley: It is definitely not obvious, and also suboptimal but chances are cross mount point copies are somewhat rare.

It’s apparently not added to the initramfs.

It won’t be. It’s built-in to the kernel starting with Fedora 33.
Fedora 32 and older:
CONFIG_BTRFS_FS=m

Fedora 33 and later:
CONFIG_BTRFS_FS=y

btrfs dracut module is not in the initramfs

It isn’t anyway due to a bug. I don’t think it missing will be fatal in the single device case, but I’m a bit concerned now about /usr/lib/udev/rules.d/64-btrfs-dm.rules missing for the Btrfs on LVM case. Looks like friendly name might be missing but also should be non-fatal.

The wiki page and man btrfs-convert examples differ.

It’s subtle. The wiki is older the man page is newer, and while the older advice isn’t wrong, the newer advice is OK and won’t take as much time. A compromise between the two approaches is a filtered balance, even suggested in the wiki. That’d look like:

btrfs balance start -dusage=20 /mnt
find /mnt -type f -size +1G -print -exec btrfs fi defrag {} \;
btrfs balance start -m /mnt

This will start balancing data block groups that are 20% or less full. Then defragment large files. Then balance all metadata block groups. The logic is that there’s a decreasing benefit to just shuffling all data extents around. What we care about is reducing free space fragmentation and a filtered balance will do that while not taking hours. The worst offenders for this are all in the less than 50% full realm. Could you do -dusage=50? Sure, it’s not a problem. But the best bang for the buck is somewhere around 10-30 so I’m just splitting the difference. The logic for rewriting all of the metadata is that this includes prior ext4 areas as well as converted btrfs areas, in quite small 8MB block group sizes. Fully rewriting them out, by not filtering the -m command, results in fewer and larger metadata block groups. Typically 256M in size, but it depends on the file system size.

~/.local/share/gnome-boxes/images

I just tested doing
chattr -C ~/.local/share/gnome-boxes/images
And then creating a new VM in Boxes, and +C is not restored. So it does seem to be predicated on first use of a directory and detects being placed on Btrfs. So the initial detection of ext4 doesn’t cause it to be set and then it’s not checked/set again post-conversion to Btrfs.

Thanks!
Your comments are very informative, but including them would make the article too long.
I decided to keep the initramfs/kernel reinstall part, it won’t hurt for users.

From here on I hope to get some feedback from the editors, too.

This, upgrading from Fedora 32 to Fedora 33, is what has kept me from upgrading to Fedora 33 so far. Thanks for the article work.

1 Like

@glb @jakfrost
Is there anything else left that I missed to do, or just wait for the review to finish?
Latest preview link: Convert your filesystem to Btrfs – Gergely Gombos

1 Like

I’ve come across a conversion gotcha. If the user has VM images, raw or qcow2, conversion makes them datacow just like everything else.

It’s possible the user will get stuck being unable to duplicate the file, in order to make it nodatacow, because they don’t have enough space. That is, the VM image is too big to duplicate in the available free space.

Reflink copies don’t help here because they share extents. The extents are datacow, so they can’t be made both datacow and nodatacow, thus attempting to force a reflink copy fails.

It’s not strictly required to convert VM images to nodatacow. But there are gotchas, hence why it’s the libvirt default when creating new pools to make them nodatacow.

That seems like an unusual edge case, but a potential workaround might be to make the original VM image sparse to free up some space. That is:

  1. From within the VM image, fill all the unused space with zeros using dd if=/dev/zero of=/zero bs=1MB; rm -f /zero
  2. On the hypervisor free all the “zero’d” blocks using fallocate -d <path-to-vm-image>

The above should work as long as the VM image is less than 50% full. Deleting unnecessary data from the VM image before zeroing the free space might help too.

Maybe steer the reader away from conversion if they have VM images? It might call for an advanced article on conversion, or on VM images specifically, if it turns out this is more common than we think.

I’ve run into this problem before, and I saw it on users@fpo over the weekend. While I’m not sure how common the “ext4 + raw/qcow2 + want to convert to Btrfs” group is, I’m not sure it’s rare. The main reason for the nodatacow default and recommendation is that it’s the simplest. No discussion needed.

While datacow VM images are a valid workflow, it’s just more complex. Free space and backing file fragmentation effects and setting up scheduled tasks to mitigate them. Avoiding O_DIRECT until an old bug gets fixed (yes it’s newly filed but it’s been known upstream for awhile). The guest VM and file system write pattern matters a lot too.

1 Like

FWIW, my personal preference when running qemu-kvm VMs on cow filesystems is to export the rootfs for the VM over NFS rather than using a disk image.

Years ago when I started running the VMs this way, clustered LVM wouldn’t allow both live snapshots and live migration. It was something to do with the LVM volumes having to be locked in “exclusive” mode, but I don’t remember the details. Using NFS on cow, you get the best of both worlds (though I’ve been using ZFS, not Btrfs; I don’t know how well or if such a setup would work with Btrfs).

1 Like