Btrfs push with Fedora - where does that leave Silverblue?

Hi all,

I’m surprised there isn’t a topic on this already. Recently Ben Cotton, the program manager of Fedora, proposed a switch to Btrfs as default for pretty much all desktop versions of Fedora. The reasons mostly have to do with providing features to users, using an entire disk to maximize space (vs partitioning), and increasing the life of storage devices. There are more and they are all agreeable reasons.

What I am curious about is how much overlap is there with the work already put into Silverblue? (For example, Linux Unplugged brought up openSUSE, which integrated the Btrfs snapshots into transaction-style updates a la Silverblue.) How much double work has there been / will there be? If Red Hat starts supporting Btrfs again, what will the value of Stratis be? What about all the paid labor on XFS, what has that produced?

Apologies for all the questions, I promise they are good in nature and intention. It is just I have been using Silverblue for almost two years thinking I have been contributing to the future. I thought Stratis and XFS were our way forward.

1 Like

AFAIK, which isn’t really saying much since I am definitely not in the inner circle of these types of decisions, this is choosing the default for new installs, XFS and ZFS and likely Ext4 will be available as well. Also, I would note that there are good reasons to replace Ext4 as noted at https://opensource.com/article/18/4/ext4-filesystem. Aside from that, I was interested in BTRFS from the first time I read about it, having suffered through some data corruption at the time, but that was a decade ago or more. XFS and ZFS I am not familiar with, but will read up on them for my own edification. I don’t know if @walters or @otaylor could answer your questions on the topic, but they would likely have a much clearer idea of the answers you seek. Maybe they’ll notice me tagging them :wink:
[edit] there is also this paper on XFS failures http://pages.cs.wisc.edu/~vshree/xfs.pdf

https://github.com/rhinstaller/anaconda/pull/2720 added support for rootflags in Anaconda when using ostree, which was the remaining blocker for using btrfs with Silverblue. Once that makes it to compose it should resolve https://bugzilla.redhat.com/show_bug.cgi?id=1753485 as well. cc @ngompa @cmurf

Hello @dcavalca,
Thanks for the work on that and thanks for posting here about it. I understand from the bugzilla comments, especially from the comment by Tomas Kovar, that this is only part of the fix to achieve a Silverblue setup with a BTRFS filesystem only. The intermediate solution seems to be a gpt layout like /boot/efi as Vfat partition, /boot as Ext4 partition, and / as BTRFS partition with a /var/home sub-volume. Is that a correct interpretation?

That’s correct, and it matches the setup in the change proposal: https://fedoraproject.org/wiki/Changes/BtrfsByDefault
Other scenarios like /boot on btrfs are still being worked out.

I blogged on this here: https://blog.verbum.org/2020/07/14/on-btrfs/

3 Likes

Hello @walters,
I read your blog piece, would you be interested in doing such an article as you suggest in your post, for the mag? I mean really I feel your eminently qualified on the topic, and the points you bring forward about BTRFS in use on a WS (whether SB or just plain WS) are valid in my opinion and certainly on topic. Another approach might be you suggest the issue here on our magazine discussion area and us editors try to find someone to write it, or we (the editors) write it. At least you have presented some technical reasons for using or not using BTRFS from the user POV, every other thing I can find about use of it is pretty much from it’s use in a large scale server POV.

The (open)SUSE use case is significantly desktop/laptop use case. They’ve been Btrfs by default on sysroot for six years. And for almost two years for /home as well, and report fewer complaints following that change.

Value of Btrfs for system root: compression, discard=async, data integrity.

Compression saves space, reduces write amplification, and can improve performance.

Most file systems today still don’t enable discards by default at mount time, mainly due to concern over firmware bugs. There is a problem with giving the firmware too many hints about freed up blocks, which slows down normal operation, sometimes badly. And there’s a problem with no hints at all, which can result in cheaper SSDs running out of blocks ready (erased) to accept full speed writes, and thus slowing down normal operation. The async discard feature in Btrfs is an answer to this problem. Enough hints to keep the SSD performant, without overwhelming it.

Data integrity means getting an I/O error rather than bit flip, or worse corruption, simply being left up as an exercise for user space to deal with properly (i.e. a crash, transient strange behavior, maybe further corruption - who is to say, we can’t even predict how random corruptions manifest in executables, libraries and configuration files). It’s a rare case but more likely to hit your data than the file system itself, because there’s more data on a drive than file system metadata.

Btrfs offers reflink copies, useful for container use cases with overlayfs. Sometimes snapshots are even cheaper than that, depending on the workload. And pretty much all of the original reasons CoreOS moved away from Btrfs in 2014 have been solved - the pernicious ENOSPC problems took a significant dive following the 2016 merging of a new ticketed enospc infrastructure. Since then it’s been about identifying edge cases. I never balance or defragment any of my Btrfs file systems, no incantations, no rituals, etc.

There does seem to be a significant performance hit to certain postgresql databases that Btrfs developers are looking at - but this doesn’t happen in every case and doesn’t happen with all database engines either. I roll my eyes at most Phoronix benchmarks but the SQLite benchmarks on kernel 5.8 suggest btrfs is neck and neck, or maybe even occasionally beats ext4, even without nodatacow set. Even without compression.

It is recommended that VM images have nodatacow set. But such per directory or per file optimization can be done on behalf of the user - we have a few ideas how to do that so, again, mortal users who have better things to do anyway, don’t have to mess around with it.

2 Likes

Okay, I already told @otaylor I was going to switch my fs over to btrfs so I am going to do it this week before I chicken out. I really wanted to use it when it first came out, but shied away after seeing the issues. I looked at those Phoronix tests, they seemed skewed in favour of ZFS, and would tend to trust SQLite benchmarks more. From a Silverblue POV, it seems like the right thing to do, especially in light of how many times I have to answer someone about running out of space in their root or home.
[Edit 2020-07-16 15:24 EDT]
So I am looking at the btrfs recommended layout and I have questions on how this might work with what I got… System details are …
Gigabyte UHD board with a AMD Phenom CPU. The board has SMBIOS
/dev/sda - SSD 240GB
/dev/sdb - SSD 240GB
/dev/sdc - HDD 1TB
Any suggestions on partitioning for a BTRFS install of Silverblue32? I already have the boot partition set on /dev/sda1 as ext4 @ 1.1GB. But I typically just use ext4+LVM for the rest of my available space. So,in btrfs I am going to have 3 volumes I would surmise, then a layout of subvolumes. I could use a little guidance on suggested partitioning from someone currently familiar with btrfs. BTW, I feel this is a reasonably probable scenario that users are going to need to face.

The kernel 5.8 Phoronix tests I’m referring to don’t include ZFS.

Unfortunately side by side Fedora’s sharing the same drive is difficult. It doesn’t matter what the file system is, it’s just really designed for one Fedora at a time - by default. Of course, it’s possible to rig a system by doing some post-install heroics. This is going to improve in Fedora 33. I have a proof of concept for F32 + F33 side by side, but it needs testing.

Yes the test I read was for earlier kernel version. [Edit] yeah that was definitely a different test.
I guess my question was not as clear as I wanted to make it. I only have one OS, Silverblue and I want to keep it that way. What I was wanting to know was what might my suggested partitioning be with the following …

/dev/sda -> SSD 240GB
/dev/sdb -> SSD 240GB
/dev/sdc -> HDD 1TB
I like having my home on the spinning disk to reduce read/write to the SSD’s. I don’t mind using the SSD drive in a Raid (?1) arrangement and I definitely would like them to be where the OS is, since the R/W frequency should be less with SB. I would think that making /dev/sdc a volume on it’s own makes sense and having it setup as subvolume /var would be okay since home would then fall in to that. But I am uncertain of whether btrfs will treat the two SSD’s as one, or is it more likely that it will look at them as two different volumes?

Ahh, sorry about the misunderstanding. I think your idea sounds fine. There’s nothing wrong with it. It’ll be a bit tricky to do it in Custom or Advanced-custom partitioning. There are caveats to the raid1 setup: one is if you have UEFI, there isn’t an easy or reliable way to create and maintain identical EFI System partitions; and also Btrfs doesn’t currently supported automatic degraded raid1.

Possible alternatives depend mostly your preferred speed, if you want a transparent “set it and forget it” approach, or if you want to get a bit more aggressive. The Fedora 32 defaults are reasonable, but not optimized for specific use cases. The Fedora 33 defaults will be different.

For example: If your workload is typical desktop/laptop use, and the SSDs are typical consumer desktop/laptop, chances are their useful life is years, with any file system. Btrfs transparent compression will reduce writes, and extend their life significantly. You could then investigate btrfs send/receive style backups to the HDD.

… if you have UEFI, there isn’t an easy or reliable way to create and maintain identical EFI System partitions …

FYI,

It is very beta, but I recently created a set of scripts for doing just that here if anyone is interested:

I’d be happy to work with someone to get that integrated with Fedora. It may need some reworking to work with grub though.

There are a few moving parts to try and make things more robust and consistent. This is a new project that just got started within Fedora CoreOS. I’m not sure it’s decided whether it will do the syncing, or what would. There’s also fwupd which is in this same space, but with a different mandate.

“chances are”, yes, but having a mirrored setup greatly reduces the chance of losing any data if one of the SSDs fail. I would like to see Fedora have better support for mirrored drive configurations. As an added bonus, mirroring the drives usually doubles the read performance of the system overall, so programs load faster and the system will seem “snapper”.

The particularly nasty thing about SSDs is that when they fail, they tend to fail all at once – one moment you have everything and the next moment you have nothing. This is different from the spinning rust disks that tended to develop more and more bad sectors over time that were sometimes recoverable with repeated tries. I’ve seen enough SSDs fail long before their expected lifetime is up (even server-grade SSDs), that I really think mirroring them is worthwhile.

It is too bad that btrfs doesn’t support automatic degraded raid1. Hopefully that feature will be implemented sometime soon as well.

Cool. I just took a look at it, but yeah, I don’t see anything specific about sync’ing the ESPs in there. Anyway, if anyone wants to “borrow” any ideas about how to sync the ESPs from the scripts I’ve written and integrate that into the new bootupd, feel free. :slightly_smiling_face:

Hello @cmurf,
Thanks for the response. I have BIOS (SMBIOS specifically) and although most would leave it as MBR WRT the partitioning table I would prefer to use GPT.
Currently with Silverblue I set the SDD /dev/sda1 as a 1GB /boot partition, would I have to do the same on the mirrored /dev/sdb(1) SDD if I wanted to use a RAID 1 setup with the two SSD’s? Forgive my ignorance, I have only ever dealt with RAID drive storage when I helped out the IT dept with backups during vacation coverage at a factory I was the maintenance dept’s programmer with over 15 yr’s ago. Seems everyone wanted the last two weeks in July off for some reason.
As for workload, I use my desktop in what I consider typical fashion for me, but I would think it reflects a similar usage pattern to many others using Fedora Silverblue.

Hello Greg,
Thanks for the link I will definitely look at what you are doing and feel free to offer suggestions on the layout, as I noted above my knowledge of RAID is dismally insufficient to make a “correct” decision, but it never hurts to at least make an informed one.

I should better qualify what I mean by “no automatic degraded raid1”. Assume a 2-device Btrfs raid1, and one of the drives starts having some problems. While the Btrfs doesn’t go “degraded” literally, it does continue to function. You’ll just see some scary errors in dmesg reported for one of the drives, and Btrfs will keep trying to read and write to it, and even try to do fix ups (self-heal) as it detects errors. It doesn’t have the same concept of “kicking” a drive out like md raid, mainly because there could still be good data on that drive.

What I meant by ‘automatic’ is booting degraded with a failed drive. For mdadm setups, there’s a udev rule and dracut script that waits for ~2 minutes during startup for all devices to show. If devices fail to show, this script attempts degraded assembly. The functional equivalent of this is being explored for Btrfs, but there’s some work to do.

I would prefer to use GPT

You can use inst.gpt boot parameter when booting the installer media, and it will favor GPT on BIOS. But you have to fully remove all volumes using the installer UI for it to be willing to destroy the MBR and create a GPT.

Currently with Silverblue I set the SDD /dev/sda1 as a 1GB /boot partition, would I have to do the same on the mirrored /dev/sdb(1) SDD if I wanted to use a RAID 1 setup with the two SSD’s?

The installer makes it optional, not required, for /boot to be on an a (mdadm) RAID device.
(GRUB supports Btrfs raid1 for ~9 years, the installer might not allow this right now.)