BTRFS restore with ext4 formatted /boot

My system disk is partitioned as follows:

├─nvme0n1p1 259:1    0   600M  0 part /boot/efi
├─nvme0n1p2 259:2    0  1023M  0 part /boot                  #ext4 formatted
├─nvme0n1p3 259:3    0  51.2G  0 part /home                  #ext4 formatted
├─nvme0n1p4 259:4    0    16M  0 part 
├─nvme0n1p5 259:5    0  48.8G  0 part /mnt/Windows
└─nvme0n1p6 259:6    0 131.3G  0 part /                     #BTRFS filesystem with snapshots

My concern is that if I wanted to restore a BTRFS snapshot, /boot won’t be affected due to being a separate partition. Will this cause me problems? In case it will, what is the suggested solutions?

1 Like

Long answer:

A question with any rollback strategy must also answer what not to rollback. If /boot were on a subvolume that we snapshot, rolling back means we rollback BLS snippets in /boot/loader/entries and we need some way for GRUB blscfg.mod to now read and present multiple generations of /boot/loader/entries. There is almost no development interest at the moment for enhancing/expanding the bootloader menu. It already confuses users, it’s not a great environment for UI/UX including a11y and i18n support. Also, on BIOS, a significant amount of GRUB modules are located in /boot/grub/ which then means many snapshots of the bootloader, with versioning becoming disconnected with the core.img embedded in either the MBR gap or BIOS Boot partition.

If /boot is on Btrfs, then /boot/grub2/grubenv cannot be written to by GRUB in the preboot environment. There is a variable in grubenv, boot_success which GRUB resets to 0, and later in user space a service makes it a 1 if the boot gets to a certain point. The idea of this is, if boot fails, then boot_success=0 and GRUB will then disable the GRUB menu so the user can make a choice other than the default choice, which likely just fails again. If boot_success=1 then GRUB menu is hidden. The reason grubenv is not writeable when on Btrfs is how GRUB writes to grubenv is just by directly writing to the block making up that file, it doesn’t write through a file system driver. On Btrfs, the file contents changing without updating checksums means it’s indistinguishable from corruption. Therefore, GRUB knows to disallow writes to grubenv on Btrfs (and on LUKS, mdadm raid, LVM, and LUKS). Ok so that’s a missing feature we’d have to figure out a work around for, and we are, but we haven’t decided on a solution with upstream yet.

There might be a dozen more examples. So it’s really a tangled web just to lay it all out and explain the tradeoffs. And what time frames are. And what resources are available.

It’s a bit of a holding pattern because there’s so much interaction between so many other things that need work in the bootloading space.

These are unrelated projects that we kinda have to somehow figure out how some or all might work together, but we’re still in the design phase. Like, what would this look like? And then there’s a bunch of work in the installer, docs, you name it, to make it actually happen.





Boot Loader Spec

Boot Loader Interface

Snapper is also a consideration. But one of the things we’re really trying to focus on is simplicity, and just doing the right thing automatically when the wrong thing happens. We don’t want to create such a complicated storage stack that users can’t follow how it works. If an update fails, we should be able to automatically delete the snapshot containing the update attempt, rather than bother the user with having to fix things. We should be able to test updates (maybe some combination of “booting” it in a container, or a small qemu machine, and see if it gets to a certain milestone in the startup process) before we make them active. We should be able to do updates off to the side, so that users aren’t waiting or interrupted for them.

Short answer:

Yes it can be a problem in that /boot will only ever have the three most recently installed kernels, and the only snapshots with matching kernel modules will be the most recent ones. The farther back in time you go, the greater the chance the system root (mounted at /) will be too old, will not contain matching kernel modules for what’s on /boot.

There isn’t a great work around for this in Fedora, at least not one that I can explain. You could manually make your Fedora installation storage stack the same as e.g. (open)SUSE, and setup Snapper and GRUB like they do. They’re using a layout right now that’s comprised of about a dozen subvolumes, as a way to carve out the areas that are and are not subject to snapshotting and rolling back. It’s easier to just use openSUSE because the way things work is really that different.

So if I restore a snapshot with kernel not 3 versions old (a kernel that shows in the boot menu) there should be no problems?
Most of the times when I need a rollback it is to the most recent non problematic snapshot.

Question: Why is it so complicated to provide a system ready for Btrfs snapshots on Fedora?
There are several Linux distros out there that already provide this functionality and convenience for users, they have no problems with the boot loader (excluding openSUSE):

Can’t we just do what they do?

It should not be a problem, as long as you’re picking a kernel in GRUB with matching kernel modules in the root file system you’re rolling back to.

The existing implementations make different assumptions.

Fedora made choices before the Btrfs by default decision that other distributions didn’t make: (a) Boot Loader Spec by default; we no longer have GRUB menu entries in the grub.cfg, they are in /boot/loader/entries as individual snippets per kernel (b) using a Btrfs “flat layout”, where other distros use the “nested” layout.

So we’d either have to change our bootloader and Btrfs layout, or we have to modify the snapshotting tools to accommodate what we’re doing. And either of those are non-trivial effort.

Right now there’s more momentum behind the transactional updates method. This means an update is not applied to the currently running system, but to a completely separate and inactive root. Only once the update completes successfully, is the root switched. This means we often have the ability to avoid rollbacks, as well as user intervention. If the update fails, we can just discard the broken half updated root. But if the update succeeds, we have the option for manual rollback later.

Basically, we’re learning from the failures and successes of prior efforts, and innovating a new path forward, rather than just duplicating existing efforts.

1 Like

This sounds even better than booting a broken system then restoring from a snapshot to fix whatever went wrong. I guess this feature will rely on BTRFS snapshotting, will it be available to ext4 filesystem?

Is there an ETA for transactional updates?

There’s already rpm-ostree, e.g. Silverblue and Kinoite that can work on any file system.

To do a conventional RPM distro like Workstation or KDE spin as they exist today, but add transactional updates - it could be done with LVM thin provisioning as the snapshot mechanism. It can be accommodated if there are folks interested in doing the development and testing work. But yeah the emphasis is on Btrfs just because the snapshots are so cheap and there’s a metric ton of real world usage already, both on the desktop and on servers.

Is there an ETA for transactional updates?

No, it’s still in the idea phase. There could maybe be a spin sooner than later to demonstrate it, but I don’t have an ETA for that either.

I prefer Fedora workstation KDE spin, I may try Kionite later.
Thanks for the information.