Is there a "plan" to integrate BTRFS snapshots in Fedora ( that appear in bootloader / grub )?

I agree that it is complicated, but I think you might be over complicating things a bit.

Again, this is just my opinion on how things should be done.

Because none of those are FAT16/32.

This is where I think you are over-complicating things. The bootloader doesn’t have to do/know anything special. A couple very simple systemd services in dracut are sufficient to both snapshot and rollback the rootfs. I have written and use such scripts for ZFS on my own Fedora Linux systems (Add bootfs.snapshot and bootfs.rollback kernel parameters by gregory-lee-bartholomew · Pull Request #10198 · openzfs/zfs · GitHub). There is no requirement to do anything special in either the bootloader or on the desktop. The snapshot or rollback operations are initiated by specifying an option on the kernel command line.

I use BLS (even on my a few of my BIOS systems with a custom patch). It is not a problem.

OK. That is not an approach that I have considered. But it seems a little overkill to me. I’m not taking snapshots until after the updates have been installed. If the updates were to fail or damage my system in some way, I would have to rollback further – to the previous successful update – and then retry all the updates from that older point. I don’t consider this requirement unreasonable, but I can see where it might be a problem. The “if that fails” condition in you suggestion could be tricky to identify. For that reason, I don’t think I would want to take that approach.

I agree with that. I just think the work should be done in dracut and the kernel+dracut should not be on the CoW file system (it should be on the ESP). Dracut is already a very powerful out-of-band rescue environment on its own. Use it. :slightly_smiling_face: I’ve been doing so for a few years on my ZFS-based Fedora Linux systems and it works great.

Just my two cents.

As a user, I am expecting something very simple (which I can use relatively soon):

  1. In Boot Menu, there is one entry marked as “Last Known Good”, which I can fallback to manually if needed.
  2. After booting “Last Know Good”, I can invoke a program to change next boot default - like bootctl --set-default=current
  3. I might not wrong “bad” snapshots to be removed automatically - as I might want to troubleshoot it. Thus might be another program - snapshotctl can be used to manage snapshots - like auto create by event trigger, timer trigger, etc, and auto remove by timer / entries count / manually, etc. It will be helpful if each snapshot will have status like good or bad, last booted, etc.

If the system can boot into last know good entry automatically if the update failed to boot is a bonus. Even without this bonus, it is still much better than the current setup.

1 Like

If I may have my say, as a simple user, I have been using Btrfs for many years, previously on Ubuntu and now on Fedora.
I’ve always used Timeshift, it’s a simple tool and does its job when needed, you can work from a GUI or CLI.
To give Fedora users an immediate advantage with Btrfs, isn’t it better to start proposing and running Timeshift by default on fedora?
Mint post installation or from the update tool, proposes to take a snapshot with Timeshift before performing an update.
Perhaps for the moment it is the easiest and most ready way to take advantage of Btrfs snapshots?

https://linuxmint-installation-guide.readthedocs.io/en/latest/timeshift.html

1 Like

But how does someone get to Timeshift if the system is broken? Would it allow a rollback of the entire OS? Say, for example, that someone wanted to try out Fedora Workstation 34 beta and then rollback to Fedora Workstation 33. Would Timeshift allow them to do that?

Also, I think a distinction should be made between a backup system and the recovery options. What I’m interested in is a robust and convenient recovery system (“last known good”). It should only maintain a few recent snapshots of the operating system (something on the order of 3 to 6 is quite sufficient). I would discourage this from being something that keeps massive backlogs of old installations on the user’s system. It should not be like email where old ones never get deleted until the system collapses under its own weight. If people want backups, that should be a different tool that sends the snapshots to an external device of some sort.

My two cents.

1 Like

If you can’t boot the system, you can retrieve the snapshot from a fedora live + Timeshift.
If the system starts partially, you can recover it with the timeshift CLI, if the system starts but with problems, just one click to restore.
There is a project to put Timeshift snapshots on GRUB, they use it on Garuda Linux.
Timeshift is software created for system snapshots only, not for data backups.

That’s not quite good enough for my needs. As a sysadmin, I have a half dozen physical servers that I remotely administer. The servers are configured with ipmi such that I can remotely reset them if they are hung and I can remotely access the boot menu over Serial over LAN. I need to be able rollback the server’s operating system without having to physically interact with the system and I need to be able to do it from a command line interface even when the system cannot mount its root file system.

The recovery system that I currently have implemented with a ZFS root file system and dracut installed on the ESP allows for the above.

1 Like

I don’t want to be restricted to grub. The system I currently use works with any bootloader. I currently use it with systemd-boot and syslinux. I don’t use grub because it doesn’t work properly/reliably with ZFS.

1 Like

I am referring to a use case for the user desktop running Fedora.
I’m not saying it has to be the definite solution, but pre-install it and make it work with the Btrfs subvolume layout on Fedora (there’s a fork and patch to make it work).
For example, recently I was very useful, for the test on Fedora 34, to check what worked, from the Timeshift GUI with a click I created the Fedora 33 snapshot> upgrade Fedora 34> test> Fedora 33 snapshot restore.
You can create manual snapshots, by hour, day, month, boot.
I agree to keep a few snasphots, I configure max 4 snapshots.

3 Likes

Some of that functionality is baked in rpm-ostree if you use Fedora Silverblue. I 'd suggest you give it a try in a VM to check how rpm-ostree rollback works.

1 Like

All answers above were very instructive, thank you for sharing!
I am still wondering if snapshots reachable in grub (à la Silverblue, or Opensuse) are considered by the main Devs team as a short term target?

I only recently begun working with Fedora (waiting official version 34 released to migrate my workstation). My tests in virtual machine and sandbox server seems to be OK using snapper and the project mentioned by Emanuele grub-btrfs. But please note I don’t have used it yet beyond simple tests.

Difficulty is without a pretty aggressive and rigid manager of snapshots, just providing boot entries is going to result in a lot of “forks” of the installed system and it’ll quickly become impossible to differentiate the various forks.

Perhaps BLS could be modified to allow a single boot entry file to contain a list of snapshots/subvols (for filesystems like BTRFS and ZFS that support them) rather than requiring a separate entry for each snapshot? The extra semantics would allow for boot loader UI that displays only a small list of bootable devices, but these would potentially be expandable to select a non-default subvolume entry if things went south. The software writing an entry file would still need to take care to order the snapshots usefully (presumably most recent first) and give them useful descriptive text of course.

There might be one potential issue with BLS type 2 entries. I don’t think Fedora Linux officially supports the type 2 UEFI booting yet, but it might be on the roadmap or Fedora Linux might want to consider it at some future point.

A type 2 BLS entry packages the kernel and initramfs together as a single EFI executable. This is particularly useful for secure boot because it allows all of the content (including everything in the initramfs) to be signed. But because it is a EFI executable that can be loaded directly from the system firmware, it must be stored on the ESP. The ESP is a distinct file system. Because Btrfs changes too much, it is highly unlikely that Btrfs will ever be supported for the ESP.

Because the kernel+initramfs are on the ESP, but the kernel modules are on the Btrfs (or ZFS) formatted root volume, an arbitrary snapshot cannot be chosen in the boot menu. The snapshot chosen must be one that contains the kernel modules needed for the kernel version selected in the boot menu.

IMO, the best option (and the one that I’ve implemented for ZFS), is to make a one-to-one correlation between the kernel-version and the recovery snapshot of the root file system. The way this is done for ZFS is that one sets a default kernel option – bootfs.snapshot – and that will take care of automatically creating a new snapshot of the root file system for each newly-installed kernel. The snapshot is named with the kernel version number so one can easily determine which snapshot goes with which kernel. When recovery is necessary, you make a one-time alteration of the kernel parameter interactively and change bootfs.snapshot to bootfs.rollback. Both the snapshots and the rollbacks are done from the initramfs stage before the root file system is mounted. So they are guaranteed to be clean (there is no chance that in-memory processes like databases have not flushed all their changes to storage).

I my experience, the above system works extremely well and reliably. I think the difficulty people run into is when they try to combine the boot-time recovery option with a backup system. The recovery system does not need to be (and IMO shouldn’t be) a backup system. Once you have used the recovery system to restore “something” in case of catastrophic failure, you can then (optionally) apply incremental backups to the recovery point to move the system state forward (or backward) to whatever point-in-time backup you want to restore. Moving the system state to a different backup/snapshot restore point can be done within a fully functional online (even graphical) environment.

Just my two cents.

1 Like

If we are talking about snapshot recovery and not snapshots being used for other data retention purposes, the fact that Fedora keeps 3 kernel versions around by default should be enough for most common use cases. Although it wouldn’t be bullet-proof it would be far simpler.

That being said, I have never had a single situation where I have wanted/needed to boot into an older filesystem snapshot. Even though I religiously take snapshots on both zfs and btrfs.

1 Like

I manage a quite a few Fedora Linux systems and (in the last five’ish years), I only had one case where I really needed to boot from a recovery snapshot of a Fedora Linux OS. Thankfully I had ZFS on root configured and I was able to rollback the root file system. The workstation had been hacked. There was some sort of elaborate self-spawning executable that would rename and respawn itself if you tried to stop/delete it. it was transmitting data to China. It got in due to a misconfigured PAM stack and an SSH port that was not properly firewalled. To separate the data we wanted to recover from the system from the virus/malware, it was a simple matter to rollback the root file system.

I’ve also seen many posts here on ask.fp.o where dnf updates have gone wrong (e.g. here and here) and being able to rollback the installation would have been much easier than the alternative of trying to get all the packages resynchronized. It is rare, but it can really be a lifesaver when you need it.

1 Like

It is also trivial to increase that number should you want to. It is the installonly_limit setting in /etc/dnf/dnf.conf. However, you will be limited by the size of the ESP as to how many kernels/recovery points you can keep. On my PC, I currently have this set to keep recovery snapshots for the last six kernels/OSs and that is using 334M.

[/home/gregory]$ df -h /boot
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1      1022M  334M  689M  33% /boot

I have mine set to 4 but I have two kernels installed

df -h /efi                                                                                                                            
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p1  949M  661M  288M  70% /efi

I don’t know why 2 kernels would use more than six, but FWIW, here is a more detailed look at my ESP:

[/root]# find /boot -printf '%p %s\n' | numfmt --field=2 --to=si
/boot 4.1K
/boot/8c76d196c474411a85814e376f2c30c4 4.1K
/boot/8c76d196c474411a85814e376f2c30c4/5.12.13-300.fc34.x86_64 4.1K
/boot/8c76d196c474411a85814e376f2c30c4/5.12.13-300.fc34.x86_64/initrd      48M
/boot/8c76d196c474411a85814e376f2c30c4/5.12.13-300.fc34.x86_64/linux      11M
/boot/8c76d196c474411a85814e376f2c30c4/5.12.12-300.fc34.x86_64 4.1K
/boot/8c76d196c474411a85814e376f2c30c4/5.12.12-300.fc34.x86_64/initrd      48M
/boot/8c76d196c474411a85814e376f2c30c4/5.12.12-300.fc34.x86_64/linux      11M
/boot/8c76d196c474411a85814e376f2c30c4/5.12.14-300.fc34.x86_64 4.1K
/boot/8c76d196c474411a85814e376f2c30c4/5.12.14-300.fc34.x86_64/initrd      48M
/boot/8c76d196c474411a85814e376f2c30c4/5.12.14-300.fc34.x86_64/linux      11M
/boot/8c76d196c474411a85814e376f2c30c4/5.12.15-300.fc34.x86_64 4.1K
/boot/8c76d196c474411a85814e376f2c30c4/5.12.15-300.fc34.x86_64/initrd      48M
/boot/8c76d196c474411a85814e376f2c30c4/5.12.15-300.fc34.x86_64/linux      11M
/boot/8c76d196c474411a85814e376f2c30c4/5.12.17-300.fc34.x86_64 4.1K
/boot/8c76d196c474411a85814e376f2c30c4/5.12.17-300.fc34.x86_64/initrd      48M
/boot/8c76d196c474411a85814e376f2c30c4/5.12.17-300.fc34.x86_64/linux      11M
/boot/8c76d196c474411a85814e376f2c30c4/5.12.11-300.fc34.x86_64 4.1K
/boot/8c76d196c474411a85814e376f2c30c4/5.12.11-300.fc34.x86_64/initrd      47M
/boot/8c76d196c474411a85814e376f2c30c4/5.12.11-300.fc34.x86_64/linux      11M
/boot/efi 4.1K
/boot/efi/boot 4.1K
/boot/efi/boot/bootx64.efi   92K
/boot/efi/syslinux 4.1K
/boot/efi/syslinux/ldlinux.e64   148K
/boot/efi/syslinux/libcom32.c32   186K
/boot/efi/syslinux/libutil.c32   25K
/boot/efi/syslinux/syslinux.efi   197K
/boot/efi/syslinux/vesamenu.c32   40K
/boot/efi/systemd 4.1K
/boot/efi/systemd/systemd-bootx64.efi   92K
/boot/efi/efi 4.1K
/boot/efi/efi/boot 4.1K
/boot/efi/efi/fedora 4.1K
/boot/loader 4.1K
/boot/loader/entries 4.1K
/boot/loader/entries/8c76d196c474411a85814e376f2c30c4-5.12.15-300.fc34.x86_64.conf 351
/boot/loader/entries/8c76d196c474411a85814e376f2c30c4-5.12.13-300.fc34.x86_64.conf 351
/boot/loader/entries/8c76d196c474411a85814e376f2c30c4-5.12.14-300.fc34.x86_64.conf 351
/boot/loader/entries/8c76d196c474411a85814e376f2c30c4-5.12.17-300.fc34.x86_64.conf 351
/boot/loader/entries/8c76d196c474411a85814e376f2c30c4-5.12.11-300.fc34.x86_64.conf 351
/boot/loader/entries/8c76d196c474411a85814e376f2c30c4-5.12.12-300.fc34.x86_64.conf 351
/boot/loader/random-seed 512
/boot/loader/loader.conf 55
/boot/grub2 4.1K
/boot/grub2/themes 4.1K
/boot/grub2/themes/system 4.1K
/boot/syslinux.cfg 957

Because I have 4 versions of each kernel.

On that machine, I am also dual-booting Arch out of a single zpool but that is a different story…

2 Likes

Out of frustration due to the lack of btrfs integration, I wrote a little tool called btrfs-upgrade-snapshot-service. It comes with an rpm package that you just have to install, then it’ll try to find your btrfs root filesystem and set up a service that automatically creates a snapshot whenever the system is upgraded to the next release. So if you’re experiencing difficulties after an upgrade (something is broken or disappeared), you’d run btrfs-os-snapshot restore <NAME> and after a reboot, you’d have your old system back.

To install it, download and install the rpm file and run btrfs-os-snapshot list once just to make sure it found your btrfs filesystem. It would complain if your system isn’t compatible. If it doesn’t complain, it’s installed and ready.

You can also use it to create snapshots manually as described here:

It only handles snapshots of the root filesystem and it includes and restores /boot which is usually a separate ext partition that’s sometimes forgotten. It doesn’t do much more but I hope it helps someone out there. I also hope this doesn’t count as spam.

2 Likes