Dnf + BTRFS Snapshots: what stands in the way?

There is also another issue: GRUB is not configured in Fedora to use the bootloader slack space in the Btrfs volume for grubenv by default. I’m not actually sure how to get it to use that space for GRUB variables (like the autohide menu thing), but once we can do that, the /boot volume on ext4 can be eliminated for a btrfs subvolume.

2 Likes

For your information, the openSUSE community has created a tool to support Btrfs snapshots. Maybe one day we could use it on Fedora.

3 Likes

We are in some ways a victim of our own success. If updates or upgrades were more risky, we’d be in a better position to mitigate that risk with snapshots and rollbacks.

A most imperfect metaphor for implementing any snapshot+rollback regime: we’re going to need to dance in a closet full of open tubes of toothpaste with a mandate to not make a mess.

As is often the case “what problem are you trying to solve?” Are we presuming a solution, and then trying to find problems it fixes?

There could be 80 valid snapshot and rollback designs, each with tradeoffs. Any design will risk scope creep. Such designs differ depending on whether the emphasis is on user data or on system uptime/recovery from a bad update.

We need to accept some iterations. But iterations in a released edition of Fedora binds us to supporting the ensuing layout for a good deal of time. We have no official policy that I’m aware of on how many years we would(n’t) support a layout. We test Fedora n-1 and n-2, so perhaps unofficially it’s 2 releases. But were we to actually tell people layout A is no longer supported you have to clean install - that’d be a first and it’d be unwelcome.

Most any design related to system rollbacks touches bootloader stuff, which is hilarious madness.

All of this vaguley suggests we start first with a spin. Or perhaps this becomes the default behavior of Fedora Rawhide when on Btrfs, and such a feature is disabled (initially) with released editions. Or possibly even focus on protecting/preserving user data rather than the system. We can reinstall the system.

2 Likes

Perfect timing! I took advantage of the snapshot feature due to a regression in kernel 6.11. Yes, I could have selected a different kernel from GRUB, but I quickly resolved the issue by rollback to a few hours ago and updating with dnf while excluding the kernel update :slight_smile:

sudo dnf --refresh  distro-sync --exclude=kernel\*

3 Likes

Yes, but this would have been solved by choosing a different kernel in grub to get a working system, plus - possibly - dnf history undo, which kinda proves @chrismurphy 's point.
It also shows that you had to redo unrelated updates.

Really, snapshots show their strength when the system cannot be booted otherwise, or there are more changes than just package updates (e.g. config changes left by an update or unrelated). In that case they may even “work better” than rebasing/pinning an atomic distro because a snapshot typically includes /etc.

I may not be a very experienced Linux user, but I’ve used sysguides’ articles and videos for my installation. As far as I can tell, there is nothing suspicious about them and they are incredibly useful. I managed to get full disk encryption (including boot) and snapper to work, basically what I would get from Tumbleweed. He is also very responsive in the comments on his site when it comes to troubleshooting.

Snapshots have been working well, as far as I’ve tested.

To be specific, I used the Fedora 39 video and accompanying article: https://www.youtube.com/watch?v=JvfCieWkXxI

Snapshots that work are just easy to understand. You can roll back to a previous “known good” if needed or wanted. That gives peace of mind :slight_smile:

It is not only for failed updates. If you want to experiment with a new package or system setup that includes configuration and/or uninstalling packages it replaces, it can get quite complex to “roll back” using other tools. Before starting an experiment like that just do a manual snapshot, and you can safely move on. If the experiment is not succesful, roll back to the manual snapshot and all is good :+1:

It is a very good idea to have home directories as an optional/separate option so you can include them if you have used an application that heavily modifies ~/.config or similar, and exclude them for system only rollbacks.

While I have indeed used snapper to roll back failed updates on Tumbleweed (and moved to Fedora after needing that a bit too often :laughing:), on Debian I almost exclusively used timeshift for safely experimenting with my servers.

I find timeshift simpler/easier than snapper, but timeshift is (to my knowledge) limited to a very specific BTRFS layout (@ and @home).

/Jaybe

2 Likes

Just to provide a counter example, I just had a user experience a failure after a package update that they were not able to resolve with dnf history undo. Maybe this is an example where automated Btrfs snapshots could have saved the day (and made the troubleshooting much easier): Wifi disapered after installing updates

2 Likes

So to brainstorm about implementations.

Currently Fedora adds grub entries per kernel version.

This mechanism would be needed turned off, so that there is no duplicate mechanism.

But having an older kernel for sure would be good.

Fedora Atomic Desktops dont save an image with an older kernel automatically, a single update with the same kernel version will remove it.

This is suboptimal I think.

Any ideas on how to tie kernel differences and keeping snapshots together?

I’d use the last installed kernel version as the version number for the snapshot and store it as part of the snapshot name (e.g. Fedora-Linux-6.10.11-200.fc40.x86_64). Everything you need to regenerate the boot entry on the ESP should already be in the snapshot (the kernel is stored at /usr/lib/modules/<kernel-version>/vmlinuz and the initramfs can be regenerated via something like chroot <path-to-snapshot> /usr/bin/dracut <path-to-esp> <kernel-version> if need be). I think it is just a matter of writing some scripts and a dracut module to store them in the initramfs by default so that they are available from dracut’s rescue environment. You could also set things up so the scripts could be triggered via a kernel parameter like btrfs.rollback for convenience.

Edit: Actually, the kernel-install script will take care of regenerating the initramfs and boot loader snippets, so you could probably just run chroot <path-to-snapshot> kernel-install ... if you need to. But you would only need to do that if the kernel for the snapshot that you are reverting to doesn’t already exist on the ESP. I expect that will typically not be the case and restoring things on the ESP will not be necessary.

Edit2: It looks like you’d have to add the chroot command to the initramfs for that to work. It is only 45K.

2 Likes