Hi all!
I had a problem upgrading, but found a workaround fix. I thought you’d like to know in case you’re seeing this too.
If you’re having a problem upgrading (even from a different build, possibly even for a different reason), you could read what I wrote here, or even skip down to the workaround fix and read that.
Preamble
I don’t know if the version I was stuck on had a problem or if it was just coincidental. The issue was with grub, which is an Achilles’ Heel of Fedora, especially Atomic versions. (The main issues I’ve had over the years running Silverblue have almost exclusively been grub related or podman/toolbox issues… but podman is a runtime thing, which can usually be worked around, and grub can prevent Fedora from booting or upgrading.)
Details
Details, with a fix that worked for me are below:
I thought I had been upgrading for a while, as it looked like I successfully upgraded each time, completely with a valid looking rpm-ostree status
afterward, but I was actually stuck on this version of Silverblue:
fedora:fedora/41/x86_64/silverblue
Version: 41.20250108.0 (2025-01-08T01:03:09Z)
BaseCommit: a1d3387406f94cbbaef15e34e5a0ced6b3b430a5dbd31a82f4aa8cf785800e06
GPGSignature: Valid signature by 466CF2D8B60BC3057AA9453ED0622462E99D6AD1
I believe it might be related to one, or more, or all of these reported issues:
- no longer able to update Silverblue due to grub2-mkconfig failing · Issue #3715 · coreos/rpm-ostree · GitHub
- https://github.com/ostreedev/ostree/issues/3198
- 2308594 – dynamic grub2-mkconfig incompatible with composefs
- 1289752 – atomic installation on btrfs results in "failed to write boot loader configuration" error
- Rpm-ostree updates broken. Can't rollback. grub2-mkconfig erroring out - #9 by ramblurr
But it might not be. They seem similar. They’re grub issues, where it can’t run for some unspecified reason. It could be the same reason (whatever it was) that I was hitting, or another grub related issue altogether.
While I didn’t see an issue in the logs on a few attempts, one of them showed this problem in the logs:
Feb 04 10:21:47 drought rpm-ostree[8975]: Process [pid: 8953 uid: 1000 unit: user@1000.service] connected to transaction progress
Feb 04 10:21:47 drought rpm-ostree[8975]: bootfs is sufficient for calculated new size: 0 bytes
Feb 04 10:21:48 drought rpm-ostree[9019]: /usr/sbin/grub2-probe: error: failed to get canonical path of `composefs'.
Feb 04 10:21:48 drought rpm-ostree[8975]: Txn Rollback on /org/projectatomic/rpmostree1/fedora failed: Bootloader write config: grub2-mkconfig: Child process exited with code 1
What I did first (troubleshooting)
- I checked
rpm-ostree status
and made sure I had old versions pinned first withsudo ostree admin pin 0
andsudo ostree admin pin 1
and so on. - I checked that there was enough space in boot
df -h /boot
(there was more than enough free space — each pinned deployment takes up space, but I only had a few pinned deployments; /boot was half full / half empty) - A standard upgrade. It looked like it worked before rebooting, but didn’t afterward.
- Reset to remove my overlays with
rpm-ostree reset -ol
… but it was the same issue: It looked like it worked until after a reboot. - Cleanup. After making sure I had some backup deployments pinned (I did), I ran
sudo rpm-ostree cleanup -mbpr
Workaround fix
- Make sure I had space in /boot, had some older deployments, and make sure old deployments were pinned. (See section immediately above for details).
- While booting, I held down shift. The grub menu came up.
- I used the arrow keys to select an older deployment that I did previously upgrade from.
- After my system was booted, I did an
rpm-ostree upgrade
and then rebooted.
These simple steps (basically: boot into another version and upgrade) fixed the problem.
Retrospective
Did I hit a grub issue in 41.20250108.0
? Perhaps. It could’ve been something on my system that build didn’t like. Or something else. I’m not really sure. Grub failed for whatever reason when I tried it on that build, but worked on the earlier build.
The issues I linked above go back a few years. This could be a one-off issue that hits some systems every once in a while, or it could be an issue with that specific deployment. But if you run into a problem, debugging it like this and using a simple workaround (which works for other problems you might hit too) by rolling back and updating is super useful and a strength of Fedora Atomic.