Problem updating from 41.20250108.0, and a fix (after some troubleshooting)

Hi all!

I had a problem upgrading, but found a workaround fix. I thought you’d like to know in case you’re seeing this too.

If you’re having a problem upgrading (even from a different build, possibly even for a different reason), you could read what I wrote here, or even skip down to the workaround fix and read that.

Preamble

I don’t know if the version I was stuck on had a problem or if it was just coincidental. The issue was with grub, which is an Achilles’ Heel of Fedora, especially Atomic versions. (The main issues I’ve had over the years running Silverblue have almost exclusively been grub related or podman/toolbox issues… but podman is a runtime thing, which can usually be worked around, and grub can prevent Fedora from booting or upgrading.)

Details

Details, with a fix that worked for me are below:

I thought I had been upgrading for a while, as it looked like I successfully upgraded each time, completely with a valid looking rpm-ostree status afterward, but I was actually stuck on this version of Silverblue:

  fedora:fedora/41/x86_64/silverblue
                  Version: 41.20250108.0 (2025-01-08T01:03:09Z)
               BaseCommit: a1d3387406f94cbbaef15e34e5a0ced6b3b430a5dbd31a82f4aa8cf785800e06
             GPGSignature: Valid signature by 466CF2D8B60BC3057AA9453ED0622462E99D6AD1

I believe it might be related to one, or more, or all of these reported issues:

But it might not be. They seem similar. They’re grub issues, where it can’t run for some unspecified reason. It could be the same reason (whatever it was) that I was hitting, or another grub related issue altogether.

While I didn’t see an issue in the logs on a few attempts, one of them showed this problem in the logs:

Feb 04 10:21:47 drought rpm-ostree[8975]: Process [pid: 8953 uid: 1000 unit: user@1000.service] connected to transaction progress
Feb 04 10:21:47 drought rpm-ostree[8975]: bootfs is sufficient for calculated new size: 0 bytes
Feb 04 10:21:48 drought rpm-ostree[9019]: /usr/sbin/grub2-probe: error: failed to get canonical path of `composefs'.
Feb 04 10:21:48 drought rpm-ostree[8975]: Txn Rollback on /org/projectatomic/rpmostree1/fedora failed: Bootloader write config: grub2-mkconfig: Child process exited with code 1

What I did first (troubleshooting)

  1. I checked rpm-ostree status and made sure I had old versions pinned first with sudo ostree admin pin 0 and sudo ostree admin pin 1 and so on.
  2. I checked that there was enough space in boot df -h /boot (there was more than enough free space — each pinned deployment takes up space, but I only had a few pinned deployments; /boot was half full / half empty)
  3. A standard upgrade. It looked like it worked before rebooting, but didn’t afterward.
  4. Reset to remove my overlays with rpm-ostree reset -ol … but it was the same issue: It looked like it worked until after a reboot.
  5. Cleanup. After making sure I had some backup deployments pinned (I did), I ran sudo rpm-ostree cleanup -mbpr

Workaround fix

  1. Make sure I had space in /boot, had some older deployments, and make sure old deployments were pinned. (See section immediately above for details).
  2. While booting, I held down shift. The grub menu came up.
  3. I used the arrow keys to select an older deployment that I did previously upgrade from.
  4. After my system was booted, I did an rpm-ostree upgrade and then rebooted.

These simple steps (basically: boot into another version and upgrade) fixed the problem.

Retrospective

Did I hit a grub issue in 41.20250108.0? Perhaps. It could’ve been something on my system that build didn’t like. Or something else. I’m not really sure. Grub failed for whatever reason when I tried it on that build, but worked on the earlier build.

The issues I linked above go back a few years. This could be a one-off issue that hits some systems every once in a while, or it could be an issue with that specific deployment. But if you run into a problem, debugging it like this and using a simple workaround (which works for other problems you might hit too) by rolling back and updating is super useful and a strength of Fedora Atomic.

2 Likes

This looks really similar to Remove all images with `ostree-2024.10-1.fc41` (across all variants) · Issue #712 · ublue-os/main · GitHub which is really weird. I don’t know what’s happening here. If you still have a system in this state, can you give me the output of findmnt /? Thanks

1 Like

I haven’t checked all the Silverblue systems here at home yet, but at least 1 was fine (which I update manually periodically), and my work system was the affected one, which is now fixed.

Since I have the possibly problematic version pinned, I can switch back to that and see if I hit the issue again and try to provide the output. (But it’ll have to be later.)

Also: If either of the other two systems here are affected (I’ll check tonight, as they’re personal machines — my desktop and my partner’s desktop), then I’ll also try from those before fixing them.

But, yeah, since I did see composefs in the logs as well (included above), that really could be the exact issue I hit. If that’s the case, switching back to that deployment will probably cause the issue again.

If the problematic versions are indeed booted with composefs enabled, then rpm -qi ostree rpm-ostree output would be great as well. You can try booting those with the karg mentioned in Remove all images with `ostree-2024.10-1.fc41` (across all variants) · Issue #712 · ublue-os/main · GitHub to disable composefs and then update normally.

1 Like

Personal desktop has the same issue, but it’s an even earlier compose of Silverblue, as it has been on testing:

● fedora:fedora/41/x86_64/testing/silverblue
                  Version: 41.20241222.0 (2024-12-22T02:38:53Z)
               BaseCommit: 8bf6d1415e909b032775ce6b5ee1f7495ba2df895b5d8adeb25472b18a396d15
             GPGSignature: Valid signature by 466CF2D8B60BC3057AA9453ED0622462E99D6AD1
TARGET SOURCE    FSTYPE  OPTIONS
/      composefs overlay ro,relatime,seclabel,lowerdir+=/run/ostree/.private/cfsroot-lower,datadir+=/sysroot/ostree/repo/objects,redirect_dir=on,metacopy=on
~ ❯ rpm -qi ostree rpm-ostree
Name        : ostree
Version     : 2024.10
Release     : 1.fc41
Architecture: x86_64
Install Date: Sun 22 Dec 2024 03:28:48
Group       : Unspecified
Size        : 635833
License     : LGPL-2.0-or-later
Signature   : RSA/SHA256, Fri 20 Dec 2024 14:19:16, Key ID d0622462e99d6ad1
Source RPM  : ostree-2024.10-1.fc41.src.rpm
Build Date  : Fri 20 Dec 2024 13:15:53
Build Host  : buildvm-x86-13.iad2.fedoraproject.org
Packager    : Fedora Project
Vendor      : Fedora Project
URL         : https://ostreedev.github.io/ostree/
Bug URL     : https://bugz.fedoraproject.org/ostree
Summary     : Tool for managing bootable, immutable filesystem trees
Description :
libostree is a shared library designed primarily for
use by higher level tools to manage host systems (e.g. rpm-ostree),
as well as container tools like flatpak and the atomic CLI.
Name        : rpm-ostree
Version     : 2024.9
Release     : 1.fc41
Architecture: x86_64
Install Date: Sun 22 Dec 2024 03:28:48
Group       : Unspecified
Size        : 15458681
License     : LGPL-2.0-or-later
Signature   : RSA/SHA256, Tue 19 Nov 2024 23:18:01, Key ID d0622462e99d6ad1
Source RPM  : rpm-ostree-2024.9-1.fc41.src.rpm
Build Date  : Tue 19 Nov 2024 21:36:49
Build Host  : buildhw-x86-06.iad2.fedoraproject.org
Packager    : Fedora Project
Vendor      : Fedora Project
URL         : https://github.com/coreos/rpm-ostree
Bug URL     : https://bugz.fedoraproject.org/rpm-ostree
Summary     : Hybrid image/package system
Description :
rpm-ostree is a hybrid image/package system.  It supports
"composing" packages on a build server into an OSTree repository,
which can then be replicated by client systems with atomic upgrades.
Additionally, unlike many "pure" image systems, with rpm-ostree
each client system can layer on additional packages, providing
a "best of both worlds" approach.

Since this is testing, and not stable, I can also rebase my work computer to the problematic commit if you’d like, in case there’s anything different there. But it’s probably similar enough, just a little earlier on testing.

Workarounds didn’t work

So, I followed the instructions @ https://github.com/ublue-os/main/issues/712#issuecomment-2618452811 to append the boot parameter to the grub command line, and it looks like I’m still in composefs mode:

TARGET SOURCE    FSTYPE  OPTIONS
/      composefs overlay ro,relatime,seclabel,lowerdir+=/run/ostree/.private/cfsroot-lower,datadir+=/sysroot/ostree/repo/objects,redirect_dir=on,metacopy=on

I then tried rpm-ostree update to see what might happen. It didn’t work.


Update

I rolled back and tried to upgrade. But this time… it went much worse than expected.

First, I tried an older Fedora 41 compose from December 1st. It wouldn’t boot. It started to, it asked for password, and it even showed a log when I hit escape. But when it tried to load gdm, it just flashed a lot and couldn’t do anything.

So, I thought: “No sweat. I have another pinned deployment.” It was a Bluefin deployment I was playing with a while ago. I rebooted into that, and it worked. I then rebased back to Fedora 41 from that. But when I went to shut down, after I deployed and was clicking reboot, I noticed it didn’t have my custom theme; it was blue. This must’ve been based on Fedora 40.

Anyway, after rebooting, I get a grub> prompt, no choice or anything. I can’t do anything now. I guess the deployment silently failed (it looked like it succeeded) and now grub doesn’t have a configuration.

I guess this means I either need to try to get it booting again somehow or reinstall, which would be a pretty big hassle.


Update 2

Meanwhile, I’ve booted Fedora Workstation live and I see ostree entries in my boot partition, but grub.conf is just a broken symlink, and there’s no actual grub.conf anywhere. From what I understand, it’s supposed to be at /boot/loader.0/grub.cfg and then /boot/grub2/grub.cfg is a symlink pointing to it. That’s not the case here; I just have a symlink and no backup or copy of an actual grub.cfg file anywhere.

I copied a grub.cfg from my work laptop over to my boot as grub-example.cfg. This is where grub.cfg should be, but it isn’t. The entirety of the directory tree looks like this:

.
├── boot -> .
├── bootupd-state.json
├── efi
├── grub2
│   ├── fonts
│   │   └── unicode.pf2
│   ├── grub.cfg -> ../loader/grub.cfg
│   └── grubenv
├── loader -> loader.0
├── loader.0
│   ├── entries
│   │   ├── ostree-1.conf
│   │   ├── ostree-2.conf
│   │   └── ostree-3.conf
│   └── grub-example.cfg
├── lost+found
└── ostree
    ├── fedora-269b3c31f73090051010ca487eafa0a0c01ccd117c8b4a5a662ffbaba167bf5c
    │   ├── initramfs-6.12.11-200.fc41.x86_64.img
    │   └── vmlinuz-6.12.11-200.fc41.x86_64
    ├── fedora-3cedb3a4e42df00ec8359ae569fe8e234101668924ff0631133c1d7eb969acdb
    │   ├── initramfs-6.9.12-204.fsync.fc40.x86_64.img
    │   └── vmlinuz-6.9.12-204.fsync.fc40.x86_64
    └── fedora-590b55689a362ee5ddc84bbb4628584e6663e785a0855da10478bfe46e42dd8a
        ├── initramfs-6.11.10-300.fc41.x86_64.img
        └── vmlinuz-6.11.10-300.fc41.x86_64

The entries for ostree-#.cfg look like this:

title Fedora Linux 40.20240810.0 (Bluefin-dx) (ostree:2)
version 1
options rd.luks.uuid=luks-24d86138-6b4c-4d85-b91b-c7f2439d715f rhgb quiet root=UUID=a626c941-4184-4de7-a79c-d93a38d8c699 rootflags=subvol=root rw ostree=/ostree/boot.0/fedora/3cedb3a4e42df00ec8359ae569fe8e234101668924ff0631133c1d7eb969acdb/0 preempt=full amd_pstate=guided bluetooth.disable_ertm=1
linux /ostree/fedora-3cedb3a4e42df00ec8359ae569fe8e234101668924ff0631133c1d7eb969acdb/vmlinuz-6.9.12-204.fsync.fc40.x86_64
initrd /ostree/fedora-3cedb3a4e42df00ec8359ae569fe8e234101668924ff0631133c1d7eb969acdb/initramfs-6.9.12-204.fsync.fc40.x86_64.img
aboot /ostree/deploy/fedora/deploy/069671ef412ac530f4e0ca1ada904d815bf1e3c77be42570b596b512967fcb16.0/usr/lib/ostree-boot/aboot.img
abootcfg /ostree/deploy/fedora/deploy/069671ef412ac530f4e0ca1ada904d815bf1e3c77be42570b596b512967fcb16.0/usr/lib/ostree-boot/aboot.cfg
title Fedora Linux 41.20241201.0 (Silverblue) (ostree:1)
version 2
options rd.luks.uuid=luks-24d86138-6b4c-4d85-b91b-c7f2439d715f rhgb quiet root=UUID=a626c941-4184-4de7-a79c-d93a38d8c699 rootflags=subvol=root rw ostree=/ostree/boot.0/fedora/590b55689a362ee5ddc84bbb4628584e6663e785a0855da10478bfe46e42dd8a/0 preempt=full amd_pstate=guided bluetooth.disable_ertm=1
linux /ostree/fedora-590b55689a362ee5ddc84bbb4628584e6663e785a0855da10478bfe46e42dd8a/vmlinuz-6.11.10-300.fc41.x86_64
initrd /ostree/fedora-590b55689a362ee5ddc84bbb4628584e6663e785a0855da10478bfe46e42dd8a/initramfs-6.11.10-300.fc41.x86_64.img
aboot /ostree/deploy/fedora/deploy/bc5888836c8d86bba95ff4b6cf4084998cc2ff79e5ded4de3dd3e0d418f20667.0/usr/lib/ostree-boot/aboot.img
abootcfg /ostree/deploy/fedora/deploy/bc5888836c8d86bba95ff4b6cf4084998cc2ff79e5ded4de3dd3e0d418f20667.0/usr/lib/ostree-boot/aboot.cfg
title Fedora Linux 41.20250203.0 (Silverblue) (ostree:0)
version 3
options rd.luks.uuid=luks-24d86138-6b4c-4d85-b91b-c7f2439d715f rhgb quiet root=UUID=a626c941-4184-4de7-a79c-d93a38d8c699 rootflags=subvol=root rw ostree=/ostree/boot.0/fedora/269b3c31f73090051010ca487eafa0a0c01ccd117c8b4a5a662ffbaba167bf5c/0 preempt=full amd_pstate=guided bluetooth.disable_ertm=1
linux /ostree/fedora-269b3c31f73090051010ca487eafa0a0c01ccd117c8b4a5a662ffbaba167bf5c/vmlinuz-6.12.11-200.fc41.x86_64
initrd /ostree/fedora-269b3c31f73090051010ca487eafa0a0c01ccd117c8b4a5a662ffbaba167bf5c/initramfs-6.12.11-200.fc41.x86_64.img
aboot /ostree/deploy/fedora/deploy/0db8702a35c6e7c05acfdc8e6c4ef718fd67bd673a33e9c24ab7ee18829f8a53.0/usr/lib/ostree-boot/aboot.img
abootcfg /ostree/deploy/fedora/deploy/0db8702a35c6e7c05acfdc8e6c4ef718fd67bd673a33e9c24ab7ee18829f8a53.0/usr/lib/ostree-boot/aboot.cfg

Is there any way to reconstruct a working grub.cfg (it doesn’t have to be complete or anything; just enough to work) from this information, based on one of the above, so I could have a working system again?

  • 41.20250203.0 is the one that can’t update
  • 41.20241201.0 was the one that booted, but blinked like mad when trying to load gdm (perhaps I could switch over to a vt or at least ssh in and run rpm-ostree still and then update to a working deployment?)
  • Bluefin-dx Fedora 40.20240810.0 (quite old) was the one that weirdly nuked my grub.cfg; perhaps rolling back to Fedora 40 and then rebasing to F41 caused some funkiness (I do remember some other grub-related issues in F40 also), or perhaps there was something in Bluefin at the time too.

I think getting 41.20241201.0 booting would be the best bet (if that’s possible), as the other two have issues with grub, apparently. :frowning_face:

If anyone can figure this out: HUGE thanks in advance! It’d save me so much time of backing up everything I can (I don’t have everything backed up; just important stuff; everything is too big, and it’d take a long time to transfer all the big files like Steam games, etc.).


Closing thoughts (for now)

Perhaps something in this story might help change things for the future. Perhaps always make a backup of the last working grub.cfg (or few) for rescue situations like this, for example, or improve testing to make sure stuff like composefs doesn’t accidentally slip through early, especially to stable releases of Fedora?

I really wish we’d use something simpler and more modern than grub. I’ve hit a few bugs due to grub over the years with Silverblue (and some outside of Silverblue). Something, perhaps, like sd-boot instead would probably be better overall.

(Relived that my partner’s computer doesn’t have this issue, especially since I had hers set up to auto-update. Somehow, thankfully, her computer skipped this issue.)

Since this is a debugging thread, I’ll move it to Ask Fedora . If you can later clearly describe a specific problem (and ideally a solution/workaround), which seems to affect a large number of Fedora users, please write it in a Common Issues format into Proposed Common Issues , and I’ll be happy to review it. Thanks.

1 Like

Good call. It was originally a fix thread, as it was first related to my work laptop… but changed dramatically when my desktop got hit by the same composefs bug and went catastrophically wrong.

This is exactly what I’m working on fixing with:

and that should land in F42 soon (I’m working on the bootupd release right now).

You were running testing and an ostree version with a bug (2024.10) got pushed there and enabled composefs by default.

This is blocking updates and may result in the GRUB config being removed in some cases I could not reproduce yet.

The easiest way to recover is to copy a GRUB config from another system. Here is one: How to restore grub.cfg in Silverblue? - #2 by siosm

I was not running testing on my work laptop, and it was still hit by this same exact composefs issue. I opened this thread based on my work laptop that was and is on stable, and was also stuck.

I also thought I wasn’t running testing on my personal desktop, as I rebased away from it in December. I only temporarily switched to testing for a day (the wrong day apparently) to be able to run the new version of darktable. I only realized that the rebasing and updating since was stuck due the composefs issue.

Thanks for the replies, BTW!

The easiest way to recover is to copy a GRUB config from another system. Here is one: How to restore grub.cfg in Silverblue? - #2 by siosm

I do have a grub from another system, but it does point to different deployments, however… how do I work around that? :confused:

You don’t need the deployments from that config as GRUB will use the BLS configs.

1 Like

I’m still unpacking/reading your long reply to try to understand what happened :slight_smile:

1 Like

Oh, weird. OK! Good to know. I’ll try with your grub config during lunch.

Not sure why my work laptop has the various deployments hardcoded.

To be clear about where grub.cfg goes, I’m looking at my work laptop running Silverblue:

  • it’s in /boot/loader.0/grub.cfg right?
  • I have a /boot/grub2/grub.cfg symlink that points to ../loader/grub.cfg and loader is a symlink to loader.0
  • There’s also a /boot/efi/EFI/fedora/grub.cfg (from the efi partition mounted under boot) but that seems to be a stub that is used to load the other one?

I’m still unpacking/reading your long reply to try to understand what happened :slight_smile:

It’s a lot. Sorry. I think I hit several different bugs along the way, even in older composes. Hopefully most everyone else can use the workaround and have it work or roll back to a working deployment and update.

Thanks again for being a huge help here (and elsewhere across Fedora), and I hope my pains with this can prevent other people from experiencing the same. :wink:

1 Like

You likely have an older (pre F41) installation and those currently use a dynamic GRUB config that is regenerated on each update. The config is thus located in /boot/loader/grub.cfg (with the symlink in /boot/grub/grub.cfg that points to it).

Everything in /boot/EFI can be ignored.

2 Likes

This is weird. Adding ostree.prepare-root.composefs=0 to the kernel command line should have disabled composefs.

Yeah, I thought it should’ve too. But I tried it first and it didn’t. Not sure why. Easiest explanation: I could’ve made a typo, even though I double-checked what I typed.

If you ever have the chance to try that again, you can check the output of cat /proc/cmdline.

Important correction here:

To update out of it, you’ll need a keyboard attached, and press e during the GRUB menu. Then you can append ostree.prepare-root.composefs=0 after load_video (make sure to have a space between)

You should add ostree.prepare-root.composefs=0 at the end of the line that starts with linux ($root)/ostree/... and ends with rw. You can add it after the rw, making sure to add a space.

1 Like