Rpm-ostree updates broken. Can't rollback. grub2-mkconfig erroring out

On October 19 I ran on rpm-ostree upgrade that seemed to have gone fine (it boots and works fine)

However, since then, no rpm-ostree upgrades nor installs are “sticking”.

The installs/upgrades seem to work, but when rebooting, the Oct 19 deployment is the most recent option.

∴ sudo rpm-ostree status
State: idle
Warning: failed to finalize previous deployment
         error: Bootloader write config: grub2-mkconfig: Child process exited with code 1
         check `journalctl -b -1 -u ostree-finalize-staged.service`
Deployments:
● fedora:fedora/35/x86_64/kinoite
                   Version: 35.20211019.n.0 (2021-10-19T08:13:20Z)
                BaseCommit: e0c50d2f6d7ac9b7e3eb7fd458a4cdbcf46cebe396df1c483883393b9e8aaf4f
              GPGSignature: Valid signature by 787EA6AE1147EEE56C40B30CDB4639719867C58F
       RemovedBasePackages: opensc 0.22.0-1.fc35
           LayeredPackages: iptraf htop syncthing zsh exa compat-ffmpeg28 tilix iotop fd-find ncdu zsh-syntax-highlighting wireguard wireguard-tools ripgrep
             LocalPackages: rpmfusion-free-release-35-0.2.noarch rpmfusion-nonfree-release-35-0.2.noarch

I looked into ostree-finalize-staged.service logs and saw:

Oct 25 15:03:23 myhost ostree[10780]: Finalizing staged deployment
Oct 25 15:03:24 myhost ostree[10780]: Copying /etc changes: 22 modified, 2 removed, 140 added
Oct 25 15:03:24 myhost ostree[10780]: Copying /etc changes: 22 modified, 2 removed, 140 added
Oct 25 15:03:27 myhost ostree[10780]: error: Bootloader write config: grub2-mkconfig: Child process exited with code 1
Oct 25 15:03:27 myhost systemd[1]: ostree-finalize-staged.service: Control process exited, code=exited, status=1/FAILURE
Oct 25 15:03:27 myhost systemd[1]: ostree-finalize-staged.service: Failed with result 'exit-code'.

Running sudo grub2-mkconfig gives:

∴ sudo grub2-mkconfig
... snip ...
### BEGIN /etc/grub.d/10_linux ###
insmod part_gpt
insmod ext2
search --no-floppy --fs-uuid --set=root 3c56f789-65ff-4e3b-9073-30c126d68218
insmod part_gpt
/usr/sbin/grub2-probe: error: ../grub-core/kern/fs.c:120:unknown filesystem.

This UUID seems to be my boot partition

 ∴ grep 3c56 /etc/fstab 
UUID=3c56f789-65ff-4e3b-9073-30c126d68218 /boot                   ext4    defaults        1 2
 ∴ sudo blkid | grep 3c56
/dev/nvme0n1p2: UUID="3c56f789-65ff-4e3b-9073-30c126d68218" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="fbf6ea7f-0465-488d-b998-f5e3fb44a732"
∴ mount | grep /dev/nvme0n1p2
/dev/nvme0n1p2 on /boot type ext4 (rw,relatime,seclabel)

I’m not sure how to proceed here… would appreciate any tips I could get.

At the boot menu select your previous deployment to boot, and update it which will make the Oct.19 update be the rollback boot.

1 Like

Ok, so here’s my rpm-ostree status with the last two entries

∴ sudo rpm-ostree status                                                             
State: idle
Warning: failed to finalize previous deployment
         error: Bootloader write config: grub2-mkconfig: Child process exited with code 1
         check `journalctl -b -1 -u ostree-finalize-staged.service`
Deployments:
● fedora:fedora/35/x86_64/kinoite
                   Version: 35.20211019.n.0 (2021-10-19T08:13:20Z)
                BaseCommit: e0c50d2f6d7ac9b7e3eb7fd458a4cdbcf46cebe396df1c483883393b9e8aaf4f
              GPGSignature: Valid signature by 787EA6AE1147EEE56C40B30CDB4639719867C58F
       RemovedBasePackages: opensc 0.22.0-1.fc35
           LayeredPackages: iptraf htop syncthing zsh exa compat-ffmpeg28 tilix iotop fd-find ncdu zsh-syntax-highlighting wireguard wireguard-tools ripgrep
             LocalPackages: rpmfusion-free-release-35-0.2.noarch rpmfusion-nonfree-release-35-0.2.noarch

   fedora:fedora/35/x86_64/kinoite
                   Version: 35.20211017.n.0 (2021-10-17T08:14:49Z)
                BaseCommit: 47018a680656047dafeba21268a1e5f4b410448e00415db02cbbb0bbd2c96bc9
              GPGSignature: Valid signature by 787EA6AE1147EEE56C40B30CDB4639719867C58F
       RemovedBasePackages: opensc 0.22.0-1.fc35
           LayeredPackages: iptraf htop syncthing zsh exa compat-ffmpeg28 tilix iotop fd-find ncdu zsh-syntax-highlighting ripgrep
             LocalPackages: rpmfusion-free-release-35-0.2.noarch rpmfusion-nonfree-release-35-0.2.noarch
                    Pinned: yes

With this status, I rebooted and at the grub menu chose the 35.20211017.n.0 deployment. When it booted I ran rpm-ostree upgrade which resulted in:

∴ sudo rpm-ostree upgrade   
2 metadata, 0 content objects fetched; 788 B transferred in 2 seconds; 0 bytes content written
Checking out tree fa25e0b... done
Enabled rpm-md repositories: updates-modular rpmfusion-free-updates rpmfusion-free tailscale-stable fedora-modular fedora updates rpmfusion-nonfree-updates rpmfusion-nonfree fedora-cisco-openh264 updates-archive
Importing rpm-md... done
rpm-md repo 'updates-modular' (cached); generated: 2018-02-20T19:18:14Z solvables: 0
rpm-md repo 'rpmfusion-free-updates' (cached); generated: 2021-08-15T14:43:40Z solvables: 0
rpm-md repo 'rpmfusion-free' (cached); generated: 2021-04-25T18:10:08Z solvables: 516
rpm-md repo 'tailscale-stable' (cached); generated: 2021-10-21T01:59:12Z solvables: 39
rpm-md repo 'fedora-modular' (cached); generated: 2021-10-24T10:07:34Z solvables: 1283
rpm-md repo 'fedora' (cached); generated: 2021-10-19T10:45:50Z solvables: 65729
rpm-md repo 'updates' (cached); generated: 2018-02-20T19:18:14Z solvables: 0
rpm-md repo 'rpmfusion-nonfree-updates' (cached); generated: 2021-08-15T14:43:20Z solvables: 0
rpm-md repo 'rpmfusion-nonfree' (cached); generated: 2021-04-25T18:36:02Z solvables: 213
rpm-md repo 'fedora-cisco-openh264' (cached); generated: 2021-02-23T00:47:28Z solvables: 4
rpm-md repo 'updates-archive' (cached); generated: 2021-08-18T13:30:23Z solvables: 0
Resolving dependencies... done
Relabeling... done
Applying 1 override and 32 overlays
Processing packages... done
Running pre scripts... done
Running post scripts... done
Running posttrans scripts... done
Writing rpmdb... done
Writing OSTree commit... done
Staging deployment... done
Freed: 57.2 MB (pkgcache branches: 0)
Upgraded:
  dracut 055-4.fc35 -> 055-5.fc35
  geoclue2 2.5.7-5.fc35 -> 2.5.7-6.fc35
  plasma-discover 5.23.0-1.fc35 -> 5.23.0-3.fc35
  plasma-discover-flatpak 5.23.0-1.fc35 -> 5.23.0-3.fc35
  plasma-discover-libs 5.23.0-1.fc35 -> 5.23.0-3.fc35
  plasma-discover-notifier 5.23.0-1.fc35 -> 5.23.0-3.fc35
  qt5-qtdeclarative 5.15.2-7.fc35 -> 5.15.2-8.fc35
  selinux-policy 35.1-1.fc35 -> 35.3-1.20211019git94970fc.fc35
  selinux-policy-targeted 35.1-1.fc35 -> 35.3-1.20211019git94970fc.fc35
Added:
  rpmfusion-free-obsolete-packages-34-1.fc34.noarch
  wireguard-tools-1.0.20210914-1.fc35.x86_64
Run "systemctl reboot" to start a reboot

Then I rebooted, but at the grub menu I didn’t have a new option, instead I still had the 35.20211019.n.0 and 35.20211017.n.0 deployments. I booted the 19 deployment just to check, and it is showing the same symptoms as my first post.

From the 19 deployment I tried a rpm-ostree rollback, but this fails with:

∴ sudo rpm-ostree rollback
Moving 'ca9455fcb262ae5140ac6a6a94fdec13c3f5ec8f2c056e142900729b039d8f60.0' to be first deployment
error: Bootloader write config: grub2-mkconfig: Child process exited with code 1

I don’t know if this is the issue here but the last time I had a new deployment “not happen” was because I had too many pinned deployment and /boot was full thus rpm-ostree could not setup a new boot entry with the kernel and initramfs with the given free disk space. Removing some pinned deployments helped.

I have 467M free in /boot and 558M free in /boot/efi. Each fedora-UUID in /boot/ostree is only ~72MB so I should have room for more deployments

Even so, I’ve unpinned a couple old deployments I know longer need. Any strangely, they are persisting across reboots. That is, the “Pinned: Yes” line is gone in the status listing, but they do not go away even across reboots.

Can you can try to manually cleanup previously pending and old deployments with rpm-ostree cleanup <options> (see the man page for details) and redo the update?

$ sudo rpm-ostree cleanup --pending
Deployments unchanged.
$ sudo rpm-ostree cleanup --base
Freed: 200.6 MB (pkgcache branches: 0)
$ sudo rpm-ostree cleanup --rollback
error: Bootloader write config: grub2-mkconfig: Child process exited with code 1

And no change in behavior, still stuck. After a reboot, the deployments I unpinned are still there (but not marked as pinned). I can boot into any of the deployments, but running any sort of install/upgrade/rollback command fails with this grub2-mkconfig error.

I’m wondering if I should reinstall fresh? (Kind of sad if I have too, been running Silverblue since 2019 without issue)

OK, this is weird. Have you made changes in /etc/grub.d/*? You can check with ostree admin config-diff and diffing /etc with /usr/etc.

sudo ostree admin config-diff doesn’t indicate that any /etc/grub.d/* files have been changed. However /etc/default/grub exists whereas /usr/etc/default/grub does not exist

This is the contents:

GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="resume=/dev/mapper/fedora-swap rd.lvm.lv=fedora/root rd.luks.uuid=luks-5b6e67ce-215c-4282-b415-0bd937bf1194 rd.lvm.lv=fedora/swap rhgb quiet"
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=true

Not sure if it is related but I have added a few nouveau items to my kargs (though I did this years ago and they’ve just been carrying on through upgrade):

sudo rpm-ostree kargs                          
resume=/dev/mapper/fedora-swap rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap rd.luks.uuid=luks-5b6e67ce-215c-4282-b415-0bd937bf1194 rhgb quiet root=/dev/mapper/fedora-root modprobe.blacklist=nouveau rd.driver.blacklist=nouveau nvidia-drm.modeset=1 pci=nommconf ostree=/ostree/boot.1/fedora/a9bebaa1458b6218288a64b67b53546131ab5334b6d6abaccc99dbd11dab9174/0 rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1

looks like there are some duplicates, not sure why. probably not related…

I’ve reinstalled Silverblue 35 over the top of my install. Sorry I couldn’t help track down the bug more, but I needed a working system on this machine.

edit: thanks @siosm for your time and patience!

2 Likes