Can't get past Grub and Dracut shells after ostree upgrade

A couple weeks ago, I updated my Fedora IoT-based server (x86_64, UEFI, KVM/QEMU) to version 41. The upgrade went well, so I also manually enabled composefs (which went well), and enabled/installed bootupd (which also went well). I also tried switching to bootc, but that did not go well since my login shell wasn’t included in the base image so I was completely locked out. It was easy enough to revert though—I just had to reboot and select the previous deployment from the Grub menu.

In the past few weeks, I’ve added and removed a few overlaid packages and rebooted a few times and everything has been fine. But today I realized that my base image hadn’t updated for 2+ weeks which seemed odd, and sudo rpm-ostree upgrade said that I was using the most version (41.20241027.0). This didn’t seem correct, so I ran sudo rpm-ostree rebase -b fedora/stable/x86_64/iot and the system updated to the latest version (41.20241110.0). Then I rebooted and was met with a Grub shell:

(please excuse the poor OCR; it seemed better than including a bunch of screenshots.)

BdsDxe: loading Boot0009 “Fedora” from HD (1,GPT,1C8B0885-9A88-4BBB-A494-22F27C79
3BIC ,0x800,0xFAOOO) /\EFI\fedoraGRUB version 2.12
BdsDxe: starting Boot0009 “Fedora” from HD (1,GPT,1C8B0885-9A88-4BBB-A494-22F27C7
Minimal BASH-like line editing is supported. For the first word, TAB
lists possible command completions. Anywhere else TAB lists possible
device or file completions. To enable less(1)-like paging, “set
pager=1".

grub>

Here are my available partitions:

grub> ls (hd0, gpt<tab>
Possible partitions are:

Partition hd@,gpt1: Filesystem type fat, UUID 312A-OADO - Partition
start at 1024KiB - Total size 512000KiB

Partition hd0,.gpt2: Filesystem type ext* - Last modification time
2024-11-12 09:01:09 Tuesday, UUID 26d65031-c51b-4e2c-999f-9642f90ebcab -
Partition start at 513024KiB - Total size 4194304KiB

Partition hdO,gpt3: No known filesystem detected - Partition start at
4707328KiB - Total size 8388608KiB

Partition hdO,.gpt4: Filesystem type btrfs,. UUID
54b902ba-941a-495f-ae19-770ee22681c4 - Partition start at 13095936KiB - Total
size 255338496KiB

grub> ls (hdO,gpti)/

efi/

grub> ls (hdO,gpt2)/

./ ../ lost+found/ efi/ grubenv ostree/ boot loader grub2/ loader .1/
bootupd-state. json

grub> ls (hdO,gpt3)/

error: ../../grub-core/kern/fs.c:123:unknown filesystem.

grub> ls (hdO,gpt4)/

dev/ run/ boot/ 00/ home/ ostree/ tmp/ sys/ proc/ root/

I tried loading the config file from the EFI partition:

grub> cat (hd0,gptl) /efi/fedora/grub.cfg
search --no-floppy --fs-uwuid --set=dev 26d65031-c51b-4e2c-999f-9642f90ebcab
set prefix=(Sdeu) /grub2

export $prefix
configfile $prefix/grub.cfg

grub> configfile (hd0,gpt1)/efi/fedora/grub.cfg

but all that did was give me a blank screen. Next, I tried loading the config file from the boot partition:

grub> cat (hd0,gpt2) /loader/entr ies/ostree-2.conf

title Fedora Linux 41.20241110.0 (oT Edition) (ostree:0)

version 2

options resume=UUID=25c85fb?-eadd-414c-9c56-475bf2a401f4 root=UUID=54b902ba-941
a-495f-ae19-7?70ee22681c4 ru ostree=/ostree/boot .1/fedora-iot/18ca81a653610fcb49
£230b5700a13c15a464cc63d8919d2508 £25 ?ceSbbbefa/0

linux /ostree/fedora-iot-18ca81a653610fcb49f230b5 700a13c15a464cc63d8919d2508F25
?ceSbbbefa/unml inuz-6.11.6-300 . fc41.x86_64

initrd /ostree/fedora-iot-18ca81a653610fcb49£230b5700a13c15a464cc63d8919d2508f2
S?ceSbbbefa/ initramfs-6.11.6-300.fc41.x86_64. img

aboot /ostree/deploy/fedora-iot/deploy/34457cBe86c0d6d137a53d042aee425f6097183F
£b25b39877 £433 £55a627cel .0/usr/1ib/ostree-boot/aboot . ing

abootcfg /ostree/deploy/fedora-iot/deploy/34457?cB8eb6c0d6d137?a53d042aee425f60971
83f££b25b39877£433£55a62?cel .0/usr/1ib/ostree-boot/aboot .cfg

grub> configfile (hd0,gpt2) /loader/entries/ostree-2.conf

but all I got was an error message:

error: ../grub-core/scr ipt/function.c:119:can’t find command ‘title’.
error: ../grub-core/commands/version.c:34:no arguments expected.
error: ../grub-core/scr ipt/function.c:119:can’t find command ‘options’ .
error: ../grub-core/net/net .c:1448:no server is specified.
error: ../grub-core/ loader/i386/efi/linux.c:258:you need to load the kernel
error: ../grub-core/scr ipt/function.c:119:can’t find command ‘aboot’.
error: ../grub-core/script/function.c:119:can’t find command ‘abootcfg’ .
grub>

Finally, I tried loading the kernel directly:

grub> set root=(hd0,gpt2)
grub> linux (hd0,gpt2)/ostree/fedora-iot-18ca81a653610fcb49f230b5-700a13c15a464c
c63d8919d2508f25 ?ceSbbbefa/vmlinuz-6.11.6-300.fc41.x86_64 root=/dev/vda2
grub> initrd (hd0,gpt2) /ostree/fedora-iot-18ca81a653610fcb49f230b5-700a13c15a464
cc63d8919d2508 f25?ceSbbefa/ initramfs-6.11.6-300.fc41.x86_64.img
grub> boot

but that just gave me a Dracut shell:

Stopping systemd-uconsole-setup.service - Virtual Console Setup...

Starting systemd-uvconsole-setup.service - Virtual Console Setup...
{ Ok 1 Stopped systemd-uvconsole-setup.service - Virtual Console Setup.

Starting systemd-vconsole-setup.service - Virtual Console Setup...

{ OK J Found device dev-vda2.device - /dev/uda2.
{ OK 1 Reached target initrd-root-device.target - Initrd Root Device.
{ OX 1 Finished dracut-initqueue.service - dracut initqueue hook.
{ OX 1 Reached target remote-fs-pre.target - Preparation for Remote File Systems.
{ OK 1 Reached target remote-cryptsetup.target —- Remote Encrypted Volumes.
{ Ok 1 Reached target remote-fs.target - Remote File Systems.
Starting dracut-pre-mount.service - dracut pre-mount hook...
{ O& J Finished systemd-uconsole-setup.service - Virtual Console Setup.
{ O& J] Finished dracut-pre-mount.service - dracut pre-mount hook.
Starting systemd-fsck-root.service - File System Check on /deu/uda2...
{ Ok 1 Finished systemd-fsck-root.service - File System Check on /deu/uda2.

Mounting sys-kernel-config.mount - Kernel Configuration File System...
Mounting sysroot.mount - /sysroot.. .
{ Ok 1 Mounted sys-kernel-config.mount - Kernel Configuration File System.
{ OX 1 Reached target sysinit.target - System Initialization.
{ OK 1 Reached target basic.target - Basic System.
C 3.722971] EXT4-fs (uda2): orphan cleanup on readonly fs
C 3.723996] EXT4-fs (uda2): mounted filesystem 26d65031-c51b-—4e2c-999f-9642f90ebcab ro with ordered data mode
Mounted sysroot.mount — /sysroot.
Reached target initrd-root-fs.target —- Initrd Root File System.
Starting initrd-parse-etc.service - Mountpoints Configured in the Real Root...
{ O& 1 Finished initrd-parse-etc.service - Mountpoints Configured in the Real Root.
{ O J] Reached target initrd-fs.target - Initrd File Systems.
Th OK 1 Reached target initrd.target - Initrd Default Target.
Starting dracut-mount.service - dracut mount hook...
C 3.847945] EXT4-fs (uda2): unmounting filesystem 26d65031-c51b-—4e2c-999f-9642f90ebcab .
C 3.958116] dracut-mount[615]: Warning: Can’t mount root filesystem
{ Ok 1 Stopped target initrd.target - Initrd Default Target.
{ Ok 1 Stopped target ignition-subsequent.target - Subsequent (Not Ignition) boot complete.
{ Ok 1 Stopped target ignition-diskful-subsequent.target - Ignition Subsequent Boot Disk Setup.
Starting dracut-emergency.service - Dracut Emergency Shell...
Generating “/run/initramfs/rdsosreport .txt”

Entering emergency mode. Exit the shell to continue.
Type “ journalctl” to view system logs.
You might want to save "/run/initranfs/rdsosreport.txt" to a USB stick or /boot
after mounting them and attach it to a bug report.

Press Enter for maintenance
(or press Control-D to continue):

(I also tried with root=(hd0,gpt4) // root=/dev/vda4, but that gave me the same result.)

Pressing Ctrl+D doesn’t change much:

[ OK 1 Finished dracut-emergency.service — Dracut Emergency Shell.
{ 167.5721771] dracut-mount[615]: Warning: Can’t mount root filesystem
Starting dracut-emergency.service - Dracut Emergency Shell...

Generating “/run/initramfs/rdsosreport .txt”

Entering emergency mode. Exit the shell to continue.

Type “ journalctl” to view system logs.

You might want to save "/run/initranfs/rdsosreport.txt" to a USB stick or /boot
after mounting them and attach it to a bug report.

Press Enter for maintenance
(or press Control-D to continue):

I inspected /run/initranfs/rdsosreport.txt and journalctl, but they don’t seem to contain anything useful. I can try exporting those if you want.

Next, I tried looking for a grub2-ANYTHING or dracut-REINSTALL commands, but I have nothing useful available:

sh-5.2# ls /
bin dracut-state.sh etc kernel 1ib64 root sbin sys tmp var
dev early_cpio init lib proc run shutdown sysroot usr

sh-5.2# ls /bin/
arping cp gawk nu systemd-sysusers
awk curl grep nn-onl ine systemd-tmpfiles
basename cut gzip nmcli systemd-tty-ask—password-—agent
bash dbus—broker ignition pgrep timeout
busctl dbus-broker-launch jose ps tpm2
cat dmesg journalctl pumake tpm2_create
chmod dracut—cmd1 ine kbd_mode readlink tpm2_createpolicy
chown dracut-cmdline-ask keyctl rn tpm2_createpr imary
clevis dracut-—emergency kmod sed tpm2_flushcontext
clevis—decrypt dracut-getarg less setfont tpm2_load
clevis—decrypt-null dracut-getargs In setsid tpm2_pcrextend
clevis—decrypt-sss dracut-initqueue loadkeys sh tpm2_pcrread
clevis—decrypt-tang dracut—mount loginctl sleep tpm2_unseal
clevis—decrypt—tpm2 dracut—pre-mount Is stat tr
clevis-encrypt-sss dracut—pre-pivot Isblk stty true
clevis—encrypt-tang dracut-pre-trigger luksmeta systemctl udevadn
clevis-encrypt—tpm2 dracut—pre-udeuv nkdir systemd-ask-password umount
clevis—luks-bind dracut-util nkfifo systemd-cgls uname
clevis-luks-common-functions echo mknod systemd-cryptsetup vi
clevis—luks-list findmnt nktemp systemd-escape wdctl
clevis-luls-unlock flock mount systemd-run

sh-5.2# ls /sbin/
NetworkManager depmod fsck.ext2 groupdel losetup nodprobe sgdisk useradd
arping dmsetup fsck.ext3 halt lsmod netroot sulogin userdel
blkid e2fsck fsck .ext4 ignition-kargs-helper lum nologin swapoff usermod
cache_check era_check fsck. fat init lum_scan pdata_tools sysctl wipefs
cache_dump era_dump fsck.minix initqueue nkfs.ext4 poweroff thin_check xfs_db
cache_repair era_invalidate fsck.msdos insmod nkfs.fat rdsosreport thin_dump xfs_metadump
cache_restore era_restore fsck.ufat insmodpost .sh nkfs.xfs reboot thin_repair xfs_repair
crypt-run-generator fsck fsck.xfs ip nksuap rmmod thin_restore
cryuptsetup fsck .cramfs groupadd loginit nodinfo setfiles udevadn

My boot partition seems to be okay, with one interesting exception:

sh-5.2# mkdir /boot/
sh-5.2# mount /dev/vda2 /boot/
[ 564.137278] EXT4-fs (uda2): mounted filesystem 26d65031-c51b—4e2c-999f-9642f90ebcab r/w with ordered data mode. Quota mode: none.
sh-5.2# ls /boot/
boot bootupd-state. json efi grub2 grubenv loader loader.1 lost+found ostree
sh-5.2# Ils -la /boot/grub2/grub.cfg
lruxruxrux. 1 root 0 18 Aug 23 06:58 /boot/grub2/grub.cfg -> ../loader/grub.cfg
sh-5.2# cat /boot/grub2/grub.cfg
cat: /boot/grub2/grub.cfg: No such file or directory

Similarly, my root partition also looks fine:

sh-5.2# mkdir /root-fs/
sh-5.2# mount /dev/uda4t /root-fs/
{C 735.360991] BIRFS: device fsid 54b902ba—941a-495f-ae19-—770ee22681c4 devid 1 transid 220079 /dev/vda4 (252:4) scanned by mount
(760)
{C 735.362685] BIRFS info (device vda4): first mount of filesystem 54b902ba—941a—495f-ae19-—770ee22681c4
{C 735.363364] BIRFS info (device vda4): using crce32c (crc32c-intel) checksum algorithm
[ 735.363831] BIRFS info (device vda4): using free-space-tree

sh-5.2# ls /root-fs/
00 boot dev home ostree proc root run sys tmp
-5.2# ls /root—-fs/ostree/deploy/fedora-iot/deploy/9?7f1df87£2£3732739239197aeS fbf 1bed5fc1a6bb51c64ba9ZI555e03a30I9a3.07/
boot dev etc home lib 1lib64 media mnt opt ostree proc root run sbin srv sys sysroot tmp usr var

I then tried running pivot_root, but I get a “Command not found” error message, so that doesn’t work. I also tried some web searches and browsing through this forum, but I couldn’t seem to find anything relevant.

I don’t have any other ideas at this point, so can someone please help?

There’s a lot going on here, but my hunch is that the problem might be related to enabling composefs and the incompatibility with dynamic grub configs.

See 2308594 – dynamic grub2-mkconfig incompatible with composefs

If you can boot into a previously working deployment, try the steps outlined in the following comment for converting to static grub configs - Use composefs by default for Bootable Containers (#35) · Issues · fedora / Fedora Atomic Desktops / SIG Issue Tracker · GitLab

No guarantees, so tread carefully and make sure you have backups.

Unfortunately, I’m only able to boot into a Grub shell, and from there I’m only able to get to an initramfs shell. It’s definitely possible to boot into a deployment from the initramfs, but I’m not sure which commands I need to run to do that.

Thanks, there’s a comment from the linked GitHub issue that exactly describes my current state:

I tried using sudo ostree config set sysroot.bootloader none and it killed my grub… Seems due to /boot/grub2/grub.cfg being a symlink to somewhere under /boot/loader which won’t be generated anymore, it seems I installed this system before f37 (I can’t really tell).

grub2-mkconfig fail with composefs enabled · Issue #3198 · ostreedev/ostree · GitHub

That should work from a live CD I think? I’ll test it out.

Indeed, I’m definitely far off of the well-trodden path here. I’ll try some stuff out and reply with the results. Thanks for your help!

Ok, I got it to work, thanks! From a live CD:

# mkdir /x/
# mount /dev/vda4 /x/
# mount /dev/vda2 /x/boot/
# mount /dev/vda1 /x/boot/efi
# mount --bind /dev /x/dev
# mount --bind /proc /x/proc
# mount --bind /sys /x/sys
# deploy="/sysroot/ostree/boot.1/fedora-iot/9bcd9212246ae456c144c97977daf59b706493a1c791b52edb483dd838626bf2/0/"
# mkdir /x/usr && mount --bind $deploy/usr /x/usr
# mkdir /x/etc && mount --bind $deploy/etc /x/etc
# mkdir /x/bin && mount --bind $deploy/bin /x/bin
# mkdir /x/sbin && mount --bind $deploy/sbin /x/sbin
# mkdir /x/lib && mount --bind $deploy/lib /x/lib
# mkdir /x/lib64 && mount --bind $deploy/lib64 /x/lib64
# chroot /x/
# mv /etc/default/grub /etc/default/grub.save
# grub2-mkconfig -o /boot/grub2/grub.cfg
# exit
# reboot

Should I report this as a bug somewhere upstream? I think that everything that I did to get to this point was supported (albeit obscure), I was able to boot after previous rpm-ostree upgrades, and the rpm-ostree rebase gave me no error messages. Or is this probably just a fluke that no one else will ever run in to?

Thanks again!

Ok, unfortunately this doesn’t quite seem to work. Once I was logged into the system, I ran rpm-ostree upgrade, which then appeared to install successfully. To make sure that the bootloader was completely fixed, I also tried running grub2-mkconfig -o /boot/grub2/grub.cfg, but I got an error message:

/usr/sbin/grub2-probe: error: failed to get canonical path of `overlay'.

And when I rebooted, I was thrown into the Grub shell again, and /boot/grub2/grub.cfg had been replaced with a symlink to the missing /boot/loader/grub.cfg.

I then booted into the Live CD and ran

bash-5.2# mkdir /x/
bash-5.2# mount /dev/vda4t /x/
bash-5.2# mount /dev/vda2z /x/boot
bash-5.2# mount /dev/vdal /x/boot/efi
bash-5.2# grub2-mkconfig -o /x/boot/grub2/grub.cfg
Generating grub configuration file ...
Adding boot menu entry for UEFI Firmware Settings ...
done

but this didn’t work. Then, I went through the whole chroot process again and that did work.

Do you know of anything that I can do so that I can safely update my system again? Here’s the system configuration:

$ ostree admin status
* fedora-iot a58285d59b68498413b8f162090b54e69d6f4557ea4a7681d43f4d491f283efe.0
    Version: 41.20241111.0
    origin: <unknown origin type>
  fedora-iot 34457c8e86c0d6d137a53d042aee425f6097183ffb25b39877f433f55a627ce1.0 (rollback)
    Version: 41.20241110.0
    origin: <unknown origin type>
$ sudo bootupctl status
Running as unit: bootupd.service
Component EFI
  Installed: grub2-efi-x64-1:2.12-10.fc41.x86_64,shim-x64-15.8-3.x86_64
  Update: At latest version
No components are adoptable.
Boot method: EFI
$ sudo ostree config get ex-integrity.composefs
yes
$ sudo ostree config get sysroot.bootloader
none
$ cat /etc/default/grub
cat: /etc/default/grub: No such file or directory
$ rpm -q ostree grub2-tools-minimal
ostree-2024.9-1.fc41.x86_64
grub2-tools-minimal-2.12-10.fc41.x86_64

If you can come up with a simple reproducer, you can report the bug on the Fedora IoT issue tracker - Issues · fedora-iot/iot-distro · GitHub

1 Like

In case anyone else runs into this, a solution is to boot into a live CD, run the commands from above:

then reboot into the system, verify that /boot/grub2/grub.cfg is still a file (and not a broken symlink), then run:

 $ sudo chattr +i /boot/grub2/grub.cfg

After you’ve done this, it should be safe to run rpm-ostree upgrade again.


This solution is a horrible hack though, so I’ll try and find a way to reproduce this and file it on the IoT issue tracker.

Ok, I figured out the actual solution here:

$ sudo rm /boot/bootupd-state.json
$ sudo /usr/libexec/bootupd install --auto --with-static-configs --write-uuid /

If this worked correctly, then running

$ sudo tail /boot/grub2/grub.cfg

should show

[...]
# Import user defined configuration
# tracker: https://github.com/coreos/fedora-coreos-tracker/issues/805
if [ -f $prefix/user.cfg ]; then
  source $prefix/user.cfg
fi

blscfg

If it instead shows

[...]
### BEGIN /etc/grub.d/41_custom ###
if [ -f  ${config_directory}/custom.cfg ]; then
  source ${config_directory}/custom.cfg
elif [ -z "${config_directory}" -a -f  $prefix/custom.cfg ]; then
  source $prefix/custom.cfg
fi
### END /etc/grub.d/41_custom ###

then something went wrong.

You should also make sure that both of the following commands show the same UUID:

$ cat /boot/grub2/bootuuid.cfg
$ lsblk --fs | grep '/boot$'

If both of the checks were successful, then everything should be fixed.