Downgrading (or upgrading) kernel leads to "A start job is running for /dev/disk/by-uuid" for root partition

Hello,

After downgrading or upgrading the kernel (tested with 5.8.18 and 5.10.rc5) the next boot is stuck at “A start job is running for /dev/disk/by-uuid/x (…s / no limit)” with the id being the root partition (/). The timer is increasing, but even after 5 minutes the boot does not continue.

The system is a Thinkpad T495 with a fairly stock Silverblue 33 (upgraded from F32). The disk swap was removed by removing the fstab entry and the swap partition.

The system works with no overrides. But overriding the kernel via https://docs.fedoraproject.org/en-US/fedora-silverblue/faq/ is not successful.

Overriding the kernel triggers a regeneration of the initramfs. Is it possible that this regeneration is corrupted because of the removed swap?

The exact error message is:

A start job is running for /dev/disk/by-uuid/e8673f81-4cb2-44b7-ad6a-268bdd36ff5a

/etc/fstab:

UUID=e8673f81-4cb2-44b7-ad6a-268bdd36ff5a /                       ext4    defaults        1 1
UUID=6a646515-53ef-458f-a265-c6c46c40cd12 /boot                   ext4    defaults        1 2
UUID=3240-80E3          /boot/efi               vfat    umask=0077,shortname=winnt 0 2

sudo blkid:

/dev/nvme0n1p1: SEC_TYPE="msdos" UUID="3240-80E3" BLOCK_SIZE="512" TYPE="vfat" PARTLABEL="EFI System Partition" PARTUUID="90390176-38f0-4984-a764-ecb63db0f6fc"
/dev/nvme0n1p2: UUID="6a646515-53ef-458f-a265-c6c46c40cd12" BLOCK_SIZE="1024" TYPE="ext4" PARTUUID="6d22a189-0753-426d-9e5e-344bc7840818"
/dev/nvme0n1p4: UUID="e8673f81-4cb2-44b7-ad6a-268bdd36ff5a" BLOCK_SIZE="4096" TYPE="ext4" PARTUUID="16a657f3-3fa0-4771-8b56-2b2f53455cbb"
/dev/zram0: UUID="363cc34b-d034-4a12-ba01-b7a637f2913c" TYPE="swap"

My swap was already removed when trying the upgrade, got the same behaviour you are describing.

My system is encrypted, but it never asked for the password – I supposed that was the problem. Upgrading to a full Rawhide got me a working 5.10rc5 system.

1 Like

Nice to know I’m not alone :slight_smile: . So you removed the swap before the upgrade? Would be interesting to understand the underlying issue.

EDIT: So I’ve tried booting with debug and disabling log level. But the messages before the final “A start job is running…” are not helpful. What options do we have to debug this issue?

EDIT2: Tested again with 5.9.11 override (just a minor version over 5.9.10) and the same issue is visible.

EDIT3: Ok, now it gets really confusing. 5.9.11 was released today. The exact same packages that I tried to override were installed during the update and the boot is successful.

EDIT4: The sole difference between the two operations is that rpm-ostree override generates a initramfs, while rpm-ostree update skips it. This seems to be the root cause.

The sole difference between the two operations is that rpm-ostree override generates a initramfs, while rpm-ostree update skips it. This seems to be the root cause.

Right.

Does this also reproduce with just rpm-ostree initramfs --enable?

Hmm…so the uuid that boot is stalling on isn’t listed in active block devs. I wonder if it’s something like the UUID for the previous boot’s zram0, and the dracut run somehow picked that up?

I just tried enabling zram-generator on FCOS quickly and didn’t see this happen though.

Or does that UUID look like the one of the previous swap device you enabled?

Is that UUID present in /proc/cmdline or in grep -r e8673f81-4cb2-44b7-ad6a-268bdd36ff5a /etc ?

No. The boot was successful.

Yes. As the root UUID.

BOOT_IMAGE=/ostree/fedora-f623eb2aefcf1ff8f3588d2b671830ff781be4983da3330b5d58a1c5d994245d/vmlinuz-5.9.11-200.fc33.x86_64 rhgb quiet root=UUID=e8673f81-4cb2-44b7-ad6a-268bdd36ff5a ostree=/ostree/boot.1/fedora/f623eb2aefcf1ff8f3588d2b671830ff781be4983da3330b5d58a1c5d994245d/0
# grep -r e8673f81-4cb2-44b7-ad6a-268bdd36ff5a /etc
/etc/fstab:UUID=e8673f81-4cb2-44b7-ad6a-268bdd36ff5a /                       ext4    defaults        1 1

EDIT: The generated grub.cfg entries of a non-functional boot (generated from override) and a successful boot are exactly the same, besides the different deployment IDs.

Wait I see you also filed https://github.com/coreos/rpm-ostree/issues/2343 - do you also see a SELinux denial when doing override replace? If you setenforce 0, does that fix it?

Unfortunatly not. I’ve set SELinux permantently to “Permissive” after filing the bug report to continue with testing. The denial is still visible in the audit log but with a Permissive=1 attribute.

I gave up on my old system. Downloaded Silverblue 33. Complete fresh system, this time with btrfs and no swap. Updated it. Reboot. Override kernel. Boot is stuck on “A start job is running for /dev/disk/by-uuid”.

Can it have something to do with my main drive being a nvme drive (dev/nvme)?

If you can boot into the original install before updating, you could check the uuid of the nvme drive to make sure it is the device hanging. The bootable installer image can do the same too.

Hi jakfrost,

The fresh Silverblue 33 installation is in a functional state, even after updating and rebooting. The problem appears once an override triggers an “regenerate initramfs”. The boot afterwards is stuck at “A start job is running for /dev/disk/by-uuid” with a valid id of the root partition.

Functional deployment:

● ostree://fedora:fedora/33/x86_64/silverblue
                   Version: 33.20201202.0 (2020-12-02T00:50:44Z)
                BaseCommit: 7398dead60608f76101354dfb74c00d911702110d8654781b48147c1abc42130
              GPGSignature: Valid signature by 963A2BEB02009608FE67EA4249FD77499570FF31

Non-functional deployment:

  ostree://fedora:fedora/33/x86_64/silverblue
                   Version: 33.20201202.0 (2020-12-02T00:50:44Z)
                BaseCommit: 7398dead60608f76101354dfb74c00d911702110d8654781b48147c1abc42130
              GPGSignature: Valid signature by 963A2BEB02009608FE67EA4249FD77499570FF31
      ReplacedBasePackages: kernel-modules kernel-devel kernel kernel-modules-extra kernel-core 5.9.11-200.fc33 -> 5.10.0-0.rc6.90.fc34

I’ve tested it with several different kernels and systems. And can only reproduce it on this notebook.

@walters rpm-ostree 2020.10 fixed this issue for me! The override replace still reports an selinux denial (noted in https://github.com/coreos/rpm-ostree/issues/2343) but the boot of an overwritten kernel is now successful - exactly the same rpms, exactly the same sequence of commands.