Sometimes boots to emergency mode

My homeserver is running Fedora 41 with automatic update + reboot via dnf5-automatic. Once in a while it fails to properly reboot and drops into emergency mode instead.

When this happens continuing with Ctrl+D (once I notice and have physical access) works without problems.

Only thing in rdsosreport.txt that catched my eyes was:

[    1.828619] mini kernel: BTRFS: device label fedora_desktop devid 1 transid 1286505 /dev/nvme0n1p3 (259:3) scanned
 by (udev-worker) (543)
[    1.828623] mini kernel: BTRFS: device label fedora_desktop devid 2 transid 1286505 /dev/nvme1n1p1 (259:5) scanned
 by (udev-worker) (585)
[    1.828628] mini systemd[1]: Found device dev-disk-by\x2duuid-14580574\x2dbfa2\x2d4fa6\x2db604\x2d615020c44431.dev
ice - KINGSTON OM8PGP41024Q-A0 fedora_desktop.
[    1.828632] mini systemd[1]: Reached target initrd-root-device.target - Initrd Root Device.
[    1.828636] mini systemd[1]: systemd-fsck-root.service: Bound to unit dev-disk-by\x2duuid-14580574\x2dbfa2\x2d4fa6\x2db604\x2d615020c44431.device, but unit isn't active.
[    1.828640] mini systemd[1]: Dependency failed for systemd-fsck-root.service - File System Check on /dev/disk/by-uuid/14580574-bfa2-4fa6-b604-615020c44431.
[    1.828645] mini systemd[1]: Dependency failed for sysroot.mount - /sysroot.
[    1.828649] mini systemd[1]: Dependency failed for initrd-root-fs.target - Initrd Root File System.
[    1.828653] mini systemd[1]: Dependency failed for initrd-parse-etc.service - Mountpoints Configured in the Real Root.
[    1.828657] mini systemd[1]: initrd-parse-etc.service: Job initrd-parse-etc.service/start failed with result 'dependency'.
[    1.828661] mini systemd[1]: initrd-parse-etc.service: Triggering OnFailure= dependencies.
[    1.828665] mini systemd[1]: initrd-root-fs.target: Job initrd-root-fs.target/start failed with result 'dependency'.
[    1.828668] mini systemd[1]: initrd-root-fs.target: Triggering OnFailure= dependencies.
[    1.828672] mini systemd[1]: sysroot.mount: Job sysroot.mount/start failed with result 'dependency'.
[    1.828676] mini systemd[1]: systemd-fsck-root.service: Job systemd-fsck-root.service/start failed with result 'dependency'.

/dev/nvme0n1p3 and /dev/nvme1n1p1 are the rootfs (+/home) on a BTRFS-Raid 1

Searching for the inactive service it seems there is only:
systemd-fsck@dev-disk-by\x2duuid-45D1\x2dC04B.service and systemd-fsck@dev-disk-by\x2duuid-469e35dd\x2d7caa\x2d48ad\x2d95ec\x2d99d5b51f7d5d.service

I’m a bit confused whats going on there.

Are all partitions listed in /etc/fstab always connect to the system?
I am think USB SSD perhaps.
Are all the mounts in /etc/fstab using UUID? If not maybe you are using unstable device names.

You can find the partition with the UUID using lsblk -f.

How do you fix this when it happens?

fstab:

UUID=14580574-bfa2-4fa6-b604-615020c44431 /                       btrfs   subvol=root,compress=zstd:1 0 0
UUID=469e35dd-7caa-48ad-95ec-99d5b51f7d5d /boot                   ext4    defaults        1 2
UUID=45D1-C04B          /boot/efi               vfat    umask=0077,shortname=winnt 0 2
UUID=14580574-bfa2-4fa6-b604-615020c44431 /home                   btrfs   subvol=home,compress=zstd:1 0 0

All either on /dev/nvme0n1 or /dev/nvme1n1. Both being internal SSDs and are allways connected.

lsblk -f:

NAME        FSTYPE FSVER LABEL          UUID                                 FSAVAIL FSUSE% MOUNTPOINTS
zram0                                                                                       [SWAP]
nvme0n1                                                                                     
├─nvme0n1p1 vfat   FAT32                45D1-C04B                             591,3M     1% /boot/efi
├─nvme0n1p2 ext4   1.0                  469e35dd-7caa-48ad-95ec-99d5b51f7d5d    549M    37% /boot
└─nvme0n1p3 btrfs        fedora_desktop 14580574-bfa2-4fa6-b604-615020c44431    367G    60% /home
                                                                                            /
nvme1n1                                                                                     
└─nvme1n1p1 btrfs        fedora_desktop 14580574-bfa2-4fa6-b604-615020c44431                

How do you fix this when it happens?

Not exactly “fixing” but ignoring: Exit emergency mode with Ctrl+D. Boot continues without problems.

I wonder if it takes longer than expected for one of the disks to show up via udev?

Maybe you can find evidence in the system journal for the boot.

I’m not really sure what I’m searching for but given all stuff related to nvme comes before the failed fsck service I guess not?

Feb 01 06:10:56 mini kernel: nvme nvme0: 16/0/0 default/read/poll queues
Feb 01 06:10:56 mini kernel:  nvme0n1: p1 p2 p3
Feb 01 06:10:56 mini kernel: nvme nvme1: allocated 64 MiB host memory buffer.

[... some more usb stuff found...]

Feb 01 06:10:56 mini kernel: nvme nvme1: 16/0/0 default/read/poll queues
Feb 01 06:10:56 mini kernel:  nvme1n1: p1
Feb 01 06:10:56 mini kernel: BTRFS: device label fedora_desktop devid 1 transid 1286505 /dev/nvme0n1p3 (259:3) scanne
d by (udev-worker) (543)
Feb 01 06:10:56 mini kernel: BTRFS: device label fedora_desktop devid 2 transid 1286505 /dev/nvme1n1p1 (259:5) scanne
d by (udev-worker) (585)
Feb 01 06:10:56 mini systemd[1]: Found device dev-disk-by\x2duuid-14580574\x2dbfa2\x2d4fa6\x2db604\x2d615020c44431.de
vice - KINGSTON OM8PGP41024Q-A0 fedora_desktop.
Feb 01 06:10:56 mini systemd[1]: Reached target initrd-root-device.target - Initrd Root Device.
Feb 01 06:10:56 mini systemd[1]: systemd-fsck-root.service: Bound to unit dev-disk-by\x2duuid-14580574\x2dbfa2\x2d4fa
6\x2db604\x2d615020c44431.device, but unit isn't active.

Next thing about the drives is hours later after Ctrl+D’ing

Feb 01 13:45:16 mini systemd[1]: dracut-pre-mount.service - dracut pre-mount hook was skipped because no trigger cond
ition checks were met.
Feb 01 13:45:16 mini systemd[1]: Starting systemd-fsck-root.service - File System Check on /dev/disk/by-uuid/14580574
-bfa2-4fa6-b604-615020c44431...
Feb 01 13:45:16 mini systemd[1]: Finished systemd-fsck-root.service - File System Check on /dev/disk/by-uuid/14580574
-bfa2-4fa6-b604-615020c44431.
Feb 01 13:45:16 mini systemd[1]: Mounting sysroot.mount - /sysroot...
Feb 01 13:45:16 mini kernel: BTRFS info (device nvme0n1p3): first mount of filesystem 14580574-bfa2-4fa6-b604-615020c
44431
Feb 01 13:45:16 mini kernel: BTRFS info (device nvme0n1p3): using crc32c (crc32c-intel) checksum algorithm
Feb 01 13:45:16 mini kernel: BTRFS info (device nvme0n1p3): using free-space-tree
Feb 01 13:45:17 mini systemd[1]: Mounted sysroot.mount - /sysroot.
Feb 01 13:45:17 mini systemd[1]: Reached target initrd-root-fs.target - Initrd Root File System.

[... lots of stuff ...]

Feb 01 13:45:17 mini kernel: BTRFS info (device nvme0n1p3 state M): use zstd compression, level 1

A bit weired, that there is only one entry with the device Feb 01 06:10:56 mini systemd[1]: Found device dev-disk-by\x2duuid-14580574\x2dbfa2\x2d4fa6\x2db604\x2d615020c44431.de vice - KINGSTON OM8PGP41024Q-A0 fedora_desktop nothing about the other SSD but maybe thats normal on btrfs raid 1?

Do you have a hint for a search term to look for or would sharing the full rdsosreport or journal help?

I would compare a good boot’s journal with a bad boot’s journal.
Are there logs showing the devices being detected?

Also wonder if your initramfs has some how becoming out of date.

Maybe force it to be rebuilt with sudo dracut --force and see if that fixes it.

Hm a good boot looks the same regarding detecting the ssds (apart from the fsck unit not failing ofc).

Did the sudo dracut --force. By the current rate that happens I will see about middle of next month if it happens again or if it fixed it xD

1 Like

What should be true is /usr/lib/udev/rules.d/64-btrfs.rules waits indefinitely for all Btrfs member devices to be present before systemd will even try to mount.

There’s separate debug parameters for udev and systemd. These are sufficiently expensive debug that they can expose a race condition - i.e. when you enable debug the problem will never happen. Funny. But it might be worth finding out if it catches some other problem that explains what’s going on.

I suggest using them separately, they’re both pretty verbose. I’d probably start with udev.

I think this is the boot parameter to use udev.log-priority=debug

1 Like