We need to know what exactly occurs when u boot. Also, what events occur before the error, which then end in the error
journalctl --boot=-1 -n 700 journalctl --boot=0
→ 0 = journal current boot
→ -1 = journal last boot (likely to be irrelevant, but just to exclude that something related occurs when shutting down); -n 100 means just the last 700 lines instead of the whole boot.
The root account is by default locked in the current Fedora installations (which is why you use sudo), although I don’t know how the current default behaves when you need to enter the emergency mode (I still use the root account).
Also, I would question whether gdm or gnome (especially if they later start properly) can cause an emergency mode. Let’s check the journal logs
Hi, from your journal, I start searching with “emergency” and got:
1897 Jan 29 09:53:53 systemname systemd: Started Emergency Shell.
After that I tried to read what was happen before above messages. From my skimming, there some warning related to drives as bellow:
1717 Jan 29 09:52:24 systemname lvm: WARNING: Couldn't find device with uuid LG0m0F-Tg51-9UPO-3Elk-Hzvr-0TAQ-FUEPd6.
1718 Jan 29 09:52:24 systemname lvm: WARNING: Couldn't find device with uuid 9xH5Vr-uypK-RPHz-wV8s-ouLa-03PJ-iuaI76.
1719 Jan 29 09:52:24 systemname lvm: WARNING: VG home_vg is missing PV LG0m0F-Tg51-9UPO-3Elk-Hzvr-0TAQ-FUEPd6 (last written to /dev/mapper/luks-8c6d63ac-8414-4772-a885-3e1c33940d17).
1720 Jan 29 09:52:24 systemname lvm: WARNING: VG home_vg is missing PV 9xH5Vr-uypK-RPHz-wV8s-ouLa-03PJ-iuaI76 (last written to /dev/md0).
1855 Jan 29 09:53:53 systemname systemd: Timed out waiting for device /dev/disk/by-uuid/57980355-78ae-4818-98fc-13b4e3dd8d48.
1858 Jan 29 09:53:53 systemname systemd: dev-disk-by\x2duuid-57980355\x2d78ae\x2d4818\x2d98fc\x2d13b4e3dd8d48.device: Job dev-disk-by\x2duuid-57980355\x2d78ae\x2d4818\x2d98fc\x2d13b4e3dd8d48.device/start failed with result 'timeout'.
Maybe first you want to compare /etc/fstab configuration with current partition layout from lsblk -f.
Oprizal has already mentioned the warning that follows.
The occurrence that then finally leads to the emergency mode starts with the medium_vg error at line 1800:
Jan 29 09:53:53 systemname systemd: dev-mapper-medium_vg\x2dvar00.device: Job dev-mapper-medium_vg\x2dvar00.device/start timed out.
Jan 29 09:53:53 systemname systemd: Timed out waiting for device /dev/mapper/medium_vg-var00.
Jan 29 09:53:53 systemname systemd: Dependency failed for File System Check on /dev/mapper/medium_vg-var00.
Jan 29 09:53:53 systemname systemd: Dependency failed for /var.
So, the issue is around nvme0n1p5 and nvme0n1p4 & home_vg and medium_vg. This is where you should start to search.
Additionally to oprival’s suggestions: was fstab changed in the recent days? As you use luks, have changes been made in the crypttab recently?
ls -l /etc/fstab ls -l /etc/crypttab
-l includes the time of the last change.
Have you made yourself changes to your drives/partitions in the recent days? Software or Hardware? Does the error also appear if you boot the previouskernel? The latter refers to the fact that misconfigurations are mostly nothing that the system can fix on itself (after all, the drives become available later on, and then without errors)… just to exclude a bug.
They become already available during the boot process. This is why your system can finally leave emergency mode and boot normally. The question is why they are not available from the beginning. Let’s see how the other kernel behaves
Can you check the log files of the past to identify when the problem has appeared first? So, journalctl --boot=-1 | grep "pvscan PV /dev/nvme0n1p5 online, VG home_vg incomplete" journalctl --boot=-2 | grep "pvscan PV /dev/nvme0n1p5 online, VG home_vg incomplete" journalctl --boot=-3 | grep "pvscan PV /dev/nvme0n1p5 online, VG home_vg incomplete" --boot=-4,-5 and so on until you find the first without the error. I assume the boot where the pvscan issue appears first is also the boot where “WARNING: VG home_vg is missing PV LG0m0F-Tg51-9UPO-3Elk-Hzvr-0TAQ-FUEPd6” appears first? The latter can be tried with the same means.
Date of the first appearance would be interesting. Do you do updates daily? all or just security updates?
The content of lsblk -f would be also helpful to interpret the logs, relations and to make suggestions.
If I got it correctly at first glance, you only have /home encypted? If this is the case (and as I had a loosely comparable issue in the past with it), you may try to remove the encrypted disc in fstab (just add a # in the beginning of the line) and try if it then boots properly. To avoid that a new user dir is created, you should then not login but just check if it boots properly until the login screen. Then shutdown and activate the /home mount again in fstab using a live image or so (or activate the root account temporarily - passwd root - before testing and activate /home after testing directly in the terminal using root, without a live image, then reboot before login). Just to have the information whether this is the origin.
Btw, you may also search on the Internet for the error lines/error output we identified in the journalctl. There is much troubleshooting for issues that contain these errors. Maybe something will help you.
The two devices reported as missing belong to /home and /var. Only /home is encrypted.
I looked at the old logs, and I see that the two devices got reported as missing throughout since I have installed this system. For over two months. So, maybe this is not really the problem. As the two devices always become available later during the boot process.
Maybe the easiest might be to tweak when the boot process “gives up” – if I extend that by x seconds, then booting might work without friction. Do you have any pointers for that?
Feel free to try to increase the timeout. Just add the option in the fstab: x-systemd.device-timeout= → e.g., x-systemd.device-timeout=300s for 300 seconds.
An example line for a btrfs@var in fstab would be: UUID=8b481900-fb7a-4e9e-929c-e940a6b913a4 /var btrfs subvol=var,compress=zstd:1,x-systemd.device-timeout=300s 0 0
or ext3@non-system directory: UUID=2ed453ee-c197-4e76-860b-d8ecf5540576 /export/data ext3 acl,user_xattr,x-systemd.device-timeout=300s 1 2
So, let the respective line as it is, just add the option x-systemd.device-timeout= with a sufficiently high value
You may also check whether the warning Oprisal mentioned was also already present since the installation (… | grep “WARNING: VG home_vg is missing PV LG0m0F”), and also the later errors with medium_vg (… | grep “Dependency failed for File System Check on /dev/mapper/medium_vg-var00” & … | grep “Timed out waiting for device /dev/disk/by-uuid/57980355-78ae-4818-98fc-13b4e3dd8d48”). Independent of the yet unknown origin, the medium_vg events (/var & backup/system) activated the emergency mode. Let’s see if the timeout makes a difference.
One thing I note is that md127 is spread across nvme1n1p2 and nvme0n1p3.
Then home_vg-home is spread across md127 and nvme0n1p5.
Also medium_vg-var is spread across nvme1n1p3 and nvme0n1p4 with somehow medium_vg-var00 and medium_vg-var.snap mixed in there.
I would guess that the system is having a problem sorting out the mdadm factor and the direct partition portions of /home as well as sorting out the mixed parts of /var which causes the delay in coming available for use.
My suggestion would be to clean up the VG and LV arrangements as well as the raid arrangements to allow the system to configure itself faster.
Since the topic came up… In the case that you re-configure your vg/lv/partition arrangements as suggested by JV, you maybe also evaluate an alternative file system for backups. BTRFS is a great thing for system partitions, but for good reasons not recommended for backups/critical data storage.