I am running Fedora 33 on an UEFI laptop with kernel 5.11.16.
I ran an update around midday on April 27th, rebooted my system and was unable to progress beyond Starting Switch Root in the boot up screen. The system hangs essentially forever. If I leave the system at the Starting Switch Root message for many hours I eventually get another message that reads:
systemd[1]: Assertion 'line > 0' failed at src/shared/conf-parser.c:165, function parse_line(). Aborting.
systemd[1]: Caught <ABRT>, dumped core as pid 912.
systemd[1]: Freezing execution.
The update on April 27th includes amongst other things:
kernel
kernel-core
kernel-devel
kernel-modules
kernel-modules-extra
NetworkManager
fedora-release-common
fedora-release-identity-workstation
fedora-release-workstation
python3
I have chroot’d into the system from a live usb and tried to reinstall/reconfigure grub, systemd, the kernel and rebuild my initramfs as possible solutions but nothing has worked. I have also tried running dnf upgrade --refresh to see if a new update would be released to fix my issue.
Booting into older kernels using grub does not solve the issue so I believe something else is at fault.
After removing the quiet rhgb kernel parameters from the boot paramters I have more messages at start up but the last messages shown are still Starting Switch Root, now followed by Welcome to Fedora 33 (Workstation Edition)!.
I am unable to identify what exactly is causing the system to freeze on startup and so I am struggling to debug the situation. My guess would be something in the update broke my system but I don’t have any solid evidence. Any assistance would be greatly appreciated and I am happy to provide any data that would help.
If in addition to removing quiet rhgb, you add rd.debug rd.shell, do you get any helpful hints as to what the problem is? Does it leave you at a command prompt where you can poke around and try to figure out what is wrong?
I edited the grub options and replaced quiet rhgb with rd.debug rd.shell as you suggested. Unfortunately, I didn’t get a shell but I got a LOT more information, though not very useful.
The last few messages are now
systemd[1]: Set hostname
Welcome to Fedora 33 (Workstation Edition)!`
systemd-sysv-generator[1102]: SysV service '/etc/rc.d/init.d/livesys' lacks a native systemd unit file. Automatically generating a unit file for compatiability. Please update package to include a native systemd unit file, in order to make it more safe and robust.
systemd-sysv-generator[1102]: SysV service '/etc/rc.d/init.d/livesys-late' lacks a native systemd unit file. Automatically generating a unit file for compatiability. Please update package to include a native systemd unit file, in order to make it more safe and robust.
zram_generator::generator[1104]: Creating dev-zram0.swap for /dev/zram0 (4096MB)
systemd-sysv-generator[1102]: SysV service '/etc/rc.d/init.d/ctxlogd' lacks a native systemd unit file. Automatically generating a unit file for compatiability. Please update package to include a native systemd unit file, in order to make it more safe and robust.
zram: Added device: zram0
This doesn’t look very useful to me but maybe it is.
Before that I have a lot of messages about SELinux
It looks like the last few things that try to load/run are /etc/rc.d/init.d/livesys and /etc/rc.d/init.d/livesys-late. If this is not a live image (i.e., you clicked “install to hard drive”), I think those services should have been removed during installation. Maybe try removing those files?
I just checked on my system and those files are present. I guess it is normal for them to be left in place.
/etc/rc.d/init.d/ctxlogd, however, is not present on my system. You might try moving that file somewhere else temporarily to see if the system boots without it.
[/home/gregory]$ ls /etc/rc.d/init.d/
functions livesys livesys-late README
You can try adding selinux=0 to the kernel command line to see if the problem is SELinux related. You should relabel your system after running with selinux disabled by running # echo -F > /.autorelabel.
ZRAM seems like a very unlikely culprit, but since it activates close to when the problem occurs, you can also try disabling that with # > /etc/systemd/zram-generator.conf.
One other thing you can try if you know exactly which packages where updated when the problem began is # dnf downgrade <packagename>. You may be able to determine which package(s) where last updated by running # grep 'Installed:' /var/log/dnf.log.
I set selinux=0 as a kernel parameter in grub and it stopped all the selinux logs but it didn’t fix the issue so I didn’t bother relabeling the system. Should I have relabled the system anyway?
I created the empty file /etc/systemd/zram-generator.conf and now ZRAM is not activated as expected but it also did not solve the issue.
I tried rolling back a few packages at the time this first occurred but I had no luck and I don’t know which caused the issue so I didn’t know exactly which packages I should be investigating.
Yes. When SELinux comes back online (by not specifying selinux=0 it can be even more unhappy about files not being labeled correctly).
Sorry, I’m kind of at the end of my list of ideas to try.
Well, maybe one or two more. Can you boot into rescue mode by specifying single on the kernel command line? Once at a rescue prompt on the root file system, you should be able to run rpm -Va to see a (possibly quite long) listing of modified system files. Each file will be prefixed with a code of sorts that will help to indicate how it has been modified. If the problem was caused by modifying a system configuration file of some sort, the file that is the cause of the problem will likely be in this list. You’ll just have to study the list and try to guess as to which one is the most likely culprit (if any). In particular, you want to look at any files that you might remember altering just before the problems started to occur. Below is the key for the first column of the rpm -Va output.
S file Size differs
M Mode differs (includes permissions and file type)
5 digest (formerly MD5 sum) differs
D Device major/minor number mismatch
L readLink(2) path mismatch
U User ownership differs
G Group ownership differs
T mTime differs
P caPabilities differ
P.S. /etc/fstab might be a good candidate as to a config file that could be misconfigured in such a way that it could cause the boot to hang. You might want to check that the contents of that file look correct.
Thank you so much for all the help. I have solved the problem.
The file upower.service had been corrupted and had a reported size of 305GB. This led systemd to fail to parse the file and fail to boot. This is what likely led to the lines
systemd[1]: Assertion 'line > 0' failed at src/shared/conf-parser.c:165, function parse_line(). Aborting.
systemd[1]: Caught <ABRT>, dumped core as pid 912.
systemd[1]: Freezing execution.
mentioned in the first post.
I am unsure why the file became corrupted but replacing it with a good copy fixed the problem immediately.
Thank you again for the help troubleshooting this.