Fedora 34 froze after failed update, now won't boot

tl;dr a failed dnf update has left me unable to boot. I can chroot in from a Live session butdnf undo/rollback fails due to a known bug. Tempted to run dnf distro-sync but don’t want to make things worse. How can I get my system back?


I ran a dnf update and came back to find it had failed and then shortly after the machine must have frozen. I could not see enough details of the failure as they had scrolled up the terminal already. I rebooted and now on every boot I see a number of failures and cannot get to login. The first of the errors reads:

...
[FAILED] Failed to listen on Load/Save RF Kill Switch Status /dev/rfkill Watch
...

…but there are tons more FAILED after that, the system looks to be in a lot of trouble.

I have tried single user mode and emergency shell but I cannot get into them due to the lack of a root password, which seems to have been discussed here also: https://discussion.fedoraproject.org/t/howto-cannot-open-access-to-console-the-root-account-is-locked-in-emergency-mode-dracut-emergency-shell/2010/22

However with a Fedora Live session and chroot (as suggested by vgaetera in https://discussion.fedoraproject.org/t/fedora-33-upgrade-broken-system-after-35-hours/9989/2) I am though able to get into the broken system.

dnf history shows me the failure:

[root@localhost-live /]# dnf history 
ID     | Command line                                                    | Date and time    | Action(s)      | Altered 
---------------------------------------------------------------------------------------------------------------------- 
    63 | upgrade -y                                                      | 2021-11-11 20:45 | ?, E, I, U     |   72 ## 

…and the top of the dnf history info on that reads:

[root@localhost-live /]# sudo dnf history info 63 
Transaction ID : 63 
Begin time     : Thu 11 Nov 2021 08:45:51 PM GMT 
Begin rpmdb    : 2600:c85641ebe1986a1db54b98fac09b77151d13f366 
End time       : Thu 01 Jan 1970 01:00:00 AM BST (-1636663551 seconds) 
End rpmdb      :  ** 
User           :  <USER> 
Return-Code    : Failure: 1 
Releasever     :  
Command Line   : upgrade -y 
...

I attempted to dnf history undo this bad transaction but hit this bug: https://bugzilla.redhat.com/show_bug.cgi?id=2010259 that means dnf cannot undo it as one of the packages simply reports “REASON CHANGE” which dnf chokes on.

I then followed a bit of DNF System Upgrade :: Fedora Docs and found dnf repoquery --duplicates gives me a ton of duplicated packages.

What is the best way out of this? I am tempted to run dnf distro-sync but don’t want to break my system any more than it is already.

rename xorg.conf to yyyymmdd-xorg.conf (or to something else). Fedora should boot back. This sorted mine.

mv xorg.conf 20211113-xorg.conf

You should probably look at, Wine installation get "file conflict with dependencies" error - #6 by vgaetera

Which seems to be the direction you’re headed.

Before that since you have access I would back up anything you need off the system in case it doesn’t work.

3 Likes

Thanks for the suggestions. Happy to say I recovered the system mostly following the process suggested by @grumpey.

For anyone else in this situation it took a bit of fiddling to get chroot working ok in the Fedora Live session. First had to unlock my drive in GNOME Disks then once I knew what was what with fdisk -l I did this:

sudo mount /dev/mapper/luks-XXXXX /mnt/ -r -t btrfs -o subvol=root
sudo mount /dev/mapper/luks-XXXXX /mnt/home -r -t btrfs -o subvol=home
sudo mount -r /dev/nvme0n1p1 /mnt/boot/
sudo mount -o bind /dev /mnt/dev
sudo mount -o bind /proc /mnt/proc
sudo mount -o bind /sys /mnt/sys
sudo mount -o bind /run /mnt/run
sudo mount -o bind /tmp /mnt/tmp
sudo chroot /mnt

Drop the -r flags if you want write access. I made a btrfs snapshot at this point so I could return to this stage in case I messed things up further.

Then I thought I’d try a dnf update first on the off-chance something as simple as that would fix it and it managed to complete ok. It was clearly picking up from the aborted update and it did find an aborted download which might be a clue:

Downloading Packages: 
...
(2/26): distribution-gpg-keys-1.59-1.fc34_1.60-1.fc34.noarch.drpm                     299 kB/s |  49 kB     00:00
/usr/share/distribution-gpg-keys/adobe/RPM-GPG-KEY-adobe-linux: read error          ] 507 kB/s |  15 MB     08:23 ETA 
(tried to read 1726 bytes from offset 0) 
cannot reconstruct rpm from disk files 
...
Some packages were not downloaded. Retrying. 
distribution-gpg-keys-1.60-1.fc34.noarch.rpm

In the middle of the update there was also a bunch of these:

/sbin/ldconfig: File /lib64/libhpmud.so.0.0.6 is empty, not checked. 
/sbin/ldconfig: File /lib64/libbind9-9.16.22-RH.so is empty, not checked. 
/sbin/ldconfig: File /lib64/libdns-9.16.22-RH.so is empty, not checked. 
/sbin/ldconfig: File /lib64/libirs-9.16.22-RH.so is empty, not checked. 
/sbin/ldconfig: File /lib64/libisc-9.16.22-RH.so is empty, not checked.

Which I don’t know looks like some files were left empty from the early aborted update? Anyway this dnf update finished ok. After this I ran dnf remove --duplicates which removed a ton of stuff. Rebooted and back in business.

Looking through the logs from when the system froze it appears after/during the broken update a few services stopped such as fprintd, systemd-hostnamed and systemd-timedated which sound fundamental. This is the very last logs before the freeze:

Nov 11 20:44:57 python3[459153]: ansible-command Invoked with warn=False _raw_params=dnf upgrade -y _uses_shell=False stdin_add_newline=True strip_empty_ends=True argv=None chdir=None executable=None creates=None removes=None stdin=None
Nov 11 20:45:18 systemd[1]: fprintd.service: Deactivated successfully.
Nov 11 20:45:18 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=fprintd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 11 20:45:27 systemd[1]: systemd-hostnamed.service: Deactivated successfully.
Nov 11 20:45:27 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-hostnamed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 11 20:45:27 audit: BPF prog-id=196 op=UNLOAD
Nov 11 20:45:27 audit: BPF prog-id=195 op=UNLOAD
Nov 11 20:45:27 systemd[1]: systemd-timedated.service: Deactivated successfully.
Nov 11 20:45:27 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-timedated comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 11 20:45:27 audit: BPF prog-id=199 op=UNLOAD
Nov 11 20:45:27 audit: BPF prog-id=198 op=UNLOAD
Nov 11 20:45:27 audit: BPF prog-id=197 op=UNLOAD
Nov 11 20:46:15 kernel: SELinux:  Converting 857 SID table entries...
Nov 11 20:46:15 kernel: SELinux:  policy capability network_peer_controls=1
Nov 11 20:46:15 kernel: SELinux:  policy capability open_perms=1
Nov 11 20:46:16 dbus-broker-launch[2312]: avc:  op=load_policy lsm=selinux seqno=2 res=1

Looking at DNF’s logs of the broken update they stop just after the transaction begins:

...
2021-11-11T20:45:47+0000 INFO Running transaction check
2021-11-11T20:45:47+0000 INFO Transaction check succeeded.
2021-11-11T20:45:47+0000 INFO Running transaction test
2021-11-11T20:45:51+0000 INFO Transaction test succeeded.
2021-11-11T20:45:51+0000 DDEBUG timer: transaction test: 4299 ms
2021-11-11T20:45:51+0000 INFO Running transaction
2021-11-11T20:45:53+0000 DDEBUG RPM transaction start.
---END---

Not sure what else I can look at to understand what really happened. Anyway glad I could salvage the system in the end.