Hi there,
I followed the instructions from the Fedora Magazine to upgrade from 29 to 30, which was the first DuckDuckGo link for fedora update 29 30.
sudo dnf upgrade --refresh
sudo systemctl reboot
Then after the reboot:
sudo dnf install dnf-plugin-system-upgrade
sudo dnf system-upgrade download --releasever=30
sudo dnf system-upgrade reboot
At this point, the system rebooted. When it loaded the kernel, the screen went blank. I could not access any other terminal using control+alt+{F1 through F7}. I also could no longer access the machine via SSH.
Previous updates on this system took about 10 minutes. After an hour, I hard-reset the machine. The screen still blanks after loading the kernel, which now says is Fedora 30. There are three other kernels.
Fedora (5.1.16-300.fc30.x86_64) 30 (Thirty)
Fedora (5.1.16-200.fc29.x86_64) 29 (Twenty Nine)
Fedora (5.1.11-200.fc29.x86_64) 29 (Twenty Nine)
Fedora (0-rescue-4a865d530b4c42be9d6878c47f7dc5d1) 30 (Thirty)
System setup
Loading the first kernel blanks the screen and the system becomes effectively unresponsive. Loading the second kernel (fedora 29 on 5.1.16) exhibits the same symptom. Loading the third kernel (fedora 29 on 5.1.11) blanks the screen and then reboots the machine after about 5 minutes. The fourth (rescue) kernel appears to boot fine; it drops me into emergency/maintenance mode and asks me to log in to root. The system setup brings me to the system’s UEFI configuration screen. Sometimes, booting one of the bad kernels will reboot the system after a few minutes.
The rescue kernel suggests I could do journalctl -xb
to look for trouble. Of course, that’s rather pointless since the trouble isn’t in the rescue kernel. I’m not sure why the rescue kernel suggests users do something pointless.
After a few minutes of searching the wiki, I found this page describing the upgrade process better than Fedora Magazine did. It also included some troubleshooting steps.
Running rpm --rebuilddb
did nothing useful.
Running dnf distro-sync
is even less useful: it complains that there’s no network. I guess the rescue kernel doesn’t load the network. Why is that even recommended if it’s not going to do anything?
Touching /.autorelabel
didn’t do anything either: the system still blanks the screen and fails to boot anything other than the rescue kernel.
I have run fsck
and it does not report any errors. I can see my /home
directory and the files are there.
I tried inserting a USB drive to copy data off of the machine, but it doesn’t appear to automount. It doesn’t appear that the USB device is added under /dev/usb*
nor /dev/disk/by-path
. This isn’t really a big deal, the data isn’t terribly important, and I have a backup, and if push comes to shove I can just migrate the disk to another machine and mount it there. But it is rather annoying and maybe some good (learning, fix bugs, whatever) will come of triaging this problem.
I also found this page on the wiki which is, honestly, better than the one (garbage page) in the documentation. It has a hell of a lot more useful information.
When I run rpmconf -a
, bash complains that rpmconf
cannot be found. I assume it’s in a package that I can’t install because the network subsystem isn’t loaded. I can’t tell you how annoying it is to have irrelevant troubleshooting suggestions.
When I run dnf check
, I finally (after hours of poking around on the internet for ideas) see some useful information.
amdgpu-dkms-19.10-785425.el7.noarch has missing requires of amdgpu-core
libdrm-amdgpu-1:2.4.97-785425.el7.noarch has missing requires of amdgpu-core
libdrm-amdgpu-common-1.0.0-785425.el7.noarch has missing requires of amdgpu-core
libwayland-amdgpu-client-1.15.0-785425.el7.noarch has missing requires of amdgpu-core
libwayland-amdgpu-egl-1.15.0-785425.el7.noarch has missing requires of amdgpu-core
libwayland-amdgpu-server-1.15.0-785425.el7.noarch has missing requires of amdgpu-core
llvm-amdgpu-libs-1:7.1-785425.el7.noarch has missing requires of amdgpu-core
mesa-amdgpu-filesystem-1:18.3.0-785425.el7.noarch has missing requires of amdgpu-core
mesa-amdgpu-libEGL-1:18.3.0-785425.el7.noarch has missing requires of amdgpu-core
mesa-amdgpu-libGL-1:18.3.0-785425.el7.noarch has missing requires of amdgpu-core
mesa-amdgpu-libGLES-1:18.3.0-785425.el7.noarch has missing requires of amdgpu-core
mesa-amdgpu-libOSMesa-1:18.3.0-785425.el7.noarch has missing requires of amdgpu-core
mesa-amdgpu-libgbm-1:18.3.0-785425.el7.noarch has missing requires of amdgpu-core
mesa-amdgpu-libglapi-1:18.3.0-785425.el7.noarch has missing requires of amdgpu-core
mesa-amdgpu-libxatracker-1:18.3.0-785425.el7.noarch has missing requires of amdgpu-core
For context, I have an AMD RX 480 installed in this machine. I assume that’s the problem. I assume that the driver didn’t live through the upgrade process. That’s not very nice . I assume that the driver not loading causes the kernel to just sit there and be angry instead of continuing to load a headless environment. But I don’t seem to have a way to inspect and debug what’s going on.
I can’t reinstall the drivers because the driver installer runs DNF to check for stuff and of course fails because the network subsystem isn’t loaded. So I opted to uninstall the drivers: dnf remove $(dnf check | cut -d ' ' -f 1)
That command appeared to be successful; after it finished, the Fedora boot logo did its thing again and then I was dropped back into emergency mode asking me to log into root again. Honestly that was kind’ve weird, but whatever. The machine did not reboot automatically: I did not see a UEFI BIOS nor the kernel selection screen. So I did systemctl reboot
.
However, that did not solve my problem: the screen still blanks after loading any kernel other than the rescue one. I still cannot access any other terminal using control-alt-{F1 through F7}. I still cannot remote into the machine using SSH.
At this point, I’m at my wit’s end and I don’t know what to do further. I could easily be convinced that the system is hosed and that I should reinstall; that certainly seems like the easy way out.
I’m a software developer by trade. I am not afraid of getting my hands dirty with technical details. But I don’t know enough about tools available for use during emergency mode to debug kernel or driver issues. Any ideas would be appreciated.