Black screen after GRUB on AMD desktop with all kernels later than 6.2.15 on F37 and F38; (no nvidia; KVM switch in use)

Hi,

So I updated my Fedora 37 desktop and laptop installations and the desktop with an AMD 6800XT doesn’t have any display after GRUB but the laptop with a Nvidia 1050ti works fine. It boots into the Gnome DE and works fine. I have LUKS installed and I tried typing my password into the black screen and after hitting enter I can see my HDD light working but after 5 minutes or so still a black screen. I didn’t try to log in I just pressed the reset button. So upon the next boot I chose 6.2.15-200.fc37 from GRUB and it worked fine. I did journalctl -b -1 and compared it side by side to journalctl -b but I couldn’t really see any major differences. There were more bright yellow and red entries in the kernel that was working…

I could see that the system did boot though, it is as if the system is running but with the screen turned off. When I start the system I see the motherboard vendor / BIOS and then the monitor goes off, normally I would see Plymouth asking for me LUKS password. I’ve done some Googling but have only really found similar issues for people with Fedora 38, so nothing helpful so far. Happy to provides logs. System specs are:

inxi -Fz

System:
  Kernel: 6.2.15-200.fc37.x86_64 arch: x86_64 bits: 64 Desktop: GNOME v: 43.5
    Distro: Fedora release 37 (Thirty Seven)
Machine:
  Type: Desktop Mobo: Gigabyte model: B550 AORUS PRO V2 serial: N/A
    UEFI: American Megatrends LLC. v: F15 date: 01/12/2023
CPU:
  Info: 8-core model: AMD Ryzen 7 5800X bits: 64 type: MT MCP cache: L2: 4 MiB
  Speed (MHz): avg: 2500 min/max: 2200/4850 cores: 1: 2200 2: 2200 3: 2200
    4: 2200 5: 3800 6: 2200 7: 2200 8: 2200 9: 2200 10: 3800 11: 2200 12: 2200
    13: 2200 14: 3800 15: 2200 16: 2200
Graphics:
  Device-1: AMD Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] driver: amdgpu
    v: kernel
  Display: server: X.Org v: 1.20.14 with: Xwayland v: 22.1.9 driver: X:
    loaded: amdgpu unloaded: fbdev,modesetting,radeon,vesa dri: radeonsi
    gpu: amdgpu resolution: 2560x1440~165Hz
  API: OpenGL v: 4.6 Mesa 23.0.3 renderer: AMD Radeon RX 6800 XT (navi21
    LLVM 15.0.7 DRM 3.49 6.2.15-200.fc37.x86_64)
Audio:
  Device-1: AMD Navi 21/23 HDMI/DP Audio driver: snd_hda_intel
  Device-2: AMD Starship/Matisse HD Audio driver: snd_hda_intel
  API: ALSA v: k6.2.15-200.fc37.x86_64 status: kernel-api
Network:
  Device-1: Realtek RTL8125 2.5GbE driver: r8169
  IF: eno1 state: up speed: 1000 Mbps duplex: full mac: <filter>
  IF-ID-1: wg-mullvad state: unknown speed: N/A duplex: N/A mac: N/A
RAID:
  Device-1: md127 type: mdraid level: mirror status: active size: 3.64 TiB
    report: 2/2 UU
  Components: Online: 0: sdc1 1: sdb1
Drives:
  Local Storage: total: 10.01 TiB used: 3.93 TiB (39.3%)
  ID-1: /dev/nvme0n1 vendor: Samsung model: SSD 970 EVO Plus 1TB
    size: 931.51 GiB
  ID-2: /dev/sda vendor: Crucial model: CT2000BX500SSD1 size: 1.82 TiB
  ID-3: /dev/sdb vendor: Western Digital model: WD40EFZX-68AWUN0
    size: 3.64 TiB
  ID-4: /dev/sdc vendor: Western Digital model: WD40EFZX-68AWUN0
    size: 3.64 TiB
Partition:
  ID-1: / size: 97.86 GiB used: 31.17 GiB (31.9%) fs: ext4 dev: /dev/dm-0
  ID-2: /boot size: 973.4 MiB used: 550.8 MiB (56.6%) fs: ext4
    dev: /dev/nvme0n1p2
  ID-3: /boot/efi size: 511 MiB used: 17.4 MiB (3.4%) fs: vfat
    dev: /dev/nvme0n1p1
  ID-4: /home size: 815.89 GiB used: 372.28 GiB (45.6%) fs: ext4
    dev: /dev/dm-1
Swap:
  ID-1: swap-1 type: zram size: 8 GiB used: 5.34 GiB (66.8%) dev: /dev/zram0
Sensors:
  System Temperatures: cpu: 49.2 C mobo: 35.0 C gpu: amdgpu temp: 43.0 C
  Fan Speeds (RPM): fan-1: 0 fan-2: 0 fan-3: 846 gpu: amdgpu fan: 633
Info:
  Processes: 567 Uptime: 2d 2h 4m Memory: available: 31.23 GiB
  used: 7.97 GiB (25.5%) Shell: Bash inxi: 3.3.27

I am just booting 6.2.15-200.fc37.x86_64 because it works. I haven’t had any kernel issues since Fedora 35 - kernel 5.16.* removes monitor resolutions automatically detected in kernel-5.15.* which is a long time ago, F35 days and F36 solved it. Does anyone have any suggestions on what to troubleshoot?

Thanks,

bc.

This could be linked to

and the related bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=2212012

Be aware that the topic is currently blurred between the issue with nvidia and the issue without nvidia. Initially, it was not clear that there are (at least) two issues. Focus on the points without nvidia (the nvidia issue is solved anyway).

I am aware that you have amd graphics, while the other user has Intel graphics, but it could still be linked.

You could also try what I suggested at this post (concerning trying without the rhgb quiet options to get some more data) and let us know: Fedora hangs on boot after upgrading to kernel 6.3.4 - #35 by py0xc3


Does the corrupted boot with 6.3.5 log in the journal? Can you provide that?

Thanks for the reply. Yeah I did read that post and the bug report and dismissed it since I am on F37, using AMD, and I don’t see an underscore just a black screen. But yeah it could be related like you say. Funny that you say there is an Nvidia issue as my Nvidia based laptop works fine with those kernels.

Yes it does seem to boot but without any display. I’ve pasted that into https://pastebin.com/2mhQ6BsY which is private and will expire in 1 month. The end of the log is when I pressed the reset button.

The kernel is the same. So it can still apply.

It does not necessarily apply to all nvidia cards. Also, there might be differences in configurations and especially driver versions. If your nvidia works fine without the noted issue, ignore the nvidia issue stuff :wink:

Indeed, the logs you provided are from a 6.3.5 kernel. In this case, it could be a different issue. Or alternatively, F37 behaves different with the same issue than F38 (or some other software/hardware).

I have currently no time to tackle another issue, but when skimming your log, I see drm issues, which was also the origin of the nvidia problem.

With this in mind, I suggest to test the kernel 6.3.6, just to see if it maybe also solves your issue before putting more work in that than necessary: Please read my explanation in the other thread carefully: this one. However, the bodhi link I provided in the other thread does not apply to you. The page for you is: https://bodhi.fedoraproject.org/updates/FEDORA-2023-70b0935c41 (since you have F37, and the other user F38). The dnf command I referred to in my explanation shall appear on that page in 1-2 days.

If that does not solve the issue, please file a bug report, which I also explained in the other topic (here). Please read carefully what information you need to provide (and add the journal log!). If you opened a bug report, please post the link to the report here.

Ah that makes sense.

Yeah I saw them too but honestly I don’t know what that means so I didn’t mention it.

Thanks for that, I had a read and went to https://bodhi.fedoraproject.org/updates/FEDORA-2023-70b0935c41 and installed kernel 6.3.6 like you suggested and tried booting it but I had the same issue. First boot is motherboard vendor screen followed by black, subsequent boot is motherboard vendor screen followed by GRUB followed by black screen after choosing which kernel to boot. This time I was able to enter my LUKS password, give it a minute, switch to ALT CTRL F3 and log in as root and then reboot. So again it seems like everything is running except the video output.

Thanks for the advice, I’ve created 2213179 – 6.2.15-200.fc37 is the last kernel that boots with a display. All newer kernels have a black screen. AMD GPU. and included a link back to here. The logs from kernel 6.3.6 are at Bugzilla.

Thank you for submitting the bug report. I am having same issue on my F37 desktop with open source AMD drivers (RX 470).

@blindcant what is your output of cat /proc/sys/kernel/tainted? It seems the bug report template no longer asks by default for data that contains this information.

cat /proc/sys/kernel/tainted
512

On which kernels do you have this result? Please check both 6.2.15 and 6.3.6 (since you can still login with terminal at 6.3.6, right?). Is this a permanent state at each boot? If so, write a short note to the bug report that cat /proc/sys/kernel/tainted remains permanently at 512 (and on which kernels it does that).

It would be indicative if such a value also exists on 6.2.15.


Supplement: Assuming that at least 6.3.6 has the 512 return, I assume the origin is:

Jun 07 20:00:02 desktop kernel: ACPI Warning: SystemIO range 0x0000000000000B00-0x0000000000000B08 conflicts with OpRegion 0x0000000000000B00-0x0000000000000B0F (\GSA1.SMBI) (20221020/utaddress-204)
Jun 07 20:00:02 desktop kernel: ACPI: OSL: Resource conflict; ACPI support missing from driver?

If 6.3.6 returns 512, feel free to add these lines as well to the bug report together with the short note mentioned above, just to provide some overview of the existing kernel warning.

Of course it would be interesting to know if that also exists on 6.2.15 (just like its tainted condition)


@bluepixels feel free to check on your machine if things are comparable.

This was on kernel 6.2.15-200.fc37.x86_64.

I can login but I have no screen output at anytime so I can’t see what I am doing. I guess I can just pipe the command to a file and reboot into the kernel that works and check the file.

Sorry I am not sure what that means. How would I know if this is permanently tainted?

I will run this command a bit later today when I get some time and I will report back. Thanks again for your time and help so far.

EDIT: I ran the command on 6.3.6 and piped it to a file it returned 0.

I misunderstood you. I thought you could still use the terminal. However, we have the result :wink:

It seems your 6.2.15 kernel also has some issue, even if it seems to work. It would be interesting to know what this is, and if there could be a link (although I assume the answer is no if your 6.3.6 returns 0).

This means, is the output always the same, even in between boots. So, reboot, and check again. If the output of the 6.2.15 kernel remains 512, it would be good to have also the output of journalctl -k while you run 6.2.15 (so to get the 6.2.15 kernel logs).

So, just reboot, select 6.2.15, check the cat /proc/sys/kernel/tainted command, and if its value is still 512, then directly/immediately get the output of journalctl -k. Since 6.3.6 returns 0, I think this is something different. But just to be sure, I would have a short look on it. Then we can decide if it makes sense to add this to the bug report.

A supplement to my last post: We just found out that the kernel bug report template seems to be broken. I assume it did not ask you for the dmesg.

Could you get the output of dmesg of 6.3.6 ? You can again pipe it to a file when 6.3.6 is running.

Just add a link/file to the bug report (please do not paste the content as a comment).

I just ran cat /proc/sys/kernel/tainted on 6.2.15 and it returned 0 this time. This is a different boot instance to the last time I ran the command since I booted into 6.3.6 to get the last taint report.

Yeah it didn’t ask me for dmesg, I will reboot into 6.3.6 and get the output of dmesg and report back with the attachment.

EDIT: The taint for 6.3.6 is still 0 and I will attach the output of dmesg into the bug report. The taint for 6.2.15 for this boot is 0 as well.

EDIT 2: @py0xc3 Something I forgot to mention is I have a KVM switch for Display Port 1.2 standard. All of these tests so far have been via the KVM switch. I can run this again without the KVM switch if necessary.

If this was a one time issue, then let’s skip it for now. If it does not cause issues and if it is no permanent phenomenon, it was likely only a single occurrence that made the kernel for once triggering a warning. This can be sometimes caused just by plugging something in that the kernel does not like (and where it becomes active to mitigate some behavior or so).

Good to have the dmesg there. Thanks. I hope the kernel report template is back in place soon, but I guess this will be fixed soon.

Well, behavior/issues in the kernel can have many outreaches/impacts within/outside the kernel. This is one reason why finding bugs in an operating system kernel is such a tricky task. I have seen freezes that occurred when the screen was turned on, and the origin was the WiFi driver :wink:

I think it is unlikely that this is the origin, but since this is not a widespread combination while you are currently the only user experiencing this issue on F37, I think it is worth a try. You have to know that screens are today actively interacting with the kernel and no longer just “passive receivers” of data. So things that are related to data transfers within HDMI/DP can be indeed related to such issues.

So, remove the KVM switch when the machine is off. And then once boot 6.3.6 and see if it makes a difference.

There is a solution for a comparable bug report:

Can you boot once with 6.3.6 and with the additional parameter module_blacklist=ucsi_acpi ?

E.g., you can in the grub menu go to 6.3.6 and click on “E”. Then there is one line that begins with “options”. Go to the end of this line and add there module_blacklist=ucsi_acpi (of course with a space before). Then do CTRL+X to boot with this parameter.

Let us know if that makes a difference or not.

That line actually is the one that begins with linux.

No. It begins with options. For example: options root=UUID=<UUID> ro rootflags=subvol=root rd.luks.uuid=luks-<UUID> rhgb quiet amd_pstate=active module_blacklist=ucsi_acpi → that’s from a system
that normally blacklists a network driver, which I changed to ucsi_acpi.

Whatever text the users have before the blacklisting, they only need to
append the blacklisting parameter to the end of the line. Everything
else shall remain as it is in their file (the above is only an example).

On my fedora workstation system (F38 with kernel 6.3.6 and using grub to boot) I see this. I have never in all my time with using fedora ever seen a line that begins with “options”

This looks like this installation has not been reinstalled from scratch
for long, or strongly customized. I haven’t seen something like that for
long. That could have been default when all entries were consolidated in
one grub file, but I think that was changed already several years ago
(don’t exactly remember how it looked like back then).

Feel free to check a new installation in a VM or so. See also:
https://fedoraproject.org/wiki/Changes/BootLoaderSpecByDefault

A modern Fedora entry file looks like that:

title Fedora Linux (6.3.5-200.fc38.x86_64) 38 (KDE Plasma)
version 6.3.6-200.fc38.x86_64
linux /vmlinuz-6.3.6-200.fc38.x86_64
initrd /initramfs-6.3.6-200.fc38.x86_64.img
options root=UUID=<UUID> ro rootflags=subvol=root rhgb quiet
grub_users $grub_users
grub_arg --unrestricted
grub_class fedora

This is default as set from Fedora (more precisely, Anaconda) today in
one of my installations; I just replaced the UUID. And this is not KDE
specific.

If your installation is very old, it might be relevant to note that this
is the whole content of one entry file as stored in /boot/loader/entries
→ this has not always been the case that way, but I think this default
has been set already several years ago (but don’t remember exactly when
it was).

You can check the config files for grub in /etc/ (with which it creates
new entry files) in current installations, which by default add
parameter through “options” (the wiki page above also elaborates that
partly).

Another explanation for your case could be if you used the network
installer or another installer that is shipped with blank/unaligned
configs (network installer does not come with the aligned configuration
as determined by Fedora’s WGs/SIGs). I don’t know how these configs look
like if installed with that.

Once again I have to disagree. I have a VM that was installed new about a week ago with fedora 38 using the default first release ISO for F38, and the entry from pressing “e” at the grub menu shows this.

Unless the entries for KDE are significantly different than workstation our grub menus show radically differing content from the screen during boot.

It may be you are giving content from a different file or location, but the grub menu clearly does not show an “Options” line for a standard clean install on Workstation.

I also only have the default /etc/default/grub and the default entries in /etc/grub.d

However, within the file located at /boot/loader/entries/UUID-6.3.6-200.fc38.x86_64.conf I do see the content as you show. That file is not accessible for edit from the grub menu during boot, and that content seems created when running grub2-mkconfig to recreate the grub.cfg file.

The “linux” line from the grub menu seems a combination of the “linux” line and the “options” line in the matching .conf file under /boot/loader/entries.