Fedora 38 and Nvidia/Nouveau problems (kernel crashes, second fake monitor and more)

Hello everyone,

I had recently a very special experience with Fedora and Nvidia. Here’s what happened:

I freshly installed* Fedora 38 complete normal w/o any special stuff. The only thing which I did was using this tutorial to sign the Nvidia drivers and use my system with Secure Boot together. So far, so good. But since the last few weeks *(I reinstalled it several times during the weeks), I have problem with the stability of my system. GNOME with Wayland works so far more or less Here’s a list of what happened so far:

GNOME

  • [Wayland + Xorg] with Nouveau / Nvidia drivers installed, I’ve a fake second non-exisiting monitor in my system settings; from what I found it has something to do with “xrandr” (because it calls itself like that if I visit the montior color profile options) which I don’t even have installed on my system

  • [Xorg] trying to change the screen refresh rate crashs the kernel + gnome-shell completely; the error message of the kernel crash tells me that my kernel is “broken” and gives me a P (Properitary module) W (Taint on Warning) O (Out-of-tree kernel) warning out which refers to all Nvidia modules installed (see inserted screenshot of abrt - this was on GNOME and KDE, you’re seeing here the latest one from KDE)

  • [Xorg] I also had temporarily the problem that starting a X11 session didn’t worked. The screen went black and I saw the login screen again. It happened at the same time where my Nvidia drivers stopped to load at all - leading to the scenario that the device security tab in gnome-settings showed that my kernel was verifiable again. But the issue dissapeared on its own after a restart for some reason. abrt created also a crash log for xorg-x11-drv-nouveau.

KDE (I switched yesterday to KDE to see if it helps kinda via a fresh install - I regret it, it was worse)

  • After the first restart from the installation I saw for a second anaconda again with the button “FINISH INSTALLATION”. But the screen went black and… nothing happened. I had to force my PC to shut down completely to reboot the system. After I logged in for the first time I found in abrt several Nouveau (and Kernel maybe - may have deleted them accidentally) crash logs which happened every second (which explains why nothing happened after that. The same thing happened after I used Discover to update my system with the “automatic reboot” option at the bottom. Result: freezing screen, had to force reboot again just to install the updates again. And again: Nouveau and Kernel crash logs again.

  • [Xorg] After I installed the Nvidia drivers again I tried to open a X11 session. The Plasma shell stuttered completely and I was able to see in abrt how it created one Kernel crash log after another (see screenshot above) with the same “Kernel is broken, PWO” messages like in GNOME (see above). It happened literally every second until it stopped then after several minutes. abrt was flooded with over 100 Kernel crash logs.

  • [Wayland] I believe that Wayland had the first few times also some Kernel crashes. Now it just one Kernel crash after I logged in into my account.

  • [Wayland + Xorg] The problem with the second non-existing monitor (see above) exists also here. I just don’t know if it has also to do with “xrandr” or not.

And now I’m sitting here, writing this forum post and have no idea what I should do now. I really love Fedora and wish to continue using it. But I must say that this was the worst experience I had so far with Fedora. And I had already cases where I couldn’t even boot and was just hitted by very bright green screen for example. I appreciate any suggestion I can get. Thanks in advance.

PS: I’ve also a full dmesg log if needed. I just believe that this post is already long enough with infos.

With kind regards
Graphizs

I will say my rig has been reporting very similar problems. I have a post here also where the Nvidia driver is preventing the laptop from waking from sleep but I will often get the crash you reported also.

1 Like

I encountered issues such as being stuck on a black screen, unable to boot, or other situations where I had no control over the system with Nvidia. After reinstalling the OS multiple times because of issues after issues, I decided to give Kubuntu a try for the time being, so far painless.

It’s possible that the current version of Fedora 38 KDE may not be compatible with the latest Nvidia drivers, but I’m not sure.

I feel your pain, graphizs.

1 Like

One issue that many see is lack of patience before they reboot.
Rebooting before the nvidia kernel modules have been completely compiled by akmods and installed can cause corruption and failure to properly load the drivers.

One easy fix is to (from the command line) do the following.

  1. sudo dnf remove 'kmod-nvidia-*' which will remove all the currently built modules (for all kernels).
  2. sudo akmods --force which will rebuild and reinstall the modules for all the currently available kernels.
  3. Wait until the prompt returns then wait another minute or so for certainty that all has been completed.
  4. reboot

If you search this forum you will find many posts about similar problems and the fix for almost all has been either a removal and full new install of the drivers or rebuilding the modules as shown above.

Just as an aside, related to signing the kernel modules on fedora.
The instructions at rpmfusion are much more compact and to the point than the linked tutorial.
Even better is to do dnf install akmods then follow in detail the instructions in the file /usr/share/doc/akmods/README.secureboot before installing the packages from rpmfusion.

The simplest and most complete way to install the drivers from rpmfusion is to enable the rpmfusion repo then dnf install akmod-nvidia xorg-x11-drv-nvidia-cuda which will pull in as dependencies all those extra needed packages that are verbosely listed and not entirely correct in the tutorial linked.

1 Like

Thanks for the suggestion Jeff, but I can say that I do sudo akmods --force everytime I install the Nvidia drivers. It’s part of the mentioned tutorial which I use to install Nvidia drivers together with Secure Boot. This includes also the part where I import the certs via mokutil (which is mentioned in the README.secureboot document).

But still, thanks for the suggestion.

Note that the tutorial is based on the posters usage with F36 and not up to date with the way things are installed today. It also seems intended for a first time install and not with every upgrade.

The only reason for doing the akmods --force is to rebuild drivers after their removal or a failed upgrade. Initial installs or updates of akmod-nvidia or kernels will automatically use akmods to build the drivers.

I agree. The tutorial was for Fedora 36 originally. If I compare it with now I must say that it shouldn’t create any problems in my opinion. The only difference which I’ve noticed is that some stuff which you should install via sudo dnf install (e.g. gcc, kernel-headers, etc.) is already installed so dnf skips them. So… there isn’t any real difference in my opinion compared with your suggestion with the exception akmods --force. I’ll keep that in mind.

I have a RTX3060 and it would not work with F38 KDE 1 week ago.

I tried both with Nouveau drivers in wayland and X11 as well for rpmfusion nvidia with wayland and X11.

But this weekend something changed - there is a new kernel as a minimum.

I now have F38 KDE with rpmfusion nvidia drivers working under X11.
Under wayland it causes kernel errors and crashes plasma shell. This is no change from F37.

I have these versions on my working system:

$ rpm -q kernel akmod-nvidia 
kernel-6.2.13-300.fc38.x86_64
akmod-nvidia-530.41.03-1.fc38.x86_64

This may be a wayland or plasma bug or it may be a driver issue.
Please post the output of cat /proc/cmdline and cat /etc/default/grub


This is from the working system. 
I also have F38 KDE on a USB-SSD that I use to test H/W compat before upgrading main system.

$ cat /proc/cmdline 
BOOT_IMAGE=(hd0,gpt2)/vmlinuz-6.2.13-300.fc38.x86_64 root=UUID=f160dd82-834b-4cfa-8ee7-9c159b2a1b7b ro rootflags=subvol=root rd.luks.uuid=luks-904db66b-db23-4719-bbf6-fb596c23d831 rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1 initcall_blacklist=simpledrm_platform_driver_init

I recall I did the blacklist=nouveau  to allow me to debug building and loading the rpmfusion nvidia driver
as the fallback loading of nouveau gets in the way.

$ cat /etc/default/grub
GRUB_TIMEOUT=60
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="rd.luks.uuid=luks-904db66b-db23-4719-bbf6-fb596c23d831 rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1 initcall_blacklist=simpledrm_platform_driver_init"
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=true

Do you want kernel stack traces when in wayland mode?

Those look correct AFAICT.
It would appear the nvidia drivers are properly configured and seem to load properly.
I cannot assist further in this since it seems beyond my expertise with the nvidia drivers.

I do not use wayland nor plasma so I have no experience there.

Interesting… I have a RTX3080 and it was the complete opposite in terms of Wayland/X11 in KDE. I’m currently back again on Workstation (GNOME) and must say that I didn’t had so far any real problems (beside the second ‘xrandr’ monitor bug which I can’t remove and that I can’t change my screen refresh rate w/o crashing the kernel and gnome-shell at the same time on X11.)

And the most weird thing is that, when I used KDE for a short moment, I had constant kernel crashes on X11, not Wayland. So we have/had kinda the opposite experience regarding this part.

So… since there wasn’t any further answers here I guess it’s maybe good moment to tell my recent try to use KDE Plasma[1] again. Here’s what happened:

  • Starting a Plasma session with Wayland results into constant kernel crashes which makes the system also lagging very much. In the 10 minutes I tried to use Plasma I got in abrt around 90 kernel crash logs. All of them are connected with GPU driver problems because it’s everytime the same error about this part:

crash_function: drm_gem_vmap

WARNING: CPU: 8 PID: 55388 at drivers/gpu/drm/drm_gem_shmem_helper.c:304 drm_gem_shmem_vmap+0x1a4/0x1d0
  • The active Wayland session is still ‘usuable’. But whenever I move a window, open a program, close a window, open the start menu, try to log out, etc. it starts to crash the kernel again for several seconds and produces logs in abrt.

  • Starting a Plasma session with Xorg worked funny enough for the first time. It just had few crashes (one kernel crash included). HOWEVER: After I tried to start Xorg for the second time Plasma tried to load the session very long until a little windows popped up in the top left corner, having a UI design like the first version of macOS (black/white with a font kinda like Times New Roman), which was telling me that it wasn’t able to start a Plasma session. And then I got back to the login screen.

I’m wondering at the end if there’s really nothing I could try at least to solve the problem? Or do I really have to hope that Nvidia or the kernel team (or probably both) are able to fix it one day? This issue is really just annoying…

Any suggestion is appreciated!

PS: The issue with the non-existing second monitor via xrandr still exists. After installing xrandr I only got my primary real screen on the list of active monitors. I’ve read that xrandr is capable to create virtual monitors. But I never tried that and have still this weird second virtual monitor in my settings. No chance to remove it included.


  1. GNOME runs currently well without any issues. I’m someone who wants to be able to use GNOME and KDE Plasma whenever I want because I really like both of them. ↩︎

If you are trying to use both gnome and plasma on the same system it is my understanding that does not go well for most. It seems there are some interactions between them that cause problems for both at times, regardless of which one is active.

There are threads noting this recently here, and the general consensus seems to have been that a user should only use one or the other (clean and without having both installed) to avoid the issues.

Apparently there are certain parts that both use and may have somewhat differing content such that one breaks the other. I think I recall reports that if one is using workstation and also installs the kde desktop then it is impossible to remove the kde desktop later due to dependencies. Vice versa with the kde spin and also installing the gnome desktop there.

It seems that neither desktop behaves properly when both are installed.

While I understand and agree with this general point, I can say that it also happened with a complete fresh install of the KDE spin (because I wanted to try if that runs better or not - it didn’t) without doing anything special. Beside that, I already had Fedora with GNOME and KDE installed in one installation in the past and it worked without any problems (beside some minor problems regarding themes but I wouldn’t count them here). And I doubt that GNOME and KDE make such deep problems between each other that the kernel crashs because of a ‘tainted kernel’ + pointing to a Kernel part about the GPU with drm_gem_shmem_helper.

So I still believe it’s some deep driver issue with Nvidia.

1 Like

I have done this and never seen one effect the other.

Maybe what you are referring to is that running a KDE app under Gnome or a Gnome app under KDE can be difficult. I think because of wanting to talk to DBus services that are only available on Gnome or Plasma maybe.

2 Likes

I’m running KDE Plasma Wayland on F36 kernel 6.1.18 right now (on a HP 470 G8 (/w nvidia)), and everything is working beautifully fine. However, all updates since the 6.1.18, meaning all the 6.2.xx kernels have problem with KDE Plasma, none had worked since.

For the latest try, was kernel 6.2.15; and both Wayland and X11 have some problems.

I was able to successfully boot into the login screen. However, from there problem starts.

If login in Wayland mode, it seems just deadlocked to a blank screen, and no response to any keyboard input at all… (can’t even e.g. ctrl-alt-F3 to a CL console…) The only possible response is to the power switch, it could still shutdown with that…

If login in X11 mode, it entered the desktop successfully. However, the problem come when logging out instead! It just crashed! The following text was seen:

[FAILED] Failed to start abrtd.service - ABRT Automated Bug Reporting Tool. 
[DEPEND] Dependency failed for abrt-vmcore.service - Harvest vmcores for ABRT. 
[DEPEND] Dependency failed for abrt-xorg.service - ABRT Xorg log watcher. 
[DEPEND] Dependency failed for abrt-journal-core.service - Creates ABRT problems from coredumpctl messages.
[DEPEND] Dependency failed for abrt-oops.service - ABRT kernel log watcher.

And then not even the power switch can shutdown. A hard power turn-off is the only thing possible.

So right now I’m in doubt F38 is any likely going to work with my rig at all…?? Don’t think I’d dare to try any time soon…

I’m a hardware guy using Fedora for some of my hardware works, so I’m not deep software savvy here. Just trying to share my observations. And hoping somebody sees this, and hopefully can be of help to solve the problems…

I’ve been using Fedora since f12 or 13, still love it. Guys doing this thing have done great works, and must be truly passionate developers. Kudos & appreciation as always. I hope it’ll stay as good as always been.

Do you need the Nvidia graphics? Simple workaround would be to get Fedora 38 running without the card, then if graphics is too slow, investigate support for the Nvidia card.

Does it have NVIDIA GeForce MX450 or NVIDIA GeForce MX330? Which drivers were you using (direct from Nvidea, rpmforge nonfree, nouveau)?

It’s got MX450, but the Info Center Graphics Processor: Mesa Intel® Xe Graphics… Not sure Nvidia is running… Here is what I got with more CLI:

lspci -n -n -k | grep -A 2 -e VGA -e 3D
0000:00:02.0 VGA compatible controller [0300]: Intel Corporation TigerLake-LP GT2 [Iris Xe Graphics] [8086:9a49] (rev 01)
        DeviceName: Onboard - Video
        Subsystem: Hewlett-Packard Company Device [103c:883d]
--
0000:01:00.0 3D controller [0302]: NVIDIA Corporation TU117M [GeForce MX450] [10de:1f97] (rev a1)
        Subsystem: Hewlett-Packard Company Device [103c:883d]
        Kernel driver in use: nouveau

glxinfo | grep -e OpenGL.vendor -e OpenGL.renderer
OpenGL vendor string: Intel
OpenGL renderer string: Mesa Intel(R) Xe Graphics (TGL GT2)

switcherooctl list outputs nothing I don’t know why…?

It’s running 6.1.18-100.fc36.x86_64 right now, any update after this kernel is not working.

@asicace: Your Integrated graphics (Iris XE) and Nvidis share the same Subsystem: Hewlett-Packard Company Device [103c:883d]. You may be collateral damage of the differences between Intel and Nvidea over graphics drivers.

Linux Hardware search for 10de:1f97:103c finds 11 entries, but none with your subsystem ID (883d).

Your Iris Xe Graphics] [8086:9a49] is listed in the Intel dgpu hardware table as: “Driver support for these devices is under active development.”