F34 - Kernel 5.13.9-200 - Nvidia Kernel modules not running

Hi everyone,

I am running Fedora now for around 7 years and this is the first time I am running into problems with my Nvidia graphics card.

For many years I was using the proprietary Nvidia driver packages without problems. I switched to the RPM Fusion drivers around 6 months ago which also ran without problems.

Now, after the last kernel update my Fedora wasn’t booting anymore. First, I thought the Grub config was damaged and I tried to repair it. But after doing so nothing changed. I even wasn’t able to boot into run level 3 anymore.

I thought I messed up something and as I have all my files backed up, I reinstalled Fedora from my live USB stick. It was running Wayland and the nouveau driver. So, I did what I also did before:

I activated the RPM Fusion repos and Installed the Nvidia driver packages as described on the RPM Fusion page (Howto/NVIDIA - RPM Fusion). On reboot, my laptop got stuck on a black screen (which I knew before when I was using the proprietary Nvidia driver packages, where I had to re-run the installer occasionally). So, again, I thought I messed up something and reinstalled Fedora again and ran the same things again to check for some errors. But I still got stuck with the black screen after booting.

So I tried to install the proprietary Nvidia driver package as I’ve done before. Same problem in the end.

Now, what I found out is, that it seems the Nvidia kernel modules weren’t build when installing the Nvidia packages (akmod-nvidia).

Is it possible that the current kernel 5.13.9-200 isn’t supporting the available Nvidia driver packages? Whenever a new kernel version was released, usually the Nvidia drivers where compiled on my system which I suppose isn’t happening anymore.

I am running a Nvidia Quadro M2000M graphics card in my laptop.

Best regards

Welcome to our community!

On my system running the same kernel and using the rpmfusion packages the nvidia drivers are built properly.

You have installed the nvidia driver from nvidia which does not automatically get updated and MUST be reinstalled or manually rebuilt with each kernel update. The reinstall handles the rebuild of the driver but requires manual attention.

This is one of the advantages of using the drivers from rpmfusion. The akmod-nvidia package and its supporting dependencies take care of rebuilding the kmod-nvidia package for you with each kernel or driver update so there is no hassle with needing to do it manually.

I would suggest you remove all the parts installed by the nvidia .run package then reinstall the nvidia drivers from rpmfusion using t0xic0der’s tool
https://copr.fedorainfracloud.org/coprs/t0xic0der/nvidia-auto-installer-for-fedora/

I think you will be pleased with the result as I have never needed to worry about graphics drivers since I began using rpmfusion packages for it.

Thanks for your reply.

Well, now I did exactly what you wrote on a completely fresh installation of Fedora. I installed the auto-installer from this mentioned link (https://copr.fedorainfracloud.org/coprs/t0xic0der/nvidia-auto-installer-for-fedora/) and I ran the

sudo nvautoinstall --rpmadd

and

sudo nvautoinstall --driver

commands.

After waiting a few minutes, I rebooted the laptop and again, i am running into a black screen.

I haven’t installed anything else on this computer. Also, I ran two hardware checks, yesterday and today and both said the graphics card is completely fine without any problems.

I absolutely don’t know what’s the problem right now. Also, if the card wouldn’t work then obviously it also shouldn’t run with the nouveau driver that is installed during the first Fedora setup. Isn’t it?

Maybe it is related to exactly what got installed. Can you do a “Ctrl-Alt-F3” and get to the text screen?

If you can do that then please post the output of the command
"sudo dnf list installed ‘*nvidia*’ " It can be redirected to a file then the file can be used to send the output.

I just checked if I can start the alternate kernel version 5.11.12-300 from the Grub boot menu and the splash screen says “Nvidia kernel module is missing. Falling back to nouveau”. :man_shrugging:

That’s one step further than booting with the 5.13.9-200 kernel.

At least it boots, just not with the nvidia driver, and I suspect it may be missing something needed to build the driver properly.

This is the output:

akmod-nvidia.x86_64                                       3:470.57.02-1.fc34                 @rpmfusion-nonfree-nvidia-driver
kmod-nvidia-5.13.9-200.fc34.x86_64.x86_64                 3:470.57.02-1.fc34                 @@commandline                   
nvidia-settings.x86_64                                    3:470.57.02-1.fc34                 @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia.x86_64                                3:470.57.02-1.fc34                 @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-kmodsrc.x86_64                        3:470.57.02-1.fc34                 @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-libs.i686                             3:470.57.02-1.fc34                 @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-libs.x86_64                           3:470.57.02-1.fc34                 @rpmfusion-nonfree-nvidia-driver

Do you have the kernel-devel and kernel-headers packages?

Also I note you are missing several packages from my list. (I do have more than the minimum though.)
Specifically nvidia-persistenced nvidia-modprobe nvidia-xconfig xorg-x11-drv-nvidia-cuda xorg-x11-drv-nvidia-cuda-libs xorg-x11-drv-nvidia-devel

I also note that the kmod-nvidia package was actually built, but apparently the module is not getting loaded.

Try installing the other packages and see what the results are. All are available from rpmfusion so they should install with just a "dnf install … " command

One more thing to look at, when booting select the 5.13.9 kernel and press “e” for edit, then look at what is on the line beginning with “linux”. It is possible that did not get properly reconfigured and it should look something like this

linux "rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1 rd.lvm.lv=fedora/root rhgb quiet"

So, I checked the Grub menu and for the kernel 5.13.9-200 I determined the following:

With the Nvidia driver settings I even can’t boot into runlevel 3. As the display drivers are not loaded in runlevel 3, I suppose that there is some kernel problem in the previous steps when the Nvidia drivers are (pre)loaded?

But when I remove the rd.driver.blacklist=nouveau and the modprobe.blacklist=nouveau options from the Grub boot menu, I am able to boot into runlevel 3 without problems.

With what you just said, I suggest that from run level 3 after you install the extra packages I suggested above you should try a reinstall of the kernel to force everything to rebuild.
“sudo dnf reinstall kernel*5.13.9”
Hopefully that with the extra packages will fix the issue with not loading the video drivers.

I booted into runlevel 3 and installed all of the packages mentioned above. Also I reinstalled the kernel*5.13.9* Package which reinstalled:

  • kernel
  • kernel-core
  • kernel-devel
  • kernel-modules
  • kernel-modules-extra

kernel-headers is already installed.

Also, the grub bootloader line looks like it should, blacklisting the nouveau drivers. Ans still I am running into the black screen when booting the kernel with the Nvidia drivers.

I found this thread here: https://discussion.fedoraproject.org/t/fedora-34-nvidia-kernel-module-missing-falling-back-to-nouveau/68831

I tried what there was supposed to do but nothing changes.

try, when booted to run level 3,
“cat /etc/default/grub” and post the content.
Also post the output of
"lsmod | grep -e “nvidia|nouveau”
and
“uname -a”

Edit:
One thing I had not considered since you have been using nvidia in the past was “secure boot”
Please make certain that is disabled then try again.

Also, you can watch the progress of the boot and see where it ends by removing “rhgb quiet” from the end of the linux command line in grub so it displays the progress as text.

Try stepping through the latter part of that post, one thing at a time. There are comments about several different things, and one is all it takes if you get the correct one.

Also, would you please post the output of “inxi -Gxx” so we can see the exact info that it provides about the GPU(s). You may need to install inxi.

Here are the outputs you requested.

This is the content of /etc/default/grub:

GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1 rhgb quiet rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1 nvidia-drm.modset=1"
GRUB_DISABLE_RECOVERY="true"
GRUB_ENABLE_BLSCFG=true

I’ve seen, that there are multiple entries but as far as I can say, they were multiplied when I reinstalled some packages.

The output of "lsmod | grep -e “nvidia|nouveau” is completely empty.

The output of uname -a:

Linux fedora 5.13.9-200.fc34.x86_64 #1 SMP Sun Aug 8 14:34:00 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

And the output of inxi -Gxx:

Graphics:  Device-1: NVIDIA GM107GLM [Quadro M2000M] vendor: Dell driver: nouveau v: kernel bus-ID: 01:00.0 chip-ID: 10de:13b0 
           Device-2: Realtek Integrated_Webcam_HD type: USB driver: uvcvideo bus-ID: 1-11:5 chip-ID: 0bda:57c3 
           Display: server: X.org 1.20.11 driver: loaded: nouveau note: n/a (using device driver) - try sudo/root tty: 240x67 
           Message: Advanced graphics data unavailable in console. Try -G --display 

I already removed rhgb quiet from the grub config line but the output at the boot sequence is so fast, that it’s impossible to read anything. And at the end the output completely disappears.

  1. I missed on the grep command. That should have been “lsmod | grep -e nouveau -e nvidia”
  2. The line in /etc/default/grub should be (yes I have seen the entries repeated with each reinstall):
GRUB_CMDLINE_LINUX="rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1 rhgb quiet"

since it should only have the entries in there once. Once you reduce it to the given then you will need to run “grub2-mkconfig -o /etc/grub2-efi.cfg” to recreate the grub.cfg file and eliminate grub having multiple entries of the same thing on the command line.

Inxi clearly shows that the nouveau driver is loaded, so I am sure that the lsmod command above will only return nouveau results.

Apparently your laptop only has the one GPU, unless for some reason the system is not seeing the IGP. If it is supposed to have two like most of the newer ones do then maybe post the results of “lspci -nnk” so we can see if there actually are 2 GPUs.

Also, did you by chance edit /etc/gdm/custom.conf as suggested and make sure that “#WaylandEnable=false” was changed to “WaylandEnable=false”.? I doubt that is needed (mine is not) but for some it seems so.

This is the output of the lsmod command:

nouveau              2400256  1
drm_ttm_helper         16384  1 nouveau
ttm                    77824  2 drm_ttm_helper,nouveau
i2c_algo_bit           16384  1 nouveau
drm_kms_helper        290816  1 nouveau
mxm_wmi                16384  1 nouveau
drm                   630784  5 drm_kms_helper,drm_ttm_helper,ttm,nouveau
wmi                    36864  7 intel_wmi_thunderbolt,dell_wmi,wmi_bmof,dell_smbios,dell_wmi_descriptor,mxm_wmi,nouveau
video                  57344  3 dell_wmi,dell_laptop,nouveau

since it should only have the entries in there once. Once you reduce it to the given then you will need to run “grub2-mkconfig -o /etc/grub2-efi.cfg” to recreate the grub.cfg file and eliminate grub having multiple entries of the same thing on the command line.

Yep, I edited the grub file and recreated the config file. Looks good now as it should.

Apparently your laptop only has the one GPU, unless for some reason the system is not seeing the IGP. If it is supposed to have two like most of the newer ones do then maybe post the results of “lspci -nnk” so we can see if there actually are 2 GPUs.

Here’s the output of the lspci command:

00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers [8086:1910] (rev 07)
	Subsystem: Dell Device [1028:06d9]
	Kernel driver in use: skl_uncore
00:01.0 PCI bridge [0604]: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 07)
	Kernel driver in use: pcieport
00:04.0 Signal processing controller [1180]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem [8086:1903] (rev 07)
	Subsystem: Dell Device [1028:06d9]
	Kernel driver in use: proc_thermal
	Kernel modules: processor_thermal_device
00:14.0 USB controller [0c03]: Intel Corporation 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller [8086:a12f] (rev 31)
	Subsystem: Dell Device [1028:06d9]
	Kernel driver in use: xhci_hcd
00:14.2 Signal processing controller [1180]: Intel Corporation 100 Series/C230 Series Chipset Family Thermal Subsystem [8086:a131] (rev 31)
	Subsystem: Dell Device [1028:06d9]
	Kernel driver in use: intel_pch_thermal
	Kernel modules: intel_pch_thermal
00:16.0 Communication controller [0780]: Intel Corporation 100 Series/C230 Series Chipset Family MEI Controller #1 [8086:a13a] (rev 31)
	Subsystem: Dell Device [1028:06d9]
	Kernel driver in use: mei_me
	Kernel modules: mei_me
00:17.0 SATA controller [0106]: Intel Corporation Q170/Q150/B150/H170/H110/Z170/CM236 Chipset SATA Controller [AHCI Mode] [8086:a102] (rev 31)
	Subsystem: Dell Device [1028:06d9]
	Kernel driver in use: ahci
00:1c.0 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #2 [8086:a111] (rev f1)
	Kernel driver in use: pcieport
00:1c.2 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #3 [8086:a112] (rev f1)
	Kernel driver in use: pcieport
00:1c.4 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #5 [8086:a114] (rev f1)
	Kernel driver in use: pcieport
00:1d.0 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #9 [8086:a118] (rev f1)
	Kernel driver in use: pcieport
00:1f.0 ISA bridge [0601]: Intel Corporation CM236 Chipset LPC/eSPI Controller [8086:a150] (rev 31)
	Subsystem: Dell Device [1028:06d9]
00:1f.2 Memory controller [0580]: Intel Corporation 100 Series/C230 Series Chipset Family Power Management Controller [8086:a121] (rev 31)
	Subsystem: Dell Device [1028:06d9]
00:1f.3 Audio device [0403]: Intel Corporation 100 Series/C230 Series Chipset Family HD Audio Controller [8086:a170] (rev 31)
	Subsystem: Dell Device [1028:06d9]
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel
00:1f.4 SMBus [0c05]: Intel Corporation 100 Series/C230 Series Chipset Family SMBus [8086:a123] (rev 31)
	Subsystem: Dell Device [1028:06d9]
	Kernel driver in use: i801_smbus
	Kernel modules: i2c_i801
00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (2) I219-LM [8086:15b7] (rev 31)
	Subsystem: Dell Device [1028:06d9]
	Kernel driver in use: e1000e
	Kernel modules: e1000e
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM107GLM [Quadro M2000M] [10de:13b0] (rev a2)
	Subsystem: Dell Device [1028:16d9]
	Kernel driver in use: nouveau
	Kernel modules: nouveau
01:00.1 Audio device [0403]: NVIDIA Corporation GM107 High Definition Audio Controller [GeForce 940MX] [10de:0fbc] (rev a1)
	Subsystem: Dell Device [1028:16d9]
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel
02:00.0 Network controller [0280]: Intel Corporation Wireless 8260 [8086:24f3] (rev 3a)
	Subsystem: Intel Corporation Device [8086:0050]
	Kernel driver in use: iwlwifi
	Kernel modules: iwlwifi
03:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader [10ec:525a] (rev 01)
	Subsystem: Dell Device [1028:06d9]
	Kernel driver in use: rtsx_pci
	Kernel modules: rtsx_pci
3d:00.0 Non-Volatile memory controller [0108]: Toshiba Corporation XG4 NVMe SSD Controller [1179:0115] (rev 01)
	Subsystem: Toshiba Corporation Device [1179:0001]
	Kernel driver in use: nvme
	Kernel modules: nvme

Also, did you by chance edit /etc/gdm/custom.conf as suggested and make sure that “#WaylandEnable=false” was changed to “WaylandEnable=false”.? I doubt that is needed (mine is not) but for some it seems so.

Yes, I already activated this in the /etc/gdm/custom.conf file but showed no changes. Also, when running the older Kernel versions before I never had to activate this part here.

Apparently your laptop only has the one GPU, unless for some reason the system is not seeing the IGP. If it is supposed to have two like most of the newer ones do then maybe post the results of “lspci -nnk” so we can see if there actually are 2 GPUs.

Afaik there is only the Nvidia Quadro GPU. I’ve never seen any systems information that there is an IGP in my system. When the Nvidia drivers refused to re-compile (which occurred two or three times in the last years) then the systems info would show that the graphics were running on LLVM Pipe which showed it was software rendering mode.

It seems you have tried almost everything, so lets approach this differently if you are willing.

  1. install fedora fresh
  2. do a full “dnf upgrade” and reboot to ensure it is booting properly and using the video correctly.
  3. enable and install the nvidia drivers using the copr link above.
  4. reboot

Hopefully this sequence will enable the drivers with no further problems. If this fails then I really have no further suggestions since this sequence is intended to do the install and get the video working properly without interference from any other software installation attempts.

Ok, I will give this another try.

Just to be clear on two things:

  1. From the copr link, all I need is to run after installation is:
sudo nvautoinstall --rpmadd

and

sudo nvautoinstall --driver

right?

  1. My GPU seems to be working, as it is recognized by the system as seen in the output of the lspci command and also it seems to work with the nouveau driver. Also, the hardware test showed absolutely no problems and at last it would be obvious to see if the GPU had some hardware damage.

Right?