I am trying to install nvidia drivers on Fedora 37. When I try to install using “sudo dnf -y module install nvidia-driver:latest-dkms” the installation succeeds but the driver is not running. When I execute the command nvidia-smi
it responds with NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
What do you mean it succeeds? The fact that the driver does not load and is not running clearly shows it did not succeed. In fedora the recommended source for the nvidia driver is rpmfusion repo and the installation procedure is using dnf.
The rpmfusion repo can be enabled from the gnome software app or by following directions here. Installation and compiling the driver is a simple dnf install akmod-nvidia xorg-x11-drv-nvidia-cuda
which installs the packages, uses akmods to compile the modules, and also installs cuda for GPU computing with nvidia. Fedora does not use dkms to compile most kernel modules.
When I said it “succeeded” I mean the command completed without error. The dump after executing the install command:
Installed:
cuda-drivers-525.60.13-1.x86_64
kmod-nvidia-latest-dkms-3:525.60.13-1.fc36.x86_64
nvidia-driver-3:525.60.13-1.fc36.x86_64
nvidia-driver-NVML-3:525.60.13-1.fc36.x86_64
nvidia-driver-NvFBCOpenGL-3:525.60.13-1.fc36.x86_64
nvidia-driver-cuda-3:525.60.13-1.fc36.x86_64
nvidia-driver-cuda-libs-3:525.60.13-1.fc36.x86_64
nvidia-driver-devel-3:525.60.13-1.fc36.x86_64
nvidia-driver-libs-3:525.60.13-1.fc36.x86_64
nvidia-kmod-common-3:525.60.13-1.fc36.noarch
nvidia-libXNVCtrl-3:525.60.13-1.fc36.x86_64
nvidia-libXNVCtrl-devel-3:525.60.13-1.fc36.x86_64
nvidia-modprobe-3:525.60.13-1.fc36.x86_64
nvidia-persistenced-3:525.60.13-1.fc36.x86_64
nvidia-settings-3:525.60.13-1.fc36.x86_64
nvidia-xconfig-3:525.60.13-1.fc36.x86_64
Complete!
I will try removing the files installed and using the command you suggested rather than the instructions here. I’m on fedora 37
There are 2 problems there.
- Fedora still only has the 525.60.06 driver version from rpmfusion and you show the 525.60.13 version.
- You said you are on F37 but that list shows only F36 packages (and they do not appear to have come from rpmfusion) so the packages do not match the fedora release version.
What repo were they installed from? That link shows a completely 3rd party repo. Even though it seems a cuda site it is not fully integrated nor tested with fedora as the ones from rpmfusion are.
I would suggest that you first remove all those nvidia packages and cuda (but do not remove the nvidia-gpu-firmware package)
dnf remove *nvidia* --exclude=nvidia-gpu-firmware cuda-drivers
Then after verifying that you have the rpmfusion-nonfree-nvidia-driver or the rpmfusion-nonfree & rpmfusion-nonfree-updates repos enabled then install the nvidia drivers and cuda from there.
dnf install akmod-nvidia xorg-x11-drv-nvidia-cuda
If there is a conflict you may need to disable the other repo you have been using.
That command should install the packages and compile the matching kernel modules. It also will enable automatic updates of both driver versions and modules to match the updates of kernel versions. There are a lot of users who have nvidia GPUs and this works very well for the great majority (myself included).
After the install completes wait 5 to 10 minutes then reboot and things should work for you.
After following those instructions, the result of dnf list installed *nvidia*
is the following:
Installed Packages
akmod-nvidia.x86_64 3:520.56.06-1.fc37 @rpmfusion-nonfree
kmod-nvidia-6.0.12-300.fc37.x86_64.x86_64 3:520.56.06-1.fc37 @@commandline
nvidia-persistenced.x86_64 3:520.56.06-1.fc37 @rpmfusion-nonfree
nvidia-settings.x86_64 3:520.56.06-1.fc37 @rpmfusion-nonfree
xorg-x11-drv-nvidia.x86_64 3:520.56.06-1.fc37 @rpmfusion-nonfree
xorg-x11-drv-nvidia-cuda.x86_64 3:520.56.06-1.fc37 @rpmfusion-nonfree
xorg-x11-drv-nvidia-cuda-libs.x86_64 3:520.56.06-1.fc37 @rpmfusion-nonfree
xorg-x11-drv-nvidia-kmodsrc.x86_64 3:520.56.06-1.fc37 @rpmfusion-nonfree
xorg-x11-drv-nvidia-libs.x86_64 3:520.56.06-1.fc37 @rpmfusion-nonfree
xorg-x11-drv-nvidia-power.x86_64 3:520.56.06-1.fc37 @rpmfusion-nonfree
It seems to me that both problems have been resolved, since the driver version is less than 525.60.13 and the suffix indicates that I have a version verified for fedora 37.
However, the output of nvidia-smi
is still NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
Are you sure the nvidia drivers are loaded and running?
lsmod | grep nvidia
should show about 4 lines of output.
On my system I get this
# lsmod | grep nvidia
nvidia_drm 73728 9
nvidia_modeset 1187840 12 nvidia_drm
nvidia_uvm 2859008 4
nvidia 55250944 1206 nvidia_uvm,nvidia_modeset
and this
# dnf provides */nvidia-smi
Last metadata expiration check: 0:04:21 ago on Tue 13 Dec 2022 09:50:56 AM CST.
xorg-x11-drv-nvidia-cuda-3:520.56.06-1.fc37.x86_64 : CUDA driver for xorg-x11-drv-nvidia
Repo : @System
Matched from:
Filename : /usr/bin/nvidia-smi
xorg-x11-drv-nvidia-cuda-3:520.56.06-1.fc37.x86_64 : CUDA driver for xorg-x11-drv-nvidia
Repo : @System
Matched from:
Filename : /usr/bin/nvidia-smi
xorg-x11-drv-nvidia-cuda-3:520.56.06-1.fc37.x86_64 : CUDA driver for xorg-x11-drv-nvidia
Repo : rpmfusion-nonfree
Matched from:
Filename : /usr/bin/nvidia-smi
and this
# ls -l /usr/bin/nvidia-smi
-rwxr-xr-x. 1 root root 600760 Oct 6 16:22 /usr/bin/nvidia-smi
It seems quite possible that the /usr/bin/nvidia-smi file may not have been removed and thus not properly reinstalled when you removed the packages installed earlier then reinstalled from rpmfusion.
I would suggest that you manually first rename the nvidia-smi file then reinstall to ensure a matching file is in place.
sudo mv /usr/bin/nvidia-smi /usr/bin/nvidia-smi.old
sudo dnf reinstall xorg-x11-drv-nvidia-cuda
then look and verify that file was newly installed
ls -l /usr/bin/nvidia-smi
It also is possible that the install from the other site placed the nvidia-smi file in a different location and that it is being found in your path before the official /usr/bin/nvidia-smi command is reached.
locate nvidia-smi
or which nvidia-smi
may reveal that, as would find / -name nvidia-smi 2>/dev/null
# which nvidia-smi
/usr/bin/nvidia-smi
# locate nvidia-smi
/usr/bin/nvidia-smi
/usr/share/doc/xorg-x11-drv-nvidia/html/nvidia-smi.html
/usr/share/man/man1/nvidia-smi.1.gz
# find / -name nvidia-smi 2>/dev/null
/usr/bin/nvidia-smi
echo $PATH
would show the order in which directories are searched for commands since the path is searched left to right.
I reinstalled nvidia-smi and the output of which nvidia-smi
, locate nvidia-smi
, and find / -name nvidia-smi 2>/dev/null
all match your output.
Output of path:
(base) [steffi@fedora ~]$ echo $PATH
/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/compilers/bin:/usr/local/cuda-11.8/bin:/home/steffi/miniconda3/bin:/home/steffi/miniconda3/condabin:/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/compilers/bin:/usr/local/cuda-11.8/bin:/home/steffi/.local/bin:/home/steffi/bin:/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/var/lib/snapd/snap/bin
I had added paths to my cuda libraries to support an application I’m supporting (My drivers were working a few days ago and then perhaps as a result as an automatic update stopped working).
I don’t think there are any driver binaries in the cuda lib directories that might cause a driver mismatch.
When I reboot I get a message “NVIDIA kernel module missing. Falling back to nouveau.”
I should note that although the drivers are “installed” they are not loaded and running:
(base) [steffi@fedora ~]$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
(base) [steffi@fedora ~]$ dnf list installed *nvidia*
Installed Packages
akmod-nvidia.x86_64 3:520.56.06-1.fc37 @rpmfusion-nonfree
kmod-nvidia-6.0.12-300.fc37.x86_64.x86_64 3:520.56.06-1.fc37 @@commandline
kmod-nvidia-6.0.7-301.fc37.x86_64.x86_64 3:520.56.06-1.fc37 @@commandline
nvidia-persistenced.x86_64 3:520.56.06-1.fc37 @rpmfusion-nonfree
nvidia-settings.x86_64 3:520.56.06-1.fc37 @rpmfusion-nonfree
xorg-x11-drv-nvidia.x86_64 3:520.56.06-1.fc37 @rpmfusion-nonfree
xorg-x11-drv-nvidia-cuda.x86_64 3:520.56.06-1.fc37 @rpmfusion-nonfree
xorg-x11-drv-nvidia-cuda-libs.x86_64 3:520.56.06-1.fc37 @rpmfusion-nonfree
xorg-x11-drv-nvidia-kmodsrc.x86_64 3:520.56.06-1.fc37 @rpmfusion-nonfree
xorg-x11-drv-nvidia-libs.x86_64 3:520.56.06-1.fc37 @rpmfusion-nonfree
xorg-x11-drv-nvidia-power.x86_64 3:520.56.06-1.fc37 @rpmfusion-nonfree
(base) [steffi@fedora ~]$ lsmod | grep nvidia
(base) [steffi@fedora ~]$
Are you by chance using secure boot? dmesg | grep -iE 'secure|nvidia'
I note in the output of the installed nvidia packages that you do not show the nvidia-gpu-firmware package which is required. If dnf list installed *firmware*
does not show it as already installed then dnf install nvidia-gpu-firmware
or dnf reinstall linux-firmware
should fix that then a reboot to load the firmware and activate the driver.
Nvidia-smi will fail if the driver is not loaded. We need to get that fixed first.
Thank you that worked! (disabled secure boot (it had previously been disabled…not sure why after some updates it was re-enabled)) and installed the firmware.
replying because this is the top google search result for “fedora NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver.”
what solved it for me was doing the opposite of the steps here: How do I prevent a kernel module from loading automatically? - Red Hat Customer Portal
since i had secure boot disabled i didn’t need to deal with it.
then i ran as stated in one of the replies dmesg | grep -iE 'secure|nvidia
as sudo and found that my kernel would be “tainted” because i had nouveau running and it was running before the nvidia driver could run. so i ran this command as sudo:
grubby --args "rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1 --update-kernel DEFAULT
this blacklists the nouveau drivers allowing the nvidia ones to run.
you might want to replace DEFAULT with the kernel you are running currently. For that you might want to see this page: Chapter 7. Making persistent changes to the GRUB boot loader | Red Hat Product Documentation so you would run as sudo grubby --info=ALL
and then look at the line that starts with the word kernel, followed by an equals sign, a quote mark and a path that begins like /boot/vmlinuz, and ends in for example x86_64, then on the command grubby --args "rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1 --update-kernel DEFAULT
you would replace DEFAULT with whatever the path actually is, like /boot/vmlinuz-6.11.4-201.fc40.x86_64for example.
I would instead replace DEFAULT with ALL so that all the installed kernels receive the change and so all kernel updates also get that change in the future.
This option isn’t useful if your using the rpmfusion non-legacy driver, the rpmfusion driver has this enabled internally in the module.
I had to do this as the fedora kernel devs perverted it’s meaning, they did this so nvidia could work (badly) with simpledrm.
I use nvidia fbdev instead to evict simpledrm, this works better than the ugly kernel hack.