Nvidia smi: Failed to initialize NVML

Hi, I’m facing issue with nvml mismatch and nvidia driver.
when run nvidia-smi gives this output:

Failed to initialize NVML: Driver/library version mismatch
NVML library version: 535.98

but I noticed something wired after I had installed the nvidia driver when run
cat /proc/driver/nvidia/version gives this output:

NVRM version: NVIDIA UNIX x86_64 Kernel Module 535.104.05 Sat Aug 19 01:15:15 UTC 2023
GCC version: gcc version 13.2.1 20230728 (Red Hat 13.2.1-1) (GCC)

I think there is problem in installation process since NVML version is 535.98 and NVRM 535.104.05 hence the mismatch problem.
Here nvidia setting and about the about showed that Nvidia driver take place but not the case for nvidia setting left me confused :sweat_smile:



Thank you for your help in advance and excuse my English.

This shows a clear mismatch.
Is it possible that you have (or have had) nvidia drivers installed directly from nvidia.com as well as from rpmfusion? I have installed only from rpmfusion and I see

# cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  535.98  Tue Aug  1 21:42:05 UTC 2023
GCC version:  gcc version 13.2.1 20230728 (Red Hat 13.2.1-1) (GCC) 

Yeah that’s right at the first try to install nvidia driver I tried to install it from nvidia.com but it cused freezing the laptop at the logo and I can’t figure out why this happened so I set the grub to nomodeset and removed all nvidia drivers but I see the uninstall process not got the job done !

I think that an uninstall of the drivers installed from nvidia requires using the command something like the install command but adding --uninstall after the .run file name. I have not used drivers directly from nvidia for some time so am not 100% certain of that.

This link may help but is written for debian/ubuntu based systems and the commands would need to be modified for use on fedora. I will try to suggest the changes below.

How to Uninstall NVIDIA Drivers from Linux
To uninstall NVIDIA drivers from a Fedora Linux operating system, you can use the command line.
( I assume that you are using Workstation and gnome with this.)

Open a terminal window by pressing CTRL+ALT+F3 and log in
Stop the X server by running the command sudo systemctl stop gdm.service
Remove the NVIDIA driver packages by running the command sudo dnf remove \*nvidia\* --exclude nvidia-gpu-firmware
Remove any remaining NVIDIA driver kernel modules by running the command sudo rm -r /lib/modules/$(uname -r)/extra/nvidia
Reinstall the the NVIDIA drivers by running the command sudo dnf install akmod-nvidia xorg-x11-drv-nvidia-cuda (this should show the packages being installed from either the rpmfusion-nonfree or rpmfusion-nonfree-nvidia-driver repos. If not then stop here and do not complete)
Wait at least 5 minutes for the compile and reinstall of the kernel modules to complete.
This will be shown when dnf list installed kmod-nvidia-\* shows the package related to the currently running kernel.
Reboot