I have seen this question gets asked a lot, but I believe I might be in a slightly different situation from the questions I’ve seen.
I have fedora 35, and I had installed the nvidia drivers successfully. My use case is deep learning, specifically pytorch. I know a lot of people who do deep learning install cuda-toolkit
, which was required to use functorch. I tried doing that, but was unsuccessful, so I just gave up, although I might have inadvertently installed something that is causing the current problem.
Also, it is worth mentioning I never turn off my that computer, until today. When I turned it off, there were some errors that I could only glanced because they passed quickly. After turning it back on, then I was trying to use the GPU, but got an error. Currently, nvidia-smi
throws the following:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
When I run lspci | grep VGA
I get:
0a:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti] (rev a1)
41:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti] (rev a1)
which are the two GPUs in my computer.
Also, when I run dnf list installed \*nvidia\*
I get:
akmod-nvidia.x86_64 3:495.44-1.fc35 @rpmfusion-nonfree-nvidia-driver
kmod-nvidia-5.14.18-300.fc35.x86_64.x86_64 3:495.44-1.fc35 @@commandline
nvidia-gpu-firmware.noarch 20221012-141.fc35 @updates
nvidia-persistenced.x86_64 3:495.44-1.fc35 @rpmfusion-nonfree-nvidia-driver
nvidia-settings.x86_64 3:495.44-1.fc35 @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia.x86_64 3:495.44-4.fc35 @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-cuda.x86_64 3:495.44-4.fc35 @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-cuda-libs.x86_64 3:495.44-4.fc35 @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-kmodsrc.x86_64 3:520.56.06-1.fc35 @rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-libs.x86_64 3:495.44-4.fc35 @rpmfusion-nonfree-nvidia-driver
I saw that a common solution was to disable secureboot, but the command sudo mokutil --sb-state
throws the error: EFI variables are not supported on this system
. And I am not sure that is my problem.
What are my options here? is there a way to try to fix my current drivers? or is it easier to try to uninstall and install again? Thanks!