I have seen this question gets asked a lot, but I believe I might be in a slightly different situation from the questions I’ve seen.
I have fedora 35, and I had installed the nvidia drivers successfully. My use case is deep learning, specifically pytorch. I know a lot of people who do deep learning install
cuda-toolkit, which was required to use functorch. I tried doing that, but was unsuccessful, so I just gave up, although I might have inadvertently installed something that is causing the current problem.
Also, it is worth mentioning I never turn off my that computer, until today. When I turned it off, there were some errors that I could only glanced because they passed quickly. After turning it back on, then I was trying to use the GPU, but got an error. Currently,
nvidia-smi throws the following:
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
When I run
lspci | grep VGA I get:
0a:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti] (rev a1) 41:00.0 VGA compatible controller: NVIDIA Corporation TU102 [GeForce RTX 2080 Ti] (rev a1)
which are the two GPUs in my computer.
Also, when I run
dnf list installed \*nvidia\* I get:
akmod-nvidia.x86_64 3:495.44-1.fc35 @rpmfusion-nonfree-nvidia-driver kmod-nvidia-5.14.18-300.fc35.x86_64.x86_64 3:495.44-1.fc35 @@commandline nvidia-gpu-firmware.noarch 20221012-141.fc35 @updates nvidia-persistenced.x86_64 3:495.44-1.fc35 @rpmfusion-nonfree-nvidia-driver nvidia-settings.x86_64 3:495.44-1.fc35 @rpmfusion-nonfree-nvidia-driver xorg-x11-drv-nvidia.x86_64 3:495.44-4.fc35 @rpmfusion-nonfree-nvidia-driver xorg-x11-drv-nvidia-cuda.x86_64 3:495.44-4.fc35 @rpmfusion-nonfree-nvidia-driver xorg-x11-drv-nvidia-cuda-libs.x86_64 3:495.44-4.fc35 @rpmfusion-nonfree-nvidia-driver xorg-x11-drv-nvidia-kmodsrc.x86_64 3:520.56.06-1.fc35 @rpmfusion-nonfree-nvidia-driver xorg-x11-drv-nvidia-libs.x86_64 3:495.44-4.fc35 @rpmfusion-nonfree-nvidia-driver
I saw that a common solution was to disable secureboot, but the command
sudo mokutil --sb-state throws the error:
EFI variables are not supported on this system. And I am not sure that is my problem.
What are my options here? is there a way to try to fix my current drivers? or is it easier to try to uninstall and install again? Thanks!