Nvidia and cuda on fedora

Hi

i am trying to install nvidia driver with cuda
and I was a little confused which one is better way

i found a way here

and when i run this command sudo dnf -y module install nvidia-driver:latest-dkms
the nvidia driver was installed and after reboot it
i figure out from nvidia-smi output that the whole gnome shell and xorg and all app is running on nvidia

but there is an another way to install it with
https://rpmfusion.org/Howto/NVIDIA

and when i installed akmod-nvidia and xorg-x11-drv-nvidia-cuda only apps are rununing on Nvidia when I run these commands __GLX_VENDOR_LIBRARY_NAME=nvidia __NV_PRIME_RENDER_OFFLOAD=1

Which way is more optimal?
please give me an explanation about these two ways
and how can i install cuda for the second way

and i use wayland btw

Thanks

That is not the recommended way to install nvidia drivers and cuda on fedora.
If one looks into the gnome-software app there is an option to enable 3rd party repos, one of which is the rpmfusion-nonfree-nvidia-driver repo.

The rpmfusion repo contains all the nvidia and cuda packages needed for fedora and they are specifically tweaked and packaged to work with fedora.

My suggestion is simple:

  1. remove everything you may have installed with the instructions from nvidia.
  2. disable the module that was installed in that process with
    sudo dnf module disable nvidia-driver
  3. install the packages from rpmfusion only with
    sudo dnf install akmod-nvidia xorg-x11-drv-nvidia-cuda

Following the completion of step 3 wait at least 5 minutes while the system compiles and installs the nvidia modules in the background, then reboot.

Confirm the nvidia drivers were properly loaded with lsmod | grep nvidia which should show several lines of output.
Now it all should just work for you.

BTW, the cuda driver installed in this way is at version 12.2 as we speak.

$ nvidia-smi
Tue Aug  8 15:55:52 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.05              Driver Version: 535.86.05    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3050        Off | 00000000:06:00.0  On |                  N/A |
| 30%   59C    P2             104W / 130W |   1097MiB /  8192MiB |    100%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

Wait. The official way to install Cuda is to use an RPM Fusion repo rather than the Nvidia ones as per the RPM Fusion HOWTO?

Howto/CUDA - RPM Fusion

The HOWTO points to Nvidia repos. Maybe something changed. So is the HOWTO out of date, or are the directions in @computersavvy 's message above out of date?

YES
That link is for the cuda toolkit and not for the cuda drivers. (it is also somewhat dated so does not apply in all cases)

Depending upon the hardware and kernels involved I have had some problems with the latest nvidia drivers and cuda.
On my main system I have an RTX 3050 GPU and am able to use the latest drivers (545.29.06) and cuda 12.3.
On my second system I have 2 GTX 1050 GPUs and they will not run with cuda 12.3 so that system is downgraded to the nvidia 535.129.03 driver with cuda 12.2.

Use the latest driver that works for you.
The 535.129.03 driver is in the rpmfusion-nonfree repo and the 545.29.06 driver is in both the rpmfusion-nonfree-updates and the rpmfusion-nonfree-nvidia-driver repos.

If you note the date when my instructions were posted it was before F39 was released and versions in the repos have been updated since. However if all you are doing is routine apps running cuda my instructions are 100% valid.

The link you posted is for the cuda toolkit, not for the cuda drivers that work with the nvidia drivers. Unless you specifically need the toolkit the package xorg-x11-drv-nvidia-cuda is probably all you need and that comes from the rpmfusion repos.

I have apps that run cuda and I have never required the toolkit be installed.

https://rpmfusion.org/Howto/CUDA?highlight=(\bCategoryHowto\b)#RPM_Fusion_CUDA

shows this

Community repositories
RPM Fusion CUDA
This repository aims to receive content dedicated for CUDA and is built with the official cuda releases. Only available for Fedora (latest supported CUDA release) so far and is still a work in progress...

and this

Which driver Package
Both "CUDA" and "RPM Fusion" repositories provide the nvidia driver packages. Unfortunately, the packaging method is way too different and can conflicts. 
We recommends to use the publicly and community based packaging method (RPM Fusion) and avoid the NVIDIA packaged nvidia-driver. From time to time, NVIDIA uses non-publictly released driver, so you will have to wait for a public driver for the RPM Fusion counterpart...

Hmm. I didn’t notice it when installing the basic Nvidia drivers, but there is a Cuda section there:

https://rpmfusion.org/Howto/NVIDIA#CUDA

Which is more or less what you said. That said, I don’t understand some things. The Cuda HOWTO I mentioned earlier disables what I assume is the regular Nvidia driver, and when I searched to see why it would do that all the hits basically say it’s going to install its own cuda driver:

sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/fedora37/x86_64/cuda-fedora37.repo
sudo dnf clean all
sudo dnf module disable nvidia-driver
sudo dnf -y install cuda

Is nvidia-driver the name of the driver from Nvidia? None of the ones I have from RPM Fusion have that name? That would make more sense than what I assumed.

The other thing I don’t understand is the mention of the “rpmfusion-nonfree-nvidia-driver” repo. The Nvidia HOWTO didn’t have me install that repo, and I don’t see it mentioned elsewhere. And dnf has the xorg-x11-drv-nvidia-cuda package already, so it must be in the repos I installed. I’m new to Fedora, is there some context or terminology I’m just missing? If so, educate me.

In any event, it sounds like the basic idea to install that package, and possibly the toolkits (actually the one who might need them is my son, who is studing AI somewhat obsessively) if they have something specific. Mainly the idea is to run pytorch, and I’m not sure the toolkits are of use there. But I’d like to get the rest of it straight, because I’m clearly missing some things I should know now that I’m on Fedora.

No
That is a module name that comes from installing from the cuda-fedoraNN repo and conflicts with installing from the rpmfusion repo.

For consistency please try and choose only one source for the nvidia drivers and cuda. The packages from rpmfusion are specifically tweaked and packaged to work with fedora and mixing packages from other sources is often cause for further problems.

You can be assured that if you only install from the rpmfusion repos that conflicts will be extremely rare or non-existent and if you use other repos conflicts are common. (Negativo, nvidia, cuda-fedoraNN, etc. are only a few of those)

You are worrying about things that are non-existent as long as you do not dive into trying more than you understand. Documentation is often out of date and may be misleading with the rapid development pace of fedora. In fact the instructions on the link you provided only show the cuda-fedora37 repo and I believe that was the last version for which that repo was available. The cuda-fedora repos are no longer being provided AFAICT. The rpmfusion repo has taken over the task.

I haven’t installed any non-Fedora/RPM Fusion repos myself except for one package from copr (which I imagine I could have just built myself, but I’m trying to learn how Fedora works so I picked one package (nothing to do with Cuda) to be my test case for copr). I’m just trying to learn what the right way to install Cuda is before I install it, and understand why it’s the right way. Also, my son already did install it, probably from the Nvidia repos, and I may need to understand enough so he can back it out if he needs to. He’s more impulsive than I am and hasn’t had to fix as many problems as I have from installing first and researching later. He’ll learn…

1 Like

Have you heard the term analysis paralysis.

That is a situation where someone does nothing because they always wonder what if? and are afraid to move forward.

Look at it this way. The worst that can happen is you break the system and have to reinstall. It costs nothing but time and effort, especially if you ensure you have your data backed up.

Dive in and make the effort. Learn by doing not by over analyzing everything.

If you read 10 different how-to’s there will probably be 10 different ways to achieve the same goal. Which is right? Maybe all are correct but use a different path to reach the end.

I’ve run Linux since either Yggdrasil or Slackware 1 or so, and I have fixed my share of stuff I broke by just doing things. I learned enough care to get me through writing X11 mode lines by hand and at this point I know how to do things to best satisfy myself. I like research more than fixing breakage, so that’s how I do it. I appreciate the concern, however.