Trouble installing proprietary nvidia driver

I am wiping my fedora installation and starting over because I need to use my computer.

I didn’t make much progress on this installation so what I have now shouldn’t be off from my current backup.

Thank you @computersavvy and @ankursinha for helping me, it really meant a lot to me.

1 Like

I am confused / concerned about this.

So after installing the Nvidia drivers and cuda via:

sudo dnf install akmod-nvidia xorg-x11-drv-nvidia-cuda

One still has to install the official nvidia drivers from Nvidia?

That rpmfusion document at Howto/CUDA - RPM Fusion has this code:

sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/fedora35/x86_64/cuda-fedora35.repo
sudo dnf clean all
sudo dnf module disable nvidia-driver
sudo dnf -y install cuda

Am I supposed to run that even after running:

sudo dnf install akmod-nvidia xorg-x11-drv-nvidia-cuda

???

Absolutely not. The package from rpmfusion handles cuda quite nicely and I have never needed any other source for the cuda drivers. In fact, as I understand it, the rpmfusion cuda driver is built from the official nvidia packages and is kept up to date.

There have been several posts about conflicting cuda drivers and the fix, IIRC, has always been to remove all other packages and disable those repos while only using the one from rpmfusion.

That linked how-to is sadly out of date and has not been updated for quite some time.

2 Likes

First, thanks for your response.

Yes, if we are talking about the same linked how to (the one specific to CUDA), that seems to be for Fedora 35.

Just adding some info for future readers who searched for the same error message:

I had the same error messages about being unable to find the akmod package and unable to find CUDA:

none of the providers can be installed

And:

All matches were filtered out by modular filtering for argument: xorg-x11-drv-nvidia-cuda
Error: Unable to find a match: xorg-x11-drv-nvidia-cuda

Then I noticed that the error messages were saying something about being unable to find a dependency version >= (greater than or equal to) the required one. In other words, DNF cannot find a new enough version of the required dependencies.

So on a whim, I did sudo dnf list "*nvidia*"

To my horror, I saw the problem:

About half of the NVIDIA packages were absolutely ancient and were being selected from the NVIDIA Fedora 35 CUDA repository. (I was using Fedora 38. But NVIDIA tells you to add older repos if you want older CUDA Toolkits.)

I disabled that repository, and ran a new attempt to install the driver: sudo dnf install akmod-nvidia --refresh

Voila. It worked.

So here is a warning: Anyone who installs older CUDA Toolkits by adding NVIDIA’s official CUDA repo to Fedora will run into this problem.

For example, I needed CUDA Toolkit 11 since Microsoft’s popular AI framework requires that version. So I installed NVIDIA’s Fedora 35 CUDA repo, since it was the only one that still shipped that old toolkit. Normally, that’s totally fine. You can easily install old toolkit from those repos, and they work just fine on Fedora 39+ and the latest driver. The issue is that NVIDIA also includes a bunch of driver packages inside their CUDA repos, whose names conflict with the RPM Fusion driver packages. Which is why DNF doesn’t know how to resolve the conflict.

The installation instructions are indeed very outdated at RPM Fusion. They mention a command, sudo dnf module disable nvidia-driver, which is supposed to prevent conflicts between the driver repos. But that command doesn’t work anymore (the module with that name doesn’t exist). (Edit: Apparently it can exist on some systems, but not mine. Jeff below says that it’s related to having installed old NVIDIA drivers via their CUDA repo at some point.)

It’s clearly not a good idea to mix old NVIDIA CUDA repos with Fedora’s package manager, since DNF lacks support for “package pinning” (i.e. packages always updating via the same source they were installed from, and dependencies always preferring same-repo if possible, to prevent these kinds of conflicts).

It’s technically possible to edit the /etc/yum.repos.d/cuda-fedoraXX.repo file to add a line similar to exclude=package package1 someotherpackagewithanasteriskwildcard*, to filter out all old driver packages (things that RPM Fusion provides instead). But that’s just gonna be hassle and adds annoying long-term maintenance in case package names change in the future.

But hey, if someone wants to do that, the process is as follows. First open the repo (such as the one I’m using for CUDA Toolkit 11):

https://developer.download.nvidia.com/compute/cuda/repos/fedora35/x86_64/

Then look at the package names. In this example, it is easy to see that all the ancient driver packages are prefixed with kmod-* and nvidia-*. So it would just be a matter of adding exclude=kmod-* nvidia-* to the repo’s config file.

I can personally vouch that when I do sudo dnf list --installed | grep "@cuda" to look at what had been installed by CUDA 11, I see that all of the necessary packages are prefixed with cuda-* gds-* lib* nsight-*. So the old driver packages (nvidia-* and kmod-*) that NVIDIA put in their repo are totally useless for CUDA Toolkit users. Therefore, the exclude-line I mentioned in the previous paragraph should work fine, but I am not willing to waste time on testing it.

For my own sanity, I have instead decided to uninstall all custom, older CUDA Toolkits and only install “Fedora-approved” NVIDIA stuff directly from RPM Fusion. Furthermore, since I actually NEED older CUDA versions, I am going to research how to make podman containers which run older CUDA Toolkits. I’ve read that it is doable. I am never letting NVIDIA’s ancient, official repo touch my main OS again!

(Update: I’ve researched a bit more. It’s very easy to make docker/podman containers that use older CUDA Toolkit versions. NVIDIA provides different base-images for each CUDA version. You basically just have to write a container “compose” file which refers to the correct base-image that you want to run. So yeah, just search for articles about that online and have fun.)

So yeah… if you are getting these DNF errors, it means you have multiple repositories that provide the same conflicting package. And that your DNF is selecting the older version of that package.

1 Like

This note about modular filtering mostly means a user has previously installed nvidia from the cuda repository and that they must remove the module installed with dnf module remove nvidia-driver.

After this the message about modular filtering disappears.

The inability to update can be resolved by a few other steps.

  1. verify that the rpmfusion-nonfree-nvidia-driver repo is enabled by looking at the output of dnf repolist
    If that is enabled then do the following.
  2. sudo dnf remove \*nvidia\* --exclude nvidia-gpu-firmware to remove all the older packages
  3. sudo dnf install akmod-nvidia xorg-x11-drv-nvidia-cuda to install the new drivers at current versions
  4. wait at least 5 minutes then reboot.

If step 1 shows the rpmfusion repo is not enabled then it can be easily enabled by using the gnome software app and enabling it through the 3rd party repos from the hamburger menu in the upper right corner of that software window.

NOTE:
Fedora 36 has been EOL for most of a year and updating to either F38 or F39 is strongly encouraged.

Thanks for the advice! That’s similar to what I did.

I uninstalled the old NVIDIA drivers (and excluded nvidia-gpu-firmware), then did the OS update via Nouveau drivers instead.

Edit: Oh and I have a big warning about that. After you uninstall NVIDIA drivers, you MUST reboot your machine once, before you begin upgrading to the next Fedora version! I didn’t do that. I just went “uninstall nvidia driver, upgrade to next fedora” all in one session. As a result of that, my NVIDIA driver was not fully uninstalled. I still had kernel boot args related to it. I think I may even still have had kernel modules related to it, because Nouveau was broken too (and yes I still had nvidia-gpu-firmware). As a result, my new Fedora version always booted to a black screen. Even editing kernel args in GRUB to re-enable nouveau didn’t work, which means that the machine was still trying to initialize the NVIDIA driver. The way I solved it was to edit kernel args to go to runlevel “3” (terminal mode) and in the terminal I re-installed the NVIDIA driver from RPM fusion. Then it booted.

When I got into Fedora 39 (in terminal mode to rescue the system, in my situation, as mentioned in the previous paragraph), I tried to install akmod-nvidia and was baffled that it still didn’t find “>= required version” of dependencies and refused to install. Which is when I did the “list all nvidia packages” thing and saw that many of them were being prioritized from NVIDIA’s Fedora 35 repo. So I disabled that repo, and was able to install the drivers, as mentioned.

The issue is that NVIDIA’s CUDA repo is a dumb mix of drivers + CUDA toolkit, rather than just providing CUDA itself. The other issue is that they choose the same package names as RPM Fusion.

I had a look at their repo via the package manager again today:

sudo dnf repoquery --disablerepo="*" --enablerepo="cuda-fedora35-x86_64" --queryformat "Name:%{name}-%{evr} Size:%{installsize} Summary:%{summary}"  --showduplicates

I can yet again confirm that, at least in this case, you can safely tell DNF to exclude every nvidia-* and kmod-* package from that repo. That will exclude all the outdated driver packages, and will also exclude a few non-essential applications that bear the same nvidia-* prefix.

The most important packages, the reason people add old NVIDIA CUDA repos, is to get access to older versions of CUDA Toolkit. Those packages all have other prefixes, so they would all properly install themselves even after excluding the driver package prefixes above.

Therefore, I decided to test my repo filtering idea, and can confirm that it works. Here’s an example /etc/yum.repos.d/cuda-fedora35.repo file with the exclude-line added, which tells DNF to completely ignore the driver packages from the CUDA repo. This fixes the repo package name conflicts (repoquery command I listed earlier no longer sees the outdated driver packages).

[cuda-fedora35-x86_64]
name=cuda-fedora35-x86_64
baseurl=https://developer.download.nvidia.com/compute/cuda/repos/fedora35/x86_64
enabled=1
gpgcheck=1
gpgkey=https://developer.download.nvidia.com/compute/cuda/repos/fedora35/x86_64/D42D0685.pub
exclude=kmod-* nvidia-*

The important line is the last one, the exclude=. I have only verified it for the CUDA Toolkit 11 repo above (the one NVIDIA calls “fedora 35”). But that line should work on all newer CUDA repos too.

I still don’t like installing old CUDA Toolkit natively anymore though, and will switch to their docker image compose method, where you can pick any CUDA Toolkit version base image you want. That method won’t infect the host OS with outdated packages.

As for the “nvidia-driver” DNF module, I don’t have it on my system. But thanks for clarifying what it’s for and that it can still exist on some systems.

1 Like

Even simpler would be do disable that repo since it is way out of date anyway.
Change the line enabled=1 to read enabled=0 then do the install of the nvidia drivers from rpmfusion.

Do the same with disabling any remaining cuda-fedora repos as well.

1 Like

But NVIDIA doesn’t ship CUDA Toolkit 11 in the newer repositories.

Their newest repository only has “cuda-toolkit-12” packages (same version that RPM Fusion ships):

https://developer.download.nvidia.com/compute/cuda/repos/fedora37/x86_64/

I had to get their fedora 35 repo to get the required toolkit.

This “going back in time” method seems to be the official way to get older toolkits as native packages, and it’s what RPM Fusion recommended doing. :confused:

NVIDIA doesn’t give any other option when picking CUDA Toolkit 11 and Fedora:

You can see “cuda-toolkit-11” packages here:

https://developer.download.nvidia.com/compute/cuda/repos/fedora35/x86_64/

From rpmfusion it is not recommended nor required to ever install directly the kmod-nvidia package on fedora. That package is locally created and installed by the akmod-nvidia package and akmods on the system.

1 Like

kmod-nvidia is a package with no files but which pulls in akmod-nvidia. It probably exists for historical reason.

1 Like

The instructions for installing nvidia drivers is at Howto/NVIDIA - RPM Fusion

While I dong have any problem with the instructions there, it does say
sudo dnf install akmod-nvidia # rhel/centos users can use kmod-nvidia instead
and historically some users have taken that to mean they should install the kmod-nvidia package in addition to the akmod-nvidia package – with a resulting conflict of drivers. I think the kmod-nvidia package pulls in prebuilt drivers which may not be compatible with the newer kernels in fedora but have not tested that myself.

There have been instances where simply removing the kmod-nvidia package eliminated the driver problems.

Yeah. But that was one of the conflicts between the repos. One of the driver upgrade errors was that “kmod-nvidia” was too old. Because instead of looking at the locally built and installed package, it looked at NVIDIA’s old CUDA repo.

So excluding kmod-* nvidia-* from NVIDIA’s CUDA repo is a good way to get rid of everything problematic and only being able to install the old CUDA Toolkit, without risking any driver conflicts.

As for how someone on Fedora ends up installing old NVIDIA repos, the process is simple:

  1. First the user reads the CUDA section of Howto/NVIDIA - RPM Fusion which says " Please have a look on the dedicated CUDA Howto".
  2. Then they read the dedicated CUDA Toolkit section of RPM Fusion’s CUDA page: Howto/CUDA - RPM Fusion, which says “Please use the Official link: Link to NVIDIA’s website”.
  3. Then they go to NVIDIA’s CUDA Toolkit website: https://developer.nvidia.com/cuda-downloads
  4. Then they click “OS: Linux”, “Architecture: x86_64”, “Distribution: Fedora”, “Version: 37” (the only choice)". They might even follow those instructions.
  5. Then they realize that there’s no CUDA Toolkit 11 in the newest NVIDIA CUDA repo. So they look again at NVIDIA’s CUDA Toolkit website.
  6. They see the link to “NVIDIA CUDA Toolkit Archive”: CUDA Toolkit Archive | NVIDIA Developer
  7. They pick the final release of CUDA Toolkit 11 (which is still required by very important AI projects like Microsoft ONNX for example): https://developer.nvidia.com/cuda-11-8-0-download-archive
  8. Again they click Linux, 64-bit, Fedora, and they see that the only choice is Fedora 35: https://developer.nvidia.com/cuda-11-8-0-download-archive?target_os=Linux&target_arch=x86_64&Distribution=Fedora
  9. So they Google the question and see that NVIDIA engineers say it’s no problem to use the older repo since it’s just a bunch of library files, and they aren’t specifically versioned to Fedora 35 and they work with newer drivers too.
  10. So then the final question is how to install CUDA Toolkit 11. There are three choices: https://developer.nvidia.com/cuda-11-8-0-download-archive?target_os=Linux&target_arch=x86_64&Distribution=Fedora&target_version=35
  11. One option is “runfile (local)” which is terrible because it’s impossible to uninstall it later since it just puts files all over the system, and it’s also a 5 gigabyte download which installs all kinds of extra, mostly-useless programs and libraries that you don’t want. Another option is “rpm (local)” which offers easy uninstall since RPM tracks its files, but is yet again an “all in one” bundle of 5 gigabytes of junk. The final option is “rpm (network)”, which is a repo that provides packages where you can pick exactly what you need.
  12. So of course you pick the network repo variant, and you are greeted with the instructions: https://developer.nvidia.com/cuda-11-8-0-download-archive?target_os=Linux&target_arch=x86_64&Distribution=Fedora&target_version=35&target_type=rpm_network
  13. The instructions didn’t look great, apart from the first line which adds the actual repo. The other lines talk about installing a DNF nvidia-driver module and installing CUDA itself. I didn’t want either of those. I just wanted the toolkit. So I just added the repo (the sudo dnf config-manager --add-repo ... line of their instructions).
  14. Then I ran sudo dnf install --refresh cuda-toolkit-11-8.
  15. And everything was fine until the actual RPM Fusion NVIDIA driver needed to update itself. Then DNF was looking at the outdated, identically-named driver packages of the old NVIDIA CUDA repo and failed to resolve dependencies.
  16. To fix that, edit /etc/yum.repos.d/cuda-fedora*.repo and add exclude=kmod-* nvidia-* to them, as mentioned here: Trouble installing proprietary nvidia driver - #28 by arcitec
  17. Now you have the native CUDA Toolkit library without any risk of RPM Fusion driver conflicts, since the old NVIDIA driver packages are filtered out.
  18. Swear a bit that NVIDIA always makes things complicated on Linux.