Fedora 41, nvidia-open, and dkms

I writing this here because of nvidia-open/570.86.15 issues on Fedora 41. When dnf upgrade is run, and it installs a new kernel, then dkms will automatically build nvidia-open under the running kernel. That’s expected.

To be clear, I originally installed via the official NVIDIA CUDA Installation Guide for Linux that’s located here:

I use the remote RPM repository as described there (yes, there’s an official fedora 41 repo despite what the docs say):

It basically boils down to this …

sudo dnf config-manager addrepo --from-repofile https://developer.download.nvidia.com/compute/cuda/repos/fedora41/x86_64/cuda-fedora41.repo
sudo dnf clean all
sudo dnf -y install cuda-toolkit-12-8
sudo dnf -y install nvidia-open

… and everything works great the first time.

However, by default (without explicitly setting a value for KERNEL_UNAME), the Nvidia Makefile included uses uname -r and that causes problems after a dnf upgrade installs a new kernel and dkms is immediately run. The modules that get dynamically built are for the running kernel, not the newly installed kernel.

That is evident in the /var/lib/dkms/nvidia-open/*/*/*/log/make.log file(s). Notice it enter’s the ‘wrong’ /usr/src/kernels directory, ie:

DKMS (dkms-3.1.5) make.log for nvidia-open/570.86.15 for kernel 6.12.15-200.fc41.x86_64 (x86_64)
Fri Feb 21 09:50:50 AM EST 2025
Cleaning build area
# command: 'make' clean
make -C src/nvidia clean
make[1]: Entering directory '/var/lib/dkms/nvidia-open/570.86.15/build/src/nvidia'
rm -f -rf _out/Linux_x86_64
make[1]: Leaving directory '/var/lib/dkms/nvidia-open/570.86.15/build/src/nvidia'
make -C src/nvidia-modeset clean
make[1]: Entering directory '/var/lib/dkms/nvidia-open/570.86.15/build/src/nvidia-modeset'
rm -f -rf _out/Linux_x86_64
make[1]: Leaving directory '/var/lib/dkms/nvidia-open/570.86.15/build/src/nvidia-modeset'
make -C kernel-open clean
make[1]: Entering directory '/var/lib/dkms/nvidia-open/570.86.15/build/kernel-open'
rm -f -r conftest
make[2]: Entering directory '/usr/src/kernels/6.12.10-200.fc41.x86_64'
make[2]: Leaving directory '/usr/src/kernels/6.12.10-200.fc41.x86_64'
make[1]: Leaving directory '/var/lib/dkms/nvidia-open/570.86.15/build/kernel-open'
...

Subsequent reboots fail to load the module, eg:

kernel: nvidia: version magic '6.12.10-200.fc41.x86_64 SMP preempt mod_unload ' should be '6.12.11-200.fc41.x86_64 SMP preempt mod_unload '

Here’s a short work-around (for Fedora) that leverages KERNEL_UNAME and properly build nvidia-open for the ‘latest’ kernel.

After a dnf upgrade, a variation of following will work as root.

# do this as root
sudo su -

# fyi, version sort installed kernels
rpm -qa kernel | sed -e 's/^kernel-//g' | sort -uV

# fyi, version sort installed dkms module/module-version and kernel/arch
dkms status | sort -uV

export CURRENT_KERNEL="$(uname -r)"; echo "CURRENT_KERNEL=${CURRENT_KERNEL}"
export LATEST_KERNEL="$(rpm -qa kernel | sed -e 's/^kernel-//g' | sort -uV | tail -1)"; echo LATEST_KERNEL=${LATEST_KERNEL} # this matches uname -r

# example rebuild; pick one
export KERNEL_UNAME="6.12.11-200.fc41.x86_64"
export KERNEL_UNAME=${CURRENT_KERNEL}
export KERNEL_UNAME=${LATEST_KERNEL}
echo KERNEL_UNAME=${KERNEL_UNAME}

# set proper values for dkms build, install, etc
export DKMS_ARCH="$(dkms status | grep ${KERNEL_UNAME}, | awk -F, '{print $3}' | awk '{print $1}' | awk -F: '{print $1}')"
export DKMS_KERNEL="$(dkms status | grep ${KERNEL_UNAME}, | awk -F, '{print $2}' | awk '{print $1}')" # should be the same as KERNEL_UNAME
export DKMS_MODULE_VERSION="$(dkms status | grep ${KERNEL_UNAME}, | awk -F, '{print $1}' | awk '{print $1}')"

# manually verify values
echo DKMS_ARCH=${DKMS_ARCH}
echo DKMS_KERNEL=${DKMS_KERNEL}
echo DKMS_MODULE_VERSION=${DKMS_MODULE_VERSION}

# NOTICE! Using the LATEST_KERNEL value is easiest/safest.
# NOTICE! If you're booted with the latest kernel and the modules ARE NOT loaded, then properly rebuilding may immediately load the correct signed module and will likely reset a graphical session.
# IMPORTANT! If you're booted from the kernel you want to 'fix' then do this in a tmux, screen, or from the linux console.
KERNEL_UNAME=${DKMS_KERNEL} dkms uninstall ${DKMS_MODULE_VERSION} -k ${DKMS_KERNEL}/${DKMS_ARCH}
KERNEL_UNAME=${DKMS_KERNEL} dkms build ${DKMS_MODULE_VERSION} -k ${DKMS_KERNEL}/${DKMS_ARCH} --force
KERNEL_UNAME=${DKMS_KERNEL} dkms install ${DKMS_MODULE_VERSION} -k ${DKMS_KERNEL}/${DKMS_ARCH}
KERNEL_UNAME=${DKMS_KERNEL} dkms status ${DKMS_MODULE_VERSION} -k ${DKMS_KERNEL}/${DKMS_ARCH}

# verify the build make.log enters the correct /usr/src/kernels directory ...
less /var/lib/dkms/nvidia-open/*/${DKMS_KERNEL}/${DKMS_ARCH}/log/make.log

systemctl reboot

Using uname -r in the Makefile seems a bit too convenient for a dynamic kernel module.

This has been happening for some time (I traced my own issues back for over a year, into f40) and there are SO many threads on similar issues. Still, I couldn’t find any that really detail what I’m trying to say here or any decent workarounds. I decided to spend the morning on this issue. I’m considering submitting a PR or bug report to Nvidia, too.

In the meantime, I posted this here, with a bunch of keywords and strings, hoping it would be found more easily and help someone else. Good luck.

2 Likes

I opened this issue, too

The rpmfusion developers package the nvidia “open” drivers for Fedora and make sure that they work. We recommend you use there version of the driver rather the the nvidia.com drivers that many people have reported issues with.

Thanks. Personally, I’ve had more problems with rpmfusion and dnf update than using the official Nvidia drivers. Especially the past year or so. Nvidia documentation & support has improved greatly.

I think if you don’t specify the kernel version, dkms is supposed to (re)build the module for both the current kernel and the newest kernel.

Excerpted from github.com – dell/dkms:

# Here we look for the most recent kernel so that we can
# build the module for it (in addition to doing it for the
# current kernel).

It looks like dkms is using a reasonably reliable method for determining what is the newest kernel:

# Get the most recent kernel in Rhel based systems.
_get_newest_kernel_rhel() {
    rpm -q --qf="%{VERSION}-%{RELEASE}.%{ARCH}\n" --whatprovides kernel | tail -n 1
}

Thanks. Mine is working fine again. I was just trying to share a workaround here, for others faced with the same issue. There’s a bug (or two) with the official f41 CUDA rpms, where the dkms.conf that gets used doesn’t pass KERNEL_UNAME to the Makefile. By default, the Makefile uses a uname -r and that causes it to compile the module with the kernel sources of the running kernel (rather than the updated kernel after a dnf update).

I put more detail here, too: [570][Fedora41] dkms incorrectly builds updated kernels with the current kernel's source · Issue #10 · NVIDIA/yum-packaging-nvidia-driver · GitHub

Hoping they fix it one way or another. In the meantime, I’ve changed the Makefile on my filesystem and dnf update works flawlessly again. dkms automatically compiles the drivers for the new kernel as it should.

2 Likes

That would only get Rhel/Fedora/CentOS kernel. On my system, for example, it wouldn’t get the latest kernel.

 ❯ rpm -q --qf="%{VERSION}-%{RELEASE}.%{ARCH}\n" --whatprovides kernel
6.13.2-cachyos1.fc41.x86_64
6.12.13-200.fc41.x86_64
6.13.3-cachyos1.fc41.x86_64
6.12.15-200.fc41.x86_64

Adding sort with -k, --key helps to order and later get the latest kernel independantly if it is RHEL format or not, but is not necessarely the definitive solution since there are a few different kernels in the COPR and unfotunetly sort -V, --version-sort would only work with the Fedora versioning pattern.

❯ rpm -q --qf="%{VERSION}-%{RELEASE}.%{ARCH}\n" --whatprovides kernel | sort -t "." -k4,4 -k1,1 -k2,2 -k3,3
6.12.13-200.fc41.x86_64
6.12.15-200.fc41.x86_64
6.13.2-cachyos1.fc41.x86_64
6.13.3-cachyos1.fc41.x86_64
1 Like

Good catch. That issue should probably be reported to the repo.

I already reported it to Nvidia upstream, and they’ve acepted & confirmed it will be fixed in the new 570 builds. Workarounds shouldn’t be needed much longer.

1 Like

The new 570 spec (et al) was published here a couple days ago …