I writing this here because of nvidia-open/570.86.15
issues on Fedora 41. When dnf upgrade
is run, and it installs a new kernel, then dkms
will automatically build nvidia-open
under the running kernel. That’s expected.
To be clear, I originally installed via the official NVIDIA CUDA Installation Guide for Linux that’s located here:
I use the remote RPM repository as described there (yes, there’s an official fedora 41 repo despite what the docs say):
It basically boils down to this …
sudo dnf config-manager addrepo --from-repofile https://developer.download.nvidia.com/compute/cuda/repos/fedora41/x86_64/cuda-fedora41.repo
sudo dnf clean all
sudo dnf -y install cuda-toolkit-12-8
sudo dnf -y install nvidia-open
… and everything works great the first time.
However, by default (without explicitly setting a value for KERNEL_UNAME
), the Nvidia Makefile
included uses uname -r
and that causes problems after a dnf upgrade
installs a new kernel and dkms
is immediately run. The modules that get dynamically built are for the running kernel, not the newly installed kernel.
That is evident in the /var/lib/dkms/nvidia-open/*/*/*/log/make.log
file(s). Notice it enter’s the ‘wrong’ /usr/src/kernels
directory, ie:
DKMS (dkms-3.1.5) make.log for nvidia-open/570.86.15 for kernel 6.12.15-200.fc41.x86_64 (x86_64)
Fri Feb 21 09:50:50 AM EST 2025
Cleaning build area
# command: 'make' clean
make -C src/nvidia clean
make[1]: Entering directory '/var/lib/dkms/nvidia-open/570.86.15/build/src/nvidia'
rm -f -rf _out/Linux_x86_64
make[1]: Leaving directory '/var/lib/dkms/nvidia-open/570.86.15/build/src/nvidia'
make -C src/nvidia-modeset clean
make[1]: Entering directory '/var/lib/dkms/nvidia-open/570.86.15/build/src/nvidia-modeset'
rm -f -rf _out/Linux_x86_64
make[1]: Leaving directory '/var/lib/dkms/nvidia-open/570.86.15/build/src/nvidia-modeset'
make -C kernel-open clean
make[1]: Entering directory '/var/lib/dkms/nvidia-open/570.86.15/build/kernel-open'
rm -f -r conftest
make[2]: Entering directory '/usr/src/kernels/6.12.10-200.fc41.x86_64'
make[2]: Leaving directory '/usr/src/kernels/6.12.10-200.fc41.x86_64'
make[1]: Leaving directory '/var/lib/dkms/nvidia-open/570.86.15/build/kernel-open'
...
Subsequent reboots fail to load the module, eg:
kernel: nvidia: version magic '6.12.10-200.fc41.x86_64 SMP preempt mod_unload ' should be '6.12.11-200.fc41.x86_64 SMP preempt mod_unload '
Here’s a short work-around (for Fedora) that leverages KERNEL_UNAME
and properly build nvidia-open
for the ‘latest’ kernel.
After a dnf upgrade
, a variation of following will work as root
.
# do this as root
sudo su -
# fyi, version sort installed kernels
rpm -qa kernel | sed -e 's/^kernel-//g' | sort -uV
# fyi, version sort installed dkms module/module-version and kernel/arch
dkms status | sort -uV
export CURRENT_KERNEL="$(uname -r)"; echo "CURRENT_KERNEL=${CURRENT_KERNEL}"
export LATEST_KERNEL="$(rpm -qa kernel | sed -e 's/^kernel-//g' | sort -uV | tail -1)"; echo LATEST_KERNEL=${LATEST_KERNEL} # this matches uname -r
# example rebuild; pick one
export KERNEL_UNAME="6.12.11-200.fc41.x86_64"
export KERNEL_UNAME=${CURRENT_KERNEL}
export KERNEL_UNAME=${LATEST_KERNEL}
echo KERNEL_UNAME=${KERNEL_UNAME}
# set proper values for dkms build, install, etc
export DKMS_ARCH="$(dkms status | grep ${KERNEL_UNAME}, | awk -F, '{print $3}' | awk '{print $1}' | awk -F: '{print $1}')"
export DKMS_KERNEL="$(dkms status | grep ${KERNEL_UNAME}, | awk -F, '{print $2}' | awk '{print $1}')" # should be the same as KERNEL_UNAME
export DKMS_MODULE_VERSION="$(dkms status | grep ${KERNEL_UNAME}, | awk -F, '{print $1}' | awk '{print $1}')"
# manually verify values
echo DKMS_ARCH=${DKMS_ARCH}
echo DKMS_KERNEL=${DKMS_KERNEL}
echo DKMS_MODULE_VERSION=${DKMS_MODULE_VERSION}
# NOTICE! Using the LATEST_KERNEL value is easiest/safest.
# NOTICE! If you're booted with the latest kernel and the modules ARE NOT loaded, then properly rebuilding may immediately load the correct signed module and will likely reset a graphical session.
# IMPORTANT! If you're booted from the kernel you want to 'fix' then do this in a tmux, screen, or from the linux console.
KERNEL_UNAME=${DKMS_KERNEL} dkms uninstall ${DKMS_MODULE_VERSION} -k ${DKMS_KERNEL}/${DKMS_ARCH}
KERNEL_UNAME=${DKMS_KERNEL} dkms build ${DKMS_MODULE_VERSION} -k ${DKMS_KERNEL}/${DKMS_ARCH} --force
KERNEL_UNAME=${DKMS_KERNEL} dkms install ${DKMS_MODULE_VERSION} -k ${DKMS_KERNEL}/${DKMS_ARCH}
KERNEL_UNAME=${DKMS_KERNEL} dkms status ${DKMS_MODULE_VERSION} -k ${DKMS_KERNEL}/${DKMS_ARCH}
# verify the build make.log enters the correct /usr/src/kernels directory ...
less /var/lib/dkms/nvidia-open/*/${DKMS_KERNEL}/${DKMS_ARCH}/log/make.log
systemctl reboot
Using uname -r
in the Makefile
seems a bit too convenient for a dynamic kernel module.
This has been happening for some time (I traced my own issues back for over a year, into f40) and there are SO many threads on similar issues. Still, I couldn’t find any that really detail what I’m trying to say here or any decent workarounds. I decided to spend the morning on this issue. I’m considering submitting a PR or bug report to Nvidia, too.
In the meantime, I posted this here, with a bunch of keywords and strings, hoping it would be found more easily and help someone else. Good luck.