Hi,
I try to run some machine learning pods using podman on f42. After some upgrade I can’t get the GPU helper config generated:
╭─ikke at [gpubox.ikenet] in ~ 25-06-06 - 18:03:16
╰─$ dnf list '*nvidia*' --installed
Installed packages
akmod-nvidia.x86_64 3:570.153.02-1.fc42 rpmfusion-nonfree-nvidia-driver
kmod-nvidia-6.14.4-300.fc42.x86_64.x86_64 3:570.153.02-1.fc42 @commandline
kmod-nvidia-6.14.9-300.fc42.x86_64.x86_64 3:570.153.02-1.fc42 @commandline
libnvidia-container-tools.x86_64 1.17.8-1 cuda-fedora41-x86_64
libnvidia-container1.x86_64 1.17.8-1 cuda-fedora41-x86_64
libva-nvidia-driver.x86_64 0.0.13^20250419gitc2860cc-1.fc42 updates
nvidia-container-toolkit.x86_64 1.17.8-1 cuda-fedora41-x86_64
nvidia-container-toolkit-base.x86_64 1.17.8-1 cuda-fedora41-x86_64
nvidia-gpu-firmware.noarch 20250509-1.fc42 updates
nvidia-modprobe.x86_64 3:570.153.02-1.fc42 rpmfusion-nonfree-nvidia-driver
nvidia-persistenced.x86_64 3:570.153.02-1.fc42 rpmfusion-nonfree-nvidia-driver
nvidia-settings.x86_64 3:570.153.02-1.fc42 rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia.x86_64 3:570.153.02-1.fc42 rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-cuda.x86_64 3:570.153.02-1.fc42 rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-cuda-libs.i686 3:570.153.02-1.fc42 rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-cuda-libs.x86_64 3:570.153.02-1.fc42 rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-kmodsrc.x86_64 3:570.153.02-1.fc42 rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-libs.i686 3:570.153.02-1.fc42 rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-libs.x86_64 3:570.153.02-1.fc42 rpmfusion-nonfree-nvidia-driver
xorg-x11-drv-nvidia-power.x86_64 3:570.153.02-1.fc42 rpmfusion-nonfree-nvidia-driver
╭─ikke at [gpubox.ikenet] in ~ 25-06-06 - 18:03:17
╰─$ nvidia-ctk cdi list
INFO[0000] Found 3 CDI devices
nvidia.com/gpu=0
nvidia.com/gpu=GPU-ddf9d362-ba80-d466-455b-b662f9ba5596
nvidia.com/gpu=all
╭─ikke at [gpubox.ikenet] in ~ 25-06-06 - 18:03:46
╰─$ sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
INFO[0000] Using /usr/lib64/libnvidia-ml.so.570.153.02
INFO[0000] Auto-detected mode as 'nvml'
ERRO[0000] failed to generate CDI spec: failed to create device CDI specs: failed to initialize NVML: Driver/library version mismatch
And driver is the same version as most of the other sw:
$ modinfo nvidia
filename: /lib/modules/6.14.9-300.fc42.x86_64/extra/nvidia/nvidia.ko
alias: char-major-195-*
version: 570.153.02
supported: external
license: NVIDIA
firmware: nvidia/570.153.02/gsp_tu10x.bin
firmware: nvidia/570.153.02/gsp_ga10x.bin
srcversion: 82F23DA6F1A39DF1BF2EC42
I have followed these guides:
So I have the 570 version of libs and kernel module. What gives?
BTW, everything was working still while ago. Then I had some troubles updating stuff, and there were conficts and I had to remove some of the stuff. But I’ve repeated the install steps just to make sure.