NVIDIA in docker containers seems to be broken

Hello, I am currently trying to set up Jellyfin on my Fedora 42 server machine. I would like to utilize my Nvidia GTX 1050TI for video transcoding, however I am hitting a very frustrating wall.

A Tl;Dr is available at the bottom of this post, as I am going to try to give as many details as possible here.

The machine that this is running on is a QEMU Virtual machine utilizing secureboot with PCI-E passthrough for the GTX 1050TI. The NVIDIA proprietary drivers have been installed from the RPMFusion non-free repo, following their instructions for both installing the NVIDIA drivers and for setting up secure boot. To be clear, I am saying that the VM itself has secure boot enabled. The host machine (which is also Fedora 42) also has secure boot enabled.

user@localhost:~$ nvidia-smi
Tue Aug 19 06:38:25 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.64.05              Driver Version: 575.64.05      CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1050 Ti     On  |   00000000:05:00.0 Off |                  N/A |
| 51%   27C    P8            N/A  /   75W |       3MiB /   4096MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |

The drivers appear to be installed fine. My problem starts at trying to pass through the GPU to docker. To do this, you have to utilize the nvidia-container-toolkit package. I first tried doing this with the official nvidia provided package, but got this error

user@localhost:~$ docker run --rm --runtime=nvidia --security-opt=label=disable ubuntu:latest nvidia-smi
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: exec: "nvidia-smi": executable file not found in $PATH: unknown

Run 'docker run --help' for more information

The first thing I tried was putting SELinux in permissive mode, which was a shot in the dark because it didn’t complain about any errors but I wanted to be sure. No change in the error. I then stumbled upon this Reddit thread where a user says to use the official NVIDIA CUDA drivers, so I give that a shot. This did nothing and I still got the error. I then found this COPR repo that’s supposed to package the nvidia-container-toolkit for Fedora. So I uninstalled nvidia-container-toolkit, installed @ai-ml/nvidia-container-toolkit and nvidia-container-toolkit-selinux only to find it gave the same issue, and someone in the comments had a very similar issue with no solution ever documented. (Although the thread also got closed shortly after that user made the comment on that COPR thread.) Linked in that COPR repo though, is another package golang-github-nvidia-container-toolkit. So I uninstalled the COPR nvidia-container-toolkit and tried using THAT package, which in turn made me even more confused because the error I get from golang-github-nvidia-container-toolkit looks as if the package doesn’t include the docker nvidia runtime.

user@localhost:~$ docker run --rm --runtime=nvidia --security-opt=label=disable ubuntu:latest nvidia-smi
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/moby/68421f2ef084e9def43045155b074a69fe4ad7709b318686313f1da5e1f3de7f/log.json: no such file or directory): exec: "nvidia-container-runtime": executable file not found in $PATH: unknown

I’ve also found this thread, which looks to almost be my issue verbatim, but they never got a solution either. (My entire workflow is based in docker and switching one or two containers to podman will throw me off when working on maintainence, really trying to stay on docker.)

I’m not sure if there’s a really easy step I missed somewhere along the way, and my search engine skills are failing me, but I am at a complete loss as to where to go from here.

So, Tl;Dr, I’m 99.999% positive I’ve installed nvidia drivers correctly, why are docker containers saying unable to start container process: error during container init: exec: "nvidia-smi": executable file not found in $PATH: unknown?

Thanks in advance

Hey, I don’t use Docker (I use Podman), but as a first step please run your container with just bash and go check if the NVIDIA driver libs made it into your container.

ls -la /usr/lib64 | grep nvidia

You could also try the manual way of passing the devices to Docker and installing the driver into your container. Just make sure the driver version match.

Also from the documentation, you are missing the GPU to pass and they run it with sudo,

sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

Finally, this is how I install the toolkit from nvidia,

curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | tee /etc/yum.repos.d/nvidia-container-toolkit.repo

dnf install nvidia-container-toolkit -y

Please refer to the install instructions in this guide.

Hi! Thanks for your response. I did actually follow the official nvidia guide for installing and configuring nvidia-container-toolkit at first. At some point I guess I got the command wrong though and it stayed. This is the result I’m seeing from your provided command within the container

user@localhost:~$ sudo docker run --rm --runtime=nvidia --gpus all -it ubuntu bash
root@a792d663e792:/# ls -la /usr/lib64 | grep nvidia
root@a792d663e792:/# 

It looks like it doesn’t find anything.

Can you try Podman real quick just to make sure? Or even just,

toolbox create

to run a fedora container with nvidia and check if its docker or the nvidia toolkit failing.

I have been experimenting for the past couple days and I am genuinely not sure what was going on. I feel like something specific might have been wrong with my docker install, but I couldn’t figure it out. I removed the /var/lib/docker and /etc/docker directories and reinstalled docker, i reinstalled the driver, i reinstalled the toolkit, I have no idea what it was but I could not get it to work in the same virtual machine. When I created a new virtual machine from scratch and ran the commands in the following order, it finally worked.
Afterwards I moved the volumes dir back over and we were functional

## Install RPMFusion Package
sudo dnf install https://mirrors.rpmfusion.org/free/fedora/rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm https://mirrors.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-$(rpm -E %fedora).noarch.rpm

## Enable RPMFusion repo from package
sudo dnf config-manager setopt fedora-cisco-openh264.enabled=1

# Update
sudo dnf update -y
sudo systemctl reboot

# Secure boot custom key
## Install deps
sudo dnf install kmodtool akmods mokutil openssl
## Generate key
sudo kmodgenca -a
## Enroll MOK  & set password
sudo mokutil --import /etc/pki/akmods/certs/public_key.der
## Reboot to complete enrollment
sudo systemctl reboot

# Install NVidia driver
sudo dnf update -y # and reboot if you are not on the latest kernel
sudo dnf install akmod-nvidia -y # rhel/centos users can use kmod-nvidia instead
sudo dnf install xorg-x11-drv-nvidia-cuda -y #optional for cuda/nvdec/nvenc support
## wait 5-10 minutes
modinfo -F version nvidia

## if successful
sudo systemctl reboot
nvidia-smi

# Install docker CE
sudo dnf -y install dnf-plugins-core
sudo dnf-3 config-manager --add-repo https://download.docker.com/linux/fedora/docker-ce.repo
sudo dnf install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo systemctl enable --now docker
sudo docker run hello-world

# Install nvidia container toolkit
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
  sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo
export NVIDIA_CONTAINER_TOOLKIT_VERSION=1.17.8-1
  sudo dnf install -y \
      nvidia-container-toolkit-${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
      nvidia-container-toolkit-base-${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
      libnvidia-container-tools-${NVIDIA_CONTAINER_TOOLKIT_VERSION} \
      libnvidia-container1-${NVIDIA_CONTAINER_TOOLKIT_VERSION}
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker


# Test nvidia
sudo docker run --rm --runtime=nvidia --security-opt=label=disable --gpus all ubuntu nvidia-smi```