How to run Tensorflow-GPU in Podman?

amit112amit · August 1, 2020, 9:20am

Thank you @brogos and @FranciscoD for your inputs. I tried to switch from podman to docker. But docker broke the networking of my QEMU/KVM setup. This is a known issue with a workaround. It also requires switching to the old CGroups as a part of docker installation. So I was not very satisfied with it.
Solution:
I was able to get podman to work using the links proposed by @brogos. The important steps are:

Install NVIDIA driver on the host. Currently it is at version 440:100 for Fedora 32.
Install nvidia-container-toolkit following the instructions here. Note: You will get an error that Fedora 32 is unsupported distribution, so just set distribution=rhel8.2.
Edit /etc/nvidia-container-runtime/config.toml to set no-cgroups = true.
Whatever container image you want to run should match the CUDA version supported by the NVIDIA driver installed on the host. For driver 440:100 it is CUDA 10.2. The nvidia/cuda:latest docker image is at CUDA 11 so it will not work. I was making this mistake at first.
To test the installation you can run the following command which uses nvidia-smi which is provided by xorg-x11-drv-nvidia-cuda package from RPMFusion on the host machine.

podman run -it --rm --security-opt=label=disable nvidia/cuda:10.2-base nvidia-smi
Sat Aug  1 15:43:00 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100      Driver Version: 440.100      CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 960M    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   53C    P8    N/A /  N/A |     36MiB /  2004MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

If you don’t get this output most likely you will get some error message. You may have to check if you have followed all the steps correctly. All this is based on the discussion in this github issue.

ALTERNATE SOLUTION:
I came across another solution proposed by u/Abraxis_Dragon on this comment on r/Fedora. I find this solution as the most hassle-free. It uses singularity containers. I followed the following steps:

Install NVIDIA drivers from RPMFusion as explained here.
Install Singularity which is available in official repo.
sudo dnf install singularity.
Build the Tensorflow GPU container into a Singularity container
singularity build mytensorflow.sif docker://tensorflow/tensorflow:latest-gpu.
Run the container using --nv flag to allow direct access to the NVIDIA GPU.
singularity run --nv mytensorflow.sif
We can check that GPU is actually available inside the container by running the following command:

INFO:    Could not find any nv files on this host!

________                               _______________                
___  __/__________________________________  ____/__  /________      __
__  /  _  _ \_  __ \_  ___/  __ \_  ___/_  /_   __  /_  __ \_ | /| / /
_  /   /  __/  / / /(__  )/ /_/ /  /   _  __/   _  / / /_/ /_ |/ |/ / 
/_/    \___//_/ /_//____/ \____//_/    /_/      /_/  \____/____/|__/


You are running this container as user with ID 1000 and group 1000,
which should map to the ID and group for your user on the Docker host. Great!

Singularity> python3 -c "import tensorflow as tf; tf.config.list_physical_devices('GPU')"
2020-08-01 14:42:30.562914: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-08-01 14:42:32.529449: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-08-01 14:42:32.537876: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-01 14:42:32.538595: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 960M computeCapability: 5.0
coreClock: 1.0975GHz coreCount: 5 deviceMemorySize: 1.96GiB deviceMemoryBandwidth: 74.65GiB/s
2020-08-01 14:42:32.538636: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-08-01 14:42:32.584239: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10
2020-08-01 14:42:32.609743: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-08-01 14:42:32.618569: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-08-01 14:42:32.666673: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-08-01 14:42:32.676543: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-08-01 14:42:32.765228: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7
2020-08-01 14:42:32.765425: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-01 14:42:32.766202: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-08-01 14:42:32.766547: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0

So Tensorflow is actually able to use the GPU without any configuration tweaks or workarounds! Hope this will help somebody!

Topic		Replies	Views
Is there a way to install docker with gpu support? Ask Fedora f36 , docker , nvidia	10	4124	September 17, 2022
Nvidia GPU for machine learning Ask Fedora cuda , nvidia	2	87	May 14, 2025
NVIDIA Container Runtime not recognized in Docker on Fedora Ask Fedora	1	188	March 31, 2025
I have jupyter working My Tensorflow installation works in Command Prompt but not in Jupyter Ask Fedora podman	4	435	June 18, 2024
How to use Nvidia container-toolkit without disabling selinux? Ask Fedora podman , selinux , nvidia , container , f40 , ollama	2	903	January 29, 2025

How to run Tensorflow-GPU in Podman?

Related topics