Install Nvidia 410 on Silverblue

I lost a week with ubuntu and other distros because i work with diffrent codebases that require diffrent versions of the same frameworks, libs and dependencies. I then thought about just installing CoreOS and Kubernetes and found out about Silverblue and i like it alot.

I just dont know how to install legacy drivers on it.

I need to run older ML Code that requires TensorFlow 1.4 and Cuda 10. In order to run Cuda 10 i need to install Nvidia 410.

I know i can install the newest drivers with:
rpm-ostree install kmod-nvidia xorg-x11-drv-nvidia\

Is there any way to install specific versions with rpm-ostree?

Thanks!

Hello @littlenode,
If you know the repo that the driver version you want is packaged in, you should be able to add the repo to your SB install with sudo ostree remote add <repo-name> <repo-url> (not certain if you need to systemctl reboot here). Then add the desired package with rpm-ostree install <package-name> as you would with the current versions available. If the software you are going to use is a flatpak’d bit you can use the available extension modules from flathub for the nvidia 410 cards. Query them with flatpak remote-ls flathub --runtime | grep nvidia.
If the driver is available in the current base package, you would need to override with rpm-ostree to replace it, otherwise it would just get updated with the next base image update.

Thank you for your answer.
Im having a Nvidia RTX 2080 but in order to run Cuda 10 i need an older driver (410 instead of 440).

Can i specify the version like ?
rpm-ostree install kmod-nvidia=410

Im using an alias for rpm-ostree status called status and rpm-ostree install called installit.
Im trying to install Cuda-toolkit, i added the repo allready.

[node@localhost ~]$ status
State: idle
AutomaticUpdates: disabled
Deployments:
● ostree://fedora:fedora/31/x86_64/silverblue
                   Version: 31.20200213.0 (2020-02-13T01:53:53Z)
                BaseCommit: 29d5c5f1bac9fa9784c14cf4029100da7fa843d4ba1f9fe34dea640dbaaab81b
              GPGSignature: Valid signature by 7D22D5867F2A4236474BF7B850CB390B3C3359C4
           LayeredPackages: fedora-workstation-repositories kmod-nvidia xorg-x11-drv-nvidia xorg-x11-drv-nvidia-cuda
             LocalPackages: rpmfusion-free-release-31-1.noarch rpmfusion-nonfree-release-31-1.noarch
                            google-chrome-stable-80.0.3987.106-1.x86_64 cuda-repo-fedora29-10.2.89-1.x86_64

  ostree://fedora:fedora/31/x86_64/silverblue
                   Version: 31.20200213.0 (2020-02-13T01:53:53Z)
                BaseCommit: 29d5c5f1bac9fa9784c14cf4029100da7fa843d4ba1f9fe34dea640dbaaab81b
              GPGSignature: Valid signature by 7D22D5867F2A4236474BF7B850CB390B3C3359C4
           LayeredPackages: fedora-workstation-repositories kmod-nvidia xorg-x11-drv-nvidia xorg-x11-drv-nvidia-cuda
             LocalPackages: rpmfusion-free-release-31-1.noarch rpmfusion-nonfree-release-31-1.noarch
                            google-chrome-stable-80.0.3987.106-1.x86_64

Im trying to install cuda toolkit this way.

[node@localhost ~]$ installit https://developer.download.nvidia.com/compute/cuda/repos/fedora29/x86_64/cuda-toolkit-10-2-10.2.89-1.x86_64.rpm
Downloading 'https://developer.download.nvidia.com/compute/cuda/repos/fedora29/x86_64/cuda-toolkit-10-2-10.2.89-1.x86_64.rpm'... done!
Checking out tree 29d5c5f... done
Enabled rpm-md repositories: fedora rpmfusion-nonfree-updates rpmfusion-nonfree fedora-modular updates google-chrome rpmfusion-free-updates rpmfusion-free cuda
rpm-md repo 'fedora' (cached); generated: 2019-10-23T22:52:47Z
rpm-md repo 'rpmfusion-nonfree-updates' (cached); generated: 2020-02-11T12:07:00Z
rpm-md repo 'rpmfusion-nonfree' (cached); generated: 2019-10-22T10:43:47Z
rpm-md repo 'fedora-modular' (cached); generated: 2019-10-23T22:53:13Z
rpm-md repo 'updates' (cached); generated: 2020-02-14T01:09:42Z
rpm-md repo 'google-chrome' (cached); generated: 2020-02-13T19:29:25Z
rpm-md repo 'rpmfusion-free-updates' (cached); generated: 2020-02-11T11:47:02Z
rpm-md repo 'rpmfusion-free' (cached); generated: 2019-10-22T10:21:36Z
rpm-md repo 'cuda' (cached); generated: 2020-01-10T22:24:41Z
Importing rpm-md... done
Resolving dependencies... done
Will download: 81 packages (2.5 GB)
Downloading from 'fedora'... done
Downloading from 'updates'... done
Downloading from 'cuda'... done
Importing packages... done
error: package nsight-systems-2019.5.2-2019.5.2.16_b54ef97-0.x86_64 cannot be verified and repo cuda is GPG enabled: failed to lookup digest in keyring for /var/cache/rpm-ostree/repomd/cuda-31-x86_64/packages/NsightSystems-linux-public-2019.5.2.16-b54ef97.rpm

Am i doing it correctly? What is the reason its not really working?

I tried downloading and installing Cuda-toolkit locally. Cuda-toolkit is really important to get for me.

[node@localhost ~]$ status
State: idle
AutomaticUpdates: disabled
Deployments:
● ostree://fedora:fedora/31/x86_64/silverblue
                   Version: 31.20200213.0 (2020-02-13T01:53:53Z)
                BaseCommit: 29d5c5f1bac9fa9784c14cf4029100da7fa843d4ba1f9fe34dea640dbaaab81b
              GPGSignature: Valid signature by 7D22D5867F2A4236474BF7B850CB390B3C3359C4
           LayeredPackages: fedora-workstation-repositories kmod-nvidia xorg-x11-drv-nvidia xorg-x11-drv-nvidia-cuda
             LocalPackages: cuda-repo-fedora29-10.2.89-1.x86_64 rpmfusion-free-release-31-1.noarch
                            cuda-repo-fedora29-10-2-local-10.2.89-440.33.01-1.0-1.x86_64
                            code-1.42.1-1581433057.el7.x86_64 rpmfusion-nonfree-release-31-1.noarch
                            google-chrome-stable-80.0.3987.106-1.x86_64

  ostree://fedora:fedora/31/x86_64/silverblue
                   Version: 31.20200213.0 (2020-02-13T01:53:53Z)
                BaseCommit: 29d5c5f1bac9fa9784c14cf4029100da7fa843d4ba1f9fe34dea640dbaaab81b
              GPGSignature: Valid signature by 7D22D5867F2A4236474BF7B850CB390B3C3359C4
           LayeredPackages: fedora-workstation-repositories kmod-nvidia xorg-x11-drv-nvidia xorg-x11-drv-nvidia-cuda
             LocalPackages: rpmfusion-free-release-31-1.noarch rpmfusion-nonfree-release-31-1.noarch
                            google-chrome-stable-80.0.3987.106-1.x86_64 cuda-repo-fedora29-10.2.89-1.x86_64
[node@localhost ~]$ installit cuda-toolkit
Checking out tree 29d5c5f... done
Enabled rpm-md repositories: rpmfusion-nonfree-nvidia-driver fedora rpmfusion-nonfree-updates cuda-10-2-local-10.2.89-440.33.01 rpmfusion-nonfree fedora-modular updates google-chrome rpmfusion-free-updates rpmfusion-free cuda
rpm-md repo 'rpmfusion-nonfree-nvidia-driver' (cached); generated: 2020-02-06T10:14:14Z
rpm-md repo 'fedora' (cached); generated: 2019-10-23T22:52:47Z
rpm-md repo 'rpmfusion-nonfree-updates' (cached); generated: 2020-02-11T12:07:00Z
Updating metadata for 'cuda-10-2-local-10.2.89-440.33.01'... done
error: repodata cuda-10-2-local-10.2.89-440.33.01 was not complete: Cannot open /var/cuda-repo-10-2-local-10.2.89-440.33.01/repodata/repomd.xml: No such file or directory
[node@localhost ~]$

When checking the cuda version i dont get any version number.

[node@localhost ~]$ nvcc --version
bash: nvcc: command not found
[node@localhost ~]$ cat /usr/local/cuda/version.txt
cat: /usr/local/cuda/version.txt: No such file or directory

i only see the cuda version with nvidia-smi

[node@localhost ~]$ nvidia-smi
Fri Feb 14 13:49:07 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.59       Driver Version: 440.59       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:08:00.0  On |                  N/A |
|  0%   46C    P8     9W / 250W |    207MiB /  7981MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      1200      G   /usr/libexec/Xorg                             18MiB |
|    0      1535      G   /usr/libexec/Xorg                             64MiB |
|    0      1655      G   /usr/bin/gnome-shell                          73MiB |
+-----------------------------------------------------------------------------+

The error reported at the end is in regards to the package nsight-systems... gpg key not being available while the repo cuda has gpg verification enabled. When you add a repo that has no gpg verification available, you should remove it with ostree first then re-add it, again with sudo ostree remote add <repo-name> <repo-url> --no-gpg-verify thus disabling gpg verification. Gpg is used for package signing to ensure you get trusted software installed. If you are comfortable that the software you are trying to install is okay, then disableing gpg verification should be alright. Once you have done that try to re-install via rpm-ostree and it should complete I believe.

I made some progress. I succesfully installed Cuda10 from a local RPM. Im still trying to figure out how to intsall the older driver run file. When executing the run in a container toolbox i get.

ERROR: An NVIDIA kernel module 'nvidia-uvm' appears to already be loaded in your kernel.  This may be because it   
         is in use (for example, by an X server, a CUDA program, or the NVIDIA Persistence Daemon), but this may     
         also happen if your kernel was configured without support for module unloading.  Please be sure to exit any 
         programs that may be using the GPU(s) before attempting to upgrade your driver.  If no GPU-based programs   
         are running, you know that your kernel supports module unloading, and you still receive this message, then  
         an error may have occured that has corrupted an NVIDIA kernel module's usage count, for which the simplest  
         remedy is to reboot your computer.

I searched for it and i found a similar issiue.
In my case its nvidia-uvm not drm.

i tried sudo modprobe -r nvidia-uvm
but i still get the same error after running the .run file.

But when i try it i get:

sudo systemctl stop systemd-logind
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down

I assume that containers arent booted with systemd-logind.

Any ideas how i can mange to install 410 via the run, yum or dnf?

Edit2:

Ok i have a suspicion. On the container even without installing CUDA first i would get the same error. Silverblue has the newest Nvidia drivers installed. Could that produce the error?

How is it possible that nvidia-uvm is allready loaded in the Container kernel? I thought toolbox containers have their own kernel that starts from scratch when you fire it up. Right?

Edit3:

After adding blacklist nvidia line to /etc/modprobe.d/blacklist.conf im getting the nvidia-drm error instead of the nvidia-uvm

⬢[node@toolbox Downloads]$ lsof /dev/nvidia*
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/host/run/user/42/gvfs
      Output information may be incomplete.
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /var/home/node/.local/share/containers/storage/overlay/076dec9164d5b18ee75a2a5a61c773cc027d0c45d5a1ac18ada7993d90956192/merged/run/host/run/user/42/gvfs
      Output information may be incomplete.
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/host/var/home/node/.local/share/containers/storage/overlay/076dec9164d5b18ee75a2a5a61c773cc027d0c45d5a1ac18ada7993d90956192/merged/run/host/run/user/42/gvfs
      Output information may be incomplete.
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/host/var/home/node/.local/share/containers/storage/overlay/076dec9164d5b18ee75a2a5a61c773cc027d0c45d5a1ac18ada7993d90956192/merged/var/home/node/.local/share/containers/storage/overlay/076dec9164d5b18ee75a2a5a61c773cc027d0c45d5a1ac18ada7993d90956192/merged/run/host/run/user/42/gvfs
      Output information may be incomplete.

here i tried to kill everything nvidia inside the container before running the .run file to install the driver.

⬢[node@toolbox Downloads]$ ps -ef | grep nvidia
nobody       913       2  0 03:13 ?        00:00:00 [nvidia-modeset/]
nobody       914       2  0 03:13 ?        00:00:00 [nvidia-modeset/]
nobody      1214       2  0 03:13 ?        00:00:08 [irq/106-nvidia]
nobody      1215       2  0 03:13 ?        00:00:00 [nvidia]
node        9454    8341  0 03:28 pts/2    00:00:00 grep --color=auto nvidia
⬢[node@toolbox Downloads]$ sudo pkill -f nvidia
pkill: killing pid 913 failed: Operation not permitted
pkill: killing pid 914 failed: Operation not permitted
pkill: killing pid 1214 failed: Operation not permitted
pkill: killing pid 1215 failed: Operation not permitted

Edit4:
I suspected Silverblues Nvidia-driver affecting the nvidia driver instalation inside the container. I commented out inside
rpm-ostree kargs --editior
rd.driver.blacklist=nouveau modprobe.blacklist=nouveau nvidia-drm.modeset=1 and
I added this inside silverblue after intalling the newest nvidia drivers on silverblue. After i commented out it got commited and i had to reboot.

Whats curious is:
I entered my container and ran the nvidia driver .run file again and im getting a new kind of error now:

  ERROR: The Nouveau kernel driver is currently in use by your system.  This driver is incompatible with the NVIDIA  
         driver, and must be disabled before proceeding.  Please consult the NVIDIA driver README and your Linux     
         distribution's documentation for details on how to correctly disable the Nouveau kernel driver. 

Then further

 WARNING: One or more modprobe configuration files to disable Nouveau are already present at:
           /usr/lib/modprobe.d/nvidia-installer-disable-nouveau.conf,
           /etc/modprobe.d/nvidia-installer-disable-nouveau.conf.  Please be sure you have rebooted your system      
           since these files were written.  If you have rebooted, then Nouveau may be enabled for other reasons,     
           such as being included in the system initial ramdisk or in your X configuration file.  Please consult the 
           NVIDIA driver README and your Linux distribution's documentation for details on how to correctly disable  
           the Nouveau kernel driver.

I actually restarted the container with

podman restart TF1 
4085c10869665ba9a9276bd6e8dc91fb61f147b0ae75385f9bb65fc2bfc8e429

Whats bizzare is i rechecked me kargs inside silverblue and the entire line that i commented out with the pound sign is gone.

Im tempted to disable the nvidia and the nouveau drivers inside silverblue and see if i can install the Nvidia 410 driver inside the container.

The goal is to have the following structure:

Silverblue Nvidia 440 drivers and Cuda 10.2
Container1: Nvidia 410, Cuda 10, TensorFlow 1.4
Container2: Nvidia 440, Cuda 10.2, Tensorflow 2.1
Existing code im working with demands certain versions and i dont want to dual boot into two diffrent systems.

Is what im trying to do even possible or am i chasing something impossible?

I think i have installed successfully cuda 10.0 on my container and i know for a fact that i have installed cuda 10.2 on my host Silverblue. What about the drivers?

Is it possible to have two containers with each a diffrent version of the Nvidia drivers and Cuda running? If yes is it possible to run both containers parallel?

EDIT5:

I found the Nvidia driver 410 as a RPM:
https://rpmfind.net/linux/RPM/mageia/6/x86_64/media/nonfree/backports/x11-driver-video-nvidia-current-410.57-1.mga6.nonfree.x86_64.html

Tried installing it with
sudo yum localinstall x11-driver-video-nvidia-current-410.57-1.mga6.nonfree.x86_64.rpm

⬢[node@toolbox Downloads]$ sudo yum localinstall x11-driver-video-nvidia-current-410.57-1.mga6.nonfree.x86_64.rpm
Last metadata expiration check: 0:17:47 ago on Sun Feb 16 16:14:58 2020.
Error: 
 Problem: conflicting requests
  - nothing provides kmod(nvidia-current.ko) = 410.57 needed by x11-driver-video-nvidia-current-410.57-1.mga6.nonfree.x86_64
  - nothing provides ldetect-lst >= 0.3.7.9-2 needed by x11-driver-video-nvidia-current-410.57-1.mga6.nonfree.x86_64
  - nothing provides xserver-abi(videodrv) < 25 needed by x11-driver-video-nvidia-current-410.57-1.mga6.nonfree.x86_64
  - nothing provides lib64vdpau1 needed by x11-driver-video-nvidia-current-410.57-1.mga6.nonfree.x86_64
  - nothing provides x11-server-common needed by x11-driver-video-nvidia-current-410.57-1.mga6.nonfree.x86_64
  - nothing provides update-alternatives needed by x11-driver-video-nvidia-current-410.57-1.mga6.nonfree.x86_64
(try to add '--skip-broken' to skip uninstallable packages)

Hello @littlenode,

For kernel arguments you use kargs there are a couple of discussions around that here and there. Try them out for your answer, if all else fails use kargs’ manpage.