Supporting the nvidia drivers on silverblue

I’m looking at adding support for the nvidia drivers on top of silverblue, using package layering.
As part of this I’ve modified kmodtool and akmods to build the driver during the %post script here:
https://src.fedoraproject.org/fork/alexl/rpms/akmods/tree/silverblue-nvidia
https://src.fedoraproject.org/fork/alexl/rpms/kmodtool/tree/silverblue-nvidia

I also have some local changes to the actual nvidia drivers which are enough to make them build and install with the above changes. However for some reason the driver isn’t loading even though it is there, so I’m not putting that up for public yet. Instead I want to talk about some general issues with this silverblue and this approach.

When building the kernel module we need the kernel-devel package, and it needs to match the kernel version that is installed in the ostree image. This causes two problems:

First of all, the kernel-devel dependencies are not strictly tied to the kernel package (i.e you can have kernel and kernel-devel installed of different versions, even multiple of them). So, every time you layer a kernel-devel on top of the base silverblue image you need to manually specify the right version.

Secondly, even if the rpm dependencies would pull in the right version, that version may not be the most recent one, which means it will not even be available in the dnf repository. So, you have to manually dig for the right build in koji and layer it as a local rpm.

This is far from ideal, especially the fact that the kernel isn’t even available which means you can’t even script the dependencies realiably. The kernel tends to rev pretty often, and the ABI is incompatible on changes, so I think the only reasonable way this could ever work is that we bundle the matching kernel-devel with the base image. That seems to be about 50 megs, which doesn’t strike me as prohibitive.

Then we have a problem with the other dependencies that get pulled in. There is not actually that many (I’ve listed them all at the end of this post), so size-wise its not really a problem. However some of them are -devel packages which are version-tied with the ones in the base image. For example, there is glibc-devel and elfutils-libelf-devel, and if either glibc or elfutils-libelf has been updated in the yum repo since the base image was created then the package layering will fail.

I think this is less of a problem, because these packages are few and don’t rev so often. But, in practice it means that layering a new package or otherwise re-deploying will fail sometimes, and you will then need to first rpm-ostree upgrade, and then do the layering.

So, is there anything we could do to help here?

This is the layered packages i get when the nvidia driver is layered:

  akmod-nvidia-3:415.22-1.silverblue1.fc30.x86_64
  akmods-0.5.6-17.silverblue.fc30.noarch
  annobin-8.65-1.fc30.x86_64
  cpp-8.2.1-5.fc30.x86_64
  dwz-0.12-9.fc29.x86_64
  efi-srpm-macros-4-1.fc30.noarch
  egl-wayland-1.1.1-3.fc30.x86_64
  elfutils-libelf-devel-0.175-2.fc30.x86_64
  fakeroot-1.23-2.fc30.x86_64
  fakeroot-libs-1.23-2.fc30.x86_64
  fpc-srpm-macros-1.1-5.fc29.noarch
  gc-7.6.4-4.fc29.x86_64
  gcc-8.2.1-5.fc30.x86_64
  gdb-headless-8.2.50.20181130-11.fc30.x86_64
  ghc-srpm-macros-1.4.2-8.fc29.noarch
  glibc-devel-2.28.9000-26.fc30.x86_64
  glibc-headers-2.28.9000-26.fc30.x86_64
  gnat-srpm-macros-4-4.fc30.noarch
  go-srpm-macros-2-18.fc29.noarch
  guile-5:2.0.14-12.fc29.x86_64
  ima-evm-utils-1.1-4.fc29.x86_64
  isl-0.16.1-7.fc29.x86_64
  kernel-devel-4.20.0-0.rc6.git1.1.fc30.x86_64
  kernel-headers-4.20.0-0.rc6.git0.1.fc30.x86_64
  kmodtool-1-31.fc30.noarch
  libatomic_ops-7.6.6-1.fc29.x86_64
  libbabeltrace-1.5.6-1.fc29.x86_64
  libglvnd-opengl-1:1.1.0-2.fc30.x86_64
  libipt-2.0-1.fc30.x86_64
  libmpc-1.1.0-2.fc29.x86_64
  libva-vdpau-driver-0.7.4-22.fc29.x86_64
  libvdpau-1.1.1-10.fc29.x86_64
  libxcrypt-devel-4.4.1-1.fc30.x86_64
  make-1:4.2.1-10.fc29.x86_64
  nim-srpm-macros-1-3.fc29.noarch
  nvidia-driver-3:415.22-1.silverblue1.fc30.x86_64
  nvidia-driver-cuda-libs-3:415.22-1.silverblue1.fc30.x86_64
  nvidia-driver-libs-3:415.22-1.silverblue1.fc30.x86_64
  ocaml-srpm-macros-5-4.fc29.noarch
  openblas-srpm-macros-2-4.fc29.noarch
  perl-srpm-macros-1-28.fc29.noarch
  python-srpm-macros-3-39.fc30.noarch
  python3-rpm-4.14.2.1-3.fc30.x86_64
  qt5-srpm-macros-5.11.3-1.fc30.noarch
  redhat-rpm-config-125-1.fc30.noarch
  rpm-build-4.14.2.1-3.fc30.x86_64
  rpm-build-libs-4.14.2.1-3.fc30.x86_64
  rpm-sign-libs-4.14.2.1-3.fc30.x86_64
  rpmdevtools-8.10-7.fc30.noarch
  rust-srpm-macros-6-1.fc30.noarch
  xemacs-filesystem-21.5.34-31.20171230hg92757c2b8239.fc30.noarch
  zlib-devel-1.2.11-14.fc30.x86_64
  zstd-1.3.6-1.fc30.x86_64
6 Likes

Does this indicate we should have something like flatpak has - an sdk that matches the OS ? Ie a collection of devel packages and tools that match the base image and can be deployed as a unit ? That would have to be created on the server side when composing the base image.

Of course, rpm-ostree doesn’t currently have the infrastructure for this kind of layering. But if we had it, there would be more interesting use cases for it. People have always wanted to layer the desktop on top of a small atomic-host like core image, and swap out gnome for kde, for example.

4 Likes

I’d like a reference check for the following:

People have always wanted to layer the desktop on top of a small atomic-host like core image, and swap out gnome for kde, for example.

I don’t think that is the right approach. For regular development the idea is that you spin up a container with an sdk, and in that you can install whichever devel packages you want without problem.

The only problem happens when you want to build something against the host abi, which in practice only should be needed for kernel modules. So, the pragmatic solution to me seems to be to just add enough of the -devel packages needed for building a kernel module to the base (but not the build tools).

Looking at my list that would be:
elfutils-libelf-devel, glibc-devel, glibc-headers, kernel-devel, kernel-headers, zlib-devel, libxcrypt-devel
(Not sure why libxcrypt is in there??)
Which adds up to 52 meg of on-disk size of which kernel-devel is 49 meg.

Adding 50 megs to a 4 gb image is imho not a large price to pay to be able to cater to nvidia users and other kmod things. Its the pragmatic, ugly solution.

1 Like

Adding 50 megs to a 4 gb image is imho not a large price to pay to be able to cater to nvidia users and other kmod things. Its the pragmatic, ugly solution.

Yes, seems fine.

Turns out this is a dependency of glibc-devel.

Also zlib-devel is a dependency of elfutils-libelf-devel, and glibc-headers is from glibc-devel.
So, a shorter list of depndencies to add would be:

elfutils-libelf-devel (needed by kernel build)
glibc-devel (general requirement to build shit)
kernel-devel (needed to build a kernel)

I’ll plus-one for other desktops on a small core, although I don’t have any problem with GNOME 3 - it’s the desktop I prefer on machines with the horsepower to support it. But NVidia drivers are a must-have - nouveau black-screens on my HP Omen (NVidia 1050Ti).

There’s something to be said for being able to walk up to any Linux machine - Ubuntu, Fedora, Antergos, Debian, etc. - and have the same GNOME 3 desktop, Nautilus, GNOME Software, LibreOffice, Firefox, etc. Sure, at some point you’ll end up learning a different package management scheme under the hood, but it’s nice not to have to re-learn everything.

Sure, but by the same logic (taken to its illogical extreme): “There’s something to be said for being able to walk up to any computer and have the same Windows desktop, Explorer, Windows Update, Office, Internet Explorer, etc.” #JustSayin

1 Like

On that laptop, it’s just a reboot for me. And Antergos defaults to Chromium for browsing. :wink: