Building container images from rpms
We are seeing initiatives like “bootable containers” and “container builds of rpms” (Konflux). And important piece of this puzzle are the Dockerfiles Containerfiles. A typical example with rpms starts like this:
RUN dnf install -y stuff other-stuff && dnf clean all
Is this wrong?
- This approach is imperative. In the 21st century we prefer declarative.
- The whole command has to be squished into a single logical line because of layering. This leads to convoluted syntax full of continued lines and silly bash pipelines.
- Combination of 1 + 2 leads to poor extensibility: if we want to add any itsy-bitsy package on top of this, we either need to create another layer or completely rewrite the commands. With some extensible declarative package list, we would be able to just insert another name in a list.
But those are not the biggest problems. Those are more of poor implementation decisions on the way. The whole approach is wrong.
dnf
and other tools are not part of the delivered payload, so they should never be in the image in the first place. If the tools are not in the image, we will never need to clean up.
Let’s consider a different way to do things:
- We want a declarative configuration with key-value settings, in particular a list of rpms to install. This syntax must allow additions and removals. All the “heavy lifting” like dependency resolution and comps groups must be done by the package manager.
- All tools must operate from the outside, adding and updating files in a buildroot directory. (Think
dnf install --installroot=/our/temporary/directory
.) - Once step 2. is done, package the buildroot into whatever formats we need. Again, we do this “from the outside”. If the output format is a tarball, we just tar things up. If the output format is a file system image, we call
mkfs --rootdir=/our/temporary/directory /our/temporary/filesystem.img
to spit out a ready-made file system with the appropriate contents. If the output format is a disk image, we blit the filesystem image into a new file at an offset, and add the GPT header and footer around it. (systemd-repart takes a declarative configuration in the style of CSS and creates disk images, completely offline.)
This approach is a natural extension of the systemd
configuration approach. All config (service files, daemon config, system settings) is simple text files in the appropriate locations in the filesystem. This means that “configuring things” is just dropping simple text files into the right place. Declarative, legible, extensible. In fact, to make this easier, all systemd tools support --root=/our/temporary/directory
, to operate on any image from the outside. (Or --image=…
to operate on a disk image.)
A tool that nicely implements this philosophy is mkosi, but a similar approach is used by KIWI NG. Various Fedora image builds have been switching to kiwi.
What are the benefits:
- The image contents are tailored to its purpose, no extra tools, no cleanups necessary, no leftovers possible.
- The user operates at a higher conceptual level, essentially giving a list of features, and the tools handle the annoying details.
- No privileges or special locations are necessary. We just copy files to and from a temporary directory, blit some bytes from one file to another. We can run this as a normal unprivileged user, or in a container, or as part of some build pipeline with identical results.
- Things are fast. Since we’re just copying bits, we can use reflinks for avoid actually moving any data. Things may become even faster with RPM zero-copy installations in the future.
- In general the “image builder” is simple and dumb: it gathers a list of rpms/groups to install, and passes it off to
dnf
/apt
/zypper
/pacman
/…, then passes the temporary work dir tomkfs
orsystemd-repart
. No rocket science. - We get easy portability between distributions and architectures. Package naming and splitting differs between distros, so we need to conditionalize that part, but most of the config can be shared.
- Everything is “local”, so the test build on a laptop and the official build in the cloud are equivalent.
- Debugging is easy. Since we are just calling well-established tools, we can take any part of that pipeline and run it interactively.
Building rpms
OK, I was going on and on about how to build things from rpms. How are the rpms theselves produced? Is the build process declarative, from the outside, minimalistic? The answer is … complicated. Let’s consider some properties:
- rpm builds use a temporary build directory and a temporary install root, confusingly called
%buildroot
. The tools to operate on both of those subhierarchies come from the outside, the “host system” (nowadays usually a container created for the build). So we are pretty close. Things can get confusing for example with tests, which sometimes need to access files in the%buildroot
, but for a majority of packages the separation is maintained.
In fact, the confusing naming points to the evolution that happened. In prehistoric times, rpms would be built by installing files onto a running system and then carefully picking them out. Until this day rpm tools start with a check whether%buildroot
is empty or/
, just in case we went back to 1995 or so. - rpm spec files are … complicated. A macro language with a complicated syntax and a minefield of caveats. But modern builds are ever more declarative. With rpm declarative build systems, we may have spec files where the
Name
,Version
,License
, and%description
are the majority of the file, and the rest is some just easy boilerplate. - rpm spec files used to do a lot of “heavy lifting”, listing build and runtime dependencies in detail, with
%build
and%install
sections taking hundreds of lines. But now we’re offloading most of that work to language-specific build systems. We have dynamic build requirements, where the build system callscargo
orpip
to spit out a list of build dependencies based on the declarative configuration provided by upstream. We use the likes of%meson_build
and%cargo_install
to do the complicated parts.
(Yes, most rpm spec files are far from this ideal. But maintaining backward compatibility is a feature, not a bug. We can and should move things towards the simpler, more declarative, templates in the future, but some tools still need to improve to make this smooth and successful. But I think rpm builds will have a bright future where we’ll be able to consistently and easily convert Python wheels, Rust crates, and other upstream packaging formats into rpms, with automatic handling of versioned depedencies and other metadata. And then those rpms can be used as a common intermediate delivery format of software.)
It is important to note that the build itself is “hermetic” — there is no access to the network and the package itself is completely dependent on the surrounding system to provide the build dependencies. It just spits out a list of build requirements, and e.g. mock
takes care of finding the right packages and unpacking them into the expected place. The package does not specify the exact versions (because we want to use the latest without having to tread water updating references), it does not specify architectures (because it cannot know where and how we want to build), and it does not specify any destinations for the build artifacts, just the metadata internal to the package.
It is also important to note that the build process generates a “software bill of materials”, in the form of specific rpm names and versions. We have the general rule that any rpm that was ever used to build other rpms, and any rpm that was ever delivered to users, must forever remain downloadable. This means that it’s possible to recreate the build environments of any historical rpm builds. This is exactly what Reproducible Builds initiative is doing to verify that builds can be repeated and fixing build definitions to make build products bit-for-bit identical.
If we consider the evolution of rpm builds, we’re clearly moving in the direction of minimalistic and declarative build specifications. The idea of operating on a “build root” was implemented ~25 years ago. We’re making progress on the other parts.
Summary
Where am I going with all this? There are various proposals being floated to change how we build rpms… When thinking about new systems, I think it is important to think about the journey that rpm builds have travelled over the last 25 years. The pipelines that we have now implement “hermetic” builds with no network access, with tight control over what is available in the build environment, but in a way that allows upstream metadata to be used to acquire this configuration, that provides good logging and reliable “software bills of materials”.
The way we build images is undergoing the same evolution: we used to do “live” builds of images, but we are switching to offline builds, where some declarative configuration is used to blit contents of packages into a buildroot and pack this up into the right formats. We install only the things we want in the image. We reuse existing tools and formats and the build pipeline remains relatively simple. We can easily produce a reliable list of components that went into the image, in whatever format is desired.
The imperative, complicated — messy — Dockerfiles Containerfiles, executed in a “live” environment", are not the solution.