Containerfiles considered harmful

Building container images from rpms

We are seeing initiatives like “bootable containers” and “container builds of rpms” (Konflux). And important piece of this puzzle are the Dockerfiles Containerfiles. A typical example with rpms starts like this:

RUN dnf install -y stuff other-stuff && dnf clean all

Is this wrong?

  1. This approach is imperative. In the 21st century we prefer declarative.
  2. The whole command has to be squished into a single logical line because of layering. This leads to convoluted syntax full of continued lines and silly bash pipelines.
  3. Combination of 1 + 2 leads to poor extensibility: if we want to add any itsy-bitsy package on top of this, we either need to create another layer or completely rewrite the commands. With some extensible declarative package list, we would be able to just insert another name in a list.

But those are not the biggest problems. Those are more of poor implementation decisions on the way. The whole approach is wrong.

dnf and other tools are not part of the delivered payload, so they should never be in the image in the first place. If the tools are not in the image, we will never need to clean up.

Let’s consider a different way to do things:

  1. We want a declarative configuration with key-value settings, in particular a list of rpms to install. This syntax must allow additions and removals. All the “heavy lifting” like dependency resolution and comps groups must be done by the package manager.
  2. All tools must operate from the outside, adding and updating files in a buildroot directory. (Think dnf install --installroot=/our/temporary/directory.)
  3. Once step 2. is done, package the buildroot into whatever formats we need. Again, we do this “from the outside”. If the output format is a tarball, we just tar things up. If the output format is a file system image, we call mkfs --rootdir=/our/temporary/directory /our/temporary/filesystem.img to spit out a ready-made file system with the appropriate contents. If the output format is a disk image, we blit the filesystem image into a new file at an offset, and add the GPT header and footer around it. (systemd-repart takes a declarative configuration in the style of CSS and creates disk images, completely offline.)

This approach is a natural extension of the systemd configuration approach. All config (service files, daemon config, system settings) is simple text files in the appropriate locations in the filesystem. This means that “configuring things” is just dropping simple text files into the right place. Declarative, legible, extensible. In fact, to make this easier, all systemd tools support --root=/our/temporary/directory, to operate on any image from the outside. (Or --image=… to operate on a disk image.)

A tool that nicely implements this philosophy is mkosi, but a similar approach is used by KIWI NG. Various Fedora image builds have been switching to kiwi.

What are the benefits:

  1. The image contents are tailored to its purpose, no extra tools, no cleanups necessary, no leftovers possible.
  2. The user operates at a higher conceptual level, essentially giving a list of features, and the tools handle the annoying details.
  3. No privileges or special locations are necessary. We just copy files to and from a temporary directory, blit some bytes from one file to another. We can run this as a normal unprivileged user, or in a container, or as part of some build pipeline with identical results.
  4. Things are fast. Since we’re just copying bits, we can use reflinks for avoid actually moving any data. Things may become even faster with RPM zero-copy installations in the future.
  5. In general the “image builder” is simple and dumb: it gathers a list of rpms/groups to install, and passes it off to dnf/apt/zypper/pacman/…, then passes the temporary work dir to mkfs or systemd-repart. No rocket science.
  6. We get easy portability between distributions and architectures. Package naming and splitting differs between distros, so we need to conditionalize that part, but most of the config can be shared.
  7. Everything is “local”, so the test build on a laptop and the official build in the cloud are equivalent.
  8. Debugging is easy. Since we are just calling well-established tools, we can take any part of that pipeline and run it interactively.

Building rpms

OK, I was going on and on about how to build things from rpms. How are the rpms theselves produced? Is the build process declarative, from the outside, minimalistic? The answer is … complicated. Let’s consider some properties:

  1. rpm builds use a temporary build directory and a temporary install root, confusingly called %buildroot. The tools to operate on both of those subhierarchies come from the outside, the “host system” (nowadays usually a container created for the build). So we are pretty close. Things can get confusing for example with tests, which sometimes need to access files in the %buildroot, but for a majority of packages the separation is maintained.

    In fact, the confusing naming points to the evolution that happened. In prehistoric times, rpms would be built by installing files onto a running system and then carefully picking them out. Until this day rpm tools start with a check whether %buildroot is empty or /, just in case we went back to 1995 or so.
  2. rpm spec files are … complicated. A macro language with a complicated syntax and a minefield of caveats. But modern builds are ever more declarative. With rpm declarative build systems, we may have spec files where the Name, Version, License, and %description are the majority of the file, and the rest is some just easy boilerplate.
  3. rpm spec files used to do a lot of “heavy lifting”, listing build and runtime dependencies in detail, with %build and %install sections taking hundreds of lines. But now we’re offloading most of that work to language-specific build systems. We have dynamic build requirements, where the build system calls cargo or pip to spit out a list of build dependencies based on the declarative configuration provided by upstream. We use the likes of %meson_build and %cargo_install to do the complicated parts.

    (Yes, most rpm spec files are far from this ideal. But maintaining backward compatibility is a feature, not a bug. We can and should move things towards the simpler, more declarative, templates in the future, but some tools still need to improve to make this smooth and successful. But I think rpm builds will have a bright future where we’ll be able to consistently and easily convert Python wheels, Rust crates, and other upstream packaging formats into rpms, with automatic handling of versioned depedencies and other metadata. And then those rpms can be used as a common intermediate delivery format of software.)

It is important to note that the build itself is “hermetic” — there is no access to the network and the package itself is completely dependent on the surrounding system to provide the build dependencies. It just spits out a list of build requirements, and e.g. mock takes care of finding the right packages and unpacking them into the expected place. The package does not specify the exact versions (because we want to use the latest without having to tread water updating references), it does not specify architectures (because it cannot know where and how we want to build), and it does not specify any destinations for the build artifacts, just the metadata internal to the package.

It is also important to note that the build process generates a “software bill of materials”, in the form of specific rpm names and versions. We have the general rule that any rpm that was ever used to build other rpms, and any rpm that was ever delivered to users, must forever remain downloadable. This means that it’s possible to recreate the build environments of any historical rpm builds. This is exactly what Reproducible Builds initiative is doing to verify that builds can be repeated and fixing build definitions to make build products bit-for-bit identical.

If we consider the evolution of rpm builds, we’re clearly moving in the direction of minimalistic and declarative build specifications. The idea of operating on a “build root” was implemented ~25 years ago. We’re making progress on the other parts.

Summary

Where am I going with all this? There are various proposals being floated to change how we build rpms… When thinking about new systems, I think it is important to think about the journey that rpm builds have travelled over the last 25 years. The pipelines that we have now implement “hermetic” builds with no network access, with tight control over what is available in the build environment, but in a way that allows upstream metadata to be used to acquire this configuration, that provides good logging and reliable “software bills of materials”.

The way we build images is undergoing the same evolution: we used to do “live” builds of images, but we are switching to offline builds, where some declarative configuration is used to blit contents of packages into a buildroot and pack this up into the right formats. We install only the things we want in the image. We reuse existing tools and formats and the build pipeline remains relatively simple. We can easily produce a reliable list of components that went into the image, in whatever format is desired.

The imperative, complicated — messyDockerfiles Containerfiles, executed in a “live” environment", are not the solution.

12 Likes

Note that we’re moving even further in this regard where we can even have dynamic subpackage generation based on packaging behavior (RPM 4.18) as well as declarative ways of describing the build process with Declarative Builds in RPM 4.20.

The larger trend is to subsume imperative features of the RPM recipe format with declarative mechanisms and reserve imperative steps for advanced packaging needs.

1 Like

It’s common to see the pattern you describe, but many sophisticated container builds actually run a tool which operates on declarative input. Now which declarative input is a huge interesting topic. Personally, I like the “low tech” example we have in e.g. coreos-assembler/Dockerfile at 146cf623d6036fee3b7607d663bbc7a8a7300af9 · coreos/coreos-assembler · GitHub which ultimately parses this text file.

I also experimented with fedora / bootc / osbuild-cfg · GitLab a while back.

And yes, while mkosi has a nice declarative input format, it also has plenty of “execute this arbitrary code” hooks.

We actually do want dnf in the target images in many cases to support e.g. transient or even persistent overlays. More in Local package layering story with bootc & dnf5 (#4) · Issues · fedora / bootc / Issue Tracker · GitLab

This is a very interesting topic but it’s tricky because in many scenarios we do want the build to be isolated, and such a thing implies trust of the cache, which requires careful design.

Konflux also supports hermetic builds, and specifically for RPMs it uses GitHub - konflux-ci/rpm-lockfile-prototype

2 Likes

Agreed. However the thing is, there’s an extremely long tail of things that one might want to do that simply do not operate this way. One example today is update-crypto-policies. It probably wouldn’t be too hard to fix it to do so, but it just doesn’t. And it’s highly relevant for my employer’s use cases.

I’d be the first to agree that Containerfile has its flaws, bigger picture I’d really consider us allies in the larger movement towards image based updates, with some differences in implementation choices.

Containerfiles (yes Dockerfiles) have become the defacto way to build a Linux userspace over the last 10+ years. Agree they certainly have flaws [0].

Is mkosi a replacement for Containerfile? Can it build (OCI) containers? I’d be interested in trying it out for my container images.

What is the “git commit”-able declarative format for mkosi? I don’t see a *.5.md here:

Interested in replacements for Containerfile … and efforts for a better way to declare and build the (tens? of) millions of containers in existence.

Stef

[0] Much like any build system I’ve ever seen, and their related file formats: Each one is flawed, and together represent a glorious venn diagram of flaws. But that doesn’t keep everyone from trying to do better, over and over again.

I didn’t want to make this about specific tools. I only mentioned Mkosi and Kiwi-ng in passing as examples of a declarative approach.

It is true that Containerfiles are a widely spread standard. But the same could be said about bash scripts… Bash scripts certainly have their uses, as do Containerfiles. The Containerfile is essentially a script, just with a worse syntax than pure bash. A script is great to solve an immediate problem, particularly if something is done interactively and extensive error handling is not necessary. The point I was trying to make is that if we are building a new system, it shouldn’t be based on scripts. A lot of problems that appear at scale in packaging are directly caused by the imperative and ad-hoc nature of spec files. And I see a pattern of change towards “smarter tools, simpler specification of contents” across different levels: packages, user accounts, images, servers, etc, etc.

Is mkosi a replacement for Containerfile? Can it build (OCI) containers? I’d be interested in trying it out for my container images.

There is an oci output format, that produces a “directory compatible with the OCI image specification”. AFAIK, there is no support for other steps like uploading that anywhere, so it’s probably not directly usable.

What is the “git commit”-able declarative format for mkosi?

The whole config is a bunch of ini files. The options are documented in mkosi/mkosi/resources/man/mkosi.1.md at main · systemd/mkosi · GitHub, but a good comprehensive introductory docs remain TBD.

Interested in replacements for Containerfile … and efforts for a better way to declare and build the (tens? of) millions of containers in existence.

I think the pattern described by Colin for coreos above is a nice example of sidestepping the design mess inherited from Docker: the Containerfile launches a tool that contains the actual work. Containerfiles are not going to go away, but ideally they will just contain a single line to offload to a better tool.

(x-post from Akkoma)

We want a declarative configuration

Yes! Yes!

with key-value settings

No-o-o!! No, no, no! Just spend a week with #NixOS already before reinventing the wheel, I beg you. There’s no key-value schema that’d get you an rsyslog compiled and running against a patched gnutls, let alone any actually complicated system setup.

There’s simply no building a configurable scriptlets-free system without a powerful, flexible system composition mechanism like NixOS module system. That thing that composes loose packages into a configured image according to a spec is the distro. 20th century distros could skimp on that by showing those into scriptlets of random packages, extracting it into runtime configuration ugliness like crypto-policies and forcing users to hammer their systems into shape by imperative scripts like bash or Ansible. A 21th century immutable image-based distros is configuration system at heart. The flexibility of image composition is the flexibility of the result. Unless you’re designing a bespoke dumb appliance with a dozen of parameters, there’s no handwaving the centerpiece of its design as a bash script, Containerfile or an ini file.

It hurts so much to read such texts. NixOS is 21 years old. Declarative configuration, true composable cacheable immutability, seamless overriding 100% of the package building where needed, building dozens of image formats, declarative VM management, impermanence, factory resets, rebootless change application — those few of the above that weren’t solved back in 2003 were solved last decade. Wanna know where do can-do attitude of “I’ll willingfully ignore all those lessons and hammer Fedora into shape in order to emulate the fraction of the desired NixOS properties” leads? One smart engineer did just that, very recently. Now we have bootc, Containerfiles for a configuration mechanism and systems where we can’ t even securely distrust a root CA in a way that survives an update.

The package building segment really grinds my shattered heart depris into asphalt. Like, wouldn’t it be nice if, upon finding out your chrony is compiled without a feature you need, you could just quickly take a custom one compiled --with-missing-feature for a spin? Take a dive from the system configuration language and right into the intricacies of its autotools-driven compilation process? Guess what, that’s a long-solved problem as well, and all you need is something like:

systemd.services.chrony.Exec = "${pkgs.chrony.overrideAttrs (old: { configureFlags = old.configureFlags ++ [ "--with-missing-feature" ];})}/bin/chrony";

Imagine that once you wrote such an override, you can put it into different places of your configuration and beyond, and the scope will change accordingly. It can become a dependency of a customized revdep. It can go into the package set and all the revdeps will be recompiled and re-tested against it. Or just to the unit, no rebuilding needed. Or it can go into a separate unit without affecting the main chrony. A development environment. A throwaway shell definition. Your .bashrc. A set of packages available for a specific user only. Or, IDK, if you could place that systemd unit into a lightweight diskless VM by cutting and pasting the line into a VM.

Now imagine having all of that at your disposal for years and then seeing a post that from a person who’s apparently naturally so great at designing distros that he goes on a short tangent in a post about configuration systems and independently arrives at a vision of hermetic, reproducible, declaratively-built, autogenerated from upstream package building recipes… just to stop right there and deny them external customizability?

That’s exactly how flummoxed I am at the idea that key-value settings are somehow enough to design an OS configuration system around. I mean, yes, yes you can overhaul the entire philosophy of how an OS gets configured, effectively designing a new distro around a brand-new configuration system, yet stop just an inch away of making it customizable for real. Not that I could stop you. Just… why.

crypto-policies maintainer here. Love that we seem to be in agreement that in a wonderful world of external manipulation, where the image is built to according to its configuration, crypto-policies should operate from the outside. That’d let it graduate from a runtime /etc-writing config generation hack it currently is and move to the configuration system where it belongs. Hook into the image composition process, consume users’ algorithm selections from the system-wide config and generate the backend configs that’d then end up in the resulting image.

Whether it is or is not hard to implement this better crypto-policies as a part of the external image-build-time configuration system depends entirely on what would that configuration system be. If said system can just invoke the existing Python implementation, writing the output to a customizeable location sounds simple indeed, if not trivial. On the other end of the spectrum, say, porting crypto-policies to Nix to turn it into a NixOS module would be a monumental, yes, but, ultimately, a feasible task. I know it can be done. But porting crypto-policies to an image-build-time configuration system that doesn’t exist yet is impossible, so that assessment will have to wait until there is one.

We need one.

Like, wouldn’t it be nice if, upon finding out your chrony is compiled without a feature you need, you could just quickly take a custom one compiled --with-missing-feature for a spin?

I don’t know, maybe? The problem with this approach is that it doesn’t scale. If it’s just one package, then, hey, I can take on maintainership and apply some override. But as soon as it’s three things that need to be patched, if I plan to use those things not just once, but keep them updated, then this stops being fun. And from what I’ve seen, this is very much a problem with Nix packages… Essentially, they make vendoring very easy, and this can easily lead to dozens of versions of dependencies with “this-was-patched-so-we-can’t-update” and “versioned-locked-but-we-forgot-the-exact-reason”.

Nix is a very cool technology, in the sense that it allows those very complex systems of dependencies and overrides to be built. For actual deployments, I find flat and simple much more appealing. But I’ll admit I haven’t spend enough time with NixOs to have an informed opinion. It’s been on my list of things to look at for a while :wink:

I don’t know, maybe? The problem with this approach is that it doesn’t scale. If it’s just one package, then, hey, I can take on maintainership and apply some override.

The problem with all kinds of convenient flexibility is that it’s gonna be abused, yes =D That doesn’t automatically mean one shouldn’t strive to offer it.

But as soon as it’s three things that need to be patched, if I plan to use those things not just once, but keep them updated, then this stops being fun.

Precisely the thing Nix/NixOS avoids. Your lockfile marches on, your packages update, your override is reapplied every rebuild and the fragility of it is entirely on you. You only fix it once it breaks, i.e., --with-missing-feature configure flag ceases to exist or the projects switches away from autotools, which might as well be never. Yet you always know exactly what you did to get your system working, as the hack is reproducibly defined right there in the config.

I find flat and simple much more appealing.

I mean, yeah, who wouldn’t? If only we could meaningfully declaratively configure real-life systems with ini-file flexibility alone. But that’s not enough, and we keep inventing very complex configuration systems, like Ansible or Nix configs because only they get real life things done.

But I’ll admit I haven’t spend enough time with NixOs to have an informed opinion. It’s been on my list of things to look at for a while.

Yeah, I guess my core point is, you’ve gotten so close to inventing it you might as well check it out and save yourself some time. I can’t promise that you will like it, but it’ll definitely give you a solid bonus viewpoint for reason about a ton of things. For example, you’re advocating for key-value configs, I find that inflexible. I don’t think you need to know a thing about Nix in order to follow along this or this configuration file and get a good idea of how far can I get on key-value configuration alone, when would that model start to push its limits and what kind of complexity I’d naturally ask you to reinvent next were you my configuration system designer =)

Just to level set, I know you know this, but it is very much worth repeating that from bootc’s PoV, it accepts OCI images and is agnostic to how they’re built. And there are multiple build systems that output OCI without running through Dockerfile.

Further on the bootc side related to configuration we will be investing in remote config via configmap and secrets · Issue #22 · bootc-dev/bootc · GitHub at some point.

What do you mean? Are you referring to that internal bug/RFE about running update-crypto-policies at both build and runtime? We can certainly securely distrust a root CA in a way that survives an update by default, the problem is mixing “modalities” which is fundamentally hard.

Is there a problem with Containerfiles being used “under the hood” with a declarative frontend file? else BlueBuild has a solution

SerpentOS’s moss and boulder are working on a similar thing but for packages, but they aren’t yet ready for images…

I agree with the issue…

Colin you owe me a beer, I told you people were going to reinvent helm. Here we are. :smiley:

2 Likes

KIWI predates Helm by nine years. :wink:

1 Like