F40 Change Proposal: Build Fedora Cloud Edition Images Using Kiwi in Koji (System-Wide)

Build Fedora Cloud Edition Images Using Kiwi in Koji

Wiki
Announced

This is a proposed Change for Fedora Linux.
This document represents a proposed Change. As part of the Changes process, proposals are publicly announced in order to receive community feedback. This proposal will only be implemented if approved by the Fedora Engineering Steering Committee.

:link: Summary

Fedora Cloud Edition images will be built with Kiwi, which will replace the unmaintained ImageFactory tooling that is currently being used to build the cloud base images.

We can already build Fedora Cloud Edition images outside of Koji using composite Kiwi definitions. However the integration with Koji must be enabled to fulfuill our goal of building official images within the Fedora infrastructure and fully replace the current usage of ImageFactory.

This transition is consistent with the direction of the Cloud Product Requirements Description (PRD). Kiwi provides the Cloud Working Group with a tool that preserve previous choices to build images using composable configurations and to provide a reproducible process for building images related to the cloud edition, including Fedora Cloud Base images for Vagrant, Azure, AWS, GCP, and generic images. This also opens up the ability to run container builds and WSL2 builds using the the composable image definitions to maintain a base image and then update the specifics needed for each specialized image using a smaller configuration file.

:link: Owner

:link: Detailed Description

While working on the production of cloud images for Fedora Linux 38 and Fedora Linux 39, the cloud-sig team did significant work to support transition from the current ImageFactory-based build tools that are outdated (but still functioning) to a tool that is supported by a broader community. The cloud team has successfully built and tested the creation of images with the kiwi application. Successful builds and tests of image builds supporting all of the previous change proposals and configuration changes to the Fedora Cloud base images has been validated and can be reproduced using the kiwi descriptions. The cloud edition WG finds that kiwi provides the most consistent experience with the least number of concerns over our current deliverables today. The cloud working group continues to focus on building support for specific requirements around specialized images that are planned parts of the cloud edition PRD included in section 2.3.

:link: Feedback

We have evaluated a number of existing image build tools as part of this Change. Ultimately, the Fedora Cloud WG chose to adopt kiwi because it retains ideal qualities of our current tooling in a way that benefits the cloud-sig and the community at large. We have cultivated a strong relationship with the upstream project, who has been receptive to our needs and made improvements based on our requirements. Kiwi is not a disruption, but an opportunity to decrease the complexity necessary to produce current and additional use cases immediately and to ensure that builds are execute securely.

We are aware of Fedora Workstation’s trial of osbuild (the upstream project for Red Hat Image Builder) for their live image, and have closely evaluated it as an option for Fedora Cloud as well. Discussions with members of the image builder team have been promising, but their mission doesn’t directly align with the Cloud Working Group’s goals immediately. Without that alignment, we are not prioritizing the same goals today. This is not a shortcoming of the cloud working group or the osbuild tools, it is a difference in timing of feature delivery. Fedora Workstation and Fedora Cloud are two different groups. We use different tools for building images today so their changes are typically independent of those we make. Currently, Fedora Workstation uses Lorax and Fedora Cloud uses ImageFactory and Oz. The cloud working group is working aggressively to eliminate our usage of ImageFactory because it is legacy code and not easily extended.

We also evaluated mkosi and decided not to pursue it due to the lack of flexibility to support all the image types we are aiming to offer. Its highly opinionated view of how images should be structured and limited framework for customization make it difficult to recommend as a framework for our builds. Additionally, it cannot support all of Fedora’s architectures due to requiring GPT, nor can it fully support Fedora Cloud’s preferred disk setup due to the aforementioned opinions of how images should be structured. Finally, when testing the generated images, the results did not line up with how we expected images to be laid out and it caused difficulties when dealing with certain classes of package upgrades (such as bootloader or kernel packages). There is also no Koji plugin at this time for running mkosi builds.

:link: Benefit to Fedora

Most importantly, the kiwi builders eliminate a series of legacy build tools for Fedora Cloud Base images

Visible to advanced users:

  • Allows Fedora Images to be built on many different platforms and distributions without modification to the runners
  • Extends the composition strategies available to users
  • Leaves the base image configuration that can be managed to ensure that it meets standard requirements for local virt installations
  • Includes the ability to leverage user-defined scripting in the image definition.
  • Adds a koji builder and image definitions that are simple to update and modify
  • Provides increased time for prioritization of features in the Fedora Images according to user feedback and user requirements
  • Supports multiple build types, from ISO to raw disk images, and all the way to WSL2 and containers.

This also aligns with the Fedora Asahi Remix and its usage of kiwi to build its images, as this lays the groundwork for those images to eventually be built in Fedora infrastructure as support for Apple Silicon Macs gets upstreamed.

:link: Scope

Submit image build requirements as a kiwi descriptions

  • Release engineering: #11854

  • Policies and guidelines: Fedora Cloud Edition documentation should be updated to reflect the usage of the new tooling and how to use and contribute to it.

  • Trademark approval: N/A (not needed for this Change)

  • Alignment with Community Initiatives:

All software and requests are consistent with the decision process and similar exceptions across other groups in Fedora.

:link: Upgrade/compatibility impact

The previous methodologies for using Fedora Quickstarts for Fedora Cloud Edition will be retired. The kiwi descriptions will support builds. We will use Toddler and Ansible to deliver images to the various public cloud targets (GCP, AWS, Azure, OCI, etc.)

:link: How To Test

Test by working with the various images

  1. Import the image into a test account for the associated cloud provider(s)
  2. start an instance from that image
  3. login to the instance successfully.

:link: User Experience

this provides a simplified method for creating composable image definitions and overlays. Users will find that there are additional images supporting targeted workloads and build methods. They will find that those images are more readily available.

:link: Dependencies

This Change depends on work in pungi to enable the use of the KiwiBuild Koji task as part of composes. It also depends on release engineering to enable the kiwi plugin in Koji.

:link: Contingency Plan

  • Contingency mechanism: Revert back to ImageFactory and continue to support builds using the kickstart (.ks) files for image builds.
  • Contingency deadline: Beta freeze
  • Blocks release? Yes

:link: Documentation

Documentation for kiwi is available from the upstream site. Once the Koji plugin is enabled, we will create accompanying documentation for SIG members on using the functionality.

:link: Release Notes

Fedora Cloud Images are now built with the kiwi image build tool, using definitions from the fedora-kiwi-descriptions repository.

This has enabled Fedora Cloud to introduce 64-bit ARM cloud images for Azure and Google Cloud, as well as 64-bit ARM Vagrant images.

This change proposal has now been submitted to FESCo with ticket #3137 for voting.

To find out more, please visit our Changes Policy documentation.

I have no particular opinion on kiwi in specific, but I am generally concerned with proliferation of image-creation tools. I know ImageFactory is fraught with problems, but it has one big inherent advantage: by using anaconda directly, it directly inherits all the slightly-weird stuff anaconda does — which is, unfortunately, more than just “install rpms to a chroot”.

This has lead to real problems in the past, like CVE-2013-2069. These bugs often arrive by this process:

  1. Appliance creation tool author realizes anaconda has weird quirks, implements all known quirks.
  2. Time passes, anaconda grows new quirks.
  3. New tool does not. More time passes.
  4. Ooops.

I’m also concerned with increased fragmentation of tooling making it harder for people to collaborate across groups – and for more to go wrong at last-minute compose time.

None of these need block the change — and in general I’m in favor of flexibility for Working Groups, but… they do make me worry!

Are we sure we can’t get what we need from osbuild / imagebuilder?

If we can’t now, does this leave the possibility of switching to osbuild or similar in the future if that situation improves?

1 Like

I want to address the last question first, because when I started this change proposal, it was imperative that it be structured to ensure that osbuild support is not excluded as a method in the future. The osbuild limitations related to cloud edition are addressed in the change proposal.

We don’t plan to move away from osbuild, we are moving towards it, but we need several updates to ensure that we have support and we just don’t think that we are prioritizing the same features in the two projects right now. Eventually, we will converge.

I agree that we want to ultimately move away from the anaconda requirements, but today we would like to include additional builds for WSL2 images, containers, and ISO to raw disk images. We also want to be able to create extended configurations for custom images as outlined in the PRD.

We will definitely continue to work with the osbuild team to ensure that the Fedora Images they can produce and the ones that are being produced for the Cloud Edition are the same across multiple tools.

1 Like

More from curiosity since I use Cockpit to make images locally, will this affect Cockpit? Isn’t ImageBuilder a part of it? Or is this simply for the infra build system?

this effects only the infra images. The cockpit images don’t support features like btrfs or extended builds. That’s what we are working to ensure we have full support for. Cockpit and all of its awesomeness is not in any way affected.

The solution here is to extract the weird quirks from Anaconda and either eliminate them or move them somewhere else. That problem affects everything we do. Note that only Fedora live media and cloud images are created using Anaconda right now, everything else is not. And both of those are in the process of moving away from something that uses Anaconda to do the install because it’s heavyweight, slow, and difficult to debug.

Directly using Anaconda has been more of a disadvantage than an advantage because it’s really hard to see what’s going on and influence the process in the way we need.

We already use seven tools for image creation in Fedora:

  • Pungi+something? (Server install DVD)
  • Lorax (netboot ISO and live ISOs for the spins)
  • CoreOS Assembler (Fedora CoreOS)
  • OSBuild (Fedora IoT, Fedora Workstation preview)
  • ImageFactory (Fedora Cloud)
  • flatpak-module-tools (Fedora Flatpak)
  • kiwi (Fedora Asahi)

Of these seven, only three have documentation for how to use outside of Fedora Infrastructure that works: Lorax, flatpak-module-tools, and kiwi. Furthermore, CoreOS Assembler is run on its own infrastructure where we have no project-level tracking of its artifacts and logs in Koji.

ImageFactory has no maintainer and has been dead for three years.

I will also point out that only Lorax and ImageFactory use Anaconda for image creation. Everything else doesn’t.

If OSBuild reaches a point where Fedora Cloud is satisfied with its capabilities and interaction model, sure.

But today, kiwi is also supported by a team of developers that have a long track record of developing and supporting image creation processes. The current implementation of kiwi (previously known as kiwi-ng) has been around for a decade and has been in Fedora for almost six years. Lorax is only slightly older than the current implementation of kiwi.

I completely understand why Cloud folks would want to move away from ImageFactory. :slight_smile:

The problem is that this adds another stack of things and completely different configuration language for a release blocking deliverable. So, we need to set it all up, get it working, and then be able to maintain it in working order all the time. ;(
So, while it may be easier on the cloud folks, it’s harder on QE/releng/infrastructure.

If this meant that we could drop ImageFactory, I would be all for it, but it doesn’t. We still use Imagefactory for container images (base/base-minimal/toolbox). So it’s a net add on. ;(

Are there specific technical gains here? Or just potential ones down the road?
By that I mean, over just not using IF and using a nicer tool, is there something that you plan to do that cannot be done in the current setup and push on osbuild folks to try and handle this case?

That said, if we do this:

I see the pungi ticket hasn’t had any comment yet… we would need pungi support.

I assume the repo with the xml (ugh) can be setup to obey freezes, have releng management, etc?

Perhaps we could also continue to produce the IF version and have a fallback (and also confirm it behaves the same).

Perhaps @adamwill could chime in from a QE perspective here?

Building the containers from the kiwi descriptions is totally possible and easy to include! This could be part of the same change proposal. I don’t mind including the modifications. It would just end up requiring a single kiwi definition file for the builds to include with the base.

Well, the base and base-minimal ones likely wouldn’t be too hard to do,
but toolbox is a lot more involved and also the toolbox folks just moved
to a kickstart, so thats another change right after the last one.

I’d want to hear from @rishi about that possibility…

I could port over those definitions to kiwi pretty easily. If there’s no “owner” for those definitions, the Cloud WG can own them and maintain them in these descriptions.

I’m open to supporting any image we currently produce with kiwi if people want to.

We have several technical gains here:

  • We maintain a semi-declarative configuration structure that can be easily overlaid and extended for derivative builds, including remixes.
  • Our image builds are now easily possible to run locally, which is important for enabling contributions.
  • Our new repo has CI for pull requests working, which is great to get testing and feedback.
  • We can start supporting new image types for new places to run Fedora that are on our roadmap pretty much immediately (such as WSL) and other existing types (like Vagrant) are much easier to support.

Yes. We can do whatever we want there. I’ve added the releng group and Adam to the repo and we can have that setup accordingly.

(Technically, kiwi supports XML, YAML, and JSON, and we’re looking to add TOML upstream soon; but, I don’t like writing YAML or JSON, and TOML isn’t here yet…)

1 Like

I could port over those definitions to kiwi pretty easily. If there’s no “owner” for those definitions, the Cloud WG can own them and maintain them in these descriptions.

well, toolbox folks just moved to kickstart… so I would check with
them if they were willing to move before sinking a lot of work into it.

We don’t really need a ‘owner’. We need people to care about them and
suggest improvements, etc. The container images do get used a LOT by ci
systems and as the base for other images. I’m not sure if it makes sense
to lump them into Cloud, or just leave them with releng merging sensable
improvements or reiving the Container Sig or what.

I’m open to supporting any image we currently produce with kiwi if people want to.

Thats great…

We have several technical gains here:

  • We maintain a semi-declarative configuration structure that can be easily overlaid and extended for derivative builds, including remixes.

Thats not a gain in the image itself… thats a gain for you who make
the images right? (and xml…easily isn’t a word I use with xml much.)

  • Our image builds are now easily possible to run locally, which is important for enabling contributions.

This could be indeed useful.

  • Our new repo has CI for pull requests working, which is great to get testing and feedback.

There is actually ci for the kickstarts repo too, but it’s pretty
primitive.

  • We can start supporting new image types for new places to run Fedora that are on our roadmap pretty much immediately (such as WSL) and other existing types (like Vagrant) are much easier to support.

Uh, I thought WSL is blocked by legal issues? Has that changed?

Yes. We can do whatever we want there. I’ve added the releng group and Adam to the repo and we can have that setup accordingly.

(Technically, kiwi supports XML, YAML, and JSON, and we’re looking to add TOML upstream soon; but, I don’t like writing YAML or JSON, and TOML isn’t here yet…)

Ah, good to know. toml would be lovely.

Sure. Even if we don’t use them officially, I think it’s useful to have something people can easily run on their own to make their own toolboxes (like I’d like one for my own use).

I was able to pretty quickly port the definitions over.

Caretaker, then. I think Fedora Cloud is as good as anyone else at this. And our mission of supporting both base and layered projects fits well with both container and VM workflows. One of the reasons we’re doing this is explicitly to enable those workflows for VM image builds.

Most of the other options we looked at don’t have this property. We want to enable consumers and producers with our tooling, so this was very important to us.

Insofar as the choice for XML, it is fairly easy to do automated document merges in derivative pipelines with tools like xmlmerge. And Koji very easily handles manipulating the XML so that it can point the image builds to compose repos and such. And kiwi includes a schema definition so they can be validated before running the build. Unfortunately, we no longer have any sophisticated XML validator tools in Fedora since the great Java implosion (kiwi will automatically use jing if it’s present).

There are actually a lot of advantages to XML, even if it’s a bit on the verbose side.

On the CentOS side, the CentOS Hyperscale SIG built a prototype OpenStack image in their CBS (SIG Koji instance) to prove it all worked.

We can do everything short of actually publishing it on the Microsoft Store. The kiwi image build tool supports producing the appx bundle required for publishing, but we can easily make it available for “sideload” installation instead.

I’d say this list is wrong in some details.

Pungi is underneath most things, for a start; there are layers, it’s not just one tool → one image. The Server DVD (DVDs in general) is just a netinst image with a package repo strapped on, really (I don’t remember if it’s Koji or Pungi that does the strapping-on of the repo).

flatpak-module-tools isn’t building an installable/deployable image, so I’d say it’s out of place in the list. It’s a different thing doing a different job.

Asahi is not part of Fedora, it’s a remix. So, as @kevin says, adding kiwi is adding a new thing for Fedora - that is, releng, QA and everyone else - to care about. Right now only Asahi has to care about it, and it’s not blocking Fedora releases or Fedora processes.

In general I agree with Kevin’s concerns. The proliferation of tools is a problem. Everyone has a reason to want a special tool for their thing, but for releng and QA it’s a struggle to stay on top of all these varying processes and tools and how to know what’s going on with each, and how to fix problems in them.

If you think trying to isolate some of anaconda’s knowledge about how to do stuff from anaconda is a way to go, great, but then it would be nice to see that actually happening before too many more new tools and processes to do similar things (deploy a Fedora system to a disk image) get introduced.

edit: to be clear, I hate imagefactory too. Of the current things it is easily the least nice to work with in terms of the “oh no, that image just broke, what do we do?” workflow. Everything else is better.

It’s just that I remember the time (ah, so long ago) when every image we cared about was built by lorax or liveimage-creator, and we only had to know how those worked and how to debug them. Having to know about lorax and livemedia-creator and imagefactory and kiwi and osbuild is…not better. (Thankfully the FCOS folks keep a pretty good eye on those images, so I really have no idea how COSA works).

I know that Pungi is underneath most of these things, since it runs the commands to invoke these tools. I don’t know what it does for the Server DVD, hence the question mark.

I have been working on this for several cycles now. I wrote livesys-scripts to pull our live media initialization process out of the kickstarts and I’ve been untangling what Anaconda actually does during installation to ensure the Fedora Asahi Remix images build correctly. @davdunc only proposed this Change once we had working cloud images. I have a TODO list of things to pull out in a permanent way and adjust Anaconda to account for that.

Our hope is that we can eliminate ImageFactory for this. A 1-for-1 swap that simplifies things for everyone and we have a tool that has community knowledge and an engaged upstream.

I wish I remembered that time too. Ever since Fedora Cloud spun up in 2012, we’ve had ImageFactory, and for the non-ImageFactory stuff, we had appliance-creator for the ARM images, and livecd-creator for the live images. Then Lorax with livemedia-creator replaced the live images along with the weird process we used for creating the netinstall process, then the process changed for the install DVD with pungi 4, and so on. And of course, from that point on we got more and more stuff with more and more tools.

Speaking purely for the fedora-toolbox OCI images for Toolbx, I certainly don’t want to stand in the way of progress. :slight_smile:

I haven’t yet played much with KIWI and haven’t looked closely at Neal’s port of the OCI images. So, here are some general thoughts.

The current Kickstart image definition for fedora-toolbox isn’t terribly more complex than that of the fedora image. I believe it got simpler once the former stopped being layered on top of the latter, because there’s no longer any need to carefully undo the minimization done for the latter.

It’s just a longer list of packages because it’s by definition a more fully-featured image. We had some last minute fixes for Fedora 39, because I didn’t carefully review the migration from Dockerfile to Kickstarts earlier in the cycle. That was my fault.

Toolbx does have an ever-growing set of upstream test cases that are constantly being run against the fedora-toolbox images. So they can help catch more and more regressions.

Our Dockerfile also used to have some tests built-into it, which would fail the OCI image build if something unexpectedly changed. We have managed to add them to the Kickstart, and I suspect that’s what makes it look unusually complex. However, I think it’s worth the trouble to catch problems closer to their source.

When we migrated the fedora-toolbox images from Container/Dockerfile to Pungi and Kickstart, my only complaint was the significant difficulty in building them locally. It’s a lot easier to do podman build ... than koji image-build ..., which has implications for Toolbx contributors trying to alter the images. From what Neal said, it sounds like it’s easier to do that with KIWI, so that’s wonderful.

We include a little helper tool in the repo to make it simpler to invoke the build to create the tarball, and steps for running it are documented in the README. You can also see how the CI builds the images in the TMT definitions in the repository.

It should be extremely straightforward to add extra tests for the toolbox image CI if you want them, too. :wink:

Speaking purely for the fedora-toolbox OCI images for Toolbx, I certainly don’t want to stand in the way of progress. :slight_smile:

I haven’t yet played much with KIWI and haven’t looked closely at Neal’s port of the OCI images. So, here are some general thoughts.

The current Kickstart image definition for fedora-toolbox isn’t terribly more complex than that of the fedora image. I believe it got simpler once the former stopped being layered on top of the latter, because there’s no longer any need to carefully undo the minimization done for the latter.

It’s just a longer list of packages because it’s by definition a more fully-featured image. We had some last minute fixes for Fedora 39, because I didn’t carefully review the migration from Dockerfile to Kickstarts earlier in the cycle. That was my fault.

Toolbx does have an ever-growing set of upstream test cases that are constantly being run against the fedora-toolbox images. So they can help catch more and more regressions.

Could we possibly make a image via kiwi and run the tests against it?

Our Dockerfile also used to have some tests built-into it, which would fail the OCI image build if something unexpectedly changed. We have managed to add them to the Kickstart, and I suspect that’s what makes it look unusually complex. However, I think it’s worth the trouble to catch problems closer to their source.

Yeah. Some of this might be addable in CI to before it gets merged.

When we migrated the fedora-toolbox images from Container/Dockerfile to Pungi and Kickstart, my only complaint was the significant difficulty in building them locally. It’s a lot easier to do podman build ... than koji image-build ..., which has implications for Toolbx contributors trying to alter the images. From what Neal said, it sounds like it’s easier to do that with KIWI, so that’s wonderful.

So, it sounds like you might not be opposed to moving to this?

Sure! We can have it done both ways: Toolbox can build the image in CI upstream to validate stuff there, and we can downstream also fetch the tests and run things too, provided the harness on their end is easy to execute in TMT/Zuul.

Currently the kiwi description does not generate a Docker/OCI formatted image directly because I don’t think it would work in the existing image build+publish process our compose tooling uses, but I could change that or add an alternative target that generates it so it can be loaded into Podman for testing right away.

We include a little helper tool in the repo to make it simpler to invoke the build to create the tarball, and steps for running it are documented in the README. You can also see how the CI builds the images in the TMT definitions in the repository.

It should be extremely straightforward to add extra tests for the toolbox image CI if you want them, too. :wink:

Sounds good to me!