We need to come up with a consistent approach for generating and publishing containers: both 'traditional' and atomic desktop containers, both stable and unstable releases

  • Describe the issue
    We have grown a bit of a mess around how we build and publish container images. We need to straighten it out.

Here’s the 10,000 foot view, as I understand it:

  • For unstable releases, we build both ‘traditional’ (generic, generic minimal, toolbox) and atomic desktop OCI containers in the nightly compose. We also build atomic desktop ostrees. When the compose completes, we run sync-latest-container-base-image.sh - which publishes the ‘traditional’ containers to registries - and sync-ostree-base-containers.sh - which converts the silverblue, kinoite and sericea ostrees to containers and publishes those to registries. We don’t actually publish the native atomic desktop OCI containers anywhere.

  • For stable releases, we have a Fedora-Container compose that builds ‘traditional’ containers and should publish them (only because of the thing @kevin is fixing in PR#1267: f39: fix container-nightly.sh script to sync the right thing - pungi-fedora - Pagure.io , it doesn’t). That compose does not build atomic desktop containers. Instead, Bodhi creates atomic desktop ostrees daily, which is how people get updates. But it does not produce native OCI containers, or run sync-ostree-base-containers.sh to convert the ostrees it creates into containers and publish those.

There are several problems here:

  1. We shouldn’t have two janky bash scripts for publishing containers to registries. We should have one tool in a sensible language (Python!) which can be properly tested. Also, it should use compose metadata to find the images (not weirdly hardcoded Koji searches, like the current sync-latest-container-base-image.sh does) - although this is a bit complicated if we’re building things in Bodhi, which doesn’t produce productmd metadata (AFAIK).
  2. Stable release ostree builds being off in Bodhi while everything else is in composes is a bit awkward, especially since we are trying to move away from ostrees towards native OCI containers for atomic desktops. Do we want to move more container builds into Bodhi, or move the stable release nightly ones out of Bodhi? Do we need to teach Bodhi to build OCI containers? Publish to registries?
  3. It would be good to have the ability to gate registry pushes. We can test all these images to some extent; it would be good to set things up such that we can gate publishing to the registry tags used to update user systems on test results.
7 Likes

CC @kevin @siosm @walters @ngompa @davdunc

We shouldn’t be doing any builds in Bodhi. These should be in the compose process rather than there. Gating registry pushes should probably be there though…

@kevin what was the thinking behind putting the atomic desktop nightly ostree builds in bodhi? just curious if there’s an advantage to it we hadn’t thought of.

Thinking about doing gating via Bodhi…mmm. I dunno. I mean, we always had the idea that greenwave was meant to be a neutral service consumed by Things That Want To Do Gating, not just a feeder for Bodhi. In a way, if I’m just thinking about “how to sync container images to registries”, Bodhi doesn’t feel like an obvious part of the process. My natural thought I guess would be just to write a message consumer that can sync container images from composes, and set it up so it can just fire when the compose is complete, or fire in response to CI messages, and have it do the thing where it checks the gating status after every test then syncs if it’s ‘passed’.

Doing it via Bodhi I guess gives us some of that logic already ‘baked in’, though we’d have to translate and extend a few things, I think. But I think we’d be back at having to find the Koji builds for each container build (since Bodhi wants you to submit Koji builds as the update components), which is not something that’s in any metadata, so we’re back to the kind of dumb logic in the current shell script (only worse because we have to somehow make sure we find the task that matches the compose we’re trying to publish, I guess). meh.

  • Describe the issue
    We have grown a bit of a mess around how we build and publish container images. We need to straighten it out.

Here’s the 10,000 foot view, as I understand it:

  • For unstable releases, we build both ‘traditional’ (generic, generic minimal, toolbox) and atomic desktop OCI containers in the nightly compose. We also build atomic desktop ostrees. When the compose completes, we run sync-latest-container-base-image.sh - which publishes the ‘traditional’ containers to registries - and sync-ostree-base-containers.sh - which converts the silverblue, kinoite and sericea ostrees to containers and publishes those to registries. We don’t actually publish the native atomic desktop OCI containers anywhere.

Yeah, that seems correct.

  • For stable releases, we have a Fedora-Container compose that builds ‘traditional’ containers and should publish them (only because of the thing @kevin is fixing in PR#1267: f39: fix container-nightly.sh script to sync the right thing - pungi-fedora - Pagure.io , it doesn’t). That compose does not build atomic desktop containers. Instead, Bodhi creates atomic desktop ostrees daily, which is how people get updates. But it does not produce native OCI containers, or run sync-ostree-base-containers.sh to convert the ostrees it creates into containers and publish those.

Correct.

There are several problems here:

  1. We shouldn’t have two janky bash scripts for publishing containers to registries. We should have one tool in a sensible language (Python!) which can be properly tested. Also, it should use compose metadata to find the images (not weirdly hardcoded Koji searches, like the current sync-latest-container-base-image.sh does) - although this is a bit complicated if we’re building things in Bodhi, which doesn’t produce productmd metadata (AFAIK).

Agreed.

  1. Stable release ostree builds being off in Bodhi while everything else is in composes is a bit awkward, especially since we are trying to move away from ostrees towards native OCI containers for atomic desktops. Do we want to move more container builds into Bodhi, or move the stable release nightly ones out of Bodhi? Do we need to teach Bodhi to build OCI containers? Publish to registries?

Well, the reason it’s there is that bodhi is already calling pungi to
compose the rpm updates/updates-testing repos. Those are the very things
we need to make to then update the ostrees. So, moving this out of bodhi
seems like it adds complexity… it means we have to have something wait
for updates/updates-testing composes in bodhi to fully sync out, then
after whatever delay, call pungi and generate the thing that bodhi could
have in the same flow.

Additionally, if bodhi is doing both rpms and ostrees if there’s a
problem in one or the other, the compose fails and we can fix it and
resume or whatever. If they are disconnected processes, you might get
say rpms updating and ostree not, or vice versa.

So you would need something to coordinate… it just seems cleaner to
just do it in bodhi composes.

  1. It would be good to have the ability to gate registry pushes. We can test all these images to some extent; it would be good to set things up such that we can gate publishing to the registry tags used to update user systems on test results.

Agreed. Bodhi does have a ‘container’ flow from back when we were making
a number of containers. That flow is basically to push from koji →
candidate-registry for updates-testing, then copy from
candidate-registry to registry for stable updates.

So, perhaps we could just extend bodhi to detect when containers are
build, and for unstable updates auto create a update, which then can get
testing, etc, etc, For stable releases we would have to either get it
also to do that or have some kind of process to manually submit them.

Additional items around this:

  1. We want to ‘move’ to quay.io. My plan was to make sure everything was
    at quay.io and then set registry.fedoraproject.org to just be a
    redirect. This allows us to control things in case we ever need to
    repoint it or move it back. I guess we also need a
    quay.io/fedora-candidate to replace out candidate registry.
    But when we do this we should really add some error checking in…if we
    can’t push to quay.io, it should error out.

  2. Whats our desired end state here format wise? I guess we want to move
    to oci containers everywhere?

1 Like

@kevin what was the thinking behind putting the atomic desktop nightly ostree builds in bodhi? just curious if there’s an advantage to it we hadn’t thought of.

Well, its a compose no?
We need to compose new ostree commits from updates/updates-testing rpms
which we are right there calling pungi to create. Also we need to
coordinate them so if one fails the other does too.

So, I wouldn’t think of it as bodhi building anything, it’s just
composing.

Thinking about doing gating via Bodhi…mmm. I dunno. I mean, we always had the idea that greenwave was meant to be a neutral service consumed by Things That Want To Do Gating, not just a feeder for Bodhi. In a way, if I’m just thinking about “how to sync container images to registries”, Bodhi doesn’t feel like an obvious part of the process. My natural thought I guess would be just to write a message consumer that can sync container images from composes, and set it up so it can just fire when the compose is complete, or fire in response to CI messages, and have it do the thing where it checks the gating status after every test then syncs if it’s ‘passed’.

Yeah, we could but without bodhi there’s not really a lot of visibility
there. Also with bodhi we could even get users +1/-1ing…

Doing it via Bodhi I guess gives us some of that logic already ‘baked in’, though we’d have to translate and extend a few things, I think. But I think we’d be back at having to find the Koji builds for each container build (since Bodhi wants you to submit Koji builds as the update components), which is not something that’s in any metadata, so we’re back to the kind of dumb logic in the current shell script (only worse because we have to somehow make sure we find the task that matches the compose we’re trying to publish, I guess). meh.

yeah, there must be a better way. :wink:

I mean, bodhi can handle containers now, perhaps we should look at whats
there already…?

This is likely out of scope (so feel free to ignore it), but it would be nice if there were a couple of additional tags for some of the existing containers such as “latest-1”? and “branched”?, that would (today) represent F38 and F40 for CI uses without having to explicitly name the versions. Right now I have to go in and manually update my workflows with the new numbers every six months (or so) for testing on supported (or soon to be beta/production) fedoras. Thanks for any consideration.

1 Like

Those are the very things
we need to make to then update the ostrees. So, moving this out of bodhi
seems like it adds complexity… it means we have to have something wait
for updates/updates-testing composes in bodhi to fully sync out, then
after whatever delay, call pungi and generate the thing that bodhi could
have in the same flow.

Okay…but then, why don’t we do the same for the other stable composes that run nightly? We have nightly Cloud and Container composes that are just run out of scripts like the branched and rawhide nightlies. I just can’t quite figure out the overall organizing principle here :smiley: If Bodhi can run composes, should we have Bodhi run…all the composes?

In the past there was a ‘container sig’ that was going to produce
containers of a bunch of applications. Due to not so many people wanting
to do that and our container build system being a horror, all those
people wandered off, although there’s a faction that produces container
images on quay.io. (They build them there even).

So, when that was supposed to be a thing, you couldn’t just build
containers at updates/updates-testing time normally, you wanted to let
maintainers build them whenever and make updates in bodhi, etc.

We built a daily update to the base container, but didn’t push it
anywhere without a human saying ‘ok, this one is good, I tested it, lets
push it out’. (which is a lot of waste to make them everyday and only
use one every month or so, but anyhow…)

At least thats what I can recall. :slight_smile:

I think it might make sense to just have bodhi compose them all.
How would that work though for updates/updates-testing and gating?
And what would happen if say the toolbox container with updates composed
fine, but failed some gating tests? and right now we just have the one
‘stream’, no updates-testing versions anyhow. If we added those, thats
more artifacts. :wink:

Also, if we do the tests in the bodhi updates compose path, that makes
them a some longer probibly?

Also, another wrinkle… the fedora base/minimal/toolbox containers
probibly don’t change a lot of the time… but yet we recompose them.
I’m not sure on the atomic desktops side if we recompose them and the
rpms didn’t change, does the ostree commits we push change?

Ideally all of these would just happen if something changes, but thats a
pretty hard problem.

We want to only have OCI containers in the end, but it’s likely going to take a few releases until we get there. So this is how I think we should proceed:

  • Remove the current ostree to container conversion script and replace it by another (unified) script that pushes the new ostree native container images to quay.
  • Add Atomic composes to Bodhi as well and sync those to quay.
  • Then we need to start the transition from ostree remotes to containers on end-user systems.
  • Let at least 2 releases go by, then remove the ostree composes.

Yes, I agree this would be nice. I’ll look at it after we’ve setup the initial sync.

1 Like

Thanks for the future consideration (it was all I was hoping for).

btw, thinking about it a bit more, rather than “branched”, perhaps the name should be “next”, (which would point to rawhide until branched, and than the branched variant until final release, when it is changed to point back to rawhide when latest gets updated; I don’t know how hard that would be to accomplish)? However, I don’t really care about the name(s) (although I am sure someone does), I would just like the name to be mostly stable and usable across my workflows.

Thanks.

We want to only have OCI containers in the end, but it’s likely going to take a few releases until we get there. So this is how I think we should proceed:

ok

  • Remove the current ostree to container conversion script and replace it by another (unified) script that pushes the new ostree native container images to quay.
  • Add Atomic composes to Bodhi as well and sync those to quay.

Sounds ok, so in that plan though we don’t do any ci/gating/testing on
them? just assume that the packages should be ok since they are going
out…

I suppose we could sync all of them to a candidate area and then have
testing/ci on them and it promotes them to release? But then we get into
the updates aren’t sync with ostrees, so thats not a great thing.

  • Then we need to start the transition from ostree remotes to containers on end-user systems.
  • Let at least 2 releases go by, then remove the ostree composes.

Sure, to quote my fav superhero: Don’t be hasty. :slight_smile:

I think we should do CI gating after the initial setup is in place.

We can start by pushing all composes under their own tag (i.e. 40.20240326.0) and “auto” update the latest/fXY tags automatically.

Then we can add CI gating to the latest/fXY tag updates to only push them if the compose pass CI/openQA.

Ideally, the repo on Quay would look more like fedora-ostree-desktops/silverblue (where we can find all composes by date) rather than what fedora/fedora-silverblue is right now (where you only have the latest tag, which makes it hard to diagnose issues with builds or regressions).

Yeah. I like that idea… definitely more clear.

4 Likes

I’m only realizing this now, but doing the same with “application” container images would also be nice. The images are much smaller so the storage cost would be low. Ideally we would rebuild those more regularly than we build them now.

I’m not sure how / when we rebuild those container images now.

Actions speak louder than words but I am going to try to do what I can to push for Red Hat to apply more resources to container-based infrastructure in Fedora. We will see what happens from that.

A notable sub-thread in this is that Fedora CoreOS maintains a custom Jenkins pipeline to build containers too.