We’re currently in the process of migrating all of our nodes from CoreOS to Fedora CoreOS (FCOS), and one challenge that we’re facing is determining how to properly version control our updates.
Given that we have around ~20 CoreOS clusters spread across Dev, QA, and Prod, it sometimes will take us 2-4 weeks to complete an upgrade across all of the clusters, mainly due to a strict change management controls, limited maintenance window availability, and a staggered rollout approach.
With that said, our concern is that if we start an upgrade on a particular FCOS version, we want to ensure that the same version is deployed to all of our clusters. Given that Zincati polls the FCOS Cincinnati servers for the latest available release, our concern is that it is possible that we might encounter new FCOS versions being introduced mid-way through our internal upgrade cycle.
For CoreOS (Container Linux), we were able to work around this concern by implementing something ismiliar to the following gist where we’d download the desired CoreOS version’s update.gz file locally, create the necessary Omaha XML, and then point each CoreOS node’s update-engine service at these files:
Does anyone know if a similiar approach could be used for FCOS version controlling? If nothing exists, then I suppose the next best option is to look into deploying our own local FCOS Cincinnati server by reverse engineering something like:
Please let me know if you need any additional information.
This is a complex topic, and also one that has come up for OpenShift (which also uses Cincinnati) and users want to replicate the same upgrade graph internally. I think the current plan there is to ship the graph data as part of the release image.
For now, we were planning to try and keep the upgrade architecture as simple as possible (avoiding airlock), by only enabling the Zincati automatic update (and immediate reboot) feature per node as maintenance windows become available for the cluster and then serially process through all the remaining intra-cluster nodes in this fashion until all nodes are updated to the desired FCOS version.
We were trying to avoid using rpm-ostree directly, as I thought the community had recommended against that, but that could be another potential option for us! Thanks!