Since it doesn’t get a lot of press these days compared to the kind of cluster deployment where a container doesn’t care which, or how many, nodes it’s running on, I thought I would describe how we want to use FCOS and containers and then ask a couple of questions that came up as I read through the issue tracker discussions.
We currently run about 7,000 geographically dispersed (mostly bare metal) nodes. Future plans call for significantly increasing both the number of nodes and the degree of dispersion. With some exceptions, each node is configured individually and can be thought of as a distinct single-node installation. Some of this is due to their geographical location and network topology, and some because of functional needs. Our interest in FCOS stems from characteristics like the immutable core; fast, automated updates; iPXE+ignition as a way to do initial configuration; the provision for fallback on boot failure (a big deal when nodes are 2,000km away); the option to seamlessly integrate node-local applications with computationally-intensive functions via something like a Kubernetes cluster set up in parallel; the kind of coordination and management protocols and tools that go into wide deployments; and finally just because it’s Fedora.
We want to use containers in order to have a degree of separation and independence between the node OS and the applications running on a node, to separate applications running on the same node, and to enable more efficient development of applications without (too much) concern for host system library compatibility, etc. All of the classic, pre-cluster motivations for containers.
So my first question is whether people think this even makes sense, either on its own or within the context of your plans and vision for Fedora CoreOS? If not, it would be good to know that sooner rather than later!
Then, just looking at, for example, the network managment issue, there seems to be some disagreement about support for the single node case, for multi-NIC hosts, or for the ability to change the configuration of the host after the ignition script has run (without requiring reboot or re-initialization). Although I’m sure there are many use cases where such things don’t matter much, lack of support for any of these would be fatal for us, so it would be good to know what people are thinking/planning.
I thought I would post this and see what people say. If FCOS isn’t the right way to go, that would be valuable knowledge, but if it is a good choice, then maybe knowing that there are use cases like this will be helpful going forward?
Thanks very much for any feedback!