Openshift Origin on Silverblue, story of one thousand cuts

rant

#1

This weekend, I decided to see if I could install a wordpress instance on my Fedora Silverblue laptop for some unrelated activities. Usually, I would just pop a VM somewhere using Ansible, but I decided to see how far it would go to just using the system as it is supposed to be used. I will skip the part of the story where I tried to just use podman (keeping that rant for Flock), and go directly to the part where I said to myself “Maybe openshift is a good match for that”.

TLDR:
There is a few roadblocks for that to be the perfect platform for openshift, and I think we should fix them.

So there is a few ways to install openshift. using “oc cluster up”, using minishift, and using openshift-ansible. Or using the rpm, or using the all in one binary directly from github.

I used openshift-ansible in the past, and I knew that trying to use that would just result into me finding something that break. I did had a cluster somewhere and my tendencies at using different versions of the components (such as the OS, ansible and the playbooks) mean that I always find some bugs. That’s a fine way to keep myself busy on weekend, but not my goal today.

The rpm, last time I looked where out of date, but seems to now be running 3.9.0, which is good. But I didn’t knew when I started to look around so I skipped that choice. No minishift official Fedora package, so I just decided to get the “oc cluster up” way, since that’s supposed to do everything in container.

So first cut, finding where to download the binary. The documentation do assume you know where to find ‘oc’, cf https://docs.openshift.org/latest/getting_started/administrators.html#running-in-a-docker-container but do not link to it. Of course, you have to read the rest of the documentation that say where to find the all in one binary, on github.

Then, I have the choice between 3.10rc, and 3.9.0. I decided to take 3.9.0, because I knew that getting a RC would mean I would stumble on some bugs. It always go like this, and that wasn’t my goal today. I download, I untar and run “oc cluster up”.

First error message:

is docker client running on this host.

Indeed, it is not. And that error message should count as the 2nd cut, because it is also showed when docker is running, but you can’t access the socket. I knew about it, so I ran it as root, but that isn’t listed on the requirements. Again that is supposed to be obvious if you know how it work, but that’s kinda the whole point of using “oc cluster up”, you should have something without friction. And the requirements linked in the documentation are for a production server, not for a simple development setup, as people can see on https://docs.openshift.org/latest/install/prerequisites.html#install-config-install-prerequisites

So I run “oc cluster up” as root, after having started the docker daemon.

Then appear cut number 3, with a nice error message that remind me that I also need to add “–insecure-registry 172.30.0.0/16” to the docker daemon command line. In fact, that should count as multiple cuts, because why should the registry be insecure in the first place ( for example, have it secured out of the box with a custom CA), and why is oc not smart enough to add it by itself too .

I do understand that 2 could run into all kind of edge case (port forwarding using socat, etc). But again, that mean that someone wanting to use oc cluster up has to find where to add that option (in /etc/sysconfig/docker ), then edit a file as root (hello vi usage), know enough of the syntax of bash for that and restart docker.

Again nothing that would block me, but not everybody is fluent in bash and Linux for that.

And then, the deployment block. I mean it just block, then timeout with:

-- Installing web console ... 
FAIL Error: failed to start the web console server: timed out waiting for the condition

And that’s cut number 5. No mention of log, a rather unhelpful message. But the port 127.0.0.1:8443 was open, just showing nothing.

Again, I know what to do, and indeed, there is a wall of text in journald for docker (again, you need to know to look there), but I didn’t had the patience to look at it, and just put “cut 6” on the said wall of text, because the lines are too long for my screen ( partially because docker duplicated information in the text, and because kubernetes do the same), there is 4000 lines of logs (again, because docker do not use proper syslog level, so systemd can’t filter them ), and the fact that systemd show lots of stuff in red for some reason, there is some usability issues for sure.

Usually, I would just dig, but I got burned too often with that, and decided to try my luck with 3.10.0rc. It should be more recent, and have bug fixes, and worst case, it doesn’t work, and I can fill bug report. So I download, remove the old images, etc. And this time, it work fine.

So I should be happy, of course ? Nope, cause here come “cut number 7”.

The webconsole is showing a self signed certificate. And I think we have been trying to stop training people to accept invalid certificates. I know that’s likely a hard problem, but given that we are already generating certificates for the all of openshift, maybe we can just extend that to some shared CA system on the OS level and just sign for the current host. Or we can also just decide that 127.0.0.1 do not requires SSL, and skip the problem.

To conclude, I do not think the problems would be hard to fix in the sense that none are hard engineering issue. Using cleartext for the console could be done. Having some fix in the doc too. Getting docker/kubernetes work better with journald seems doable. I am not sure if I should just open bugs for all of this, cause I am sure 90% of them will be ignored.

In the current state, it seems no one is focusing on improving the experience of running Openshift on developer workstation, and that’s kinda detrimental to the adoption of the project, and I really hope that silverblue will one day bring the polish I can see in Gnome to the experience of the developers.


#2

So, since I managed to fill my main partition with my newly install openshift cluster, (a rant for another time), I had to reinstall my laptop (cause it did seemed faster and safer). So now, I had to do the process from 0 again.

And it turn out that:

  • I found out where is the real doc is https://github.com/openshift/origin/blob/master/docs/cluster_up_down.md .
  • oc cluster up now display: “error: did not detect an --insecure-registry argument on the Docker daemon”. Which is slightly less useful that the previous message. And since that was less useful, I had to search for the value of the option, thus leading me to discover the previous doc.

The rest of my rant is still valid, and on top of that, I am a bit surprised to see there is no default stream for Fedora containers.

I also looked a bit more on minishift, and now I remember why I wasn’t happy with it. Having copr is fine, but I usually draw the limit at “downloading random binary to execute as root” ( aka: https://docs.openshift.org/latest/minishift/getting-started/setting-up-virtualization-environment.html#setting-up-kvm-driver )


#3

Yeah…I’m not sure why the Docker machine driver isn’t packaged; though one thing I suspect would work here is to leak the libvirt socket into a dev container, and install the driver there.

I personally use vagrant inside my primary dev container, just bind in /run/libvirt.

Agree about the papercuts for oc cluster up…it seems likely at some point this will not require Docker, and at that point a lot gets simpler. But there are also a whole lot of maintenance problems upstream with oc cluster up - it’s just an entirely different installation and management codepath for a very, very complex project whose primary goal is production clusters. It seems possible that things will circle back to making the minishift path better.

Also…whether we should include oc in Silverblue is a discussion here.