This weekend, I decided to see if I could install a wordpress instance on my Fedora Silverblue laptop for some unrelated activities. Usually, I would just pop a VM somewhere using Ansible, but I decided to see how far it would go to just using the system as it is supposed to be used. I will skip the part of the story where I tried to just use podman (keeping that rant for Flock), and go directly to the part where I said to myself “Maybe openshift is a good match for that”.
TLDR:
There is a few roadblocks for that to be the perfect platform for openshift, and I think we should fix them.
So there is a few ways to install openshift. using “oc cluster up”, using minishift, and using openshift-ansible. Or using the rpm, or using the all in one binary directly from github.
I used openshift-ansible in the past, and I knew that trying to use that would just result into me finding something that break. I did had a cluster somewhere and my tendencies at using different versions of the components (such as the OS, ansible and the playbooks) mean that I always find some bugs. That’s a fine way to keep myself busy on weekend, but not my goal today.
The rpm, last time I looked where out of date, but seems to now be running 3.9.0, which is good. But I didn’t knew when I started to look around so I skipped that choice. No minishift official Fedora package, so I just decided to get the “oc cluster up” way, since that’s supposed to do everything in container.
So first cut, finding where to download the binary. The documentation do assume you know where to find ‘oc’, cf https://docs.openshift.org/latest/getting_started/administrators.html#running-in-a-docker-container but do not link to it. Of course, you have to read the rest of the documentation that say where to find the all in one binary, on github.
Then, I have the choice between 3.10rc, and 3.9.0. I decided to take 3.9.0, because I knew that getting a RC would mean I would stumble on some bugs. It always go like this, and that wasn’t my goal today. I download, I untar and run “oc cluster up”.
First error message:
is docker client running on this host.
Indeed, it is not. And that error message should count as the 2nd cut, because it is also showed when docker is running, but you can’t access the socket. I knew about it, so I ran it as root, but that isn’t listed on the requirements. Again that is supposed to be obvious if you know how it work, but that’s kinda the whole point of using “oc cluster up”, you should have something without friction. And the requirements linked in the documentation are for a production server, not for a simple development setup, as people can see on https://docs.openshift.org/latest/install/prerequisites.html#install-config-install-prerequisites
So I run “oc cluster up” as root, after having started the docker daemon.
Then appear cut number 3, with a nice error message that remind me that I also need to add “–insecure-registry 172.30.0.0/16” to the docker daemon command line. In fact, that should count as multiple cuts, because why should the registry be insecure in the first place ( for example, have it secured out of the box with a custom CA), and why is oc not smart enough to add it by itself too .
I do understand that 2 could run into all kind of edge case (port forwarding using socat, etc). But again, that mean that someone wanting to use oc cluster up has to find where to add that option (in /etc/sysconfig/docker ), then edit a file as root (hello vi usage), know enough of the syntax of bash for that and restart docker.
Again nothing that would block me, but not everybody is fluent in bash and Linux for that.
And then, the deployment block. I mean it just block, then timeout with:
-- Installing web console ...
FAIL Error: failed to start the web console server: timed out waiting for the condition
And that’s cut number 5. No mention of log, a rather unhelpful message. But the port 127.0.0.1:8443 was open, just showing nothing.
Again, I know what to do, and indeed, there is a wall of text in journald for docker (again, you need to know to look there), but I didn’t had the patience to look at it, and just put “cut 6” on the said wall of text, because the lines are too long for my screen ( partially because docker duplicated information in the text, and because kubernetes do the same), there is 4000 lines of logs (again, because docker do not use proper syslog level, so systemd can’t filter them ), and the fact that systemd show lots of stuff in red for some reason, there is some usability issues for sure.
Usually, I would just dig, but I got burned too often with that, and decided to try my luck with 3.10.0rc. It should be more recent, and have bug fixes, and worst case, it doesn’t work, and I can fill bug report. So I download, remove the old images, etc. And this time, it work fine.
So I should be happy, of course ? Nope, cause here come “cut number 7”.
The webconsole is showing a self signed certificate. And I think we have been trying to stop training people to accept invalid certificates. I know that’s likely a hard problem, but given that we are already generating certificates for the all of openshift, maybe we can just extend that to some shared CA system on the OS level and just sign for the current host. Or we can also just decide that 127.0.0.1 do not requires SSL, and skip the problem.
To conclude, I do not think the problems would be hard to fix in the sense that none are hard engineering issue. Using cleartext for the console could be done. Having some fix in the doc too. Getting docker/kubernetes work better with journald seems doable. I am not sure if I should just open bugs for all of this, cause I am sure 90% of them will be ignored.
In the current state, it seems no one is focusing on improving the experience of running Openshift on developer workstation, and that’s kinda detrimental to the adoption of the project, and I really hope that silverblue will one day bring the polish I can see in Gnome to the experience of the developers.