I am trying to set CoreOS up with K3s & Cilium but can’t get it to work unless I disable SELinux. So far I haven’t noticed anything that explains why this is, and the Cilium docs say it should work with SELinux so long as I set envoy.enabled=false
during install. Do any of you know how I can debug this or, maybe, know something I could try? In case it helps, I did file an issue with Cilium at the suggestion of a community member at Cilium Agent fails on Fedora Core 40 if SELinux is enabled · Issue #34068 · cilium/cilium · GitHub
So, interestingly, it seems adding --set securityContext.privileged=true
makes it work, but I don’t yet know why this is. Here is how I am installing Cilium:
cilium install \
--version 1.16.0 \
--set ipam.operator.clusterPoolIPv4PodCIDRList="10.42.0.0/16" \
--set kubeProxyReplacement=true \
--set operator.replicas=1 \
--set envoy.enabled=false \
--set k8sServicePort=6443 \
--set k8sServiceHost=127.0.0.1 \
--set securityContext.privileged=true
In securityContext, there are also SELinux options:
The comments suggest that privileged=true isn’t necessary if the spc_t context is set:
securityContext:
# -- User to run the pod with
# runAsUser: 0
# -- Run the pod with elevated privileges
privileged: false
# -- SELinux options for the `cilium-agent` and init containers
seLinuxOptions:
level: 's0'
# Running with spc_t since we have removed the privileged mode.
# Users can change it to a different type as long as they have the
# type available on the system.
type: 'spc_t'
It looks like the spc_t might need set. Also, did you install k3s with --selinux
?
It looks like spc_t comes with container-selinux that should also get pulled as a dependency with k3s-selinux when you install it with --selinux
.
Here are the selinux packages installed right now:
gene@fcos-vm1:~$ rpm -qa |grep -i selinux |sort
container-selinux-2.232.1-1.fc40.noarch
k3s-selinux-1.5-1.coreos.noarch
libselinux-3.6-4.fc40.x86_64
libselinux-utils-3.6-4.fc40.x86_64
passt-selinux-0^20240624.g1ee2eca-1.fc40.noarch
rpm-plugin-selinux-4.19.1.1-1.fc40.x86_64
selinux-policy-40.20-1.fc40.noarch
selinux-policy-targeted-40.20-1.fc40.noarch
Here are what the security contexts look like right now also:
gene@fcos-vm1:~$ ls -lhdZ /opt/cni
drwxr-xr-x. 3 root root system_u:object_r:var_t:s0 17 Jul 31 19:56 /opt/cni
gene@fcos-vm1:~$ ls -lhdZ /opt/cni/bin
drwxr-xr-x. 2 root root system_u:object_r:var_t:s0 40 Jul 31 19:57 /opt/cni/bin
gene@fcos-vm1:~$ ls -lhZ /opt/cni/bin/
total 59M
-rwxr-xr-x. 1 root root system_u:object_r:var_t:s0 57M Jul 31 19:57 cilium-cni
-rwxr-xr-x. 1 root root system_u:object_r:var_t:s0 2.4M Jul 31 19:57 loopback
After installing Cilium with privileged I went to install the Linkerd CNI and it failed as well. I’m starting to think the context on /opt/cni/bin
needs adjusting, or should have been adjusted by something before I started.
Edit: yes, I installed K3s with --selinux
More specifically, In installed K3s with this in /etc/rancher/k3s/config.yaml
:
cluster-init: true
disable:
- servicelb
- traefik
disable-kube-proxy: true
disable-network-policy: true
flannel-backend: none
selinux: true
write-kubeconfig-mode: "0644"
Here is the error from the Linkerd CNI:
kubectl -n linkerd-cni logs linkerd-cni-jpcwr
[2024-07-31 20:30:42] /host/opt/cni/bin is non-writeable, failure
Like with Cilium, setting the cni installer to privileged worked.
linkerd install-cni --set privileged=true | kubectl apply -f -
Hi @genebean
I went back and forth since it was my first time installing coreos. I can tell you about my attempt to reproduce your steps…
I notice in your ignition file you install k3s-selinux. This is great. But when I checked the VM I noticed the most important paths outlined here were not actually labeled. So I had to enable this specific selinux module and relabled the entire fs
rpm-ostree install helm
systemctl reboot
semodule -v -e k3s
fixfiles -F -f relabel
systemctl reboot
Then these well known paths have been finally labeled properly. When I checked the process table I noticed that k3s process was labeled correctly this time as “container_runtime_t” and not “unconfined_service_t” like before.
ls -laZ /var/lib/rancher/k3s
ps -auxZ | grep k3s
Finally I decided to install cilium as a helm chart rather than a binary. It will mount some hostpaths and try some nsenters, but all that from within the confinement of the k3s-selinux package, which is not something we can say about the cilium binary.
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
helm repo add cilium https://helm.cilium.io
helm install mycilium cilium/cilium \
--set ipam.operator.clusterPoolIPv4PodCIDRList="10.0.0.0/8" \
--set ipam.operator.clusterPoolIPv4MaskSize=24 \
--set kubeProxyReplacement=true --set operator.replicas=1 \
--set envoy.enabled=false \
--set k8sServicePort=6443 \
--set k8sServiceHost=127.0.0.1
So far we have
k3s process running as container_runtime_t
cilium-operator pod running as container_t
cilium-agent pod running as spc_t
the bpf filesystem is stil labeled bpf_t. I do not see any selinux errors relating to it. We can see the process spc_t has quite a few permission on files labeled bpf_t
ls -laZ /sys/fs/bpf
sesearch -A -s spc_t -t bpf_t
All the pods are running with allocated ips and I see no avc errors in journalctl, or at least these are my thoughts on reproducing this setup. Why was not k3s-selinux enabled as part of the installation I do not know…
Hmmm, I’m just now remembering that k3s ships with flannel by default (which you rightfully disabled) and cilium needs to be installed after the fact (whereas it can be installed with RKE2 out of box). So, I’m thinking you might be right about possibly needing to adjust the context on cilium since k3s-selinux is probably not going to grok it.
Here’s the rke2-selinux ref. for the CNI paths:
I’ll check both of your discoveries out, thanks! For what it’s worth, here’s where I’m working on building this setup: Redoing my setup after many years by genebean · Pull Request #4 · genebean/kubebag · GitHub
As of this moment things work, but still via the privilege escalation. What you both shared sounds quite promising.
Regarding having to relabel, does that seem like an issue with the k3s selinux rpm or something else?
That is a good question and like always a good question is already half the answer… So the scripts I mentioned above are part of the rpm and they get triggered in the %post section of the rpm package. We can also double check the scripts to see that the host package has same scripts
rpm --query --scripts k3s-selinux
...
...
%define k3s_relabel_files() \
mkdir -p /var/lib/cni; \
mkdir -p /var/lib/kubelet/pods; \
mkdir -p /var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots; \
mkdir -p /var/lib/rancher/k3s/data; \
mkdir -p /var/run/flannel; \
mkdir -p /var/run/k3s; \
restorecon -R -i /etc/systemd/system/k3s.service; \
restorecon -R -i /usr/lib/systemd/system/k3s.service; \
restorecon -R /var/lib/cni; \
restorecon -R /var/lib/kubelet; \
restorecon -R /var/lib/rancher; \
restorecon -R /var/run/k3s; \
restorecon -R /var/run/flannel
...
...
%post
%selinux_modules_install %{_datadir}/selinux/packages/k3s.pp
if /usr/sbin/selinuxenabled ; then
/usr/sbin/load_policy
%k3s_relabel_files
fi;
However after starting a clean coreos and installing k3s-selinux manually I noticed it did not create the above mentioned folders and consequently it did not label (restorecon) them properly despite logs saying post install script ran successfully in 65ms. So what happens is that these folders are not properly labeled, then when k3s.service is installed it will pick the existing labels and the process will be labeled unconfined_service_t then during cilium installation one of the forks of this process will try to create a mount-cgroup container for mounting bpf(used heavily by cilium for performance reasons). This process will try to transition to spc_t since this process is alowed to access bpf_t and it will fail because it is not allowed to transition in such a way and this is the error with permissions that I and you probably have seen in the logs.
I would suggest you add a new systemd directive to your k3s-selinux systemd service and run a script with these steps.
mkdir -p /var/lib/cni;
mkdir -p /var/lib/kubelet/pods;
mkdir -p /var/lib/rancher/k3s/agent/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots;
mkdir -p /var/lib/rancher/k3s/data;
mkdir -p /var/run/flannel;
mkdir -p /var/run/k3s;
restorecon -R -i /etc/systemd/system/k3s.service;
restorecon -R -i /usr/lib/systemd/system/k3s.service;
restorecon -R /var/lib/cni;
restorecon -R /var/lib/kubelet;
restorecon -R /var/lib/rancher;
restorecon -R /var/run/k3s;
restorecon -R /var/run/flannel;
Unless someone who knows coreos and rpm-ostree better could help me as well to understand why were the postinstall scripts not executed in the case of this package. Of course I hope these are all the paths that need to relabeled. In my setup I just relabeled the entire filesystem to be sure.
This may well be a “I am new to FCOS” question, but how does this help with /opt/cni/
? Is that mappted to one of the var lib paths OR is the real issue that k3s is running with the wrong context?
These two commands seem to be the only thing that gets k3s running as system_u:system_r:container_runtime_t:s0 root 3825 102 6.4 16347828 389024 ? Ssl 19:38 0:19 /usr/local/bin/k3s server
. A simple service restart is enough to get that to be the case. So, it seems like the module is not being activated before k3s is installed… or so it seems to me.
Hi Gene, when dealing with complex and intertwined systems, what is the real cause of something is a philosophical thing for me… I don’t have a fcos instance now and I don t remember clearly, but I think the module was active already. So I think relabeling the filesystem the fixfiles command is the only one important. What is funny, I tried restorecon -Rv / to relable everything but this to my surprise was not consistently labeling the files, unless I was drunk, so I replaced it with the fixfiles command and this one was consistent and reproducible and labeled the /var/lib/rancher properly… Anyway, I am sorry I cannot dedicate more time for this but I think cilium should work properly now in bpf mode. You will probably also see some files in hostpath /sys/fs/bpf .
No problem and I really appreciate the help! I’m going to get my systemd units in order and try the revised process without setting any charts to privileged. Seriously, thank you!
I think adding this post start may have done it!
[Unit]
Description=Install k3s dependencies
Wants=network-online.target
After=network-online.target
Before=zincati.service
ConditionPathExists=|!/usr/bin/kubectl
ConditionPathExists=|!/usr/share/selinux/packages/k3s.pp
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=rpm-ostree install --apply-live --allow-inactive --assumeyes kubectl k3s-selinux
ExecStartPost=/usr/bin/bash -c '/usr/sbin/semodule -v -e k3s && /usr/sbin/fixfiles -F -f relabel'
[Install]
WantedBy=multi-user.target
Hello again
You will want to start k3s.service after the dependencies service, so make sure you update the After directive like below
- name: "k3s.service"
enabled: true
contents: |
[Unit]
Description=Run K3s
Wants=network-online.target
After=network-online.target rpm-ostree-install-k3s-dependencies.service
Also, the entire filesystem relabeling takes time… you can check with “rpm-ostree status” and wait until it will get into idle status. For me it took 4 minutes since the restart…
To make relabeling faster, you might want to label not the entire filesystem but only the following paths:
ExecStartPost=/usr/bin/bash -c '/usr/sbin/semodule -v -e k3s && restorecon -Rv /var /etc /usr'
Coreos is interesting, I ll try to give it a chance in the future
Thanks so much! I will give this a try shortly