CoreOs fails to boot with composefs after upgrade to 41

Currently we only have an fcos 39 template available to provision systems.
When I create an instance based on this template and perform an upgrade to the latest stable fcos rpm-ostree upgrade then the upgrade procedure seems to work fine.
However the system will not boot and fails with:

ostree-prepare-root: composefs: failed to mount: No such file or directory

I got the system to work by disabling composefs in the kernel arguments - but I would really like to fix the system.

I guess there should be a migration running when I do the upgrade - can this be verified somehow? Is there a manual way to fix this?

Hello @dansch and welcome to :fedora: !

Is this your Butane config? If so, would you share it?

Does this mean you have disabled automatic updates (Zincati)? Along with the Butane config, could you also share the output of the rpm-ostree status command executed on the Fedora CoreOS 39 deployment before upgrading it.

Hello @hricky

Sure and thx for helping!
I anonymized some things but this is the butane config used…
We disable zincati and trigger updates manually.

variant: fcos
version: 1.5.0
passwd:
  users:
    - name: core
      ssh_authorized_keys:
        - ssh-rsa XXXXX
    #user running all custom services
    - name: forgejo
      groups:
        - systemd-journal # add our gateway user to this group purely so it would be able to read journald, which we need so fluent-bit can forward it to elasticsearch
systemd:
  units:
    - name: zincati.service
      enabled: false
      mask: true
    - name: rpm-ostree-countme.timer
      enabled: false
      mask: true
    # we would disable this one during hardening anyway, by disabling it during ignition we won't have the first boot
    # have the CoreOS version listed upon login either way..
    - name: console-login-helper-messages-gensnippet-os-release.service
      enabled: false
      mask: true
storage:
  filesystems:
    #main hd 30GB (os, image,logs)
    #2nd hd 30GB (all forgejo data) - grow afterward to required size
    - device: /dev/sdb
      path: /var/forgejo-data
      format: xfs
      label: nx-data
      with_mount_unit: true
  files:
    - path: /var/lib/systemd/linger/forgejo # insteadof sudo loginctl enable-linger $USER
      mode: 0644
      contents:
        inline: ''
    - path: /etc/hostname
      mode: 0420
      overwrite: true
      contents:
        inline: xxx.foo.bar
    - path: /etc/ostree/remotes.d/fedora.conf
      overwrite: true
      contents:
        inline: |
          [remote "fedora"]
          url=https://artifactproxy.foo.bar/repository/raw-ostree.fedoraproject.org/
          gpg-verify=true
          gpgkeypath=/etc/pki/rpm-gpg/
    - path: /etc/zincati/config.d/90-disable-auto-updates.toml
      contents:
        inline: |
          [updates]
          enabled = false
    - path: /etc/NetworkManager/system-connections/ens192.nmconnection
      mode: 0600
      contents:
        inline: |
          [connection]
          id=ens192
          type=ethernet
          interface-name=ens192
          [ipv4]
          address1=1.4.1.6/24,1.4.1.5
          dns=1.4.1.5
          dns-search=foo.bar
          may-fail=false
          method=manual
          [ipv6]
          method=disabled

and this is the ostree status right after provisioning (we currently provision using a template in vmware)

# rpm-ostree status
State: idle
Deployments:
â—Ź fedora:fedora/x86_64/coreos/stable
                  Version: 39.20231119.3.0 (2023-12-04T16:21:28Z)
                   Commit: cd3ab5975ace83aa36f687d2f2d58ea59b4d1cceef6d0cd18a231248ce6ae207
             GPGSignature: Valid signature by E8F23996F23218640CB44CBE75CF5AC418B8E74C

Hmm interesting - the system fixes itself when:

  • booted with ostree.prepare-root.composefs=0
  • perform rpm-ostree install python

Then reboot really works and composefs is setup properly:

mount
/dev/sda4 on /sysroot type xfs (ro,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
composefs on / type overlay (ro,relatime,seclabel,lowerdir+=/run/ostree/.private/cfsroot-lower,datadir+=/sysroot/ostree/repo/objects,redirect_dir=on,metacopy=on)
/dev/sda4 on /etc type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)
/dev/sda4 on /sysroot/ostree/deploy/fedora-coreos/var type xfs (rw,relatime,seclabel,attr2,inode64,logbufs=8,logbsize=32k,prjquota)

I assume it would not matter what gets installed.

I haven’t tried to reproduce the issue yet. It’s likely related to upgrading from 39 to 41, which as far as I know is not supported. Usually you have to rebase from 39 to 40 and then to 41.

Regarding composefs, as far as I know, it doesn’t matter what is installed. Here’s what it looks like on one of my virtual machines.

core@sysexts-vm:~$ cat /usr/lib/ostree/prepare-root.conf 
[composefs]
enabled = true
core@sysexts-vm:~$ findmnt /
TARGET SOURCE    FSTYPE  OPTIONS
/      composefs overlay ro,relatime,seclabel,lowerdir+=/run/ostree/.private/cfsroot-lower,datadir+=/sysroot/ostree/repo/objects,redirect_dir=on,metacopy=on

If you are booting from an old boot image, you should let Zincati update the system to the latest release and do the reboots. It will go through all the barrier releases that are required to fix issues before major updates.

1 Like

Good point @siosm thx. I thought rpm-ostree upgrade would respect barrier releases as well. I will try using zincati next time.
Are these the barrier releases? fedora-coreos-streams/updates/stable.json at main · coreos/fedora-coreos-streams · GitHub and is there a way to see that in rpm-ostree or is this only solved in zincati? (sorry i did not find much documentation on that and i still need to learn a lot on fcos…)

@hricky How can i rebase to 40 and then to 41 in a proper way?

Just follow Timothée’s recommendation. Also, keep in mind that if a issue cannot be reproduced in the latest released versions (released every two weeks), then the maintainers probably won’t spend time on it. My suggestion would be to activate ZIncati whenever it is appropriate for your use case – the more often the better.

Thinking more about it, if you don’t want to use Zincati at all, maybe you could consider re-provisioning your nodes with each new release. If you make /var persistent and also set other storage configs in Butane correctly, the system will be provisioned without touching the configured devices/partitions. I haven’t used it, at least not regularly, but I know of a community member who runs Fedora CoreOS entirely from RAM and uses such an approach to keep his nodes updated. If you think it would be suitable for your use case, I can try to find the GitHub repo and post the link here.

re-provisioning your nodes with each new release

Yes that sounds like a neat approach. This gives us full control on when the upgrade happens and ensures the system is properly upgraded imho.
If you could find that link that would be nice - thx!

I found the repo, but it only contains a simple Bash script to download Fedora CoreOS artifacts. I can try to replicate your setup and test some Butane configs. As far as I can see, you have one disk for OS and logs and another for data. If this is correct, do you need any modifications to the main system disk where FCOS is installed?

Ah thx! Apart from the ignition with butane we have performed a few changes in /etc like configuring chrony. I assume these changes would remain when re-provisioning - correct?
Besides that we also modify the homedir /var/home/forgejo. I guess provisioning would not retain /var/home and this would need to be moved to the second disk? wdyt?

I can try to replicate your setup and test some Butane configs.

If it is not too much effort it would certainly be interesting to know if reprovisioning is a viable option.

Fedora CoreOS (FCOS) is upstream of Red Hat Enterprise Linux CoreOS (RHCOS), which is the default operating system for all OpenShift Container Platform cluster machines. RHCOS is designed as a single-purpose container operating system with automated remote upgrade features. FCOS can be used as a general-purpose server operating system, but it still requires automated remote upgrades, and this is the only supported and documented way to keep your system up to date. At least that’s how I understand it.

With the correct config, /var should not be touched during re-provisioning. I’ll try /var on a separate partition on the same primary disk or mount it from a partition on the second disk, so you can decide which would be appropriate.

I’m not entirely sure about /etc, but I’ll test it. Perhaps you can include all the necessary /etc modifications in the Butane config, since you already have some.

I have already done similar Butane configs and will initially test them on a virtual machine. As you said, it would be valuable to know for sure if re-provisioning is a reliable update option, even though it is not officially supported nor documented.