Using a post-install script during coreos PXE install

Greetings! I am deploying CoreOS using ipxe and the latest initramfs, and, well,
I am hitting 1997805 – coreos install boot order

On my Dell servers, EFI boot entries to boot on disk are automatically removed when the system detects there is no more EFI partition on the disk (so “it’s a feature”).
Following the CoreOS installation, the node reboots into PXE again, triggering the installation again and again and again.

This problem is tracked now by Stale UEFI boot entry left behind after reprovisioning · Issue #946 · coreos/fedora-coreos-tracker · GitHub though I would like to know if there is a way around it using pxe customize --post-install, or maybe custom post-install-hook services.
Simply put, I would like to run an efibootmgr -o in a post install script so that the Fedora entry gets first in the list.
As far as I tried, the post-install-hook service is not installed in the live ramdisk, every customization I managed to do (through butane config) is executed after the reboot.

coreos-installer pxe customize --post-install visibly modifies the ramdisk file, but I don’t know how to start checking what it does exactly, I could not find an option for more logs, and the ramdisk is a bunch of cpio and gzipped files concatenated together, I am not sure how to open that (and I don’t know how this modification is used on the live system). I call it in this fashion:

podman run --privileged -ti --rm -v `pwd`:/imgs quay.io/coreos/coreos-installer:release pxe customize -o /imgs/fedora-coreos-36.20220605.3.0-live-initramfs.x86_64.img.4 /imgs/fedora-coreos-36.20220605.3.0-live-initramfs.x86_64.img --post-install /imgs/post.sh

the ipxe options are as followed:

kernel --timeout 60000 http://192.168.25.6:8088/446af683-b062-4879-bcd2-a32092cfbe58/kernel text nofb nomodeset vga=normal ipa-debug=1 coreos.live.rootfs_url=http://192.168.25.4:8088/fedora-coreos-36.20220605.3.0-live-rootfs.x86_64.img coreos.inst.install_dev=/dev/sda coreos.inst.ignition_url=http://192.168.25.4:8088/init-141.ign initrd=ramdisk || goto boot_ramdisk

I am wondering if it is possible at all to use the --post-install option for my case. I would be grateful if anyone could help me troubleshoot :slight_smile:
At the moment I have to manually fix the boot order, then the system runs fine until next install.

Thanks!

--post-install is the right approach here. You can use coreos-installer pxe ignition unwrap <iso> to see the Ignition config that pxe customize embedded in the live image. In your case, --post-install should create /usr/local/bin/post-install-post.sh and post-install-post.sh.service in the live system. You can add the coreos.inst.skip_reboot kernel argument to automatically drop to a shell after installation, and then journalctl -u post-install-post.sh.service to see what the post-install script did.

Thanks for your response. Indeed coreos-installer pxe ignition unwrap allows me to see that new ignition files were embedded within the ramdisk, thank you.
However neither the post-install-post.sh.service or anything under /usr/local/bin is present in the live installation. I tested coreos.inst.skip_reboot already, I just checked one more time: there is just nothing there. The ignition files added by --post-install are not used during the installation.

EDIT: found it! I needed
ignition.firstboot ignition.platform.id=metal
on the cmdline (as documented)

I have a follow-up issue. I tried setting up some software raid with the doc and some extra clean-up. It seems the way things work:

1- pxe boot
2- the “live ignition” file is applied
3- coreos-installer /dev/sda completes
4- the post script is run (where I run some sgdisk --zap-all /dev/sdb and efibootmgr --create --disk /dev/sda etc.)
5- the system reboots
6- the dest-ignition file is applied, containing the reconfiguration with disk mirroring and RAID-1

after the RAID reconfiguration completes, the EFI boot entry is no longer valid as the partition UUID changed. When the system reboots, it goes into PXE boot again.

My understanding is that software RAID can only be applied after the installation, so I cannot define the RAID configuration directly into the “live ignition” file (doing that, coreos-installer does not know how to partition the disks anymore). I am fine reconfiguring the disks after the installation (what is documented for OpenShift as well), so I am looking for a way to run a efibootmgr command once the RAID reconfiguration completes.

RAID is set up as part of the Ignition run on first boot, not in the LiveISO.
coreos-installer does not create an EFI boot entry. It’s created on first boot by the shim. You need to boot from the disk or disable PXE once you have installed your system.
If you are doing multiple installations on the same disk, there might be stale boot entries left in your firmware config. You should clean them up before re-installing.

RAID is set up as part of the Ignition run on first boot, not in the LiveISO.

thanks for the confirmation!

what happens on Dell servers:

  • the iDRAC detects there is no bootable entry so you get a PXE entry automatically added
  • then you install the server. coreos-installer does not create the EFI boot entry, so I create one myself with a post install script.
  • the iDRAC detects there is still a PXE boot available, so it will add a boot entry for PXE nevertheless, that will be second in the boot ordering
  • now the server reboots, the first boot create the file systems. The boot entry I created initially is kept but is invalid. The PXE is second in the list. Following a reboot, the IDRAC detects there are 2 more ESP partitions and will automatically add boot entries for these, that will be third and fourth in the list.
  • so the system boots in PXE following the raid set-up.

I managed to make it work (in RHCOS, I did not try on Fedora CoreOS) by adding a service that runs after coreos-copy-firstboot-network.service, so that the EFI boot entry is fixed before the reboot

  systemd:
    units:
      - name: fix-raid-efi.service
        enabled: true
        contents: |
          [Unit]
          Description=Fix efibootmgr after raid config
          After=coreos-copy-firstboot-network.service
          [Service]
          Type=oneshot
          RemainAfterExit=yes
          ExecStart=/usr/local/bin/post.sh
          [Install]
          WantedBy=multi-user.target