Hi, we are doing “complete reinstalls” on some systems. This worked pretty well so far, I think we somehow broke some dependency chain or the like when our manifest-locks became invalid. Let me explain:
How are we updating
We are using kexec
to boot into a new kernel, initramfs and rootfs image to do essentially what coreos install
also does. We provide some kargs to the next boot which specifies the ignition file to provision. Below are the commands, which I hope are pretty self-explainatory in what is happening:
sudo curl http://192.168.0.10:3001/new/kernel -o /var/reinstall/kernel
sudo curl http://192.168.0.10:3001/new/initramfs.img -o /var/reinstall/initrd.img
sudo curl http://192.168.0.10:3001/new/rootfs.img | sudo dd oflag=append conv=notrunc of=/var/reinstall/initrd.img
sudo kexec -l /var/reinstall/kernel --initrd="/var/reinstall/initrd.img" --append="coreos.inst.install_dev=/dev/mmcblk0 console=ttyS0 coreos.inst.ignition_url=http://192.168.0.10:3001/config.ign coreos.inst.insecure ip=192.168.0.150:::24:BBHUB553LKW3:enp1s0:none"
sudo systemctl kexec
Some links where basically the same is done:
Reinstall POC, entrypoint
Relevant PR
Manual equivalent
Our image
We are building our image, the “initramfs”, “rootfs” and “kernel” files you see above, with Coreos Assembler. We are pretty much using the same base as official CoreOS is, but have added some extra packages and an “overlay.d” layer.
What is failing
All things worked flawlessly, until recently I added some updates to our overlay.d layer and we tried rebuilding the image. At which point building failed, mentioning some packages could not be found, I assumed some package repo no longer existing? Some googling around stumbled us across this. This led me to “clean things” up and remove the “lock” files, EG: src/config/manifest-lock.x86_64.json
, cache
folder and overrides
. This restored the ability to build an image.
It seems however that this new image is incompatible with the “old” one, old meaning the image built before the “cleanup”.
Previously we could rebuild images, and reinstall them just fine. In the http server hosting all of these files we could see al requests passing by, even the retrieval of the ignition file, config.ign
, this last request for the ignition file we no longer see. This leads me to believe that for some reason this new image, whatever part of this it may be, is incompatible with the old causing the “initramfs initialization process” (am I correct that indeed initramfs is the “part” making this request?) to fail doing so. The question is why…
What could help me forward
- First and foremost, the only thing I have to try and debug this are the request logging in the http server and timing’s of a screen getting power/backlit. I’d very much like some pointers as to what component could be at fault here and how to even start properly debugging these, possibly early, stages.
- Some clarification on what stages are being executed in my workflow. Where does the ignition request come from? Has the machine rebooted fully into the new kernel at this time? Etc…
- Some clarification on what these lock files are along with the
cache
andoverrides
directories, what have I actually done when “cleaning up”? And how could this “Package not found” error appear suddenly, is this a package repo missing? Are the alternative ways of fixing this besides “cleaning up”? - Any and all thoughts on what could be causing this “incompatibility”.
Your help is greatly appreciated!