Draft for Quick Re-provisioning of Existing CoreOS Node

Greetings folks!

For my bare-metal servers without PXE environment I wasn’t satisfied with the intricate re-provisioning using USB boot media. It also takes quite some time to copy the entire OS to the system’s drive even though re-creating /etc and /var could be quite efficient with an appropriate persistence strategy.

I have therefore – inspired by the factory reset discussion on GitHub – attempted to build a quick re-provisioning mechanism for already installed nodes. I have only tested it with simple configs but would like to get some feedback about my implementation and the limitations of such an approach.

At the heart is a service running between ostree-remount and local-fs. It runs if a /var/REPROVISION.ign file exists and reboots the system if it succeeds. Before executing the main script it moves the ignition file somewhere else to prevent boot-loops.

/etc/systemd/system/quick-reprovision.service

[Unit]
Description=Quick Reprovision
DefaultDependencies=no
Requires=quick-reprovision-cleanup.service
After=ostree-remount.service
Before=local-fs.target
ConditionPathExists=/sysroot/ostree/deploy/fedora-coreos/var/REPROVISION.ign
SuccessAction=reboot

[Service]
Type=oneshot
ExecStartPre=/usr/bin/mv /sysroot/ostree/deploy/fedora-coreos/var/REPROVISION.ign /sysroot/ostree/deploy/fedora-coreos/var/REPROVISION-FAILED.ign
ExecStart=/usr/bin/bash /usr/local/lib/quick-reprovision.sh
TimeoutStartSec=0

[Install]
WantedBy=local-fs.target

The service’s main script extracts the relevant boot options from the current deployment and then creates a new OSTree deployment with a fresh /etc from the same ref. It then installs the config.ign and ignition.firstboot files in the /boot directory. Since some files in /var are already going to be opened, everything in /var is moved to a “DELETE” sub-directory for now.

/usr/local/lib/quick-reprovision.sh

set -euo pipefail
export PATH="/usr/bin"

# get current delpoyment info
ostree_status="$(ostree admin status -J)"
current_check="$(jq -r '.deployments[] | select(.booted) | .checksum' <<< "$ostree_status")"
current_serial="$(jq -r '.deployments[] | select(.booted) | .serial' <<< "$ostree_status")"

# find current boot entry
for current_boot in /boot/loader/entries/*.conf; do
  ostree="$(sed -En 's/^options.* ostree=([^ ]+).*$/\1/p' "$current_boot")"
  depl="$(basename "$(readlink "$ostree")")"
  [[ "$depl" != "${current_check}.${current_serial}" ]] || break
done

# extract options to take over
new_options="$(sed -En 's/^options ?(.*) ostree=[^ ]+ ?(.*) root=.*$/\1 \2/p' "$current_boot")"

# create new deployment
args=()
for opt in $new_options; do args+=("--karg=$opt"); done
ostree admin deploy --retain --no-merge "${args[@]}" "$current_check"

# configure ignition execution
mount -o remount,rw /boot
install -m 0600 -D /sysroot/ostree/deploy/fedora-coreos/var/REPROVISION-FAILED.ign /boot/ignition/config.ign
touch /boot/ignition.firstboot
mount -o remount,ro /boot

# clear state
mkdir -p /sysroot/ostree/deploy/fedora-coreos/var/DELETE
find /sysroot/ostree/deploy/fedora-coreos/var -mindepth 1 -maxdepth 1 \( '!' -name DELETE \) \
  -execdir mv -t /sysroot/ostree/deploy/fedora-coreos/var/DELETE/ -- '{}' + || true

A clean-up service will remove this /var/DELETE directory during the next boot before /var is mounted to its final location.

/etc/systemd/system/quick-reprovision-cleanup.service

[Unit]
Description=Quick Reprovision Clean-up
DefaultDependencies=no
After=sysroot-ostree-deploy-fedora\x2dcoreos-var.mount
Before=var.mount
ConditionPathExists=/sysroot/ostree/deploy/fedora-coreos/var/DELETE

[Service]
Type=oneshot
ExecStart=/usr/bin/rm -rf /sysroot/ostree/deploy/fedora-coreos/var/DELETE
TimeoutStartSec=0

A convenience script validates a new config’s JSON format and places it at /var/REPROVISION.ign.

/usr/local/sbin/quick-reprovision

#!/usr/bin/bash
set -euo pipefail

# parse args until positional
args="$(getopt -l now,help -o nh -- "$@")"
eval set -- "$args"

now=false
while [[ "$1" != '--' ]]; do
  case "$1" in
    -n | --now  ) now=true;;
    -h | --help ) echo 'usage: quick-reprovision [-n|--now] [-h|--help] [IGNITION-FILE]'; exit;;
  esac; shift
done; shift

# only continue if root
(( EUID == 0 )) || { echo >&2 '[error] root privs required'; exit 1; }

# handle positional args
(( $# < 2 )) || { echo >&2 '[error] excess positional params'; exit 1; }
path="${1:-"-"}"
[[ "$path" != '-' ]] || path="/dev/stdin"

# read config and make sure it is valid JSON object
config="$(< "$path")"
jq -er 'type == "object"' <<< "$config" >/dev/null 2>&1 || { echo >&2 '[error] invalid JSON'; exit 1; }

# write configuration
echo "$config" > /var/REPROVISION.ign

# reboot if requested
! $now || systemctl reboot

I can now remotely apply a new ingnition file by running the following.

ssh core@coreos-machine 'sudo quick-reprovision --now' < config.ign

If the new config messes things up, I can boot into the old deployment to get back my old /etc. My /var will already be purged though.

Once it’s honed I’m planning to publish it as a pyromaniac library to be easily imported into ones configs.

What are your thoughts?