Draft for Quick Re-provisioning of Existing CoreOS Node

Greetings folks!

For my bare-metal servers without PXE environment I wasn’t satisfied with the intricate re-provisioning using USB boot media. It also takes quite some time to copy the entire OS to the system’s drive even though re-creating /etc and /var could be quite efficient with an appropriate persistence strategy.

I have therefore – inspired by the factory reset discussion on GitHub – attempted to build a quick re-provisioning mechanism for already installed nodes. I have only tested it with simple configs but would like to get some feedback about my implementation and the limitations of such an approach.

At the heart is a service running between ostree-remount and local-fs. It runs if a /var/REPROVISION.ign file exists and reboots the system if it succeeds. Before executing the main script it moves the ignition file somewhere else to prevent boot-loops.

/etc/systemd/system/quick-reprovision.service

[Unit]
Description=Quick Reprovision
DefaultDependencies=no
Requires=quick-reprovision-cleanup.service
After=ostree-remount.service
Before=local-fs.target
ConditionPathExists=/sysroot/ostree/deploy/fedora-coreos/var/REPROVISION.ign
SuccessAction=reboot

[Service]
Type=oneshot
ExecStartPre=/usr/bin/mv /sysroot/ostree/deploy/fedora-coreos/var/REPROVISION.ign /sysroot/ostree/deploy/fedora-coreos/var/REPROVISION-FAILED.ign
ExecStart=/usr/bin/bash /usr/local/lib/quick-reprovision.sh
TimeoutStartSec=0

[Install]
WantedBy=local-fs.target

The service’s main script extracts the relevant boot options from the current deployment and then creates a new OSTree deployment with a fresh /etc from the same ref. It then installs the config.ign and ignition.firstboot files in the /boot directory. Since some files in /var are already going to be opened, everything in /var is moved to a “DELETE” sub-directory for now.

/usr/local/lib/quick-reprovision.sh

set -euo pipefail
export PATH="/usr/bin"

# get current delpoyment info
ostree_status="$(ostree admin status -J)"
current_check="$(jq -r '.deployments[] | select(.booted) | .checksum' <<< "$ostree_status")"
current_serial="$(jq -r '.deployments[] | select(.booted) | .serial' <<< "$ostree_status")"

# find current boot entry
for current_boot in /boot/loader/entries/*.conf; do
  ostree="$(sed -En 's/^options.* ostree=([^ ]+).*$/\1/p' "$current_boot")"
  depl="$(basename "$(readlink "$ostree")")"
  [[ "$depl" != "${current_check}.${current_serial}" ]] || break
done

# extract options to take over
new_options="$(sed -En 's/^options ?(.*) ostree=[^ ]+ ?(.*) root=.*$/\1 \2/p' "$current_boot")"

# create new deployment
args=()
for opt in $new_options; do args+=("--karg=$opt"); done
ostree admin deploy --retain --no-merge "${args[@]}" "$current_check"

# configure ignition execution
mount -o remount,rw /boot
install -m 0600 -D /sysroot/ostree/deploy/fedora-coreos/var/REPROVISION-FAILED.ign /boot/ignition/config.ign
touch /boot/ignition.firstboot
mount -o remount,ro /boot

# clear state
mkdir -p /sysroot/ostree/deploy/fedora-coreos/var/DELETE
find /sysroot/ostree/deploy/fedora-coreos/var -mindepth 1 -maxdepth 1 \( '!' -name DELETE \) \
  -execdir mv -t /sysroot/ostree/deploy/fedora-coreos/var/DELETE/ -- '{}' + || true

A clean-up service will remove this /var/DELETE directory during the next boot before /var is mounted to its final location.

/etc/systemd/system/quick-reprovision-cleanup.service

[Unit]
Description=Quick Reprovision Clean-up
DefaultDependencies=no
After=sysroot-ostree-deploy-fedora\x2dcoreos-var.mount
Before=var.mount
ConditionPathExists=/sysroot/ostree/deploy/fedora-coreos/var/DELETE

[Service]
Type=oneshot
ExecStart=/usr/bin/rm -rf /sysroot/ostree/deploy/fedora-coreos/var/DELETE
TimeoutStartSec=0

A convenience script validates a new config’s JSON format and places it at /var/REPROVISION.ign.

/usr/local/sbin/quick-reprovision

#!/usr/bin/bash
set -euo pipefail

# parse args until positional
args="$(getopt -l now,help -o nh -- "$@")"
eval set -- "$args"

now=false
while [[ "$1" != '--' ]]; do
  case "$1" in
    -n | --now  ) now=true;;
    -h | --help ) echo 'usage: quick-reprovision [-n|--now] [-h|--help] [IGNITION-FILE]'; exit;;
  esac; shift
done; shift

# only continue if root
(( EUID == 0 )) || { echo >&2 '[error] root privs required'; exit 1; }

# handle positional args
(( $# < 2 )) || { echo >&2 '[error] excess positional params'; exit 1; }
path="${1:-"-"}"
[[ "$path" != '-' ]] || path="/dev/stdin"

# read config and make sure it is valid JSON object
config="$(< "$path")"
jq -er 'type == "object"' <<< "$config" >/dev/null 2>&1 || { echo >&2 '[error] invalid JSON'; exit 1; }

# write configuration
echo "$config" > /var/REPROVISION.ign

# reboot if requested
! $now || systemctl reboot

I can now remotely apply a new ingnition file by running the following.

ssh core@coreos-machine 'sudo quick-reprovision --now' < config.ign

If the new config messes things up, I can boot into the old deployment to get back my old /etc. My /var will already be purged though.

Once it’s honed I’m planning to publish it as a pyromaniac library to be easily imported into ones configs.

What are your thoughts?

The above approach relies on the initrd applying the ignition file and breaks pretty soon with custom storage layouts. Here is a new approach using systemd-nspawn. I have tested it (exclusively) with a RAID + LUKS setup.

The ignition fetch phases are executed inside a container while the system is running. During shutdown, when most services are already down, a new deployment is created and the mount and files stages are executed. If any errors occur, the changes are rolled back. The system then reboots directly into the new deployment.

All required files are stored in /etc, such that they can be rolled back by booting into a previous deployment.

/etc/local/sbin/quick-reprovision

#!/usr/bin/bash
set -euo pipefail

readonly RUN_DIR=/run/quick-reprovision

# parse args until positional
args="$(getopt -l now,help -o nh -- "$@")"
eval set -- "$args"

now=false
while [[ "$1" != '--' ]]; do
  case "$1" in
    -n | --now  ) now=true;;
    -h | --help ) echo 'usage: 'quick-reprovision' [-n|--now] [-h|--help] [IGNITION-FILE]'; exit;;
  esac; shift
done; shift

# handle positional args
(( $# < 2 )) || { echo >&2 '[error] excess positional params'; exit 1; }
path="${1:-"-"}"
[[ "$path" != '-' ]] || path="/dev/stdin"

# only continue if root
(( EUID == 0 )) || { echo >&2 '[error] root privs required'; exit 1; }

# only continue if initrd root doesn't exist already
! [[ -d "$RUN_DIR" ]] || { echo >&2 '[error] reprovisioning already scheduled'; exit 1; }

# read new config and make sure it is valid JSON object
echo >&2 '[info] reading and validating JSON file'
config="$(< "$path")"
jq -er 'type == "object"' <<< "$config" >/dev/null 2>&1 || {
  echo >&2 '[error] invalid JSON'; exit 1
}

# extract initrd path from current kernel command line
boot_dir="$(sed -En 's|^(.* )?BOOT_IMAGE=[^/]*(/boot/[^ ]+)/.*$|\2|p' /proc/cmdline)"
initrd="$(find "$boot_dir" -type f -name 'initr*.img')"

# make container root directory and register removal on failure
install -dm 700 "$RUN_DIR"
success=false
cleanup() { $success || rm --one-file-system -rf "$RUN_DIR"; }
trap cleanup EXIT

# extract initrd to container root directory
echo >&2 '[info] preparing initrd container'
mkdir "$RUN_DIR/container"
/usr/lib/dracut/skipcpio "$initrd" | zstdcat | cpio --quiet -idmD "$RUN_DIR/container"

# create dev links
mkdir -p "$RUN_DIR/ignition/dev_aliases" && ln -sf /dev "$RUN_DIR/ignition/dev_aliases/dev"

# place new ignition file
install -Dm 600 /dev/stdin "$RUN_DIR/container/usr/lib/ignition/user.ign" <<< "$config"

# container wrapper
container() {
  systemd-nspawn --quiet --register=no --as-pid2 -D "$RUN_DIR/container" \
    --bind=/run/systemd --bind="$RUN_DIR/ignition:/run/ignition" -- "$@"
}

# run ignition fetch phases inside container
echo >&2 '[info] running ignition fetch phases'
pf="$(. /run/ignition.env && echo "$PLATFORM_ID")"
common=(--log-to-stdout --platform "$pf" --root /sysroot --config-cache /run/ignition/ignition.json)
container /usr/bin/ignition "${common[@]}" --stage fetch-offline |
  sed -En 's/CRITICAL\s+:\s+(.*)$/[error] \1/p' >&2
container /usr/bin/ignition "${common[@]}" --stage fetch |
  sed -En 's/CRITICAL\s+:\s+(.*)$/[error] \1/p' >&2

# add LUKS key files
if [[ -d /etc/luks ]]; then
  args=(); for f in /etc/luks/*; do args+=(--arg "$(basename "$f")" "$(base64 < "$f")"); done
  jq --compact-output <<< "$(< "$RUN_DIR/ignition/state")" > "$RUN_DIR/ignition/state" \
    "${args[@]}" 'setpath(["luksPersistKeyFiles"]; $ARGS.named | map_values("data:;base64,\(.)"))'
fi

# sigal success to clean-up
echo >&2 '[info] successfully prepared ignition environment'
success=true

# reboot if requested
if $now; then
  echo >&2 '[info] rebooting to apply new config'
  systemctl reboot
else
  echo >&2 '[info] you need to reboot to apply the new config'
fi

quick-reprovision.service

[Unit]
Description=Quick Reprovision
DefaultDependencies=no
Conflicts=shutdown.target
After=boot.mount tmp.mount ostree-remount.service
Before=local-fs.target shutdown.target

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/usr/bin/sh -c 'rm -rf /sysroot/ostree/deploy/fedora-coreos/var/delete.????'
ExecStop=/usr/bin/bash /etc/local/libexec/quick-reprovision.sh
TimeoutStopSec=infinity

[Install]
WantedBy=basic.target

/etc/local/libexec/quick-reprovision.sh

set -euo pipefail

readonly RUN_DIR=/run/quick-reprovision
readonly DEPLOY=/sysroot/ostree/deploy/fedora-coreos
readonly VAR="$DEPLOY/var"
readonly LOG="$VAR/reprovision.log"

# only run if ignition environment prepared and local-fs not active
[[ -d "$RUN_DIR" ]] && ! systemctl -q is-active local-fs.target || exit 0

# write stdout and stderr to log file
exec >"$LOG" 2>&1

###########################
## Create New Deployment ##
###########################

# get current deployment check sum
ostree_status="$(ostree admin status -J)"
current_check="$(jq -r '.deployments[] | select(.booted) | .checksum' <<< "$ostree_status")"
current_serial="$(jq -r '.deployments[] | select(.booted) | .serial' <<< "$ostree_status")"
current_deploy="$DEPLOY/deploy/${current_check}.${current_serial}"

# find current boot entry
for current_boot in /boot/loader/entries/*.conf; do
  ostree="$(sed -En 's/^options.* ostree=([^ ]+).*$/\1/p' "$current_boot")"
  depl="$(basename "$(readlink "$ostree")")"
  [[ "$depl" != "${current_check}.${current_serial}" ]] || break
done

# create new deployment
new_options="$(sed -En 's/^options ?(.*) ostree=[^ ]+ ?(.*)$/\1 \2/p' "$current_boot")"
args=(); for arg in $new_options; do [[ "$arg" == ostree=* ]] || args+=("--karg=$arg"); done
ostree admin deploy --retain --no-merge \
  --origin-file "${current_deploy}.origin" "${args[@]}" "$current_check"

# get new deployment info
ostree_status="$(ostree admin status -J)"
new_check="$(jq -r '.deployments[] | select(.pending) | .checksum' <<< "$ostree_status")"
new_serial="$(jq -r '.deployments[] | select(.pending) | .serial' <<< "$ostree_status")"
new_index="$(jq -r '.deployments[] | select(.pending) | .index' <<< "$ostree_status")"
new_deploy="$DEPLOY/deploy/${new_check}.${new_serial}"

# register undeployment in case of failure
success=false
cleanup_deployment() { $success || ostree admin undeploy "$new_index"; }
trap cleanup_deployment EXIT

#######################
## Execute Ignition  ##
#######################

# remount new deployment's /etc writable and register unmounting
mount -o bind,rw "$new_deploy/etc" "$new_deploy/etc"
cleanup_etc() { umount -l "$new_deploy/etc"; cleanup_deployment; }
trap cleanup_etc EXIT

# create new /var and register removal in case of failure
new_var="$(mktemp -dp "$VAR" reprovision.XXXX)" && chmod go+rx "$new_var"
cleanup_var() { rm --one-file-system -rf "$new_var"; cleanup_etc; }
trap cleanup_var EXIT

# assemble bind arguments for disks
mapfile -t disks < <(find /dev/disk -type l -exec realpath '{}' \; | sort -u)
disks+=(/dev/disk /dev/mapper)
[[ ! -d /dev/md ]] || disks+=(/dev/md)
disk_mnts=(); for disk in "${disks[@]}"; do disk_mnts+=(--bind="$disk"); done

# container wrapper
container() {
  systemd-nspawn --quiet --keep-unit --register=no --as-pid2 \
    --private-network -D "$RUN_DIR/container" \
    --bind=/run/systemd "${disk_mnts[@]}" --bind="$RUN_DIR/ignition:/run/ignition" \
    --bind="${new_deploy}:/sysroot" --bind="${new_var}:/sysroot/var" -- "$@"
}

# populate /var and run ignition
pf="$(. /run/ignition.env && echo "$PLATFORM_ID")"
common="--log-to-stdout --platform $pf --root /sysroot --config-cache /run/ignition/ignition.json"
container /usr/bin/sh -c \
  "ignition $common --stage mount && ignition-ostree-populate-var && ignition $common --stage files"

####################
## Swap Out Files ##
####################

# swap out files in /var
del="$(mktemp -dp "$VAR" delete.XXXX)"
find "$VAR" -mindepth 1 -maxdepth 1 \! \( -path "$del" -o -path "$new_var" -o -path "$LOG" \) \
  -exec mv -t "$del" -- "{}" +
find "$new_var" -mindepth 1 -maxdepth 1 -exec mv -t "$VAR" -- "{}" +

# sigal success to clean-up
echo >&2 '[info] successfully applied new ignition file'
success=true

I’m still planning to release this as a pyromaniac config after successfully deploying it to my own production systems.