Metis
Metis produces fully configured recovery SD cards for any node in the lab (RPi 4/5 workers, control plane Pis, amd64 nodes like tethys, titan-db, titan-jh, future titan-20/21, and non-cluster hosts). Goal: 1 command + insert SD → node rejoins with identical identity, network, k3s role/labels/taints, and pre-baked log/GC drop-ins.
Objectives
- Cross-platform (Linux + Windows) CLI/GUI with dead-simple UX.
- Pull class-specific golden images from Harbor (or other artifact store), inject per-node config, and write/verify SD cards.
- Minimal image set via node classes; inject per-node deltas at burn time.
- Idempotent bootstraps: hostname/IP, k3s server/agent setup, labels/taints, journald/log GC drop-ins, Longhorn mount validation, SSH keys/users.
- Works offline once artifacts are cached; verifies hashes/signatures before writing.
Planned high-level workflow
- Select target node (from inventory) + target disk.
- Tool downloads/caches the right golden image for that node class.
- Injects per-node config (net, k3s tokens/roles/labels/taints, SSH keys, runtime drop-ins, Longhorn mount metadata) and writes SD.
- Verifies write; prints next-step: "insert and power on." No manual follow-up.
Early design notes
- Implemented in Go for easy static builds and a lightweight GUI (e.g., Fyne or Wails) plus CLI.
- Inventory-driven: node classes (rpi5-ubuntu, rpi4-armbian-longhorn, rpi4-armbian-std, control-plane, amd64-agents, external hosts).
- Extensible per-node hooks for special hardware (Longhorn HDD UUIDs on titan-13/15/17/19; future titan-20/21; oceanus/titan-23; tethys/titan-jh/titan-db).
- Secure defaults: hash checking for downloaded images; avoids ever printing secrets; prepares k3s tokens/certs/keys via sealed source.
Repo layout (initial)
cmd/– CLI/GUI entrypointspkg/– shared lib (inventory, imaging, injectors, platform abstraction)docs/– user/operator docs (this will stay light; working notes live in AGENTS.md untracked)AGENTS.md– local, untracked working notes (do not add to git)
Current modes
metis plan --inventory inv.yaml --node titan-13 --device /dev/sdz --cache /tmp/metis-cacheprints the burn plan (respects--boot/--rootorMETIS_*envs for injection steps).metis burn ... --yesdownloads/verifies the golden image, writes it (dd for/dev/*, file copy otherwise), and injects node config when mounts are provided.- Pass
--boot /mnt/boot --root /mnt/root(or setMETIS_BOOT_PATH/METIS_ROOT_PATH) to drop hostname, k3s config, ssh keys, NoCloud user-data, and a debugetc/metis/node.jsoninto the mounted card. If unset, injection is skipped (write-only). --auto-mountattempts to mount/dev/*partitions (or loop images) automatically for injection on Linux (requires privileges).
- Pass
metis image --inventory inv.yaml --node titan-13 --output artifacts/titan-13.imgproduces a fully injected raw image artifact without writing to removable media.metis serveruns the operator-facing Metis service:- web UI for build/flash workflows
- Prometheus metrics on
/metrics - internal sentinel snapshot + watch endpoints
- Container images are split for gentler cluster operation:
metiscarries the flash/build toolchain and is intended to run ontitan-22metis-sentinelstays slim for the DaemonSet that samples node facts
- Class overlays: define
boot_overlay/root_overlayon a class to merge static files into boot/root at burn time (e.g., cloud-init/netplan drop-ins, GPU driver configs). Per-node config still injects hostname/IP/k3s/SSH/Longhorn. - Linux loop-mount helper (losetup/mount) exists for automation; wiring into CLI burn is next. Windows writer/GUI stub forthcoming.
- Vault: Metis can read per-node secrets from
secret/data/nodes/<hostname>using VAULT_ADDR plus either VAULT_TOKEN or AppRole (VAULT_ROLE_ID/VAULT_SECRET_ID). Expected fields: ssh_password, k3s_token, cloud_init, extra map. - Sentinel:
metis-sentinelcollects host facts and can either print them, write local history, or push them into the Metis service. The intended deployment shape is a DaemonSet on cluster nodes plus an Ariadne-triggered Metis watch that recomputes recommended class targets and drift history. - Facts aggregation:
metis facts --inventory inv.yaml --snapshots ./snapshotsreads sentinel snapshot JSON files and prints per-class drift summary (kernels, containerd, k3s, package samples). Use exported ConfigMaps orMETIS_SENTINEL_OUThistory as input. metis config --inventory inv.yaml --node titan-13prints the merged node config (hostname/IP/k3s labels/taints/Longhorn UUIDs).
Service direction
- Deployed UI protected by Atlas SSO headers (
admin/maintainer) - Default flash host support for
titan-22 - Recent build / flash / sentinel change history
- Ariadne-driven sentinel watch cadence
- Prometheus/Grafana visibility for Metis runs and tests
- CI test metrics share the
ariadne_ci_*series and are distinguished byrepo="metis"
- CI test metrics share the
Current deployment note: the service can fetch and verify the rpi4 base image from an official URL via METIS_IMAGE_RPI4_ARMBIAN_LONGHORN and METIS_IMAGE_RPI4_ARMBIAN_LONGHORN_SHA256, then cache it locally on the flash host. A mirrored Harbor-backed base image is still preferable long term, but it is no longer a prerequisite for Texas-side builds.
Next steps: publish the service images, add the SCM remote/repo for Metis, and broaden inventory coverage beyond the current Titan recovery classes.