# ananke `ananke` is the host-side power + bootstrap orchestrator for Titan. It runs outside Kubernetes (systemd on host), so it can: - shut the cluster down gracefully before battery/runtime redlines - bring the cluster back after power returns - recover common Flux/Kustomize startup deadlocks - validate service health from the outside before declaring startup done ## Why `ananke` I wanted a name that fits Titan/mythology, but also describes what this service actually does. In Greek myth, **Ananke** is inevitability/necessity. That matches this tool: when power events happen, graceful sequencing is not optional. UPS names in this cluster are also part of the story: - `Statera`: powers `titan-23`, `titan-24`, `titan-jh` - `Pyrphoros`: powers all other nodes ## Breakglass reminder Vault unseal breakglass is wired for remote retrieval (magic mirror host). If local key retrieval fails, Ananke can use the configured breakglass command. ## What “startup complete” means now Ananke does **not** stop at “Flux says Ready”. Startup only completes when all configured gates pass: - node inventory preflight passes (host mapping + ssh user + port sanity) - node SSH auth gate passes (real command execution, not just TCP) - Flux source drift guard passes (`expected_flux_source_url` + branch expectation) - Flux kustomizations are healthy - controller convergence is healthy (deployments/statefulsets/daemonsets) - ingress checklist passes (all discovered ingress hosts reachable with accepted status) - external service checklist passes (Gitea, Grafana, Keycloak OIDC, Harbor registry auth challenge, Longhorn auth redirect) - stability soak window passes (no regressions, no CrashLoop/ImagePull failures) During startup, Ananke also auto-heals known failure patterns (stuck controller pods, immutable Flux Jobs, critical workloads scaled to zero) and writes a report: - `/var/lib/ananke/last-startup-report.json` If any gate fails, startup is blocked with a concrete reason. ## Command quick sheet From `titan-db` (coordinator): ```bash sudo /usr/local/bin/ananke status --config /etc/ananke/ananke.yaml sudo /usr/local/bin/ananke startup --config /etc/ananke/ananke.yaml --execute --force-flux-branch main sudo /usr/local/bin/ananke shutdown --config /etc/ananke/ananke.yaml --execute --reason graceful-maintenance --mode cluster-only sudo /usr/local/bin/ananke shutdown --config /etc/ananke/ananke.yaml --execute --reason emergency-power --mode poweroff --skip-drain --skip-etcd-snapshot ``` From `titan-24` (`tethys` peer): ```bash sudo /usr/local/bin/ananke shutdown --config /etc/ananke/ananke.yaml --execute --reason graceful-maintenance --mode cluster-only ``` Systemd: ```bash sudo systemctl status ananke.service sudo systemctl start ananke-bootstrap.service sudo systemctl start ananke-update.service ``` ## Shutdown modes (explicit) `ananke shutdown` now supports explicit mode selection: - default behavior is `cluster-only` (host poweroff is not performed) - `--mode config`: use config default (`shutdown.poweroff_enabled`) - `--mode cluster-only`: stop cluster services only (no host poweroff) - `--mode poweroff`: include host poweroff path (explicit only) This removes ambiguity during drills. ## Config file Primary path: - `/etc/ananke/ananke.yaml` Core settings to keep accurate: - `expected_flux_branch` - `expected_flux_source_url` - `startup.require_node_ssh_auth` - `startup.require_ingress_checklist` - `startup.service_checklist` - `startup.service_checklist_stability_seconds` - `startup.ignore_unavailable_nodes` (for planned temporary node outages) - `coordination.role`, `coordination.peer_hosts` ## Install / update ```bash sudo ./scripts/install.sh ``` Installer behavior: - builds and installs `/usr/local/bin/ananke` - installs `ananke*.service` units - migrates and enforces current `ananke` config/state paths ## Notes - Apply changes through Git/Flux manifests; avoid manual in-cluster edits for durable changes. - For controlled shutdown/startup drills, treat any manual intervention as a bug and fold the logic back into Ananke.