# titan-iac Flux-managed Kubernetes cluster config for bstein.dev. Canonical repo URL: - `ssh://git@scm.bstein.dev:2242/bstein/titan-iac.git` ## Why `ananke` `Ananke` is inevitability and constraint. That is exactly what this tooling is for: - power events happen - recovery windows are finite - bootstrap has to be deterministic The point is not clever automation. The point is boring, repeatable recovery. ## Power Domains Two UPS domains matter during shutdown/startup drills: - `Statera`: `titan-23`, `titan-24`, `titan-jh` - `Pyrphoros`: all other nodes Default UPS checks in Ananke read from `Pyrphoros` (`pyrphoros@localhost`) unless overridden. ## Breakglass If primary operator access is lost, breakglass is on the remote Magic Mirror. ## Ananke Commands Ananke is the recovery orchestrator. Flux desired-state source remains `titan-iac.git`. Use `titan-db` as the canonical control host. `tethys` (`titan-24`) is the backup operator host. From `titan-db`: ```bash ~/ananke-cluster-power status ~/ananke-cluster-power prepare --execute ~/ananke-cluster-power shutdown --execute --require-ups-battery ~/ananke-cluster-power startup --execute --force-flux-branch main --require-ups-battery ``` From `tethys` / `titan-24` (delegating to `titan-db`): ```bash ~/ananke-tools/cluster_power_console.sh --delegate-host titan-db status ~/ananke-tools/cluster_power_console.sh --delegate-host titan-db prepare --execute ~/ananke-tools/cluster_power_console.sh --delegate-host titan-db shutdown --execute --require-ups-battery ~/ananke-tools/cluster_power_console.sh --delegate-host titan-db startup --execute --force-flux-branch main --require-ups-battery ``` ## Shutdown Modes `cluster_power_recovery.sh` supports two shutdown behaviors: - `--shutdown-mode host-poweroff` (default): graceful cluster shutdown plus scheduled host poweroff. - `--shutdown-mode cluster-only`: graceful cluster shutdown without host poweroff (stops `k3s` / `k3s-agent` only). ## Startup Completion Rules Ananke startup is not “done” just because Flux says green once. Startup now completes only after: - Flux source drift checks pass (expected URL and branch) - all non-optional Flux kustomizations report `Ready=True` - external service checklist passes (default includes Gitea, Grafana, Harbor) - generated ingress reachability checks pass (default accepted statuses: `200,301,302,307,308,401,403,404`) - a stability soak window passes with no `CrashLoopBackOff` / image-pull failures and checklist still healthy If you intentionally need to correct Flux source during recovery, use: - `--force-flux-url ssh://git@scm.bstein.dev:2242/bstein/titan-iac.git` - `--force-flux-branch main` `--force-flux-url` is breakglass-only and requires `--allow-flux-source-mutation`. The defaults live in: - `scripts/bootstrap/recovery-config.env` Detailed runbook: - `knowledge/runbooks/cluster-power-recovery.md`