titan-iac
Flux-managed Kubernetes cluster config for bstein.dev.
Canonical repo URL:
ssh://git@scm.bstein.dev:2242/bstein/titan-iac.git
Why ananke
Ananke is inevitability and constraint. That is exactly what this tooling is for:
- power events happen
- recovery windows are finite
- bootstrap has to be deterministic
The point is not clever automation. The point is boring, repeatable recovery.
Power Domains
Two UPS domains matter during shutdown/startup drills:
Statera:titan-23,titan-24,titan-jhPyrphoros: all other nodes
Default UPS checks in Ananke read from Pyrphoros (pyrphoros@localhost) unless overridden.
Breakglass
If primary operator access is lost, breakglass is on the remote Magic Mirror.
Ananke Commands
Ananke is the recovery orchestrator. Flux desired-state source remains titan-iac.git.
Use titan-db as the canonical control host. tethys (titan-24) is the backup operator host.
From titan-db:
~/ananke-cluster-power status
~/ananke-cluster-power prepare --execute
~/ananke-cluster-power shutdown --execute --require-ups-battery
~/ananke-cluster-power startup --execute --force-flux-branch main --require-ups-battery
From tethys / titan-24 (delegating to titan-db):
~/ananke-tools/cluster_power_console.sh --delegate-host titan-db status
~/ananke-tools/cluster_power_console.sh --delegate-host titan-db prepare --execute
~/ananke-tools/cluster_power_console.sh --delegate-host titan-db shutdown --execute --require-ups-battery
~/ananke-tools/cluster_power_console.sh --delegate-host titan-db startup --execute --force-flux-branch main --require-ups-battery
Shutdown Modes
cluster_power_recovery.sh supports two shutdown behaviors:
--shutdown-mode host-poweroff(default): graceful cluster shutdown plus scheduled host poweroff.--shutdown-mode cluster-only: graceful cluster shutdown without host poweroff (stopsk3s/k3s-agentonly).
Startup Completion Rules
Ananke startup is not “done” just because Flux says green once.
Startup now completes only after:
- Flux source drift checks pass (expected URL and branch)
- all non-optional Flux kustomizations report
Ready=True - external service checklist passes (default includes Gitea, Grafana, Harbor)
- generated ingress reachability checks pass (default accepted statuses:
200,301,302,307,308,401,403,404) - a stability soak window passes with no
CrashLoopBackOff/ image-pull failures and checklist still healthy
If you intentionally need to correct Flux source during recovery, use:
--force-flux-url ssh://git@scm.bstein.dev:2242/bstein/titan-iac.git--force-flux-branch main
--force-flux-url is breakglass-only and requires --allow-flux-source-mutation.
The defaults live in:
scripts/bootstrap/recovery-config.env
Detailed runbook:
knowledge/runbooks/cluster-power-recovery.md