ananke

ananke is the host-side power + bootstrap orchestrator for Titan.

It runs outside Kubernetes (systemd on host), so it can:

  • shut the cluster down gracefully before battery/runtime redlines
  • bring the cluster back after power returns
  • recover common Flux/Kustomize startup deadlocks
  • validate service health from the outside before declaring startup done

Why ananke

I wanted a name that fits Titan/mythology, but also describes what this service actually does.

In Greek myth, Ananke is inevitability/necessity. That matches this tool: when power events happen, graceful sequencing is not optional.

UPS names in this cluster are also part of the story:

  • Statera: powers titan-23, titan-24, titan-jh
  • Pyrphoros: powers all other nodes

Breakglass reminder

Vault unseal breakglass is wired for remote retrieval (magic mirror host). If local key retrieval fails, Ananke can use the configured breakglass command.

What “startup complete” means now

Ananke does not stop at “Flux says Ready”. Startup only completes when all configured gates pass:

  • Flux source drift guard passes (expected_flux_source_url + branch expectation)
  • Flux kustomizations are healthy
  • controller convergence is healthy (deployments/statefulsets/daemonsets)
  • external service checklist passes (Gitea, Grafana, Keycloak OIDC, Harbor registry auth challenge, Longhorn auth redirect)
  • stability soak window passes (no regressions, no CrashLoop/ImagePull failures)

If any gate fails, startup is blocked with a concrete reason.

Command quick sheet

From titan-db (coordinator):

sudo /usr/local/bin/ananke status --config /etc/ananke/ananke.yaml
sudo /usr/local/bin/ananke startup --config /etc/ananke/ananke.yaml --execute --force-flux-branch main
sudo /usr/local/bin/ananke shutdown --config /etc/ananke/ananke.yaml --execute --reason graceful-maintenance --mode cluster-only
sudo /usr/local/bin/ananke shutdown --config /etc/ananke/ananke.yaml --execute --reason emergency-power --mode poweroff --skip-drain --skip-etcd-snapshot

From titan-24 (tethys peer):

sudo /usr/local/bin/ananke shutdown --config /etc/ananke/ananke.yaml --execute --reason graceful-maintenance --mode cluster-only

Systemd:

sudo systemctl status ananke.service
sudo systemctl start ananke-bootstrap.service
sudo systemctl start ananke-update.service

Shutdown modes (explicit)

ananke shutdown now supports explicit mode selection:

  • default behavior is cluster-only (host poweroff is not performed)
  • --mode config: use config default (shutdown.poweroff_enabled)
  • --mode cluster-only: stop cluster services only (no host poweroff)
  • --mode poweroff: include host poweroff path (explicit only)

This removes ambiguity during drills.

Config file

Primary path:

  • /etc/ananke/ananke.yaml

Core settings to keep accurate:

  • expected_flux_branch
  • expected_flux_source_url
  • startup.service_checklist
  • startup.service_checklist_stability_seconds
  • startup.ignore_unavailable_nodes (for planned temporary node outages)
  • coordination.role, coordination.peer_hosts

Install / update

sudo ./scripts/install.sh

Installer behavior:

  • builds and installs /usr/local/bin/ananke
  • installs ananke*.service units
  • migrates and enforces current ananke config/state paths

Notes

  • Apply changes through Git/Flux manifests; avoid manual in-cluster edits for durable changes.
  • For controlled shutdown/startup drills, treat any manual intervention as a bug and fold the logic back into Ananke.
Description
atlas cluster UPS manager and start/stop orchestration
Readme 2.1 MiB
Languages
Go 94.2%
Shell 4.4%
Python 1.4%