Hecate

Hecate is the host-level bootstrap and power-protection service for Titan.

It runs on titan-db and handles:

  • Staged startup (including Flux/Gitea bootstrap deadlock fallback)
  • Graceful shutdown
  • UPS-driven automatic shutdown decisions based on discharge/runtime

Why host-level

A service inside Kubernetes cannot start a cluster that is fully down. Hecate runs outside the cluster under systemd, so it can always orchestrate bring-up.

Commands

  • hecate startup --config /etc/hecate/hecate.yaml --execute --force-flux-branch main
  • hecate shutdown --config /etc/hecate/hecate.yaml --execute
  • hecate daemon --config /etc/hecate/hecate.yaml
  • hecate status --config /etc/hecate/hecate.yaml

Manual install on titan-db

git clone git@gitea-admin:bstein/hecate.git
cd hecate
sudo ./scripts/install.sh
sudoedit /etc/hecate/hecate.yaml
sudo systemctl restart hecate.service

Bootstrap now (without reboot):

sudo systemctl start hecate-bootstrap.service

Preconditions on titan-db

  • kubectl installed and configured (kubeconfig path in config)
  • SSH reachability to all cluster nodes
  • Remote sudo rights to run:
    • systemctl start/stop k3s
    • systemctl start/stop k3s-agent
  • UPS telemetry available via NUT (upsc)

Config

See configs/hecate.example.yaml.

UPS auto-shutdown trigger uses:

  • runtime threshold = runtime_safety_factor * estimated_shutdown_budget
  • default safety factor 1.10
  • debounce across multiple polls to avoid noise

Estimated shutdown budget is derived from historical successful shutdown runs (/var/lib/hecate/runs.json) with default fallback from config.

Notes

  • Default behavior for startup and shutdown is dry-run unless --execute is set.
  • hecate-bootstrap.service is enabled to run at host boot and perform staged startup automatically.
Description
atlas cluster UPS manager and start/stop orchestration
Readme 2.1 MiB
Languages
Go 94.2%
Shell 4.4%
Python 1.4%