Hecate

Hecate is the host-level bootstrap and power-protection service for Titan.

It runs on titan-db and handles:

  • Staged startup (including Flux/Gitea bootstrap deadlock fallback)
  • Graceful shutdown
  • UPS-driven automatic shutdown decisions based on discharge/runtime
  • Multi-UPS operation via multiple Hecate instances (for example titan-db + tethys)
  • Full hardware poweroff sequencing after graceful Kubernetes shutdown

Why host-level

A service inside Kubernetes cannot start a cluster that is fully down. Hecate runs outside the cluster under systemd, so it can always orchestrate bring-up.

Commands

  • hecate startup --config /etc/hecate/hecate.yaml --execute --force-flux-branch main
  • hecate shutdown --config /etc/hecate/hecate.yaml --execute
  • hecate daemon --config /etc/hecate/hecate.yaml
  • hecate status --config /etc/hecate/hecate.yaml

Manual install on titan-db

git clone git@gitea-admin:bstein/hecate.git
cd hecate
sudo HECATE_ENABLE_BOOTSTRAP=1 ./scripts/install.sh
sudoedit /etc/hecate/hecate.yaml
sudo systemctl restart hecate.service

The installer is idempotent:

  • Re-runs safely on every update
  • Preserves existing /etc/hecate/hecate.yaml
  • Ensures required dependencies are installed (kubectl, nut-*, ssh, go, etc.)
  • Installs/refreshes systemd units and enables boot-time self-update
  • Applies declarative NUT + udev UPS configuration by default (can be tuned via env vars)

Installer knobs (optional):

  • HECATE_ENABLE_BOOTSTRAP=1 enables hecate-bootstrap.service on this host.
  • HECATE_MANAGE_NUT=0 skips writing NUT/udev files.
  • HECATE_NUT_UPS_NAME (default atlasups)
  • HECATE_NUT_VENDOR_ID / HECATE_NUT_PRODUCT_ID (defaults 0764 / 0601)
  • HECATE_NUT_MONITOR_USER / HECATE_NUT_MONITOR_PASSWORD (defaults monuser / atlasupsmon)

Bootstrap now (without reboot):

sudo systemctl start hecate-bootstrap.service

Preconditions on titan-db

  • kubectl installed and configured (kubeconfig path in config)
  • SSH reachability to all cluster nodes
  • Remote sudo rights to run:
    • systemctl start/stop k3s
    • systemctl start/stop k3s-agent
  • UPS telemetry available via NUT (upsc)

Multi-UPS topology

Recommended:

  • titan-db runs Hecate as the shutdown coordinator (local UPS target + local shutdown execution).
  • tethys runs Hecate with local UPS target and forwards shutdown triggers to titan-db.
  • If forwarding fails, fallback local shutdown can remain enabled.

Config

See configs/hecate.example.yaml.

UPS auto-shutdown trigger uses:

  • runtime threshold = runtime_safety_factor * estimated_shutdown_budget
  • default safety factor 1.10
  • debounce across multiple polls to avoid noise

Estimated shutdown budget is derived from historical successful shutdown runs (/var/lib/hecate/runs.json) with default fallback from config.

Power metrics:

  • Hecate exposes Prometheus metrics on :9560/metrics by default.
  • This is intended for a dedicated Grafana power dashboard and a high-level overview row.

Notes

  • Default behavior for startup and shutdown is dry-run unless --execute is set.
  • hecate-bootstrap.service is enabled to run at host boot and perform staged startup automatically.
  • HECATE_ENABLE_BOOTSTRAP=1 enables hecate-bootstrap.service (recommended on titan-db; keep disabled on non-coordinator hosts).
  • hecate-update.timer runs on boot and periodically to pull latest main and reinstall Hecate declaratively.
Description
atlas cluster UPS manager and start/stop orchestration
Readme 2.1 MiB
Languages
Go 94.2%
Shell 4.4%
Python 1.4%