# Hecate Hecate is the host-level bootstrap and power-protection service for Titan. It runs on `titan-db` and handles: - Staged **startup** (including Flux/Gitea bootstrap deadlock fallback) - Graceful **shutdown** - UPS-driven automatic shutdown decisions based on discharge/runtime - Multi-UPS operation via multiple Hecate instances (for example `titan-db` + `tethys`) - Full hardware poweroff sequencing after graceful Kubernetes shutdown ## Why host-level A service inside Kubernetes cannot start a cluster that is fully down. Hecate runs outside the cluster under systemd, so it can always orchestrate bring-up. ## Commands - `hecate startup --config /etc/hecate/hecate.yaml --execute --force-flux-branch main` - `hecate shutdown --config /etc/hecate/hecate.yaml --execute` - `hecate daemon --config /etc/hecate/hecate.yaml` - `hecate status --config /etc/hecate/hecate.yaml` ## Manual install on titan-db ```bash git clone git@gitea-admin:bstein/hecate.git cd hecate sudo HECATE_ENABLE_BOOTSTRAP=1 ./scripts/install.sh sudoedit /etc/hecate/hecate.yaml sudo systemctl restart hecate.service ``` The installer is idempotent: - Re-runs safely on every update - Preserves existing `/etc/hecate/hecate.yaml` - Ensures required dependencies are installed (`kubectl`, `nut-*`, `ssh`, `go`, etc.) - Installs/refreshes systemd units and enables boot-time self-update - Applies declarative NUT + udev UPS configuration by default (can be tuned via env vars) Installer knobs (optional): - `HECATE_ENABLE_BOOTSTRAP=1` enables `hecate-bootstrap.service` on this host. - `HECATE_ENABLE_BOOTSTRAP=0` disables it; default `auto` preserves current bootstrap enablement state. - `HECATE_MANAGE_NUT=0` skips writing NUT/udev files. - `HECATE_NUT_UPS_NAME` (default `atlasups`) - `HECATE_NUT_VENDOR_ID` / `HECATE_NUT_PRODUCT_ID` (defaults `0764` / `0601`) - `HECATE_NUT_MONITOR_USER` / `HECATE_NUT_MONITOR_PASSWORD` (defaults `monuser` / `atlasupsmon`) Bootstrap now (without reboot): ```bash sudo systemctl start hecate-bootstrap.service ``` ## Preconditions on titan-db - `kubectl` installed and configured (`kubeconfig` path in config) - SSH reachability to all cluster nodes - Remote sudo rights to run: - `systemctl start/stop k3s` - `systemctl start/stop k3s-agent` - UPS telemetry available via NUT (`upsc`) ## Multi-UPS topology Recommended: - `titan-db` runs Hecate as the shutdown coordinator (local UPS target + local shutdown execution). - `tethys` runs Hecate with local UPS target and forwards shutdown triggers to `titan-db`. - If forwarding fails, fallback local shutdown can remain enabled. ## Config See `configs/hecate.example.yaml`. UPS auto-shutdown trigger uses: - runtime threshold = `runtime_safety_factor * estimated_shutdown_budget` - default safety factor `1.10` - debounce across multiple polls to avoid noise Estimated shutdown budget is derived from historical successful shutdown runs (`/var/lib/hecate/runs.json`) with default fallback from config. Power metrics: - Hecate exposes Prometheus metrics on `:9560/metrics` by default. - This is intended for a dedicated Grafana power dashboard and a high-level overview row. ## Notes - Default behavior for `startup` and `shutdown` is dry-run unless `--execute` is set. - `hecate-bootstrap.service` is enabled to run at host boot and perform staged startup automatically. - `HECATE_ENABLE_BOOTSTRAP=1` enables `hecate-bootstrap.service` (recommended on `titan-db`; keep disabled on non-coordinator hosts). - `hecate-update.timer` runs on boot and periodically to pull latest `main` and reinstall Hecate declaratively.