3.0 KiB
3.0 KiB
Hecate
Hecate is the host-level bootstrap and power-protection service for Titan.
It runs on titan-db and handles:
- Staged startup (including Flux/Gitea bootstrap deadlock fallback)
- Graceful shutdown
- UPS-driven automatic shutdown decisions based on discharge/runtime
- Multi-UPS operation via multiple Hecate instances (for example
titan-db+tethys) - Full hardware poweroff sequencing after graceful Kubernetes shutdown
Why host-level
A service inside Kubernetes cannot start a cluster that is fully down. Hecate runs outside the cluster under systemd, so it can always orchestrate bring-up.
Commands
hecate startup --config /etc/hecate/hecate.yaml --execute --force-flux-branch mainhecate shutdown --config /etc/hecate/hecate.yaml --executehecate daemon --config /etc/hecate/hecate.yamlhecate status --config /etc/hecate/hecate.yaml
Manual install on titan-db
git clone git@gitea-admin:bstein/hecate.git
cd hecate
sudo HECATE_ENABLE_BOOTSTRAP=1 ./scripts/install.sh
sudoedit /etc/hecate/hecate.yaml
sudo systemctl restart hecate.service
The installer is idempotent:
- Re-runs safely on every update
- Preserves existing
/etc/hecate/hecate.yaml - Ensures required dependencies are installed (
kubectl,nut-*,ssh,go, etc.) - Installs/refreshes systemd units and enables boot-time self-update
Bootstrap now (without reboot):
sudo systemctl start hecate-bootstrap.service
Preconditions on titan-db
kubectlinstalled and configured (kubeconfigpath in config)- SSH reachability to all cluster nodes
- Remote sudo rights to run:
systemctl start/stop k3ssystemctl start/stop k3s-agent
- UPS telemetry available via NUT (
upsc)
Multi-UPS topology
Recommended:
titan-dbruns Hecate as the shutdown coordinator (local UPS target + local shutdown execution).tethysruns Hecate with local UPS target and forwards shutdown triggers totitan-db.- If forwarding fails, fallback local shutdown can remain enabled.
Config
See configs/hecate.example.yaml.
UPS auto-shutdown trigger uses:
- runtime threshold =
runtime_safety_factor * estimated_shutdown_budget - default safety factor
1.10 - debounce across multiple polls to avoid noise
Estimated shutdown budget is derived from historical successful shutdown runs (/var/lib/hecate/runs.json) with default fallback from config.
Power metrics:
- Hecate exposes Prometheus metrics on
:9560/metricsby default. - This is intended for a dedicated Grafana power dashboard and a high-level overview row.
Notes
- Default behavior for
startupandshutdownis dry-run unless--executeis set. hecate-bootstrap.serviceis enabled to run at host boot and perform staged startup automatically.HECATE_ENABLE_BOOTSTRAP=1enableshecate-bootstrap.service(recommended ontitan-db; keep disabled on non-coordinator hosts).hecate-update.timerruns on boot and periodically to pull latestmainand reinstall Hecate declaratively.