ananke/README.md

62 lines
1.8 KiB
Markdown
Raw Normal View History

# Hecate
Hecate is the host-level bootstrap and power-protection service for Titan.
It runs on `titan-db` and handles:
- Staged **startup** (including Flux/Gitea bootstrap deadlock fallback)
- Graceful **shutdown**
- UPS-driven automatic shutdown decisions based on discharge/runtime
## Why host-level
A service inside Kubernetes cannot start a cluster that is fully down.
Hecate runs outside the cluster under systemd, so it can always orchestrate bring-up.
## Commands
- `hecate startup --config /etc/hecate/hecate.yaml --execute --force-flux-branch main`
- `hecate shutdown --config /etc/hecate/hecate.yaml --execute`
- `hecate daemon --config /etc/hecate/hecate.yaml`
- `hecate status --config /etc/hecate/hecate.yaml`
## Manual install on titan-db
```bash
git clone git@gitea-admin:bstein/hecate.git
cd hecate
sudo ./scripts/install.sh
sudoedit /etc/hecate/hecate.yaml
sudo systemctl restart hecate.service
```
Bootstrap now (without reboot):
```bash
sudo systemctl start hecate-bootstrap.service
```
## Preconditions on titan-db
- `kubectl` installed and configured (`kubeconfig` path in config)
- SSH reachability to all cluster nodes
- Remote sudo rights to run:
- `systemctl start/stop k3s`
- `systemctl start/stop k3s-agent`
- UPS telemetry available via NUT (`upsc`)
## Config
See `configs/hecate.example.yaml`.
UPS auto-shutdown trigger uses:
- runtime threshold = `runtime_safety_factor * estimated_shutdown_budget`
- default safety factor `1.10`
- debounce across multiple polls to avoid noise
Estimated shutdown budget is derived from historical successful shutdown runs (`/var/lib/hecate/runs.json`) with default fallback from config.
## Notes
- Default behavior for `startup` and `shutdown` is dry-run unless `--execute` is set.
- `hecate-bootstrap.service` is enabled to run at host boot and perform staged startup automatically.