docs: restore titan-iac README scope and split ananke guidance

This commit is contained in:
Brad Stein 2026-04-08 19:22:56 -03:00
parent cfdd5a377d
commit cf6252c55a

View File

@ -1,80 +1,29 @@
# titan-iac # titan-iac
Flux-managed Kubernetes cluster config for bstein.dev. Flux-managed Kubernetes desired-state config for `bstein.dev`.
Canonical repo URL: Canonical source URL:
- `ssh://git@scm.bstein.dev:2242/bstein/titan-iac.git` - `ssh://git@scm.bstein.dev:2242/bstein/titan-iac.git`
## Why `ananke` ## Scope
`Ananke` is inevitability and constraint. That is exactly what this tooling is for: This repo contains cluster configuration consumed by Flux:
- power events happen - platform/infrastructure manifests
- recovery windows are finite - service manifests and kustomizations
- bootstrap has to be deterministic - operational scripts for render/reconcile workflows
The point is not clever automation. The point is boring, repeatable recovery. This repo is **not** the Ananke application source repo.
Ananke lives in `bstein/ananke` and orchestrates host-side shutdown/startup behavior around this desired state.
## Power Domains ## Validation workflow
Two UPS domains matter during shutdown/startup drills:
- `Statera`: `titan-23`, `titan-24`, `titan-jh`
- `Pyrphoros`: all other nodes
Default UPS checks in Ananke read from `Pyrphoros` (`pyrphoros@localhost`) unless overridden.
## Breakglass
If primary operator access is lost, breakglass is on the remote Magic Mirror.
## Ananke Commands
Ananke is the recovery orchestrator. Flux desired-state source remains `titan-iac.git`.
Use `titan-db` as the canonical control host. `tethys` (`titan-24`) is the backup operator host.
From `titan-db`:
```bash ```bash
~/ananke-cluster-power status kustomize build services/<app>
~/ananke-cluster-power prepare --execute kubectl apply --server-side --dry-run=client -k services/<app>
~/ananke-cluster-power shutdown --execute --require-ups-battery flux reconcile kustomization <name> --namespace flux-system --with-source
~/ananke-cluster-power startup --execute --force-flux-branch main --require-ups-battery
``` ```
From `tethys` / `titan-24` (delegating to `titan-db`): ## Apply model
```bash Use Git + Flux as the source of truth.
~/ananke-tools/cluster_power_console.sh --delegate-host titan-db status Avoid manual in-cluster edits for durable changes.
~/ananke-tools/cluster_power_console.sh --delegate-host titan-db prepare --execute
~/ananke-tools/cluster_power_console.sh --delegate-host titan-db shutdown --execute --require-ups-battery
~/ananke-tools/cluster_power_console.sh --delegate-host titan-db startup --execute --force-flux-branch main --require-ups-battery
```
## Shutdown Modes
`cluster_power_recovery.sh` supports two shutdown behaviors:
- `--shutdown-mode host-poweroff` (default): graceful cluster shutdown plus scheduled host poweroff.
- `--shutdown-mode cluster-only`: graceful cluster shutdown without host poweroff (stops `k3s` / `k3s-agent` only).
## Startup Completion Rules
Ananke startup is not “done” just because Flux says green once.
Startup now completes only after:
- Flux source drift checks pass (expected URL and branch)
- all non-optional Flux kustomizations report `Ready=True`
- external service checklist passes (default includes Gitea, Grafana, Harbor)
- generated ingress reachability checks pass (default accepted statuses: `200,301,302,307,308,401,403,404`)
- a stability soak window passes with no `CrashLoopBackOff` / image-pull failures and checklist still healthy
If you intentionally need to correct Flux source during recovery, use:
- `--force-flux-url ssh://git@scm.bstein.dev:2242/bstein/titan-iac.git`
- `--force-flux-branch main`
`--force-flux-url` is breakglass-only and requires `--allow-flux-source-mutation`.
The defaults live in:
- `scripts/bootstrap/recovery-config.env`
Detailed runbook:
- `knowledge/runbooks/cluster-power-recovery.md`