docs: restore titan-iac README scope and split ananke guidance
This commit is contained in:
parent
cfdd5a377d
commit
cf6252c55a
83
README.md
83
README.md
@ -1,80 +1,29 @@
|
|||||||
# titan-iac
|
# titan-iac
|
||||||
|
|
||||||
Flux-managed Kubernetes cluster config for bstein.dev.
|
Flux-managed Kubernetes desired-state config for `bstein.dev`.
|
||||||
|
|
||||||
Canonical repo URL:
|
Canonical source URL:
|
||||||
- `ssh://git@scm.bstein.dev:2242/bstein/titan-iac.git`
|
- `ssh://git@scm.bstein.dev:2242/bstein/titan-iac.git`
|
||||||
|
|
||||||
## Why `ananke`
|
## Scope
|
||||||
|
|
||||||
`Ananke` is inevitability and constraint. That is exactly what this tooling is for:
|
This repo contains cluster configuration consumed by Flux:
|
||||||
- power events happen
|
- platform/infrastructure manifests
|
||||||
- recovery windows are finite
|
- service manifests and kustomizations
|
||||||
- bootstrap has to be deterministic
|
- operational scripts for render/reconcile workflows
|
||||||
|
|
||||||
The point is not clever automation. The point is boring, repeatable recovery.
|
This repo is **not** the Ananke application source repo.
|
||||||
|
Ananke lives in `bstein/ananke` and orchestrates host-side shutdown/startup behavior around this desired state.
|
||||||
|
|
||||||
## Power Domains
|
## Validation workflow
|
||||||
|
|
||||||
Two UPS domains matter during shutdown/startup drills:
|
|
||||||
- `Statera`: `titan-23`, `titan-24`, `titan-jh`
|
|
||||||
- `Pyrphoros`: all other nodes
|
|
||||||
|
|
||||||
Default UPS checks in Ananke read from `Pyrphoros` (`pyrphoros@localhost`) unless overridden.
|
|
||||||
|
|
||||||
## Breakglass
|
|
||||||
|
|
||||||
If primary operator access is lost, breakglass is on the remote Magic Mirror.
|
|
||||||
|
|
||||||
## Ananke Commands
|
|
||||||
|
|
||||||
Ananke is the recovery orchestrator. Flux desired-state source remains `titan-iac.git`.
|
|
||||||
|
|
||||||
Use `titan-db` as the canonical control host. `tethys` (`titan-24`) is the backup operator host.
|
|
||||||
|
|
||||||
From `titan-db`:
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
~/ananke-cluster-power status
|
kustomize build services/<app>
|
||||||
~/ananke-cluster-power prepare --execute
|
kubectl apply --server-side --dry-run=client -k services/<app>
|
||||||
~/ananke-cluster-power shutdown --execute --require-ups-battery
|
flux reconcile kustomization <name> --namespace flux-system --with-source
|
||||||
~/ananke-cluster-power startup --execute --force-flux-branch main --require-ups-battery
|
|
||||||
```
|
```
|
||||||
|
|
||||||
From `tethys` / `titan-24` (delegating to `titan-db`):
|
## Apply model
|
||||||
|
|
||||||
```bash
|
Use Git + Flux as the source of truth.
|
||||||
~/ananke-tools/cluster_power_console.sh --delegate-host titan-db status
|
Avoid manual in-cluster edits for durable changes.
|
||||||
~/ananke-tools/cluster_power_console.sh --delegate-host titan-db prepare --execute
|
|
||||||
~/ananke-tools/cluster_power_console.sh --delegate-host titan-db shutdown --execute --require-ups-battery
|
|
||||||
~/ananke-tools/cluster_power_console.sh --delegate-host titan-db startup --execute --force-flux-branch main --require-ups-battery
|
|
||||||
```
|
|
||||||
|
|
||||||
## Shutdown Modes
|
|
||||||
|
|
||||||
`cluster_power_recovery.sh` supports two shutdown behaviors:
|
|
||||||
- `--shutdown-mode host-poweroff` (default): graceful cluster shutdown plus scheduled host poweroff.
|
|
||||||
- `--shutdown-mode cluster-only`: graceful cluster shutdown without host poweroff (stops `k3s` / `k3s-agent` only).
|
|
||||||
|
|
||||||
## Startup Completion Rules
|
|
||||||
|
|
||||||
Ananke startup is not “done” just because Flux says green once.
|
|
||||||
|
|
||||||
Startup now completes only after:
|
|
||||||
- Flux source drift checks pass (expected URL and branch)
|
|
||||||
- all non-optional Flux kustomizations report `Ready=True`
|
|
||||||
- external service checklist passes (default includes Gitea, Grafana, Harbor)
|
|
||||||
- generated ingress reachability checks pass (default accepted statuses: `200,301,302,307,308,401,403,404`)
|
|
||||||
- a stability soak window passes with no `CrashLoopBackOff` / image-pull failures and checklist still healthy
|
|
||||||
|
|
||||||
If you intentionally need to correct Flux source during recovery, use:
|
|
||||||
- `--force-flux-url ssh://git@scm.bstein.dev:2242/bstein/titan-iac.git`
|
|
||||||
- `--force-flux-branch main`
|
|
||||||
|
|
||||||
`--force-flux-url` is breakglass-only and requires `--allow-flux-source-mutation`.
|
|
||||||
|
|
||||||
The defaults live in:
|
|
||||||
- `scripts/bootstrap/recovery-config.env`
|
|
||||||
|
|
||||||
Detailed runbook:
|
|
||||||
- `knowledge/runbooks/cluster-power-recovery.md`
|
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user