From cf6252c55aadf70f639a6726e396d2290a8464df Mon Sep 17 00:00:00 2001 From: Brad Stein Date: Wed, 8 Apr 2026 19:22:56 -0300 Subject: [PATCH] docs: restore titan-iac README scope and split ananke guidance --- README.md | 83 +++++++++++-------------------------------------------- 1 file changed, 16 insertions(+), 67 deletions(-) diff --git a/README.md b/README.md index 859ac256..c99975c7 100644 --- a/README.md +++ b/README.md @@ -1,80 +1,29 @@ # titan-iac -Flux-managed Kubernetes cluster config for bstein.dev. +Flux-managed Kubernetes desired-state config for `bstein.dev`. -Canonical repo URL: +Canonical source URL: - `ssh://git@scm.bstein.dev:2242/bstein/titan-iac.git` -## Why `ananke` +## Scope -`Ananke` is inevitability and constraint. That is exactly what this tooling is for: -- power events happen -- recovery windows are finite -- bootstrap has to be deterministic +This repo contains cluster configuration consumed by Flux: +- platform/infrastructure manifests +- service manifests and kustomizations +- operational scripts for render/reconcile workflows -The point is not clever automation. The point is boring, repeatable recovery. +This repo is **not** the Ananke application source repo. +Ananke lives in `bstein/ananke` and orchestrates host-side shutdown/startup behavior around this desired state. -## Power Domains - -Two UPS domains matter during shutdown/startup drills: -- `Statera`: `titan-23`, `titan-24`, `titan-jh` -- `Pyrphoros`: all other nodes - -Default UPS checks in Ananke read from `Pyrphoros` (`pyrphoros@localhost`) unless overridden. - -## Breakglass - -If primary operator access is lost, breakglass is on the remote Magic Mirror. - -## Ananke Commands - -Ananke is the recovery orchestrator. Flux desired-state source remains `titan-iac.git`. - -Use `titan-db` as the canonical control host. `tethys` (`titan-24`) is the backup operator host. - -From `titan-db`: +## Validation workflow ```bash -~/ananke-cluster-power status -~/ananke-cluster-power prepare --execute -~/ananke-cluster-power shutdown --execute --require-ups-battery -~/ananke-cluster-power startup --execute --force-flux-branch main --require-ups-battery +kustomize build services/ +kubectl apply --server-side --dry-run=client -k services/ +flux reconcile kustomization --namespace flux-system --with-source ``` -From `tethys` / `titan-24` (delegating to `titan-db`): +## Apply model -```bash -~/ananke-tools/cluster_power_console.sh --delegate-host titan-db status -~/ananke-tools/cluster_power_console.sh --delegate-host titan-db prepare --execute -~/ananke-tools/cluster_power_console.sh --delegate-host titan-db shutdown --execute --require-ups-battery -~/ananke-tools/cluster_power_console.sh --delegate-host titan-db startup --execute --force-flux-branch main --require-ups-battery -``` - -## Shutdown Modes - -`cluster_power_recovery.sh` supports two shutdown behaviors: -- `--shutdown-mode host-poweroff` (default): graceful cluster shutdown plus scheduled host poweroff. -- `--shutdown-mode cluster-only`: graceful cluster shutdown without host poweroff (stops `k3s` / `k3s-agent` only). - -## Startup Completion Rules - -Ananke startup is not “done” just because Flux says green once. - -Startup now completes only after: -- Flux source drift checks pass (expected URL and branch) -- all non-optional Flux kustomizations report `Ready=True` -- external service checklist passes (default includes Gitea, Grafana, Harbor) -- generated ingress reachability checks pass (default accepted statuses: `200,301,302,307,308,401,403,404`) -- a stability soak window passes with no `CrashLoopBackOff` / image-pull failures and checklist still healthy - -If you intentionally need to correct Flux source during recovery, use: -- `--force-flux-url ssh://git@scm.bstein.dev:2242/bstein/titan-iac.git` -- `--force-flux-branch main` - -`--force-flux-url` is breakglass-only and requires `--allow-flux-source-mutation`. - -The defaults live in: -- `scripts/bootstrap/recovery-config.env` - -Detailed runbook: -- `knowledge/runbooks/cluster-power-recovery.md` +Use Git + Flux as the source of truth. +Avoid manual in-cluster edits for durable changes.