soteria/README.md

53 lines
2.2 KiB
Markdown
Raw Normal View History

2026-01-31 03:34:34 -03:00
# soteria
2026-06-19 15:46:22 -03:00
Soteria is the backup and restore console for Atlas PVCs.
2026-01-31 03:34:34 -03:00
2026-06-19 20:46:24 +00:00
Right now it is mainly built around Longhorn. It lists bound PVCs, starts backups, restores a backup into a new PVC, runs namespace-wide backup/restore jobs, and exposes backup health metrics for Grafana. It also has a small React UI so the common restore path does not require remembering the API by hand.
2026-01-31 03:34:34 -03:00
2026-06-19 20:46:24 +00:00
Soteria never overwrites an existing target PVC. Restore work is meant to be explicit and reversible.
2026-01-31 03:34:34 -03:00
2026-06-19 15:46:22 -03:00
## How it works
2026-01-31 03:34:34 -03:00
2026-06-19 20:46:24 +00:00
The service runs in-cluster and talks to Kubernetes plus the Longhorn backend. For each PVC it resolves the backing volume, asks Longhorn to snapshot/backup it, and records enough inventory for humans and dashboards to see whether the backup is fresh.
2026-06-19 20:46:24 +00:00
Policies are stored in a Kubernetes secret and evaluated on a timer. Metrics are published at `/metrics`; the UI and API share the same backend.
The following are notes for future Brad.
2026-06-19 20:55:56 +00:00
## Operational Dependencies
2026-06-19 20:55:56 +00:00
Soteria needs:
- Kubernetes API, service DNS, and Soteria's service account/RBAC
- Longhorn managers/backend service reachable at `SOTERIA_LONGHORN_URL`
- Longhorn backup target already configured and reachable if backups/restores are expected to work
- the `soteria` namespace and the Kubernetes secret used for backup policies
- ingress/proxy/auth headers if the UI is exposed with auth enabled
- B2 credentials and bucket access only if B2 usage reporting is enabled
- Prometheus/Grafana only for visibility; the service can run without dashboards
In a total bring-up, fix nodes, disks, Longhorn, and Flux first. Start using Soteria once PVC inventory is visible and Longhorn can make backups without being nursed.
2026-06-19 15:46:22 -03:00
Main endpoints:
2026-06-19 15:46:22 -03:00
- `GET /healthz`, `GET /readyz`, `GET /metrics`
- `GET /v1/inventory`
- `GET /v1/backups?namespace=<ns>&pvc=<name>`
- `POST /v1/backup`
- `POST /v1/backup/namespace`
- `POST /v1/restores`
- `POST /v1/restores/namespace`
2026-06-19 15:46:22 -03:00
- `GET|POST|DELETE /v1/policies`
- `GET /v1/b2`
2026-06-19 20:46:24 +00:00
When auth is enabled, Soteria expects trusted headers from the fronting proxy and checks `SOTERIA_ALLOWED_GROUPS`.
2026-06-19 15:46:22 -03:00
## Development
2026-01-31 03:34:34 -03:00
2026-06-19 15:46:22 -03:00
```bash
go test ./...
./scripts/check.sh
2026-01-31 03:34:34 -03:00
```
2026-06-19 20:46:24 +00:00
The local deploy manifests live in `deploy/`. Production wiring should still go through the Flux repo, not one-off cluster edits.