2026-01-31 03:34:34 -03:00
# soteria
2026-06-19 15:46:22 -03:00
Soteria is the backup and restore console for Atlas PVCs.
2026-01-31 03:34:34 -03:00
2026-06-19 20:46:24 +00:00
Right now it is mainly built around Longhorn. It lists bound PVCs, starts backups, restores a backup into a new PVC, runs namespace-wide backup/restore jobs, and exposes backup health metrics for Grafana. It also has a small React UI so the common restore path does not require remembering the API by hand.
2026-01-31 03:34:34 -03:00
2026-06-19 20:46:24 +00:00
Soteria never overwrites an existing target PVC. Restore work is meant to be explicit and reversible.
2026-01-31 03:34:34 -03:00
2026-06-19 15:46:22 -03:00
## How it works
2026-01-31 03:34:34 -03:00
2026-06-19 20:46:24 +00:00
The service runs in-cluster and talks to Kubernetes plus the Longhorn backend. For each PVC it resolves the backing volume, asks Longhorn to snapshot/backup it, and records enough inventory for humans and dashboards to see whether the backup is fresh.
2026-04-12 11:09:49 -03:00
2026-06-19 20:46:24 +00:00
Policies are stored in a Kubernetes secret and evaluated on a timer. Metrics are published at `/metrics` ; the UI and API share the same backend.
The following are notes for future Brad.
2026-04-12 11:09:49 -03:00
2026-06-19 15:46:22 -03:00
Main endpoints:
2026-04-12 11:09:49 -03:00
2026-06-19 15:46:22 -03:00
- `GET /healthz` , `GET /readyz` , `GET /metrics`
2026-04-12 11:09:49 -03:00
- `GET /v1/inventory`
- `GET /v1/backups?namespace=<ns>&pvc=<name>`
- `POST /v1/backup`
2026-04-12 14:32:39 -03:00
- `POST /v1/backup/namespace`
2026-04-12 11:09:49 -03:00
- `POST /v1/restores`
2026-04-12 14:32:39 -03:00
- `POST /v1/restores/namespace`
2026-06-19 15:46:22 -03:00
- `GET|POST|DELETE /v1/policies`
2026-04-12 19:45:23 -03:00
- `GET /v1/b2`
2026-04-12 11:09:49 -03:00
2026-06-19 20:46:24 +00:00
When auth is enabled, Soteria expects trusted headers from the fronting proxy and checks `SOTERIA_ALLOWED_GROUPS` .
2026-04-12 11:09:49 -03:00
2026-06-19 15:46:22 -03:00
## Development
2026-01-31 03:34:34 -03:00
2026-06-19 15:46:22 -03:00
```bash
go test ./...
./scripts/check.sh
2026-01-31 03:34:34 -03:00
```
2026-06-19 20:46:24 +00:00
The local deploy manifests live in `deploy/` . Production wiring should still go through the Flux repo, not one-off cluster edits.