2.2 KiB
soteria
Soteria is the backup and restore console for Atlas PVCs.
Right now it is mainly built around Longhorn. It lists bound PVCs, starts backups, restores a backup into a new PVC, runs namespace-wide backup/restore jobs, and exposes backup health metrics for Grafana. It also has a small React UI so the common restore path does not require remembering the API by hand.
Soteria never overwrites an existing target PVC. Restore work is meant to be explicit and reversible.
How it works
The service runs in-cluster and talks to Kubernetes plus the Longhorn backend. For each PVC it resolves the backing volume, asks Longhorn to snapshot/backup it, and records enough inventory for humans and dashboards to see whether the backup is fresh.
Policies are stored in a Kubernetes secret and evaluated on a timer. Metrics are published at /metrics; the UI and API share the same backend.
The following are notes for future Brad.
Operational Dependencies
Soteria needs:
- Kubernetes API, service DNS, and Soteria's service account/RBAC
- Longhorn managers/backend service reachable at
SOTERIA_LONGHORN_URL - Longhorn backup target already configured and reachable if backups/restores are expected to work
- the
soterianamespace and the Kubernetes secret used for backup policies - ingress/proxy/auth headers if the UI is exposed with auth enabled
- B2 credentials and bucket access only if B2 usage reporting is enabled
- Prometheus/Grafana only for visibility; the service can run without dashboards
In a total bring-up, fix nodes, disks, Longhorn, and Flux first. Start using Soteria once PVC inventory is visible and Longhorn can make backups without being nursed.
Main endpoints:
GET /healthz,GET /readyz,GET /metricsGET /v1/inventoryGET /v1/backups?namespace=<ns>&pvc=<name>POST /v1/backupPOST /v1/backup/namespacePOST /v1/restoresPOST /v1/restores/namespaceGET|POST|DELETE /v1/policiesGET /v1/b2
When auth is enabled, Soteria expects trusted headers from the fronting proxy and checks SOTERIA_ALLOWED_GROUPS.
Development
go test ./...
./scripts/check.sh
The local deploy manifests live in deploy/. Production wiring should still go through the Flux repo, not one-off cluster edits.