soteria/README.md

# soteria

Soteria is the backup and restore console for Atlas PVCs.

Right now it is mainly built around Longhorn. It lists bound PVCs, starts backups, restores a backup into a new PVC, runs namespace-wide backup/restore jobs, and exposes backup health metrics for Grafana. It also has a small React UI so the common restore path does not require remembering the API by hand.

Soteria never overwrites an existing target PVC. Restore work is meant to be explicit and reversible.

## How it works

The service runs in-cluster and talks to Kubernetes plus the Longhorn backend. For each PVC it resolves the backing volume, asks Longhorn to snapshot/backup it, and records enough inventory for humans and dashboards to see whether the backup is fresh.

Policies are stored in a Kubernetes secret and evaluated on a timer. Metrics are published at `/metrics`; the UI and API share the same backend.

The following are notes for future Brad.

## Operational Dependencies

Soteria needs:

- Kubernetes API, service DNS, and Soteria's service account/RBAC
- Longhorn managers/backend service reachable at `SOTERIA_LONGHORN_URL`
- Longhorn backup target already configured and reachable if backups/restores are expected to work
- the `soteria` namespace and the Kubernetes secret used for backup policies
- ingress/proxy/auth headers if the UI is exposed with auth enabled
- B2 credentials and bucket access only if B2 usage reporting is enabled
- Prometheus/Grafana only for visibility; the service can run without dashboards

In a total bring-up, fix nodes, disks, Longhorn, and Flux first. Start using Soteria once PVC inventory is visible and Longhorn can make backups without being nursed.

Main endpoints:

- `GET /healthz`, `GET /readyz`, `GET /metrics`
- `GET /v1/inventory`
- `GET /v1/backups?namespace=<ns>&pvc=<name>`
- `POST /v1/backup`
- `POST /v1/backup/namespace`
- `POST /v1/restores`
- `POST /v1/restores/namespace`
- `GET|POST|DELETE /v1/policies`
- `GET /v1/b2`

When auth is enabled, Soteria expects trusted headers from the fronting proxy and checks `SOTERIA_ALLOWED_GROUPS`.

## Development

```bash
go test ./...
./scripts/check.sh
```

The local deploy manifests live in `deploy/`. Production wiring should still go through the Flux repo, not one-off cluster edits.
init soteria service 2026-01-31 03:34:34 -03:00			`# soteria`

docs: shorten soteria README 2026-06-19 15:46:22 -03:00			`Soteria is the backup and restore console for Atlas PVCs.`
init soteria service 2026-01-31 03:34:34 -03:00
Update README.md 2026-06-19 20:46:24 +00:00			`Right now it is mainly built around Longhorn. It lists bound PVCs, starts backups, restores a backup into a new PVC, runs namespace-wide backup/restore jobs, and exposes backup health metrics for Grafana. It also has a small React UI so the common restore path does not require remembering the API by hand.`
init soteria service 2026-01-31 03:34:34 -03:00
Update README.md 2026-06-19 20:46:24 +00:00			`Soteria never overwrites an existing target PVC. Restore work is meant to be explicit and reversible.`
init soteria service 2026-01-31 03:34:34 -03:00
docs: shorten soteria README 2026-06-19 15:46:22 -03:00			`## How it works`
init soteria service 2026-01-31 03:34:34 -03:00
Update README.md 2026-06-19 20:46:24 +00:00			`The service runs in-cluster and talks to Kubernetes plus the Longhorn backend. For each PVC it resolves the backing volume, asks Longhorn to snapshot/backup it, and records enough inventory for humans and dashboards to see whether the backup is fresh.`
backup: add pvc inventory, restore UI, and metrics baseline 2026-04-12 11:09:49 -03:00
Update README.md 2026-06-19 20:46:24 +00:00			Policies are stored in a Kubernetes secret and evaluated on a timer. Metrics are published at `/metrics`; the UI and API share the same backend.

			`The following are notes for future Brad.`
backup: add pvc inventory, restore UI, and metrics baseline 2026-04-12 11:09:49 -03:00
Update README.md 2026-06-19 20:55:56 +00:00			`## Operational Dependencies`
docs: note soteria bring-up dependencies 2026-06-19 17:53:25 -03:00
Update README.md 2026-06-19 20:55:56 +00:00			`Soteria needs:`
docs: note soteria bring-up dependencies 2026-06-19 17:53:25 -03:00
			`- Kubernetes API, service DNS, and Soteria's service account/RBAC`
			- Longhorn managers/backend service reachable at `SOTERIA_LONGHORN_URL`
			`- Longhorn backup target already configured and reachable if backups/restores are expected to work`
			- the `soteria` namespace and the Kubernetes secret used for backup policies
			`- ingress/proxy/auth headers if the UI is exposed with auth enabled`
			`- B2 credentials and bucket access only if B2 usage reporting is enabled`
			`- Prometheus/Grafana only for visibility; the service can run without dashboards`

			`In a total bring-up, fix nodes, disks, Longhorn, and Flux first. Start using Soteria once PVC inventory is visible and Longhorn can make backups without being nursed.`

docs: shorten soteria README 2026-06-19 15:46:22 -03:00			`Main endpoints:`
backup: add pvc inventory, restore UI, and metrics baseline 2026-04-12 11:09:49 -03:00
docs: shorten soteria README 2026-06-19 15:46:22 -03:00			- `GET /healthz`, `GET /readyz`, `GET /metrics`
backup: add pvc inventory, restore UI, and metrics baseline 2026-04-12 11:09:49 -03:00			- `GET /v1/inventory`
			- `GET /v1/backups?namespace=<ns>&pvc=<name>`
			- `POST /v1/backup`
backup: add policy scheduler and namespace bulk operations 2026-04-12 14:32:39 -03:00			- `POST /v1/backup/namespace`
backup: add pvc inventory, restore UI, and metrics baseline 2026-04-12 11:09:49 -03:00			- `POST /v1/restores`
backup: add policy scheduler and namespace bulk operations 2026-04-12 14:32:39 -03:00			- `POST /v1/restores/namespace`
docs: shorten soteria README 2026-06-19 15:46:22 -03:00			- `GET\|POST\|DELETE /v1/policies`
ui: migrate soteria console to react and add b2 telemetry 2026-04-12 19:45:23 -03:00			- `GET /v1/b2`
backup: add pvc inventory, restore UI, and metrics baseline 2026-04-12 11:09:49 -03:00
Update README.md 2026-06-19 20:46:24 +00:00			When auth is enabled, Soteria expects trusted headers from the fronting proxy and checks `SOTERIA_ALLOWED_GROUPS`.
backup: add pvc inventory, restore UI, and metrics baseline 2026-04-12 11:09:49 -03:00
docs: shorten soteria README 2026-06-19 15:46:22 -03:00			`## Development`
init soteria service 2026-01-31 03:34:34 -03:00
docs: shorten soteria README 2026-06-19 15:46:22 -03:00			```bash
			`go test ./...`
			`./scripts/check.sh`
init soteria service 2026-01-31 03:34:34 -03:00			```

Update README.md 2026-06-19 20:46:24 +00:00			The local deploy manifests live in `deploy/`. Production wiring should still go through the Flux repo, not one-off cluster edits.