soteria
Soteria is an in-cluster service for PVC backup and restore operations. The current production baseline focuses on Longhorn-backed PVCs and provides:
- Namespace-grouped PVC inventory for backup and restore selection.
- On-demand backup creation for Longhorn volumes.
- Namespace-wide backup and restore batch execution.
- Restore into a new target PVC with conflict checks and best-effort cleanup on failure.
- Policy-based scheduled backups (per PVC or all PVCs in a namespace), persisted in-cluster.
- A built-in React + TypeScript UI (dark-mode default) suitable for publishing behind an authenticated ingress.
- Prometheus-format backup freshness and B2 consumption telemetry for Grafana rollups.
For Longhorn, backups are crash-consistent at the volume level and delegated to the Longhorn control plane.
Endpoints
Public endpoints:
GET /healthzGET /readyzGET /metrics
Protected endpoints when SOTERIA_AUTH_REQUIRED=true:
GET /UI consoleGET /v1/whoamiGET /v1/inventoryGET /v1/backups?namespace=<ns>&pvc=<name>POST /v1/backupPOST /v1/backup/namespacePOST /v1/restoresPOST /v1/restores/namespacePOST /v1/restore-testlegacy alias for/v1/restoresGET /v1/policiesPOST /v1/policiesDELETE /v1/policies/<policy-id>GET /v1/b2
API examples
POST /v1/backup
{
"namespace": "ai",
"pvc": "llm-cache",
"tags": ["namespace=ai", "service=llm"],
"dry_run": false
}
Longhorn response:
{
"driver": "longhorn",
"volume": "pvc-1234abcd",
"backup": "soteria-backup-ai-llm-cache-20260412-153000",
"namespace": "ai",
"requested_by": "brad",
"dry_run": false
}
GET /v1/inventory
Response shape:
{
"generated_at": "2026-04-12T15:30:00Z",
"namespaces": [
{
"name": "ai",
"pvcs": [
{
"namespace": "ai",
"pvc": "llm-cache",
"volume": "pvc-1234abcd",
"storage_class": "longhorn",
"capacity": "50Gi",
"driver": "longhorn",
"last_backup_at": "2026-04-12T14:55:00Z",
"last_backup_age_hours": 0.58,
"backup_count": 14,
"healthy": true,
"health_reason": "fresh"
}
]
}
]
}
GET /v1/backups
/v1/backups?namespace=ai&pvc=llm-cache
Returns the resolved volume name and backup records so the UI or automation can select a restore source.
POST /v1/restores
{
"namespace": "ai",
"pvc": "llm-cache",
"snapshot": "latest",
"target_namespace": "ai",
"target_pvc": "restore-llm-cache",
"dry_run": false
}
Notes:
namespaceandpvcidentify the source PVC.target_pvcis required.target_namespacedefaults tonamespace.- Soteria refuses to overwrite an existing target PVC.
- If Longhorn volume creation succeeds but PVC creation fails, Soteria attempts to delete the just-created restore volume.
- You may provide
backup_urldirectly instead ofsnapshot.
POST /v1/backup/namespace
{
"namespace": "ai",
"dry_run": false
}
Runs backup for every currently bound PVC in the namespace and returns a per-PVC result list.
POST /v1/restores/namespace
{
"namespace": "ai",
"target_namespace": "ai-restore",
"target_prefix": "restore-20260412-",
"snapshot": "",
"dry_run": true
}
Runs restore planning/execution for every bound PVC in the source namespace. snapshot is optional and blank means latest completed backup per PVC.
Policy API
Create or update a policy:
POST /v1/policies
{
"namespace": "ai",
"pvc": "llm-cache",
"interval_hours": 6,
"enabled": true
}
- Leave
pvcempty to target all PVCs in that namespace. - Policies are stored in secret
SOTERIA_POLICY_SECRET_NAMEunder keypolicies.json.
GET /v1/b2
Returns B2 account/bucket consumption based on S3-compatible object scans.
{
"enabled": true,
"available": true,
"endpoint": "https://s3.us-west-004.backblazeb2.com",
"region": "us-west-004",
"scanned_at": "2026-04-12T16:00:00Z",
"scan_duration_ms": 824,
"total_objects": 1324,
"total_bytes": 18407542931,
"recent_objects_24h": 18,
"recent_bytes_24h": 12245812,
"buckets": [
{
"name": "atlas-backups",
"object_count": 1240,
"total_bytes": 18288473811,
"recent_objects_24h": 12,
"recent_bytes_24h": 8542198,
"last_modified_at": "2026-04-12T15:43:19Z"
}
]
}
Recent 24h values are an object-change proxy and do not represent full B2 billing egress totals.
Authentication and authorization
When SOTERIA_AUTH_REQUIRED=true, Soteria expects trusted auth headers from a fronting proxy such as oauth2-proxy:
X-Auth-Request-UserX-Auth-Request-EmailX-Auth-Request-GroupsX-Forwarded-User(fallback)X-Forwarded-Email(fallback)X-Forwarded-Groups(fallback)
Allowed groups are configured with SOTERIA_ALLOWED_GROUPS and compared after normalizing leading / prefixes, so both maintenance and /maintenance are accepted. Group lists may be comma- or semicolon-separated.
Optional machine-to-machine access can be enabled with SOTERIA_AUTH_BEARER_TOKENS, which accepts a comma-separated list of bearer tokens.
Prometheus metrics
Soteria exports Prometheus-format metrics at GET /metrics.
Implemented metrics:
soteria_backup_requests_total{driver,result}soteria_restore_requests_total{driver,result}soteria_policy_backups_total{result}soteria_namespace_backup_requests_total{driver,result}soteria_namespace_restore_requests_total{driver,result}soteria_authz_denials_total{reason}soteria_inventory_refresh_failures_totalsoteria_inventory_refresh_timestamp_secondspvc_backup_age_hours{namespace,pvc,volume,driver}pvc_backup_health{namespace,pvc,volume,driver}pvc_backup_health_reason{namespace,pvc,volume,driver,reason}pvc_backup_last_success_timestamp_seconds{namespace,pvc,volume,driver}pvc_backup_count{namespace,pvc,volume,driver}pvc_backup_completed_count{namespace,pvc,volume,driver}pvc_backup_last_size_bytes{namespace,pvc,volume,driver}pvc_backup_total_size_bytes{namespace,pvc,volume,driver}soteria_b2_scan_successsoteria_b2_scan_timestamp_secondssoteria_b2_scan_duration_secondssoteria_b2_account_objectssoteria_b2_account_bytessoteria_b2_account_recent_objects_24hsoteria_b2_account_recent_bytes_24hsoteria_b2_bucket_objects{bucket}soteria_b2_bucket_bytes{bucket}soteria_b2_bucket_recent_objects_24h{bucket}soteria_b2_bucket_recent_bytes_24h{bucket}soteria_b2_bucket_last_modified_timestamp_seconds{bucket}
pvc_backup_health is 1 when the most recent successful backup is within SOTERIA_BACKUP_MAX_AGE_HOURS, otherwise 0.
Configuration
Environment variables:
SOTERIA_BACKUP_DRIVERdefaultlonghorn, allowedlonghorn,resticSOTERIA_LONGHORN_URLdefaulthttp://longhorn-backend.longhorn-system.svc:9500SOTERIA_LONGHORN_BACKUP_MODEdefaultincremental, allowedincremental,fullSOTERIA_RESTIC_REPOSITORYrequired for restic driverSOTERIA_RESTIC_SECRET_NAMEdefaultsoteria-resticSOTERIA_SECRET_NAMESPACEdefault service namespaceSOTERIA_RESTIC_IMAGEdefaultrestic/restic:0.16.4SOTERIA_RESTIC_BACKUP_ARGSoptional extra args forrestic backupSOTERIA_RESTIC_FORGET_ARGSoptional extra args forrestic forgetSOTERIA_S3_ENDPOINToptional S3-compatible endpointSOTERIA_S3_REGIONoptional regionSOTERIA_JOB_TTL_SECONDSdefault86400SOTERIA_JOB_NODE_SELECTORoptional comma-separatedkey=valuelistSOTERIA_JOB_SERVICE_ACCOUNToptional ServiceAccount for restic JobsSOTERIA_LISTEN_ADDRdefault:8080SOTERIA_AUTH_REQUIREDdefaultfalseSOTERIA_ALLOWED_GROUPSdefaultadmin,maintenanceSOTERIA_AUTH_BEARER_TOKENSoptional comma-separated bearer tokensSOTERIA_BACKUP_MAX_AGE_HOURSdefault24SOTERIA_METRICS_REFRESH_SECONDSdefault300SOTERIA_POLICY_EVAL_SECONDSdefault300SOTERIA_POLICY_SECRET_NAMEdefaultsoteria-policiesSOTERIA_B2_ENABLEDdefaultfalse(auto-enabled if endpoint/secret are set)SOTERIA_B2_ENDPOINToptional S3-compatible endpoint (for B2, usuallyhttps://s3.<region>.backblazeb2.com)SOTERIA_B2_REGIONoptional region override (auto-inferred for Backblaze endpoint patterns)SOTERIA_B2_BUCKETSoptional comma-separated bucket allowlist (defaults to scanning all accessible buckets)SOTERIA_B2_ACCESS_KEY_IDoptional static key (can come from secret instead)SOTERIA_B2_SECRET_ACCESS_KEYoptional static secret key (can come from secret instead)SOTERIA_B2_SECRET_NAMESPACEoptional secret namespace (defaults to service namespace when secret name is set)SOTERIA_B2_SECRET_NAMEoptional secret containing B2 keysSOTERIA_B2_ACCESS_KEY_FIELDdefaultAWS_ACCESS_KEY_IDSOTERIA_B2_SECRET_KEY_FIELDdefaultAWS_SECRET_ACCESS_KEYSOTERIA_B2_ENDPOINT_FIELDdefaultAWS_ENDPOINTSSOTERIA_B2_SCAN_INTERVAL_SECONDSdefault900SOTERIA_B2_SCAN_TIMEOUT_SECONDSdefault120
Secrets
Create a secret named soteria-restic in the Soteria namespace, or set SOTERIA_RESTIC_SECRET_NAME, when using the restic driver. Required keys:
AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEYRESTIC_PASSWORD
The service copies this secret into the target namespace per job and attaches an owner reference so it is cleaned up with the Job.
For B2 scanning, you can point Soteria at a secret via SOTERIA_B2_SECRET_NAME. Expected keys by default:
AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEYAWS_ENDPOINTS(optional ifSOTERIA_B2_ENDPOINTis set)
A template is in deploy/secret-example.yaml. Do not commit real credentials.
Deployment
The deploy/ folder includes Kustomize-ready manifests for namespace, RBAC, config, deployment, and service.
Apply with:
kubectl apply -k deploy
The example Service is annotated for Prometheus scraping of /metrics.
Notes
- Longhorn inventory and metrics are based on discovered backup records per PVC.
- Inventory
Restorebuttons load source context into the restore planner; restore execution happens from the planner panel. - Scheduled policy execution currently applies to Longhorn driver.
- Restic backup and restore execution exists, but inventory-style telemetry is currently Longhorn-focused.
- For Atlas production, place Soteria behind an authenticated ingress and trust only proxy-injected auth headers.