# Soteria PVC Restore Drill (backup.bstein.dev) Use this runbook for a minimal production-safe restore drill after each meaningful Soteria change. ## Preconditions - `maintenance` kustomization is reconciled and healthy in Flux. - `soteria` and `oauth2-proxy-soteria` Deployments are ready in `maintenance`. - Operator account is in Keycloak group `admin` or `maintenance`. - Source PVC is not ephemeral/test throwaway storage that should be excluded from backup policy. ## Operator Flow (UI) 1. Open `https://backup.bstein.dev` and sign in through Keycloak. 2. In `PVC Inventory`, pick source namespace/PVC. 3. Click `Backup now` and wait for success response in `Last Action`. 4. Click `Restore`, choose a completed backup snapshot, and set: - `Target namespace`: destination namespace (defaults to source) - `Target PVC name`: unique drill PVC name (`restore--`) 5. Click `Create restore PVC`. ## Verification 1. Confirm restore target exists: - `kubectl -n get pvc ` 2. Confirm backup telemetry is present: - `kubectl -n monitoring port-forward svc/victoria-metrics-k8s-stack 8428:8428` - `curl -fsS 'http://127.0.0.1:8428/api/v1/query?query=max%20by%20(namespace%2Cpvc)(pvc_backup_age_hours)'` 3. Confirm alerting input stays healthy: - `pvc_backup_health{namespace="",pvc=""} == 1` ## Cleanup 1. Remove drill PVC after validation: - `kubectl -n delete pvc ` 2. If a detached restore Longhorn volume remains, remove it in Longhorn UI/API. ## Failure Triage - `401/403` on UI/API: - Verify oauth2-proxy group claims include `admin` or `maintenance`. - Restore conflict: - Target PVC already exists; pick a new target PVC name. - Freshness alert firing (`maint-soteria-refresh-stale`): - Check Soteria pod health and `/metrics` scrape reachability from `monitoring`. - Unhealthy PVC alert firing (`maint-soteria-backup-unhealthy`): - Inspect `pvc_backup_health` and `pvc_backup_age_hours` for stale/missing backup coverage.