2.0 KiB

Soteria PVC Restore Drill (backup.bstein.dev)

Use this runbook for a minimal production-safe restore drill after each meaningful Soteria change.

Preconditions

  • maintenance kustomization is reconciled and healthy in Flux.
  • soteria and oauth2-proxy-soteria Deployments are ready in maintenance.
  • Operator account is in Keycloak group admin or maintenance.
  • Source PVC is not ephemeral/test throwaway storage that should be excluded from backup policy.

Operator Flow (UI)

  1. Open https://backup.bstein.dev and sign in through Keycloak.
  2. In PVC Inventory, pick source namespace/PVC.
  3. Click Backup now and wait for success response in Last Action.
  4. Click Restore, choose a completed backup snapshot, and set:
    • Target namespace: destination namespace (defaults to source)
    • Target PVC name: unique drill PVC name (restore-<source-pvc>-<date>)
  5. Click Create restore PVC.

Verification

  1. Confirm restore target exists:
    • kubectl -n <target-namespace> get pvc <target-pvc>
  2. Confirm backup telemetry is present:
    • kubectl -n monitoring port-forward svc/victoria-metrics-k8s-stack 8428:8428
    • curl -fsS 'http://127.0.0.1:8428/api/v1/query?query=max%20by%20(namespace%2Cpvc)(pvc_backup_age_hours)'
  3. Confirm alerting input stays healthy:
    • pvc_backup_health{namespace="<source-namespace>",pvc="<source-pvc>"} == 1

Cleanup

  1. Remove drill PVC after validation:
    • kubectl -n <target-namespace> delete pvc <target-pvc>
  2. If a detached restore Longhorn volume remains, remove it in Longhorn UI/API.

Failure Triage

  • 401/403 on UI/API:
    • Verify oauth2-proxy group claims include admin or maintenance.
  • Restore conflict:
    • Target PVC already exists; pick a new target PVC name.
  • Freshness alert firing (maint-soteria-refresh-stale):
    • Check Soteria pod health and /metrics scrape reachability from monitoring.
  • Unhealthy PVC alert firing (maint-soteria-backup-unhealthy):
    • Inspect pvc_backup_health and pvc_backup_age_hours for stale/missing backup coverage.