backup: add pvc inventory, restore UI, and metrics baseline

This commit is contained in:
Brad Stein 2026-04-12 11:09:49 -03:00
parent b54df95db8
commit 09ab3ec889
14 changed files with 1662 additions and 160 deletions

187
README.md
View File

@ -1,16 +1,36 @@
# soteria
Soteria is a small in-cluster service that orchestrates Longhorn backups for PVCs. It is intended to be called by Ariadne (or another controller) and focuses on:
Soteria is an in-cluster service for PVC backup and restore operations. The current production baseline focuses on Longhorn-backed PVCs and provides:
- Longhorn-managed backups to an S3-compatible backend (Backblaze B2 by default).
- On-demand restore tests into a target PVC.
- Minimal long-running footprint (the backup work happens in Longhorn).
- Namespace-grouped PVC inventory for backup and restore selection.
- On-demand backup creation for Longhorn volumes.
- Restore into a new target PVC with conflict checks and best-effort cleanup on failure.
- A simple built-in UI suitable for publishing behind an authenticated ingress.
- Prometheus-format backup freshness telemetry for Grafana rollups.
Snapshots are managed by Longhorn; backups are crash-consistent for the PVC as mounted.
For Longhorn, backups are crash-consistent at the volume level and delegated to the Longhorn control plane.
## API
## Endpoints
### POST /v1/backup (Longhorn)
Public endpoints:
- `GET /healthz`
- `GET /readyz`
- `GET /metrics`
Protected endpoints when `SOTERIA_AUTH_REQUIRED=true`:
- `GET /` UI console
- `GET /v1/whoami`
- `GET /v1/inventory`
- `GET /v1/backups?namespace=<ns>&pvc=<name>`
- `POST /v1/backup`
- `POST /v1/restores`
- `POST /v1/restore-test` legacy alias for `/v1/restores`
## API examples
### POST /v1/backup
```json
{
@ -21,78 +41,149 @@ Snapshots are managed by Longhorn; backups are crash-consistent for the PVC as m
}
```
Response:
Longhorn response:
```json
{
"job_name": "soteria-backup-llm-cache-20260131-013001",
"driver": "longhorn",
"volume": "pvc-1234abcd",
"backup": "soteria-backup-ai-llm-cache-20260412-153000",
"namespace": "ai",
"secret": "soteria-soteria-backup-llm-cache-20260131-013001-restic",
"requested_by": "brad",
"dry_run": false
}
```
### POST /v1/restore-test (Longhorn)
### GET /v1/inventory
Response shape:
```json
{
"generated_at": "2026-04-12T15:30:00Z",
"namespaces": [
{
"name": "ai",
"pvcs": [
{
"namespace": "ai",
"pvc": "llm-cache",
"volume": "pvc-1234abcd",
"storage_class": "longhorn",
"capacity": "50Gi",
"driver": "longhorn",
"last_backup_at": "2026-04-12T14:55:00Z",
"last_backup_age_hours": 0.58,
"backup_count": 14,
"healthy": true,
"health_reason": "fresh"
}
]
}
]
}
```
### GET /v1/backups
```text
/v1/backups?namespace=ai&pvc=llm-cache
```
Returns the resolved volume name and backup records so the UI or automation can select a restore source.
### POST /v1/restores
```json
{
"namespace": "ai",
"pvc": "llm-cache",
"snapshot": "latest",
"pvc": "ollama-models",
"target_pvc": "restore-sandbox",
"target_namespace": "ai",
"target_pvc": "restore-llm-cache",
"dry_run": false
}
```
Notes:
- `pvc` is required to resolve the Longhorn volume and locate the latest backup.
- `snapshot` may be `latest` or a specific backup/snapshot name. You can also pass `backup_url`.
- `namespace` and `pvc` identify the source PVC.
- `target_pvc` is required.
- `target_namespace` defaults to `namespace`.
- Soteria refuses to overwrite an existing target PVC.
- If Longhorn volume creation succeeds but PVC creation fails, Soteria attempts to delete the just-created restore volume.
- You may provide `backup_url` directly instead of `snapshot`.
## Authentication and authorization
When `SOTERIA_AUTH_REQUIRED=true`, Soteria expects trusted auth headers from a fronting proxy such as `oauth2-proxy`:
- `X-Auth-Request-User`
- `X-Auth-Request-Email`
- `X-Auth-Request-Groups`
Allowed groups are configured with `SOTERIA_ALLOWED_GROUPS` and compared after normalizing leading `/` prefixes, so both `maintenance` and `/maintenance` are accepted.
Optional machine-to-machine access can be enabled with `SOTERIA_AUTH_BEARER_TOKENS`, which accepts a comma-separated list of bearer tokens.
## Prometheus metrics
Soteria exports Prometheus-format metrics at `GET /metrics`.
Implemented metrics:
- `soteria_backup_requests_total{driver,result}`
- `soteria_restore_requests_total{driver,result}`
- `soteria_authz_denials_total{reason}`
- `soteria_inventory_refresh_failures_total`
- `soteria_inventory_refresh_timestamp_seconds`
- `pvc_backup_age_hours{namespace,pvc,volume,driver}`
- `pvc_backup_health{namespace,pvc,volume,driver}`
- `pvc_backup_last_success_timestamp_seconds{namespace,pvc,volume,driver}`
- `pvc_backup_count{namespace,pvc,volume,driver}`
`pvc_backup_health` is `1` when the most recent successful backup is within `SOTERIA_BACKUP_MAX_AGE_HOURS`, otherwise `0`.
## Configuration
Environment variables:
- `SOTERIA_BACKUP_DRIVER` (default: `longhorn`, allowed: `longhorn`, `restic`)
- `SOTERIA_LONGHORN_URL` (default: `http://longhorn-backend.longhorn-system.svc:9500`)
- `SOTERIA_LONGHORN_BACKUP_MODE` (default: `incremental`, allowed: `incremental`, `full`)
- `SOTERIA_RESTIC_REPOSITORY` (required for restic driver) Example: `s3:s3.us-west-004.backblazeb2.com/atlas-backups`
- `SOTERIA_RESTIC_SECRET_NAME` (default: `soteria-restic`)
- `SOTERIA_SECRET_NAMESPACE` (default: service namespace)
- `SOTERIA_RESTIC_IMAGE` (default: `restic/restic:0.16.4`)
- `SOTERIA_RESTIC_BACKUP_ARGS` (optional) Extra args for `restic backup`
- `SOTERIA_RESTIC_FORGET_ARGS` (optional) Extra args for `restic forget` (include `--prune` if desired)
- `SOTERIA_S3_ENDPOINT` (optional) Example: `s3.us-west-004.backblazeb2.com`
- `SOTERIA_S3_REGION` (optional) Example: `us-west-004`
- `SOTERIA_JOB_TTL_SECONDS` (default: 86400)
- `SOTERIA_JOB_NODE_SELECTOR` (optional) Comma-separated node selector, e.g. `kubernetes.io/arch=arm64,node-role.kubernetes.io/worker=true`
- `SOTERIA_JOB_SERVICE_ACCOUNT` (optional) ServiceAccount for backup Jobs
- `SOTERIA_LISTEN_ADDR` (default: `:8080`)
The restic repository is encrypted with `RESTIC_PASSWORD` from the secret below.
- `SOTERIA_BACKUP_DRIVER` default `longhorn`, allowed `longhorn`, `restic`
- `SOTERIA_LONGHORN_URL` default `http://longhorn-backend.longhorn-system.svc:9500`
- `SOTERIA_LONGHORN_BACKUP_MODE` default `incremental`, allowed `incremental`, `full`
- `SOTERIA_RESTIC_REPOSITORY` required for restic driver
- `SOTERIA_RESTIC_SECRET_NAME` default `soteria-restic`
- `SOTERIA_SECRET_NAMESPACE` default service namespace
- `SOTERIA_RESTIC_IMAGE` default `restic/restic:0.16.4`
- `SOTERIA_RESTIC_BACKUP_ARGS` optional extra args for `restic backup`
- `SOTERIA_RESTIC_FORGET_ARGS` optional extra args for `restic forget`
- `SOTERIA_S3_ENDPOINT` optional S3-compatible endpoint
- `SOTERIA_S3_REGION` optional region
- `SOTERIA_JOB_TTL_SECONDS` default `86400`
- `SOTERIA_JOB_NODE_SELECTOR` optional comma-separated `key=value` list
- `SOTERIA_JOB_SERVICE_ACCOUNT` optional ServiceAccount for restic Jobs
- `SOTERIA_LISTEN_ADDR` default `:8080`
- `SOTERIA_AUTH_REQUIRED` default `false`
- `SOTERIA_ALLOWED_GROUPS` default `admin,maintenance`
- `SOTERIA_AUTH_BEARER_TOKENS` optional comma-separated bearer tokens
- `SOTERIA_BACKUP_MAX_AGE_HOURS` default `24`
- `SOTERIA_METRICS_REFRESH_SECONDS` default `300`
## Secrets
Create a secret named `soteria-restic` in the Soteria namespace (or set `SOTERIA_RESTIC_SECRET_NAME`) if using the restic driver. Keys required:
Create a secret named `soteria-restic` in the Soteria namespace, or set `SOTERIA_RESTIC_SECRET_NAME`, when using the restic driver. Required keys:
- `AWS_ACCESS_KEY_ID`
- `AWS_SECRET_ACCESS_KEY`
- `RESTIC_PASSWORD`
The service copies this secret into the target namespace per job and attaches an owner reference so it gets cleaned up with the Job.
The service copies this secret into the target namespace per job and attaches an owner reference so it is cleaned up with the Job.
A template is in `deploy/secret-example.yaml` (do not commit real credentials).
A template is in `deploy/secret-example.yaml`. Do not commit real credentials.
## Deployment
The `deploy/` folder includes Kustomize-ready manifests:
- `namespace.yaml`
- `configmap.yaml` (set your repository and endpoint)
- `serviceaccount.yaml`
- `clusterrole.yaml`
- `clusterrolebinding.yaml`
- `deployment.yaml`
- `service.yaml`
The `deploy/` folder includes Kustomize-ready manifests for namespace, RBAC, config, deployment, and service.
Apply with:
@ -100,8 +191,10 @@ Apply with:
kubectl apply -k deploy
```
The example Service is annotated for Prometheus scraping of `/metrics`.
## Notes
- Backups mount the PVC read-only at `/data` and run `restic backup /data`.
- Restore tests write into `/restore` (either an emptyDir or a target PVC).
- For Backblaze B2, use the S3 endpoint and region for your bucket (example: `s3.us-west-004.backblazeb2.com`).
- Longhorn inventory and metrics are based on discovered backup records per PVC.
- Restic backup and restore execution exists, but inventory-style telemetry is currently Longhorn-focused.
- For Atlas production, place Soteria behind an authenticated ingress and trust only proxy-injected auth headers.

View File

@ -5,7 +5,6 @@ import (
"errors"
"log"
"net/http"
"os"
"os/signal"
"syscall"
"time"
@ -18,6 +17,8 @@ import (
func main() {
log.SetFlags(log.LstdFlags | log.LUTC)
runCtx, stop := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)
defer stop()
cfg, err := config.Load()
if err != nil {
@ -31,6 +32,7 @@ func main() {
longhornClient := longhorn.New(cfg.LonghornURL)
srv := server.New(cfg, client, longhornClient)
srv.Start(runCtx)
httpServer := &http.Server{
Addr: cfg.ListenAddr,
Handler: srv.Handler(),
@ -46,12 +48,9 @@ func main() {
errCh <- httpServer.ListenAndServe()
}()
signalCh := make(chan os.Signal, 1)
signal.Notify(signalCh, syscall.SIGINT, syscall.SIGTERM)
select {
case sig := <-signalCh:
log.Printf("shutdown signal: %s", sig)
case <-runCtx.Done():
log.Printf("shutdown signal: %v", runCtx.Err())
case err := <-errCh:
if err != nil && !errors.Is(err, http.ErrServerClosed) {
log.Fatalf("server error: %v", err)

View File

@ -5,9 +5,12 @@ metadata:
labels:
app.kubernetes.io/name: soteria
rules:
- apiGroups: [""]
resources: ["persistentvolumeclaims", "persistentvolumes"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get", "list", "create", "update"]
verbs: ["get", "list", "create", "update", "delete"]
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["get", "list", "create"]

View File

@ -9,3 +9,7 @@ data:
SOTERIA_BACKUP_DRIVER: "longhorn"
SOTERIA_LONGHORN_URL: "http://longhorn-backend.longhorn-system.svc:9500"
SOTERIA_LONGHORN_BACKUP_MODE: "incremental"
SOTERIA_AUTH_REQUIRED: "false"
SOTERIA_ALLOWED_GROUPS: "admin,maintenance"
SOTERIA_BACKUP_MAX_AGE_HOURS: "24"
SOTERIA_METRICS_REFRESH_SECONDS: "300"

View File

@ -5,6 +5,10 @@ metadata:
labels:
app.kubernetes.io/name: soteria
app.kubernetes.io/component: api
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "80"
prometheus.io/path: "/metrics"
spec:
type: ClusterIP
selector:

View File

@ -9,31 +9,86 @@ type BackupRequest struct {
}
type BackupResponse struct {
Driver string `json:"driver,omitempty"`
Volume string `json:"volume,omitempty"`
Backup string `json:"backup,omitempty"`
JobName string `json:"job_name,omitempty"`
Namespace string `json:"namespace,omitempty"`
Secret string `json:"secret,omitempty"`
DryRun bool `json:"dry_run"`
Driver string `json:"driver,omitempty"`
Volume string `json:"volume,omitempty"`
Backup string `json:"backup,omitempty"`
JobName string `json:"job_name,omitempty"`
Namespace string `json:"namespace,omitempty"`
Secret string `json:"secret,omitempty"`
RequestedBy string `json:"requested_by,omitempty"`
DryRun bool `json:"dry_run"`
}
type RestoreTestRequest struct {
Namespace string `json:"namespace"`
PVC string `json:"pvc,omitempty"`
Snapshot string `json:"snapshot,omitempty"`
BackupURL string `json:"backup_url,omitempty"`
TargetPVC string `json:"target_pvc,omitempty"`
DryRun bool `json:"dry_run"`
Namespace string `json:"namespace"`
PVC string `json:"pvc,omitempty"`
Snapshot string `json:"snapshot,omitempty"`
BackupURL string `json:"backup_url,omitempty"`
TargetNamespace string `json:"target_namespace,omitempty"`
TargetPVC string `json:"target_pvc,omitempty"`
DryRun bool `json:"dry_run"`
}
type RestoreTestResponse struct {
Driver string `json:"driver,omitempty"`
Volume string `json:"volume,omitempty"`
TargetPVC string `json:"target_pvc,omitempty"`
BackupURL string `json:"backup_url,omitempty"`
JobName string `json:"job_name,omitempty"`
Namespace string `json:"namespace,omitempty"`
Secret string `json:"secret,omitempty"`
DryRun bool `json:"dry_run"`
Driver string `json:"driver,omitempty"`
Volume string `json:"volume,omitempty"`
TargetNamespace string `json:"target_namespace,omitempty"`
TargetPVC string `json:"target_pvc,omitempty"`
BackupURL string `json:"backup_url,omitempty"`
JobName string `json:"job_name,omitempty"`
Namespace string `json:"namespace,omitempty"`
Secret string `json:"secret,omitempty"`
RequestedBy string `json:"requested_by,omitempty"`
DryRun bool `json:"dry_run"`
}
type InventoryResponse struct {
GeneratedAt string `json:"generated_at"`
Namespaces []NamespaceInventory `json:"namespaces"`
}
type NamespaceInventory struct {
Name string `json:"name"`
PVCs []PVCInventory `json:"pvcs"`
}
type PVCInventory struct {
Namespace string `json:"namespace"`
PVC string `json:"pvc"`
Volume string `json:"volume,omitempty"`
Phase string `json:"phase,omitempty"`
StorageClass string `json:"storage_class,omitempty"`
Capacity string `json:"capacity,omitempty"`
AccessModes []string `json:"access_modes,omitempty"`
Driver string `json:"driver,omitempty"`
LastBackupAt string `json:"last_backup_at,omitempty"`
LastBackupAgeHours float64 `json:"last_backup_age_hours,omitempty"`
BackupCount int `json:"backup_count"`
Healthy bool `json:"healthy"`
HealthReason string `json:"health_reason,omitempty"`
Error string `json:"error,omitempty"`
}
type BackupListResponse struct {
Namespace string `json:"namespace"`
PVC string `json:"pvc"`
Volume string `json:"volume"`
Backups []BackupRecord `json:"backups"`
}
type BackupRecord struct {
Name string `json:"name"`
SnapshotName string `json:"snapshot_name,omitempty"`
Created string `json:"created,omitempty"`
State string `json:"state,omitempty"`
URL string `json:"url,omitempty"`
Size string `json:"size,omitempty"`
Latest bool `json:"latest,omitempty"`
}
type AuthInfoResponse struct {
Authenticated bool `json:"authenticated"`
User string `json:"user,omitempty"`
Email string `json:"email,omitempty"`
Groups []string `json:"groups,omitempty"`
}

View File

@ -5,36 +5,45 @@ import (
"os"
"strconv"
"strings"
"time"
)
const (
defaultBackupDriver = "longhorn"
defaultResticImage = "restic/restic:0.16.4"
defaultJobTTLSeconds = 86400
defaultListenAddr = ":8080"
defaultResticSecret = "soteria-restic"
defaultLonghornURL = "http://longhorn-backend.longhorn-system.svc:9500"
defaultLonghornMode = "incremental"
serviceNamespacePath = "/var/run/secrets/kubernetes.io/serviceaccount/namespace"
defaultBackupDriver = "longhorn"
defaultResticImage = "restic/restic:0.16.4"
defaultJobTTLSeconds = 86400
defaultListenAddr = ":8080"
defaultResticSecret = "soteria-restic"
defaultLonghornURL = "http://longhorn-backend.longhorn-system.svc:9500"
defaultLonghornMode = "incremental"
defaultAllowedGroups = "admin,maintenance"
defaultMetricsRefresh = 300 * time.Second
defaultBackupMaxAge = 24 * time.Hour
serviceNamespacePath = "/var/run/secrets/kubernetes.io/serviceaccount/namespace"
)
type Config struct {
Namespace string
SecretNamespace string
BackupDriver string
ResticImage string
ResticRepository string
ResticSecretName string
ResticBackupArgs []string
ResticForgetArgs []string
S3Endpoint string
S3Region string
JobTTLSeconds int32
JobNodeSelector map[string]string
WorkerServiceAccount string
ListenAddr string
LonghornURL string
LonghornBackupMode string
Namespace string
SecretNamespace string
BackupDriver string
ResticImage string
ResticRepository string
ResticSecretName string
ResticBackupArgs []string
ResticForgetArgs []string
S3Endpoint string
S3Region string
JobTTLSeconds int32
JobNodeSelector map[string]string
WorkerServiceAccount string
ListenAddr string
LonghornURL string
LonghornBackupMode string
AuthRequired bool
AllowedGroups []string
AuthBearerTokens []string
MetricsRefreshInterval time.Duration
BackupMaxAge time.Duration
}
func Load() (*Config, error) {
@ -67,6 +76,11 @@ func Load() (*Config, error) {
cfg.JobNodeSelector = parseNodeSelector(getenv("SOTERIA_JOB_NODE_SELECTOR"))
cfg.LonghornURL = getenvDefault("SOTERIA_LONGHORN_URL", defaultLonghornURL)
cfg.LonghornBackupMode = getenvDefault("SOTERIA_LONGHORN_BACKUP_MODE", defaultLonghornMode)
cfg.AuthRequired = getenvBool("SOTERIA_AUTH_REQUIRED")
cfg.AllowedGroups = parseCSV(getenvDefault("SOTERIA_ALLOWED_GROUPS", defaultAllowedGroups))
cfg.AuthBearerTokens = parseCSV(getenv("SOTERIA_AUTH_BEARER_TOKENS"))
cfg.MetricsRefreshInterval = defaultMetricsRefresh
cfg.BackupMaxAge = defaultBackupMaxAge
if ttl, ok := getenvInt("SOTERIA_JOB_TTL_SECONDS"); ok {
cfg.JobTTLSeconds = int32(ttl)
@ -74,6 +88,13 @@ func Load() (*Config, error) {
cfg.JobTTLSeconds = defaultJobTTLSeconds
}
if seconds, ok := getenvInt("SOTERIA_METRICS_REFRESH_SECONDS"); ok {
cfg.MetricsRefreshInterval = time.Duration(seconds) * time.Second
}
if hours, ok := getenvFloat("SOTERIA_BACKUP_MAX_AGE_HOURS"); ok {
cfg.BackupMaxAge = time.Duration(hours * float64(time.Hour))
}
if cfg.ResticRepository == "" {
if cfg.BackupDriver == "restic" {
return nil, errors.New("SOTERIA_RESTIC_REPOSITORY is required for restic driver")
@ -106,6 +127,12 @@ func Load() (*Config, error) {
return nil, errors.New("SOTERIA_LONGHORN_BACKUP_MODE must be incremental or full")
}
}
if cfg.MetricsRefreshInterval <= 0 {
return nil, errors.New("SOTERIA_METRICS_REFRESH_SECONDS must be greater than zero")
}
if cfg.BackupMaxAge <= 0 {
return nil, errors.New("SOTERIA_BACKUP_MAX_AGE_HOURS must be greater than zero")
}
return cfg, nil
}
@ -133,6 +160,28 @@ func getenvInt(key string) (int, bool) {
return parsed, true
}
func getenvFloat(key string) (float64, bool) {
val := getenv(key)
if val == "" {
return 0, false
}
parsed, err := strconv.ParseFloat(val, 64)
if err != nil {
return 0, false
}
return parsed, true
}
func getenvBool(key string) bool {
val := strings.ToLower(getenv(key))
switch val {
case "1", "true", "yes", "on":
return true
default:
return false
}
}
func parseNodeSelector(raw string) map[string]string {
raw = strings.TrimSpace(raw)
if raw == "" {
@ -160,3 +209,19 @@ func parseNodeSelector(raw string) map[string]string {
return selector
}
func parseCSV(raw string) []string {
if strings.TrimSpace(raw) == "" {
return nil
}
parts := strings.Split(raw, ",")
values := make([]string, 0, len(parts))
for _, part := range parts {
part = strings.TrimSpace(part)
if part == "" {
continue
}
values = append(values, part)
}
return values
}

View File

@ -3,11 +3,23 @@ package k8s
import (
"context"
"fmt"
"sort"
corev1 "k8s.io/api/core/v1"
apierrors "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
type PVCSummary struct {
Namespace string
Name string
VolumeName string
Phase string
StorageClass string
Capacity string
AccessModes []string
}
func (c *Client) ResolvePVCVolume(ctx context.Context, namespace, pvcName string) (string, *corev1.PersistentVolumeClaim, *corev1.PersistentVolume, error) {
pvc, err := c.Clientset.CoreV1().PersistentVolumeClaims(namespace).Get(ctx, pvcName, metav1.GetOptions{})
if err != nil {
@ -24,3 +36,64 @@ func (c *Client) ResolvePVCVolume(ctx context.Context, namespace, pvcName string
return pvc.Spec.VolumeName, pvc, pv, nil
}
func (c *Client) ListBoundPVCs(ctx context.Context) ([]PVCSummary, error) {
list, err := c.Clientset.CoreV1().PersistentVolumeClaims("").List(ctx, metav1.ListOptions{})
if err != nil {
return nil, fmt.Errorf("list pvcs: %w", err)
}
items := make([]PVCSummary, 0, len(list.Items))
for _, pvc := range list.Items {
if pvc.Spec.VolumeName == "" || pvc.Status.Phase != corev1.ClaimBound {
continue
}
storageClass := ""
if pvc.Spec.StorageClassName != nil {
storageClass = *pvc.Spec.StorageClassName
}
capacity := ""
if quantity, ok := pvc.Status.Capacity[corev1.ResourceStorage]; ok {
capacity = quantity.String()
} else if quantity, ok := pvc.Spec.Resources.Requests[corev1.ResourceStorage]; ok {
capacity = quantity.String()
}
accessModes := make([]string, 0, len(pvc.Spec.AccessModes))
for _, mode := range pvc.Spec.AccessModes {
accessModes = append(accessModes, string(mode))
}
items = append(items, PVCSummary{
Namespace: pvc.Namespace,
Name: pvc.Name,
VolumeName: pvc.Spec.VolumeName,
Phase: string(pvc.Status.Phase),
StorageClass: storageClass,
Capacity: capacity,
AccessModes: accessModes,
})
}
sort.Slice(items, func(i, j int) bool {
if items[i].Namespace == items[j].Namespace {
return items[i].Name < items[j].Name
}
return items[i].Namespace < items[j].Namespace
})
return items, nil
}
func (c *Client) PersistentVolumeClaimExists(ctx context.Context, namespace, pvcName string) (bool, error) {
_, err := c.Clientset.CoreV1().PersistentVolumeClaims(namespace).Get(ctx, pvcName, metav1.GetOptions{})
if err == nil {
return true, nil
}
if apierrors.IsNotFound(err) {
return false, nil
}
return false, fmt.Errorf("get pvc %s/%s: %w", namespace, pvcName, err)
}

View File

@ -142,6 +142,11 @@ func (c *Client) CreatePVC(ctx context.Context, volumeName, namespace, pvcName s
return c.doJSON(ctx, http.MethodPost, path, input, nil)
}
func (c *Client) DeleteVolume(ctx context.Context, volumeName string) error {
path := fmt.Sprintf("%s/v1/volumes/%s", c.baseURL, url.PathEscape(volumeName))
return c.doJSON(ctx, http.MethodDelete, path, nil, nil)
}
func (c *Client) GetBackupVolume(ctx context.Context, volumeName string) (*BackupVolume, error) {
path := fmt.Sprintf("%s/v1/backupvolumes/%s", c.baseURL, url.PathEscape(volumeName))
var out BackupVolume

203
internal/server/metrics.go Normal file
View File

@ -0,0 +1,203 @@
package server
import (
"fmt"
"net/http"
"sort"
"strings"
"sync"
"time"
"scm.bstein.dev/bstein/soteria/internal/api"
)
type metricSample struct {
labels map[string]string
value float64
}
type telemetry struct {
mu sync.RWMutex
backupRequests map[string]metricSample
restoreRequests map[string]metricSample
authzDenials map[string]metricSample
inventoryRefreshFailure float64
inventoryRefreshTime float64
pvcBackupAgeHours map[string]metricSample
pvcBackupHealth map[string]metricSample
pvcBackupLastSuccess map[string]metricSample
pvcBackupCount map[string]metricSample
}
func newTelemetry() *telemetry {
return &telemetry{
backupRequests: map[string]metricSample{},
restoreRequests: map[string]metricSample{},
authzDenials: map[string]metricSample{},
pvcBackupAgeHours: map[string]metricSample{},
pvcBackupHealth: map[string]metricSample{},
pvcBackupLastSuccess: map[string]metricSample{},
pvcBackupCount: map[string]metricSample{},
}
}
func (t *telemetry) Handler() http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
w.Header().Set("Content-Type", "text/plain; version=0.0.4; charset=utf-8")
_, _ = w.Write([]byte(t.render()))
})
}
func (t *telemetry) RecordBackupRequest(driver, result string) {
t.mu.Lock()
defer t.mu.Unlock()
incMetric(t.backupRequests, map[string]string{"driver": driver, "result": result})
}
func (t *telemetry) RecordRestoreRequest(driver, result string) {
t.mu.Lock()
defer t.mu.Unlock()
incMetric(t.restoreRequests, map[string]string{"driver": driver, "result": result})
}
func (t *telemetry) RecordAuthzDenied(reason string) {
t.mu.Lock()
defer t.mu.Unlock()
incMetric(t.authzDenials, map[string]string{"reason": reason})
}
func (t *telemetry) RecordInventoryFailure() {
t.mu.Lock()
defer t.mu.Unlock()
t.inventoryRefreshFailure++
}
func (t *telemetry) RecordInventory(inv api.InventoryResponse) {
t.mu.Lock()
defer t.mu.Unlock()
t.pvcBackupAgeHours = map[string]metricSample{}
t.pvcBackupHealth = map[string]metricSample{}
t.pvcBackupLastSuccess = map[string]metricSample{}
t.pvcBackupCount = map[string]metricSample{}
for _, namespace := range inv.Namespaces {
for _, pvc := range namespace.PVCs {
labels := map[string]string{
"namespace": pvc.Namespace,
"pvc": pvc.PVC,
"volume": pvc.Volume,
"driver": pvc.Driver,
}
setMetric(t.pvcBackupCount, labels, float64(pvc.BackupCount))
if pvc.Healthy {
setMetric(t.pvcBackupHealth, labels, 1)
} else {
setMetric(t.pvcBackupHealth, labels, 0)
}
if pvc.LastBackupAt == "" {
continue
}
setMetric(t.pvcBackupAgeHours, labels, pvc.LastBackupAgeHours)
if ts, ok := parseBackupTime(pvc.LastBackupAt); ok {
setMetric(t.pvcBackupLastSuccess, labels, float64(ts.Unix()))
}
}
}
t.inventoryRefreshTime = float64(time.Now().Unix())
}
func (t *telemetry) render() string {
t.mu.RLock()
defer t.mu.RUnlock()
var b strings.Builder
writeMetricFamily(&b, "soteria_backup_requests_total", "counter", "Backup requests handled by Soteria.", metricValues(t.backupRequests))
writeMetricFamily(&b, "soteria_restore_requests_total", "counter", "Restore requests handled by Soteria.", metricValues(t.restoreRequests))
writeMetricFamily(&b, "soteria_authz_denials_total", "counter", "Authorization denials emitted by Soteria.", metricValues(t.authzDenials))
writeMetricFamily(&b, "soteria_inventory_refresh_failures_total", "counter", "Inventory refresh failures while computing PVC backup telemetry.", []metricSample{{value: t.inventoryRefreshFailure}})
writeMetricFamily(&b, "soteria_inventory_refresh_timestamp_seconds", "gauge", "Unix timestamp of the last successful inventory refresh.", []metricSample{{value: t.inventoryRefreshTime}})
writeMetricFamily(&b, "pvc_backup_age_hours", "gauge", "Age in hours of the latest successful PVC backup known to Soteria.", metricValues(t.pvcBackupAgeHours))
writeMetricFamily(&b, "pvc_backup_health", "gauge", "PVC backup health according to Soteria: 1=fresh backup within policy, 0=missing/stale/error.", metricValues(t.pvcBackupHealth))
writeMetricFamily(&b, "pvc_backup_last_success_timestamp_seconds", "gauge", "Unix timestamp of the latest successful PVC backup known to Soteria.", metricValues(t.pvcBackupLastSuccess))
writeMetricFamily(&b, "pvc_backup_count", "gauge", "Count of backup records discovered for a PVC.", metricValues(t.pvcBackupCount))
return b.String()
}
func metricValues(source map[string]metricSample) []metricSample {
keys := make([]string, 0, len(source))
for key := range source {
keys = append(keys, key)
}
sort.Strings(keys)
values := make([]metricSample, 0, len(keys))
for _, key := range keys {
values = append(values, source[key])
}
return values
}
func writeMetricFamily(b *strings.Builder, name, metricType, help string, samples []metricSample) {
b.WriteString("# HELP ")
b.WriteString(name)
b.WriteString(" ")
b.WriteString(help)
b.WriteString("\n")
b.WriteString("# TYPE ")
b.WriteString(name)
b.WriteString(" ")
b.WriteString(metricType)
b.WriteString("\n")
for _, sample := range samples {
b.WriteString(name)
b.WriteString(renderLabels(sample.labels))
b.WriteString(" ")
b.WriteString(fmt.Sprintf("%g", sample.value))
b.WriteString("\n")
}
}
func renderLabels(labels map[string]string) string {
if len(labels) == 0 {
return ""
}
keys := make([]string, 0, len(labels))
for key := range labels {
keys = append(keys, key)
}
sort.Strings(keys)
parts := make([]string, 0, len(keys))
for _, key := range keys {
parts = append(parts, fmt.Sprintf("%s=%q", key, labels[key]))
}
return "{" + strings.Join(parts, ",") + "}"
}
func metricKey(labels map[string]string) string {
return renderLabels(labels)
}
func incMetric(target map[string]metricSample, labels map[string]string) {
key := metricKey(labels)
sample, ok := target[key]
if !ok {
target[key] = metricSample{labels: cloneLabels(labels), value: 1}
return
}
sample.value++
target[key] = sample
}
func setMetric(target map[string]metricSample, labels map[string]string, value float64) {
key := metricKey(labels)
target[key] = metricSample{labels: cloneLabels(labels), value: value}
}
func cloneLabels(labels map[string]string) map[string]string {
out := make(map[string]string, len(labels))
for key, value := range labels {
out[key] = value
}
return out
}

View File

@ -1,9 +1,13 @@
package server
import (
"context"
"encoding/json"
"fmt"
"log"
"math"
"net/http"
"sort"
"strings"
"time"
@ -11,43 +15,196 @@ import (
"scm.bstein.dev/bstein/soteria/internal/config"
"scm.bstein.dev/bstein/soteria/internal/k8s"
"scm.bstein.dev/bstein/soteria/internal/longhorn"
corev1 "k8s.io/api/core/v1"
)
type kubeClient interface {
ResolvePVCVolume(ctx context.Context, namespace, pvcName string) (string, *corev1.PersistentVolumeClaim, *corev1.PersistentVolume, error)
CreateBackupJob(ctx context.Context, cfg *config.Config, req api.BackupRequest) (string, string, error)
CreateRestoreJob(ctx context.Context, cfg *config.Config, req api.RestoreTestRequest) (string, string, error)
ListBoundPVCs(ctx context.Context) ([]k8s.PVCSummary, error)
PersistentVolumeClaimExists(ctx context.Context, namespace, pvcName string) (bool, error)
}
type longhornClient interface {
SnapshotBackup(ctx context.Context, volume, name string, labels map[string]string, backupMode string) (*longhorn.Volume, error)
GetVolume(ctx context.Context, volume string) (*longhorn.Volume, error)
CreateVolumeFromBackup(ctx context.Context, name, size string, replicas int, backupURL string) (*longhorn.Volume, error)
CreatePVC(ctx context.Context, volumeName, namespace, pvcName string) error
DeleteVolume(ctx context.Context, volumeName string) error
FindBackup(ctx context.Context, volumeName, snapshot string) (*longhorn.Backup, error)
ListBackups(ctx context.Context, volumeName string) ([]longhorn.Backup, error)
}
type Server struct {
cfg *config.Config
client *k8s.Client
longhorn *longhorn.Client
mux *http.ServeMux
client kubeClient
longhorn longhornClient
metrics *telemetry
handler http.Handler
}
type authIdentity struct {
Authenticated bool
User string
Email string
Groups []string
}
type ctxKey string
const authContextKey ctxKey = "soteria-auth"
func New(cfg *config.Config, client *k8s.Client, lh *longhorn.Client) *Server {
s := &Server{
cfg: cfg,
client: client,
longhorn: lh,
mux: http.NewServeMux(),
metrics: newTelemetry(),
}
s.mux.HandleFunc("/healthz", s.handleHealth)
s.mux.HandleFunc("/readyz", s.handleReady)
s.mux.HandleFunc("/v1/backup", s.handleBackup)
s.mux.HandleFunc("/v1/restore-test", s.handleRestore)
s.handler = http.HandlerFunc(s.route)
return s
}
func (s *Server) Handler() http.Handler {
return s.mux
func (s *Server) Start(ctx context.Context) {
s.refreshTelemetry(ctx)
ticker := time.NewTicker(s.cfg.MetricsRefreshInterval)
go func() {
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
s.refreshTelemetry(ctx)
}
}
}()
}
func (s *Server) handleHealth(w http.ResponseWriter, r *http.Request) {
func (s *Server) Handler() http.Handler {
return s.handler
}
func (s *Server) route(w http.ResponseWriter, r *http.Request) {
switch r.URL.Path {
case "/healthz":
s.handleHealth(w, r)
return
case "/readyz":
s.handleReady(w, r)
return
case "/metrics":
s.metrics.Handler().ServeHTTP(w, r)
return
}
identity, status, err := s.authorize(r)
if err != nil {
s.metrics.RecordAuthzDenied(authzReason(status, err))
writeError(w, status, err.Error())
return
}
r = r.WithContext(context.WithValue(r.Context(), authContextKey, identity))
switch r.URL.Path {
case "/":
s.handleUI(w, r)
case "/v1/whoami":
s.handleWhoAmI(w, r)
case "/v1/inventory":
s.handleInventory(w, r)
case "/v1/backups":
s.handleBackups(w, r)
case "/v1/backup":
s.handleBackup(w, r)
case "/v1/restores", "/v1/restore-test":
s.handleRestore(w, r)
default:
writeError(w, http.StatusNotFound, "not found")
}
}
func (s *Server) handleHealth(w http.ResponseWriter, _ *http.Request) {
writeJSON(w, http.StatusOK, map[string]string{"status": "ok"})
}
func (s *Server) handleReady(w http.ResponseWriter, r *http.Request) {
func (s *Server) handleReady(w http.ResponseWriter, _ *http.Request) {
writeJSON(w, http.StatusOK, map[string]string{"status": "ready"})
}
func (s *Server) handleUI(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodGet {
writeError(w, http.StatusMethodNotAllowed, "method not allowed")
return
}
w.Header().Set("Content-Type", "text/html; charset=utf-8")
_, _ = w.Write([]byte(uiHTML))
}
func (s *Server) handleWhoAmI(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodGet {
writeError(w, http.StatusMethodNotAllowed, "method not allowed")
return
}
identity := requesterFromContext(r.Context())
writeJSON(w, http.StatusOK, api.AuthInfoResponse{
Authenticated: identity.Authenticated,
User: identity.User,
Email: identity.Email,
Groups: identity.Groups,
})
}
func (s *Server) handleInventory(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodGet {
writeError(w, http.StatusMethodNotAllowed, "method not allowed")
return
}
inventory, err := s.buildInventory(r.Context())
if err != nil {
writeError(w, http.StatusBadGateway, err.Error())
return
}
writeJSON(w, http.StatusOK, inventory)
}
func (s *Server) handleBackups(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodGet {
writeError(w, http.StatusMethodNotAllowed, "method not allowed")
return
}
namespace := strings.TrimSpace(r.URL.Query().Get("namespace"))
pvcName := strings.TrimSpace(r.URL.Query().Get("pvc"))
if namespace == "" || pvcName == "" {
writeError(w, http.StatusBadRequest, "namespace and pvc are required")
return
}
volumeName, _, _, err := s.client.ResolvePVCVolume(r.Context(), namespace, pvcName)
if err != nil {
writeError(w, http.StatusBadRequest, err.Error())
return
}
backups, err := s.longhorn.ListBackups(r.Context(), volumeName)
if err != nil {
writeError(w, http.StatusBadGateway, err.Error())
return
}
writeJSON(w, http.StatusOK, api.BackupListResponse{
Namespace: namespace,
PVC: pvcName,
Volume: volumeName,
Backups: buildBackupRecords(backups),
})
}
func (s *Server) handleBackup(w http.ResponseWriter, r *http.Request) {
if r.Method != http.MethodPost {
writeError(w, http.StatusMethodNotAllowed, "method not allowed")
@ -57,61 +214,83 @@ func (s *Server) handleBackup(w http.ResponseWriter, r *http.Request) {
r.Body = http.MaxBytesReader(w, r.Body, 1<<20)
var req api.BackupRequest
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
s.metrics.RecordBackupRequest(s.cfg.BackupDriver, "invalid_json")
writeError(w, http.StatusBadRequest, fmt.Sprintf("invalid JSON: %v", err))
return
}
if strings.TrimSpace(req.Namespace) == "" || strings.TrimSpace(req.PVC) == "" {
s.metrics.RecordBackupRequest(s.cfg.BackupDriver, "validation_error")
writeError(w, http.StatusBadRequest, "namespace and pvc are required")
return
}
requester := currentRequester(r.Context())
switch s.cfg.BackupDriver {
case "longhorn":
volumeName, _, _, err := s.client.ResolvePVCVolume(r.Context(), req.Namespace, req.PVC)
if err != nil {
s.metrics.RecordBackupRequest("longhorn", "validation_error")
writeError(w, http.StatusBadRequest, err.Error())
return
}
backupName := backupName("backup", req.PVC)
backupName := backupName("backup", req.Namespace+"-"+req.PVC)
if req.DryRun {
s.metrics.RecordBackupRequest("longhorn", "dry_run")
writeJSON(w, http.StatusOK, api.BackupResponse{
Driver: "longhorn",
Volume: volumeName,
Backup: backupName,
Namespace: req.Namespace,
DryRun: true,
Driver: "longhorn",
Volume: volumeName,
Backup: backupName,
Namespace: req.Namespace,
RequestedBy: requester,
DryRun: true,
})
return
}
labels := map[string]string{
"soteria.bstein.dev/namespace": req.Namespace,
"soteria.bstein.dev/pvc": req.PVC,
"soteria.bstein.dev/namespace": req.Namespace,
"soteria.bstein.dev/pvc": req.PVC,
"soteria.bstein.dev/requested-by": requester,
}
if _, err := s.longhorn.SnapshotBackup(r.Context(), volumeName, backupName, labels, s.cfg.LonghornBackupMode); err != nil {
writeError(w, http.StatusBadRequest, err.Error())
s.metrics.RecordBackupRequest("longhorn", "backend_error")
writeError(w, http.StatusBadGateway, err.Error())
return
}
s.metrics.RecordBackupRequest("longhorn", "success")
writeJSON(w, http.StatusOK, api.BackupResponse{
Driver: "longhorn",
Volume: volumeName,
Backup: backupName,
Namespace: req.Namespace,
DryRun: false,
Driver: "longhorn",
Volume: volumeName,
Backup: backupName,
Namespace: req.Namespace,
RequestedBy: requester,
DryRun: false,
})
case "restic":
jobName, secretName, err := s.client.CreateBackupJob(r.Context(), s.cfg, req)
if err != nil {
s.metrics.RecordBackupRequest("restic", "backend_error")
writeError(w, http.StatusBadRequest, err.Error())
return
}
if req.DryRun {
s.metrics.RecordBackupRequest("restic", "dry_run")
} else {
s.metrics.RecordBackupRequest("restic", "success")
}
writeJSON(w, http.StatusOK, api.BackupResponse{
Driver: "restic",
JobName: jobName,
Namespace: req.Namespace,
Secret: secretName,
DryRun: req.DryRun,
Driver: "restic",
JobName: jobName,
Namespace: req.Namespace,
Secret: secretName,
RequestedBy: requester,
DryRun: req.DryRun,
})
default:
s.metrics.RecordBackupRequest(s.cfg.BackupDriver, "unsupported_driver")
writeError(w, http.StatusBadRequest, "unsupported backup driver")
}
}
@ -125,23 +304,48 @@ func (s *Server) handleRestore(w http.ResponseWriter, r *http.Request) {
r.Body = http.MaxBytesReader(w, r.Body, 1<<20)
var req api.RestoreTestRequest
if err := json.NewDecoder(r.Body).Decode(&req); err != nil {
s.metrics.RecordRestoreRequest(s.cfg.BackupDriver, "invalid_json")
writeError(w, http.StatusBadRequest, fmt.Sprintf("invalid JSON: %v", err))
return
}
if strings.TrimSpace(req.Namespace) == "" {
s.metrics.RecordRestoreRequest(s.cfg.BackupDriver, "validation_error")
writeError(w, http.StatusBadRequest, "namespace is required")
return
}
if strings.TrimSpace(req.PVC) == "" {
s.metrics.RecordRestoreRequest(s.cfg.BackupDriver, "validation_error")
writeError(w, http.StatusBadRequest, "pvc is required")
return
}
if strings.TrimSpace(req.TargetPVC) == "" {
s.metrics.RecordRestoreRequest(s.cfg.BackupDriver, "validation_error")
writeError(w, http.StatusBadRequest, "target_pvc is required")
return
}
if strings.TrimSpace(req.TargetNamespace) == "" {
req.TargetNamespace = req.Namespace
}
requester := currentRequester(r.Context())
switch s.cfg.BackupDriver {
case "longhorn":
if req.TargetPVC == "" {
writeError(w, http.StatusBadRequest, "target_pvc is required")
exists, err := s.client.PersistentVolumeClaimExists(r.Context(), req.TargetNamespace, req.TargetPVC)
if err != nil {
s.metrics.RecordRestoreRequest("longhorn", "validation_error")
writeError(w, http.StatusBadRequest, err.Error())
return
}
if req.PVC == "" {
writeError(w, http.StatusBadRequest, "pvc is required to locate backup volume")
if exists {
s.metrics.RecordRestoreRequest("longhorn", "conflict")
writeError(w, http.StatusConflict, fmt.Sprintf("target pvc %s/%s already exists", req.TargetNamespace, req.TargetPVC))
return
}
volumeName, _, _, err := s.client.ResolvePVCVolume(r.Context(), req.Namespace, req.PVC)
if err != nil {
s.metrics.RecordRestoreRequest("longhorn", "validation_error")
writeError(w, http.StatusBadRequest, err.Error())
return
}
@ -150,32 +354,38 @@ func (s *Server) handleRestore(w http.ResponseWriter, r *http.Request) {
if backupURL == "" {
backup, err := s.longhorn.FindBackup(r.Context(), volumeName, req.Snapshot)
if err != nil {
s.metrics.RecordRestoreRequest("longhorn", "validation_error")
writeError(w, http.StatusBadRequest, err.Error())
return
}
backupURL = backup.URL
}
if backupURL == "" {
s.metrics.RecordRestoreRequest("longhorn", "validation_error")
writeError(w, http.StatusBadRequest, "backup_url is required")
return
}
restoreVolumeName := backupName("restore", req.TargetPVC)
restoreVolumeName := backupName("restore", req.TargetNamespace+"-"+req.TargetPVC)
if req.DryRun {
s.metrics.RecordRestoreRequest("longhorn", "dry_run")
writeJSON(w, http.StatusOK, api.RestoreTestResponse{
Driver: "longhorn",
Volume: restoreVolumeName,
TargetPVC: req.TargetPVC,
BackupURL: backupURL,
Namespace: req.Namespace,
DryRun: true,
Driver: "longhorn",
Volume: restoreVolumeName,
TargetNamespace: req.TargetNamespace,
TargetPVC: req.TargetPVC,
BackupURL: backupURL,
Namespace: req.Namespace,
RequestedBy: requester,
DryRun: true,
})
return
}
sourceVolume, err := s.longhorn.GetVolume(r.Context(), volumeName)
if err != nil {
writeError(w, http.StatusBadRequest, err.Error())
s.metrics.RecordRestoreRequest("longhorn", "backend_error")
writeError(w, http.StatusBadGateway, err.Error())
return
}
replicas := sourceVolume.NumberOfReplicas
@ -184,41 +394,322 @@ func (s *Server) handleRestore(w http.ResponseWriter, r *http.Request) {
}
if _, err := s.longhorn.CreateVolumeFromBackup(r.Context(), restoreVolumeName, sourceVolume.Size, replicas, backupURL); err != nil {
writeError(w, http.StatusBadRequest, err.Error())
s.metrics.RecordRestoreRequest("longhorn", "backend_error")
writeError(w, http.StatusBadGateway, err.Error())
return
}
if err := s.longhorn.CreatePVC(r.Context(), restoreVolumeName, req.Namespace, req.TargetPVC); err != nil {
writeError(w, http.StatusBadRequest, err.Error())
if err := s.longhorn.CreatePVC(r.Context(), restoreVolumeName, req.TargetNamespace, req.TargetPVC); err != nil {
cleanupErr := s.longhorn.DeleteVolume(r.Context(), restoreVolumeName)
if cleanupErr != nil {
log.Printf("restore cleanup failed for %s: %v", restoreVolumeName, cleanupErr)
writeError(w, http.StatusBadGateway, fmt.Sprintf("create restore pvc: %v (cleanup failed: %v)", err, cleanupErr))
} else {
writeError(w, http.StatusBadGateway, fmt.Sprintf("create restore pvc: %v", err))
}
s.metrics.RecordRestoreRequest("longhorn", "backend_error")
return
}
s.metrics.RecordRestoreRequest("longhorn", "success")
writeJSON(w, http.StatusOK, api.RestoreTestResponse{
Driver: "longhorn",
Volume: restoreVolumeName,
TargetPVC: req.TargetPVC,
BackupURL: backupURL,
Namespace: req.Namespace,
DryRun: false,
Driver: "longhorn",
Volume: restoreVolumeName,
TargetNamespace: req.TargetNamespace,
TargetPVC: req.TargetPVC,
BackupURL: backupURL,
Namespace: req.Namespace,
RequestedBy: requester,
DryRun: false,
})
case "restic":
jobName, secretName, err := s.client.CreateRestoreJob(r.Context(), s.cfg, req)
if err != nil {
s.metrics.RecordRestoreRequest("restic", "backend_error")
writeError(w, http.StatusBadRequest, err.Error())
return
}
if req.DryRun {
s.metrics.RecordRestoreRequest("restic", "dry_run")
} else {
s.metrics.RecordRestoreRequest("restic", "success")
}
writeJSON(w, http.StatusOK, api.RestoreTestResponse{
Driver: "restic",
JobName: jobName,
Namespace: req.Namespace,
Secret: secretName,
DryRun: req.DryRun,
Driver: "restic",
JobName: jobName,
Namespace: req.Namespace,
TargetNamespace: req.TargetNamespace,
TargetPVC: req.TargetPVC,
Secret: secretName,
RequestedBy: requester,
DryRun: req.DryRun,
})
default:
s.metrics.RecordRestoreRequest(s.cfg.BackupDriver, "unsupported_driver")
writeError(w, http.StatusBadRequest, "unsupported backup driver")
}
}
func (s *Server) buildInventory(ctx context.Context) (api.InventoryResponse, error) {
pvcs, err := s.client.ListBoundPVCs(ctx)
if err != nil {
return api.InventoryResponse{}, err
}
groups := make(map[string][]api.PVCInventory)
for _, summary := range pvcs {
entry := api.PVCInventory{
Namespace: summary.Namespace,
PVC: summary.Name,
Volume: summary.VolumeName,
Phase: summary.Phase,
StorageClass: summary.StorageClass,
Capacity: summary.Capacity,
AccessModes: summary.AccessModes,
Driver: s.cfg.BackupDriver,
}
s.enrichPVCInventory(ctx, &entry)
groups[summary.Namespace] = append(groups[summary.Namespace], entry)
}
namespaceNames := make([]string, 0, len(groups))
for namespace := range groups {
namespaceNames = append(namespaceNames, namespace)
}
sort.Strings(namespaceNames)
response := api.InventoryResponse{
GeneratedAt: time.Now().UTC().Format(time.RFC3339),
Namespaces: make([]api.NamespaceInventory, 0, len(namespaceNames)),
}
for _, namespace := range namespaceNames {
response.Namespaces = append(response.Namespaces, api.NamespaceInventory{
Name: namespace,
PVCs: groups[namespace],
})
}
return response, nil
}
func (s *Server) enrichPVCInventory(ctx context.Context, entry *api.PVCInventory) {
switch s.cfg.BackupDriver {
case "longhorn":
backups, err := s.longhorn.ListBackups(ctx, entry.Volume)
if err != nil {
entry.Healthy = false
entry.HealthReason = "lookup_failed"
entry.Error = err.Error()
return
}
entry.BackupCount = len(backups)
latest, latestTime, ok := latestCompletedBackup(backups)
if !ok {
entry.Healthy = false
if len(backups) == 0 {
entry.HealthReason = "missing"
} else {
entry.HealthReason = "no_completed"
}
return
}
entry.LastBackupAt = latest.Created
if latestTime.IsZero() {
entry.Healthy = false
entry.HealthReason = "unknown_timestamp"
return
}
entry.LastBackupAgeHours = roundHours(time.Since(latestTime).Hours())
entry.Healthy = time.Since(latestTime) <= s.cfg.BackupMaxAge
if entry.Healthy {
entry.HealthReason = "fresh"
} else {
entry.HealthReason = "stale"
}
case "restic":
entry.Healthy = false
entry.HealthReason = "inventory_unavailable"
entry.Error = "restic inventory telemetry is not implemented yet"
default:
entry.Healthy = false
entry.HealthReason = "unsupported_driver"
}
}
func (s *Server) refreshTelemetry(ctx context.Context) {
refreshCtx, cancel := context.WithTimeout(ctx, 2*time.Minute)
defer cancel()
inventory, err := s.buildInventory(refreshCtx)
if err != nil {
log.Printf("inventory refresh failed: %v", err)
s.metrics.RecordInventoryFailure()
return
}
s.metrics.RecordInventory(inventory)
}
func (s *Server) authorize(r *http.Request) (authIdentity, int, error) {
if !s.cfg.AuthRequired {
return authIdentity{}, http.StatusOK, nil
}
authorization := strings.TrimSpace(r.Header.Get("Authorization"))
if strings.HasPrefix(strings.ToLower(authorization), "bearer ") {
token := strings.TrimSpace(authorization[7:])
for _, expected := range s.cfg.AuthBearerTokens {
if token != "" && token == expected {
return authIdentity{Authenticated: true, User: "service-token", Groups: []string{"service-token"}}, http.StatusOK, nil
}
}
}
identity := authIdentity{
Authenticated: true,
User: strings.TrimSpace(r.Header.Get("X-Auth-Request-User")),
Email: strings.TrimSpace(r.Header.Get("X-Auth-Request-Email")),
Groups: normalizeGroups(strings.Split(strings.TrimSpace(r.Header.Get("X-Auth-Request-Groups")), ",")),
}
if identity.User == "" && identity.Email == "" {
return authIdentity{}, http.StatusUnauthorized, fmt.Errorf("authentication required")
}
if len(s.cfg.AllowedGroups) == 0 {
return identity, http.StatusOK, nil
}
if hasAllowedGroup(identity.Groups, s.cfg.AllowedGroups) {
return identity, http.StatusOK, nil
}
return authIdentity{}, http.StatusForbidden, fmt.Errorf("access requires one of: %s", strings.Join(s.cfg.AllowedGroups, ", "))
}
func requesterFromContext(ctx context.Context) authIdentity {
identity, _ := ctx.Value(authContextKey).(authIdentity)
return identity
}
func currentRequester(ctx context.Context) string {
identity := requesterFromContext(ctx)
if identity.User != "" {
return identity.User
}
if identity.Email != "" {
return identity.Email
}
if identity.Authenticated {
return "authenticated"
}
return "anonymous"
}
func authzReason(status int, err error) string {
if err == nil {
return "unknown"
}
switch status {
case http.StatusUnauthorized:
return "unauthenticated"
case http.StatusForbidden:
return "forbidden_group"
default:
return "error"
}
}
func hasAllowedGroup(actual, allowed []string) bool {
allowedSet := make(map[string]struct{}, len(allowed))
for _, group := range normalizeGroups(allowed) {
allowedSet[group] = struct{}{}
}
for _, group := range normalizeGroups(actual) {
if _, ok := allowedSet[group]; ok {
return true
}
}
return false
}
func normalizeGroups(values []string) []string {
groups := make([]string, 0, len(values))
for _, value := range values {
value = strings.TrimSpace(strings.TrimPrefix(value, "/"))
if value == "" {
continue
}
groups = append(groups, value)
}
return groups
}
func buildBackupRecords(backups []longhorn.Backup) []api.BackupRecord {
records := make([]api.BackupRecord, 0, len(backups))
latestName := ""
if latest, _, ok := latestCompletedBackup(backups); ok {
latestName = latest.Name
}
sort.Slice(backups, func(i, j int) bool {
left, lok := parseBackupTime(backups[i].Created)
right, rok := parseBackupTime(backups[j].Created)
switch {
case lok && rok:
return left.After(right)
case lok:
return true
case rok:
return false
default:
return backups[i].Name > backups[j].Name
}
})
for _, backup := range backups {
records = append(records, api.BackupRecord{
Name: backup.Name,
SnapshotName: backup.SnapshotName,
Created: backup.Created,
State: backup.State,
URL: backup.URL,
Size: backup.Size,
Latest: backup.Name == latestName,
})
}
return records
}
func latestCompletedBackup(backups []longhorn.Backup) (longhorn.Backup, time.Time, bool) {
var selected longhorn.Backup
var selectedTime time.Time
found := false
for _, backup := range backups {
if backup.State != "Completed" {
continue
}
createdAt, ok := parseBackupTime(backup.Created)
if !ok {
if !found {
selected = backup
found = true
}
continue
}
if !found || createdAt.After(selectedTime) {
selected = backup
selectedTime = createdAt
found = true
}
}
return selected, selectedTime, found
}
func parseBackupTime(raw string) (time.Time, bool) {
layouts := []string{time.RFC3339Nano, time.RFC3339}
for _, layout := range layouts {
parsed, err := time.Parse(layout, raw)
if err == nil {
return parsed, true
}
}
return time.Time{}, false
}
func writeJSON(w http.ResponseWriter, status int, payload any) {
w.Header().Set("Content-Type", "application/json")
w.WriteHeader(status)
@ -254,3 +745,7 @@ func sanitizeName(value string) string {
value = strings.Trim(value, "-")
return value
}
func roundHours(value float64) float64 {
return math.Round(value*100) / 100
}

View File

@ -0,0 +1,161 @@
package server
import (
"context"
"encoding/json"
"net/http"
"net/http/httptest"
"strings"
"testing"
"time"
"scm.bstein.dev/bstein/soteria/internal/api"
"scm.bstein.dev/bstein/soteria/internal/config"
"scm.bstein.dev/bstein/soteria/internal/k8s"
"scm.bstein.dev/bstein/soteria/internal/longhorn"
corev1 "k8s.io/api/core/v1"
)
type fakeKubeClient struct {
pvcs []k8s.PVCSummary
targetExists bool
}
func (f *fakeKubeClient) ResolvePVCVolume(_ context.Context, namespace, pvcName string) (string, *corev1.PersistentVolumeClaim, *corev1.PersistentVolume, error) {
return namespace + "-" + pvcName + "-pv", nil, nil, nil
}
func (f *fakeKubeClient) CreateBackupJob(_ context.Context, _ *config.Config, _ api.BackupRequest) (string, string, error) {
return "backup-job", "backup-secret", nil
}
func (f *fakeKubeClient) CreateRestoreJob(_ context.Context, _ *config.Config, _ api.RestoreTestRequest) (string, string, error) {
return "restore-job", "restore-secret", nil
}
func (f *fakeKubeClient) ListBoundPVCs(_ context.Context) ([]k8s.PVCSummary, error) {
return f.pvcs, nil
}
func (f *fakeKubeClient) PersistentVolumeClaimExists(_ context.Context, _, _ string) (bool, error) {
return f.targetExists, nil
}
type fakeLonghornClient struct {
backups []longhorn.Backup
}
func (f *fakeLonghornClient) SnapshotBackup(_ context.Context, volume, name string, labels map[string]string, backupMode string) (*longhorn.Volume, error) {
return &longhorn.Volume{Name: volume}, nil
}
func (f *fakeLonghornClient) GetVolume(_ context.Context, volume string) (*longhorn.Volume, error) {
return &longhorn.Volume{Name: volume, Size: "1073741824", NumberOfReplicas: 2}, nil
}
func (f *fakeLonghornClient) CreateVolumeFromBackup(_ context.Context, name, size string, replicas int, backupURL string) (*longhorn.Volume, error) {
return &longhorn.Volume{Name: name, Size: size, NumberOfReplicas: replicas}, nil
}
func (f *fakeLonghornClient) CreatePVC(_ context.Context, volumeName, namespace, pvcName string) error {
return nil
}
func (f *fakeLonghornClient) DeleteVolume(_ context.Context, volumeName string) error {
return nil
}
func (f *fakeLonghornClient) FindBackup(_ context.Context, volumeName, snapshot string) (*longhorn.Backup, error) {
return &longhorn.Backup{Name: "backup-latest", URL: "s3://bucket/backup-latest", State: "Completed"}, nil
}
func (f *fakeLonghornClient) ListBackups(_ context.Context, volumeName string) ([]longhorn.Backup, error) {
return f.backups, nil
}
func TestProtectedInventoryRequiresAuth(t *testing.T) {
srv := &Server{
cfg: &config.Config{AuthRequired: true, AllowedGroups: []string{"admin", "maintenance"}, BackupDriver: "longhorn"},
client: &fakeKubeClient{},
longhorn: &fakeLonghornClient{},
metrics: newTelemetry(),
}
srv.handler = http.HandlerFunc(srv.route)
req := httptest.NewRequest(http.MethodGet, "/v1/inventory", nil)
res := httptest.NewRecorder()
srv.Handler().ServeHTTP(res, req)
if res.Code != http.StatusUnauthorized {
t.Fatalf("expected 401, got %d", res.Code)
}
}
func TestProtectedInventoryAllowsMaintenanceGroup(t *testing.T) {
srv := &Server{
cfg: &config.Config{AuthRequired: true, AllowedGroups: []string{"admin", "maintenance"}, BackupDriver: "longhorn", BackupMaxAge: 24 * time.Hour},
client: &fakeKubeClient{pvcs: []k8s.PVCSummary{{Namespace: "apps", Name: "data", VolumeName: "pv-apps-data", Phase: "Bound"}}},
longhorn: &fakeLonghornClient{backups: []longhorn.Backup{{Name: "backup-1", Created: "2026-04-12T00:00:00Z", State: "Completed", URL: "s3://bucket/backup-1"}}},
metrics: newTelemetry(),
}
srv.handler = http.HandlerFunc(srv.route)
req := httptest.NewRequest(http.MethodGet, "/v1/inventory", nil)
req.Header.Set("X-Auth-Request-User", "brad")
req.Header.Set("X-Auth-Request-Groups", "/maintenance")
res := httptest.NewRecorder()
srv.Handler().ServeHTTP(res, req)
if res.Code != http.StatusOK {
t.Fatalf("expected 200, got %d: %s", res.Code, res.Body.String())
}
var payload api.InventoryResponse
if err := json.NewDecoder(strings.NewReader(res.Body.String())).Decode(&payload); err != nil {
t.Fatalf("decode inventory: %v", err)
}
if len(payload.Namespaces) != 1 || payload.Namespaces[0].Name != "apps" {
t.Fatalf("unexpected inventory payload: %#v", payload)
}
}
func TestRestoreRejectsExistingTargetPVC(t *testing.T) {
srv := &Server{
cfg: &config.Config{AuthRequired: false, BackupDriver: "longhorn"},
client: &fakeKubeClient{targetExists: true},
longhorn: &fakeLonghornClient{},
metrics: newTelemetry(),
}
srv.handler = http.HandlerFunc(srv.route)
body := `{"namespace":"apps","pvc":"data","target_namespace":"apps","target_pvc":"restore-data"}`
req := httptest.NewRequest(http.MethodPost, "/v1/restores", strings.NewReader(body))
res := httptest.NewRecorder()
srv.Handler().ServeHTTP(res, req)
if res.Code != http.StatusConflict {
t.Fatalf("expected 409, got %d: %s", res.Code, res.Body.String())
}
}
func TestMetricsStayPublic(t *testing.T) {
srv := &Server{
cfg: &config.Config{AuthRequired: true, AllowedGroups: []string{"admin"}},
client: &fakeKubeClient{},
longhorn: &fakeLonghornClient{},
metrics: newTelemetry(),
}
srv.handler = http.HandlerFunc(srv.route)
req := httptest.NewRequest(http.MethodGet, "/metrics", nil)
res := httptest.NewRecorder()
srv.Handler().ServeHTTP(res, req)
if res.Code != http.StatusOK {
t.Fatalf("expected 200, got %d", res.Code)
}
if !strings.Contains(res.Body.String(), "soteria_backup_requests_total") {
t.Fatalf("expected prometheus metrics body, got %q", res.Body.String())
}
}

6
internal/server/ui.go Normal file
View File

@ -0,0 +1,6 @@
package server
import _ "embed"
//go:embed ui.html
var uiHTML string

336
internal/server/ui.html Normal file
View File

@ -0,0 +1,336 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>Soteria Backup Console</title>
<style>
:root {
color-scheme: light;
--bg: #f4efe4;
--card: #fffaf1;
--ink: #1d1d1b;
--muted: #5f625b;
--line: #d8cdb8;
--accent: #0f766e;
--warn: #c2410c;
--good: #166534;
--bad: #991b1b;
}
* { box-sizing: border-box; }
body {
margin: 0;
font-family: "Iowan Old Style", "Palatino Linotype", serif;
color: var(--ink);
background: radial-gradient(circle at top right, #f9e8c8 0, var(--bg) 45%), var(--bg);
}
header {
padding: 24px;
border-bottom: 1px solid var(--line);
background: rgba(255,250,241,0.92);
backdrop-filter: blur(8px);
position: sticky;
top: 0;
z-index: 10;
}
h1 { margin: 0 0 6px; font-size: 2rem; }
.sub { color: var(--muted); margin: 0; }
.topbar, main {
width: min(1200px, calc(100vw - 32px));
margin: 0 auto;
}
.topbar {
display: flex;
gap: 16px;
align-items: center;
justify-content: space-between;
flex-wrap: wrap;
}
main {
display: grid;
grid-template-columns: 1.4fr 1fr;
gap: 20px;
padding: 20px 0 40px;
}
.panel {
background: var(--card);
border: 1px solid var(--line);
border-radius: 18px;
padding: 18px;
box-shadow: 0 10px 24px rgba(38, 35, 25, 0.06);
}
.panel h2, .panel h3 { margin-top: 0; }
.badge {
display: inline-flex;
align-items: center;
gap: 8px;
border-radius: 999px;
padding: 6px 10px;
font-size: 0.92rem;
background: #efe7d6;
}
.badge.good { background: #dcfce7; color: var(--good); }
.badge.bad { background: #fee2e2; color: var(--bad); }
button {
border: 0;
border-radius: 999px;
padding: 10px 14px;
font: inherit;
cursor: pointer;
background: var(--accent);
color: white;
}
button.secondary {
background: #efe7d6;
color: var(--ink);
}
button:disabled { opacity: 0.6; cursor: wait; }
.namespace {
margin-bottom: 18px;
padding-bottom: 18px;
border-bottom: 1px dashed var(--line);
}
.namespace:last-child { border-bottom: 0; margin-bottom: 0; padding-bottom: 0; }
.pvc {
border: 1px solid var(--line);
border-radius: 14px;
padding: 14px;
margin: 10px 0;
background: rgba(255,255,255,0.55);
}
.row, .actions {
display: flex;
gap: 10px;
align-items: center;
flex-wrap: wrap;
}
.meta {
color: var(--muted);
font-size: 0.95rem;
}
.stack { display: grid; gap: 12px; }
label { display: grid; gap: 6px; font-weight: 600; }
input, select {
width: 100%;
border: 1px solid var(--line);
border-radius: 12px;
padding: 10px 12px;
font: inherit;
background: white;
}
pre {
margin: 0;
padding: 12px;
border-radius: 12px;
background: #171717;
color: #f7f7f7;
overflow: auto;
font-size: 0.9rem;
}
.muted { color: var(--muted); }
.error { color: var(--bad); }
@media (max-width: 900px) {
main { grid-template-columns: 1fr; }
h1 { font-size: 1.7rem; }
}
</style>
</head>
<body>
<header>
<div class="topbar">
<div>
<h1>Soteria Backup Console</h1>
<p class="sub">Namespace-grouped PVC backup and restore control plane for Atlas.</p>
</div>
<div class="row">
<span id="auth-pill" class="badge">Checking access...</span>
<button id="refresh-btn" class="secondary">Refresh inventory</button>
</div>
</div>
</header>
<main>
<section class="panel">
<div class="row" style="justify-content: space-between; margin-bottom: 12px;">
<h2 style="margin-bottom: 0;">PVC Inventory</h2>
<span id="generated-at" class="muted"></span>
</div>
<div id="inventory" class="stack">
<p class="muted">Loading PVC inventory...</p>
</div>
</section>
<aside class="stack">
<section class="panel">
<h2>Restore Workspace</h2>
<div id="details" class="muted">Choose a PVC to inspect backups or prepare a restore.</div>
</section>
<section class="panel">
<h2>Last Action</h2>
<pre id="result">No action yet.</pre>
</section>
</aside>
</main>
<script>
const inventoryEl = document.getElementById('inventory');
const detailsEl = document.getElementById('details');
const resultEl = document.getElementById('result');
const generatedAtEl = document.getElementById('generated-at');
const authPillEl = document.getElementById('auth-pill');
const refreshBtn = document.getElementById('refresh-btn');
function escapeHtml(value) {
return String(value || '')
.replaceAll('&', '&amp;')
.replaceAll('<', '&lt;')
.replaceAll('>', '&gt;')
.replaceAll('"', '&quot;')
.replaceAll("'", '&#39;');
}
function showResult(payload) {
resultEl.textContent = typeof payload === 'string' ? payload : JSON.stringify(payload, null, 2);
}
async function fetchJSON(url, options) {
const response = await fetch(url, options);
const text = await response.text();
let payload = text;
try { payload = text ? JSON.parse(text) : {}; } catch (_) {}
if (!response.ok) {
const message = payload && payload.error ? payload.error : response.status + ' ' + response.statusText;
throw new Error(message);
}
return payload;
}
async function loadWhoAmI() {
try {
const who = await fetchJSON('/v1/whoami');
const label = who.authenticated
? (who.user || who.email || 'authenticated') + ' (' + ((who.groups || []).join(', ') || 'no groups') + ')'
: 'anonymous';
authPillEl.textContent = label;
authPillEl.className = 'badge ' + (who.authenticated ? 'good' : '');
} catch (error) {
authPillEl.textContent = error.message;
authPillEl.className = 'badge bad';
}
}
async function triggerBackup(namespace, pvc) {
try {
const payload = await fetchJSON('/v1/backup', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ namespace, pvc, dry_run: false })
});
showResult(payload);
await loadInventory();
} catch (error) {
showResult({ error: error.message, namespace, pvc });
}
}
async function showRestore(namespace, pvc) {
detailsEl.innerHTML = '<p class="muted">Loading backups...</p>';
try {
const payload = await fetchJSON('/v1/backups?namespace=' + encodeURIComponent(namespace) + '&pvc=' + encodeURIComponent(pvc));
const options = payload.backups
.filter((backup) => backup.state === 'Completed')
.map((backup) => '<option value="' + escapeHtml(backup.url) + '">' + escapeHtml(backup.name) + ' | ' + escapeHtml(backup.created || 'unknown time') + '</option>')
.join('');
detailsEl.innerHTML = [
'<div class="stack">',
'<div><h3 style="margin-bottom: 6px;">' + escapeHtml(namespace) + '/' + escapeHtml(pvc) + '</h3><p class="muted" style="margin-top: 0;">Source volume: ' + escapeHtml(payload.volume) + '</p></div>',
'<label>Backup snapshot<select id="restore-backup">' + (options || '<option value="">No completed backups available</option>') + '</select></label>',
'<label>Target namespace<input id="restore-namespace" value="' + escapeHtml(namespace) + '"></label>',
'<label>Target PVC name<input id="restore-pvc" value="restore-' + escapeHtml(pvc) + '"></label>',
'<div class="actions">',
'<button id="restore-submit"' + (options ? '' : ' disabled') + '>Create restore PVC</button>',
'<button id="restore-view" class="secondary">Show backup JSON</button>',
'</div></div>'
].join('');
document.getElementById('restore-submit').onclick = async () => {
const backupURL = document.getElementById('restore-backup').value;
const targetNamespace = document.getElementById('restore-namespace').value.trim();
const targetPVC = document.getElementById('restore-pvc').value.trim();
try {
const result = await fetchJSON('/v1/restores', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
namespace,
pvc,
backup_url: backupURL,
target_namespace: targetNamespace,
target_pvc: targetPVC,
dry_run: false
})
});
showResult(result);
await loadInventory();
} catch (error) {
showResult({ error: error.message, namespace, pvc, target_namespace: targetNamespace, target_pvc: targetPVC });
}
};
document.getElementById('restore-view').onclick = () => showResult(payload);
} catch (error) {
detailsEl.innerHTML = '<p class="error">' + escapeHtml(error.message) + '</p>';
}
}
function renderInventory(payload) {
generatedAtEl.textContent = payload.generated_at ? 'Updated ' + payload.generated_at : '';
if (!payload.namespaces || payload.namespaces.length === 0) {
inventoryEl.innerHTML = '<p class="muted">No bound PVCs found.</p>';
return;
}
inventoryEl.innerHTML = payload.namespaces.map((group) => {
const pvcs = group.pvcs.map((pvc) => {
const healthClass = pvc.healthy ? 'good' : 'bad';
const healthText = pvc.healthy ? 'Healthy backup window' : (pvc.health_reason || 'Needs attention');
const ageText = pvc.last_backup_at
? Number(pvc.last_backup_age_hours || 0).toFixed(1) + 'h since last success'
: 'No successful backup yet';
return [
'<article class="pvc">',
'<div class="row" style="justify-content: space-between; align-items: flex-start;">',
'<div><h3 style="margin: 0 0 4px;">' + escapeHtml(pvc.pvc) + '</h3><div class="meta">' + escapeHtml(pvc.volume) + ' • ' + escapeHtml(pvc.storage_class || 'no storage class') + ' • ' + escapeHtml(pvc.capacity || 'size unknown') + '</div></div>',
'<span class="badge ' + healthClass + '">' + escapeHtml(healthText) + '</span>',
'</div>',
'<p class="meta">' + escapeHtml(ageText) + (pvc.error ? ' • ' + escapeHtml(pvc.error) : '') + '</p>',
'<div class="actions">',
'<button data-action="backup" data-namespace="' + escapeHtml(pvc.namespace) + '" data-pvc="' + escapeHtml(pvc.pvc) + '">Backup now</button>',
'<button class="secondary" data-action="restore" data-namespace="' + escapeHtml(pvc.namespace) + '" data-pvc="' + escapeHtml(pvc.pvc) + '">Restore</button>',
'</div></article>'
].join('');
}).join('');
return '<section class="namespace"><h3>' + escapeHtml(group.name) + '</h3>' + pvcs + '</section>';
}).join('');
inventoryEl.querySelectorAll('button[data-action="backup"]').forEach((button) => {
button.addEventListener('click', () => triggerBackup(button.dataset.namespace, button.dataset.pvc));
});
inventoryEl.querySelectorAll('button[data-action="restore"]').forEach((button) => {
button.addEventListener('click', () => showRestore(button.dataset.namespace, button.dataset.pvc));
});
}
async function loadInventory() {
refreshBtn.disabled = true;
inventoryEl.innerHTML = '<p class="muted">Refreshing inventory...</p>';
try {
const payload = await fetchJSON('/v1/inventory');
renderInventory(payload);
} catch (error) {
inventoryEl.innerHTML = '<p class="error">' + escapeHtml(error.message) + '</p>';
} finally {
refreshBtn.disabled = false;
}
}
refreshBtn.addEventListener('click', loadInventory);
loadWhoAmI();
loadInventory();
</script>
</body>
</html>