Merge pull request 'feature/sso' (#4 ) from feature/sso into main

Reviewed-on: #4
zot: restore main branch config
2025-12-11 20:43:34 +00:00 · 2025-12-11 17:26:15 -03:00 · 2025-12-11 17:22:16 -03:00 · 2025-12-11 17:09:05 -03:00 · 2025-12-11 17:04:19 -03:00 · 2025-12-07 19:44:02 -03:00
31 changed files with 567 additions and 89 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@ -1,68 +0,0 @@
 Repository Guidelines
 ## Project Structure & Module Organization
 - `infrastructure/`: cluster-scoped building blocks (core, flux-system, traefik, longhorn). Add new platform features by mirroring this layout.
 - `services/`: workload manifests per app (`services/gitea/`, etc.) with `kustomization.yaml` plus one file per kind; keep diffs small and focused.
 - `dockerfiles/` hosts bespoke images, while `scripts/` stores operational Fish/Bash helpers—extend these directories instead of relying on ad-hoc commands.
 ## Build, Test, and Development Commands
 - `kustomize build services/<app>` (or `kubectl kustomize ...`) renders manifests exactly as Flux will.
 - `kubectl apply --server-side --dry-run=client -k services/<app>` checks schema compatibility without touching the cluster.
 - `flux reconcile kustomization <name> --namespace flux-system --with-source` pulls the latest Git state after merges or hotfixes.
 - `fish scripts/flux_hammer.fish --help` explains the recovery tool; read it before running against production workloads.
 ## Coding Style & Naming Conventions
 - YAML uses two-space indents; retain the leading path comment (e.g. `# services/gitea/deployment.yaml`) to speed code review.
 - Keep resource names lowercase kebab-case, align labels/selectors, and mirror namespaces with directory names.
 - List resources in `kustomization.yaml` from namespace/config, through storage, then workloads and networking for predictable diffs.
 - Scripts start with `#!/usr/bin/env fish` or bash, stay executable, and follow snake_case names such as `flux_hammer.fish`.
 ## Testing Guidelines
 - Run `kustomize build` and the dry-run apply for every service you touch; capture failures before opening a PR.
 - `flux diff kustomization <name> --path services/<app>` previews reconciliations—link notable output when behavior shifts.
 - Docker edits: `docker build -f dockerfiles/Dockerfile.monerod .` (swap the file you changed) to verify image builds.
 ## Commit & Pull Request Guidelines
 - Keep commit subjects short, present-tense, and optionally scoped (`gpu(titan-24): add RuntimeClass`); squash fixups before review.
 - Describe linked issues, affected services, and required operator steps (e.g. `flux reconcile kustomization services-gitea`) in the PR body.
 - Focus each PR on one kustomization or service and update `infrastructure/flux-system` when Flux must track new folders.
 - Record the validation you ran (dry-runs, diffs, builds) and add screenshots only when ingress or UI behavior changes.
 ## Security & Configuration Tips
 - Never commit credentials; use Vault workflows (`services/vault/`) or SOPS-encrypted manifests wired through `infrastructure/flux-system`.
 - Node selectors and tolerations gate workloads to hardware like `hardware: rpi4`; confirm labels before scaling or renaming nodes.
 - Pin external images by digest or rely on Flux image automation to follow approved tags and avoid drift.
 ## Dashboard roadmap / context (2025-12-02)
 - Atlas dashboards are generated via `scripts/dashboards_render_atlas.py --build`, which writes JSON under `services/monitoring/dashboards/` and ConfigMaps under `services/monitoring/`. Keep the Grafana manifests in sync by regenerating after edits.
 - Atlas Overview panels are paired with internal dashboards (pods, nodes, storage, network, GPU). A new `atlas-gpu` internal dashboard holds the detailed GPU metrics that feed the overview share pie.
 - Old Grafana folders (`Atlas Storage`, `Atlas SRE`, `Atlas Public`, `Atlas Nodes`) should be removed in Grafana UI when convenient; only `Atlas Overview` and `Atlas Internal` should remain provisioned.
 - Future work: add a separate generator (e.g., `dashboards_render_oceanus.py`) for SUI/oceanus validation dashboards, mirroring the atlas pattern of internal dashboards feeding a public overview.
 ## Monitoring state (2025-12-03)
 - dcgm-exporter DaemonSet pulls `registry.bstein.dev/monitoring/dcgm-exporter:4.4.2-4.7.0-ubuntu22.04` with nvidia runtime/imagePullSecret; titan-24 exports metrics, titan-22 remains NotReady.
 - Atlas Overview is the Grafana home (1h range, 1m refresh), Overview folder UID `overview`, internal folder `atlas-internal` (oceanus-internal stub).
 - Panels standardized via generator; hottest row compressed, worker/control rows taller, root disk row taller and top12 bar gauge with labels. GPU share pie uses 1h avg_over_time to persist idle activity.
 - Internal dashboards are provisioned without Viewer role; if anonymous still sees them, restart Grafana and tighten auth if needed.
 ## Upcoming priorities (SSO/storage/mail)
 - Establish SSO (Keycloak or similar) and federate Grafana, Gitea, Zot, Nextcloud, Pegasus/Jellyfin; keep Vaultwarden separate until safe.
 - Add Nextcloud (limit to rpi5 workers) with office suite; integrate with SSO; plan storage class and ingress.
 - Plan mail: mostly self-hosted, relay through trusted provider for outbound; integrate with services (Nextcloud, Vaultwarden, etc.) for notifications and account flows.
 ## SSO plan sketch (2025-12-03)
 - IdP: use Keycloak (preferred) in a new `sso` namespace, Bitnami or codecentric chart with Postgres backing store (single PVC), ingress `sso.bstein.dev`, admin user bound to brad@bstein.dev; stick with local DB initially (no external IdP).
 - Auth flow goals: Grafana (OIDC), Gitea (OAuth2/Keycloak), Zot (via Traefik forward-auth/oauth2-proxy), Jellyfin/Pegasus via Jellyfin OAuth/OpenID plugin (map existing usernames; run migration to pre-create users in Keycloak with same usernames/emails and temporary passwords), Pegasus keeps using Jellyfin tokens.
 - Steps to implement:
  1) Add service folder `services/keycloak/` (namespace, PVC, HelmRelease, ingress, secret for admin creds). Verify with kustomize + Flux reconcile.
  2) Seed realm `atlas` with users (import CSV/realm). Create client for Grafana (public/implicit), Gitea (confidential), and a “jellyfin” client for the OAuth plugin; set email for brad@bstein.dev as admin.
  3) Reconfigure Grafana to OIDC (disable anonymous to internal folders, leave Overview public via folder permissions). Reconfigure Gitea to OIDC (app.ini).
  4) Add Traefik forward-auth (oauth2-proxy) in front of Zot and any other services needing headers-based auth.
  5) Deploy Jellyfin OpenID plugin; map Keycloak users to existing Jellyfin usernames; communicate password reset path.
 - Migration caution: do not delete existing local creds until SSO validated; keep Pegasus working via Jellyfin tokens during transition.
 ## Postgres centralization (2025-12-03)
 - Prefer a shared in-cluster Postgres deployment with per-service databases to reduce resource sprawl on Pi nodes. Use it for services that can easily point at an external DB.
 - Candidates to migrate to shared Postgres: Keycloak (realm DB), Gitea (git DB), Nextcloud (app DB), possibly Grafana (if persistence needed beyond current provisioner), Jitsi prosody/JVB state (if external DB supported). Keep tightly-coupled or lightweight embedded DBs as-is when migration is painful or not supported.
--- a/clusters/atlas/flux-system/applications/keycloak/kustomization.yaml
+++ b/clusters/atlas/flux-system/applications/keycloak/kustomization.yaml
@ -0,0 +1,15 @@
 # clusters/atlas/flux-system/applications/keycloak/kustomization.yaml
 apiVersion: kustomize.toolkit.fluxcd.io/v1
 kind: Kustomization
 metadata:
  name: keycloak
  namespace: flux-system
 spec:
  interval: 10m
  prune: true
  sourceRef:
    kind: GitRepository
    name: flux-system
  path: ./services/keycloak
  targetNamespace: sso
  timeout: 2m
--- a/clusters/atlas/flux-system/applications/kustomization.yaml
+++ b/clusters/atlas/flux-system/applications/kustomization.yaml
@ -13,3 +13,5 @@ resources:
  - jellyfin/kustomization.yaml
  - xmr-miner/kustomization.yaml
  - sui-metrics/kustomization.yaml
  - keycloak/kustomization.yaml
  - oauth2-proxy/kustomization.yaml
--- a/clusters/atlas/flux-system/applications/oauth2-proxy/kustomization.yaml
+++ b/clusters/atlas/flux-system/applications/oauth2-proxy/kustomization.yaml
@ -0,0 +1,15 @@
 # clusters/atlas/flux-system/applications/oauth2-proxy/kustomization.yaml
 apiVersion: kustomize.toolkit.fluxcd.io/v1
 kind: Kustomization
 metadata:
  name: oauth2-proxy
  namespace: flux-system
 spec:
  interval: 10m
  prune: true
  sourceRef:
    kind: GitRepository
    name: flux-system
  path: ./services/oauth2-proxy
  targetNamespace: sso
  timeout: 2m
--- a/clusters/atlas/flux-system/gotk-sync.yaml
+++ b/clusters/atlas/flux-system/gotk-sync.yaml
@ -8,7 +8,7 @@ metadata:
 spec:
  interval: 1m0s
  ref:
-    branch: feature/atlas-monitoring
+    branch: feature/sso
  secretRef:
    name: flux-system-gitea
  url: ssh://git@scm.bstein.dev:2242/bstein/titan-iac.git
--- a/infrastructure/longhorn/ui-ingress/ingress.yaml
+++ b/infrastructure/longhorn/ui-ingress/ingress.yaml
@ -7,7 +7,7 @@ metadata:
  annotations:
    traefik.ingress.kubernetes.io/router.entrypoints: websecure
    traefik.ingress.kubernetes.io/router.tls: "true"
-    traefik.ingress.kubernetes.io/router.middlewares: longhorn-system-longhorn-basicauth@kubernetescrd,longhorn-system-longhorn-headers@kubernetescrd
+    traefik.ingress.kubernetes.io/router.middlewares: ""
 spec:
  ingressClassName: traefik
  tls:
@ -21,6 +21,6 @@ spec:
            pathType: Prefix
            backend:
              service:
-                name: longhorn-frontend
+                name: oauth2-proxy-longhorn
                port:
                  number: 80
--- a/infrastructure/longhorn/ui-ingress/kustomization.yaml
+++ b/infrastructure/longhorn/ui-ingress/kustomization.yaml
@ -4,3 +4,4 @@ kind: Kustomization
 resources:
  - middleware.yaml
  - ingress.yaml
  - oauth2-proxy-longhorn.yaml
--- a/infrastructure/longhorn/ui-ingress/middleware.yaml
+++ b/infrastructure/longhorn/ui-ingress/middleware.yaml
@ -20,3 +20,20 @@ spec:
  headers:
    customRequestHeaders:
      X-Forwarded-Proto: "https"
 ---
 apiVersion: traefik.io/v1alpha1
 kind: Middleware
 metadata:
  name: longhorn-forward-auth
  namespace: longhorn-system
 spec:
  forwardAuth:
    address: https://auth.bstein.dev/oauth2/auth
    trustForwardHeader: true
    authResponseHeaders:
      - Authorization
      - X-Auth-Request-Email
      - X-Auth-Request-User
      - X-Auth-Request-Groups
--- a/infrastructure/longhorn/ui-ingress/oauth2-proxy-longhorn.yaml
+++ b/infrastructure/longhorn/ui-ingress/oauth2-proxy-longhorn.yaml
@ -0,0 +1,102 @@
 # infrastructure/longhorn/ui-ingress/oauth2-proxy-longhorn.yaml
 apiVersion: v1
 kind: Service
 metadata:
  name: oauth2-proxy-longhorn
  namespace: longhorn-system
  labels:
    app: oauth2-proxy-longhorn
 spec:
  ports:
    - name: http
      port: 80
      targetPort: 4180
  selector:
    app: oauth2-proxy-longhorn
 ---
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: oauth2-proxy-longhorn
  namespace: longhorn-system
  labels:
    app: oauth2-proxy-longhorn
 spec:
  replicas: 2
  selector:
    matchLabels:
      app: oauth2-proxy-longhorn
  template:
    metadata:
      labels:
        app: oauth2-proxy-longhorn
    spec:
      nodeSelector:
        node-role.kubernetes.io/worker: "true"
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 90
              preference:
                matchExpressions:
                  - key: hardware
                    operator: In
                    values: ["rpi5","rpi4"]
      containers:
        - name: oauth2-proxy
          image: quay.io/oauth2-proxy/oauth2-proxy:v7.6.0
          imagePullPolicy: IfNotPresent
          args:
            - --provider=oidc
            - --redirect-url=https://longhorn.bstein.dev/oauth2/callback
            - --oidc-issuer-url=https://sso.bstein.dev/realms/atlas
            - --scope=openid profile email groups
            - --email-domain=*
            - --allowed-group=admin
            - --set-xauthrequest=true
            - --pass-access-token=true
            - --set-authorization-header=true
            - --cookie-secure=true
            - --cookie-samesite=lax
            - --cookie-refresh=20m
            - --cookie-expire=168h
            - --insecure-oidc-allow-unverified-email=true
            - --upstream=http://longhorn-frontend.longhorn-system.svc.cluster.local
            - --http-address=0.0.0.0:4180
            - --skip-provider-button=true
            - --skip-jwt-bearer-tokens=true
            - --oidc-groups-claim=groups
            - --cookie-domain=longhorn.bstein.dev
          env:
            - name: OAUTH2_PROXY_CLIENT_ID
              valueFrom:
                secretKeyRef:
                  name: oauth2-proxy-longhorn-oidc
                  key: client_id
            - name: OAUTH2_PROXY_CLIENT_SECRET
              valueFrom:
                secretKeyRef:
                  name: oauth2-proxy-longhorn-oidc
                  key: client_secret
            - name: OAUTH2_PROXY_COOKIE_SECRET
              valueFrom:
                secretKeyRef:
                  name: oauth2-proxy-longhorn-oidc
                  key: cookie_secret
          ports:
            - containerPort: 4180
              name: http
          readinessProbe:
            httpGet:
              path: /ping
              port: 4180
            initialDelaySeconds: 5
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /ping
              port: 4180
            initialDelaySeconds: 20
            periodSeconds: 20
--- a/scripts/dashboards_render_atlas.py
+++ b/scripts/dashboards_render_atlas.py
@ -232,7 +232,7 @@ NAMESPACE_GPU_ALLOC = (
    ' or kube_pod_container_resource_limits{namespace!="",resource="nvidia.com/gpu"})) by (namespace)'
 )
 NAMESPACE_GPU_USAGE_SHARE = (
-    'sum by (namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[1h]))'
+    'sum by (namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[$__range]))'
 )
 NAMESPACE_GPU_USAGE_INSTANT = 'sum(DCGM_FI_DEV_GPU_UTIL{namespace!="",pod!=""}) by (namespace)'
 NAMESPACE_GPU_RAW = (
--- a/services/keycloak/README.md
+++ b/services/keycloak/README.md
@ -0,0 +1,27 @@
 # services/keycloak
 Keycloak is deployed via raw manifests and backed by the shared Postgres (`postgres-service.postgres.svc.cluster.local:5432`). Create these secrets before applying:
 ```bash
 # DB creds (per-service DB/user in shared Postgres)
 kubectl -n sso create secret generic keycloak-db \
  --from-literal=username=keycloak \
  --from-literal=password='<DB_PASSWORD>' \
  --from-literal=database=keycloak
 # Admin console creds (maps to KC admin user)
 kubectl -n sso create secret generic keycloak-admin \
  --from-literal=username=brad@bstein.dev \
  --from-literal=password='<ADMIN_PASSWORD>'
 ```
 Apply:
 ```bash
 kubectl apply -k services/keycloak
 ```
 Notes
 - Service: `keycloak.sso.svc:80` (Ingress `sso.bstein.dev`, TLS via cert-manager).
 - Uses Postgres schema `public`; DB/user should be provisioned in the shared Postgres instance.
 - Health endpoints on :9000 are wired for probes.
--- a/services/keycloak/deployment.yaml
+++ b/services/keycloak/deployment.yaml
@ -0,0 +1,132 @@
 # services/keycloak/deployment.yaml
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: keycloak
  namespace: sso
  labels:
    app: keycloak
 spec:
  replicas: 1
  selector:
    matchLabels:
      app: keycloak
  template:
    metadata:
      labels:
        app: keycloak
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: hardware
                    operator: In
                    values: ["rpi5","rpi4"]
                  - key: node-role.kubernetes.io/worker
                    operator: Exists
              - matchExpressions:
                  - key: kubernetes.io/hostname
                    operator: In
                    values: ["titan-24"]
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 90
              preference:
                matchExpressions:
                  - key: hardware
                    operator: In
                    values: ["rpi5"]
            - weight: 70
              preference:
                matchExpressions:
                  - key: hardware
                    operator: In
                    values: ["rpi4"]
      securityContext:
        runAsUser: 1000
        runAsGroup: 0
        fsGroup: 1000
        fsGroupChangePolicy: OnRootMismatch
      containers:
        - name: keycloak
          image: quay.io/keycloak/keycloak:26.0.7
          imagePullPolicy: IfNotPresent
          args:
            - start
          env:
            - name: KC_DB
              value: postgres
            - name: KC_DB_URL_HOST
              value: postgres-service.postgres.svc.cluster.local
            - name: KC_DB_URL_DATABASE
              valueFrom:
                secretKeyRef:
                  name: keycloak-db
                  key: database
            - name: KC_DB_USERNAME
              valueFrom:
                secretKeyRef:
                  name: keycloak-db
                  key: username
            - name: KC_DB_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: keycloak-db
                  key: password
            - name: KC_DB_SCHEMA
              value: public
            - name: KC_HOSTNAME
              value: sso.bstein.dev
            - name: KC_HOSTNAME_URL
              value: https://sso.bstein.dev
            - name: KC_PROXY
              value: edge
            - name: KC_PROXY_HEADERS
              value: xforwarded
            - name: KC_HTTP_ENABLED
              value: "true"
            - name: KC_HTTP_MANAGEMENT_PORT
              value: "9000"
            - name: KC_HTTP_MANAGEMENT_BIND_ADDRESS
              value: 0.0.0.0
            - name: KC_HEALTH_ENABLED
              value: "true"
            - name: KC_METRICS_ENABLED
              value: "true"
            - name: KEYCLOAK_ADMIN
              valueFrom:
                secretKeyRef:
                  name: keycloak-admin
                  key: username
            - name: KEYCLOAK_ADMIN_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: keycloak-admin
                  key: password
          ports:
            - containerPort: 8080
              name: http
            - containerPort: 9000
              name: metrics
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 9000
            initialDelaySeconds: 15
            periodSeconds: 10
            failureThreshold: 6
          livenessProbe:
            httpGet:
              path: /health/live
              port: 9000
            initialDelaySeconds: 60
            periodSeconds: 15
            failureThreshold: 6
          volumeMounts:
            - name: data
              mountPath: /opt/keycloak/data
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: keycloak-data
--- a/services/keycloak/ingress.yaml
+++ b/services/keycloak/ingress.yaml
@ -0,0 +1,24 @@
 # services/keycloak/ingress.yaml
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
  name: keycloak
  namespace: sso
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt
 spec:
  ingressClassName: traefik
  rules:
    - host: sso.bstein.dev
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: keycloak
                port:
                  number: 80
  tls:
    - hosts: [sso.bstein.dev]
      secretName: keycloak-tls
--- a/services/keycloak/kustomization.yaml
+++ b/services/keycloak/kustomization.yaml
@ -0,0 +1,10 @@
 # services/keycloak/kustomization.yaml
 apiVersion: kustomize.config.k8s.io/v1beta1
 kind: Kustomization
 namespace: sso
 resources:
  - namespace.yaml
  - pvc.yaml
  - deployment.yaml
  - service.yaml
  - ingress.yaml
--- a/services/keycloak/namespace.yaml
+++ b/services/keycloak/namespace.yaml
@ -0,0 +1,5 @@
 # services/keycloak/namespace.yaml
 apiVersion: v1
 kind: Namespace
 metadata:
  name: sso
--- a/services/keycloak/pvc.yaml
+++ b/services/keycloak/pvc.yaml
@ -0,0 +1,12 @@
 # services/keycloak/pvc.yaml
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
  name: keycloak-data
  namespace: sso
 spec:
  accessModes: ["ReadWriteOnce"]
  resources:
    requests:
      storage: 10Gi
  storageClassName: astreae
--- a/services/keycloak/service.yaml
+++ b/services/keycloak/service.yaml
@ -0,0 +1,15 @@
 # services/keycloak/service.yaml
 apiVersion: v1
 kind: Service
 metadata:
  name: keycloak
  namespace: sso
  labels:
    app: keycloak
 spec:
  selector:
    app: keycloak
  ports:
    - name: http
      port: 80
      targetPort: http
--- a/services/monitoring/dashboards/atlas-gpu.json
+++ b/services/monitoring/dashboards/atlas-gpu.json
@ -20,7 +20,7 @@
      },
      "targets": [
        {
-          "expr": "100 * ( ( (sum by (namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[1h]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ) / clamp_min(sum( ( (sum by (namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[1h]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ), 1)",
+          "expr": "100 * ( ( (sum by (namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[$__range]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ) / clamp_min(sum( ( (sum by (namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[$__range]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ), 1)",
          "refId": "A",
          "legendFormat": "{{namespace}}"
        }
--- a/services/monitoring/dashboards/atlas-overview.json
+++ b/services/monitoring/dashboards/atlas-overview.json
@ -975,7 +975,7 @@
      },
      "targets": [
        {
-          "expr": "100 * ( ( (sum by (namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[1h]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ) / clamp_min(sum( ( (sum by (namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[1h]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ), 1)",
+          "expr": "100 * ( ( (sum by (namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[$__range]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ) / clamp_min(sum( ( (sum by (namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[$__range]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ), 1)",
          "refId": "A",
          "legendFormat": "{{namespace}}"
        }
--- a/services/monitoring/grafana-dashboard-gpu.yaml
+++ b/services/monitoring/grafana-dashboard-gpu.yaml
@ -29,7 +29,7 @@ data:
          },
          "targets": [
            {
-              "expr": "100 * ( ( (sum by (namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[1h]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ) / clamp_min(sum( ( (sum by (namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[1h]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ), 1)",
+              "expr": "100 * ( ( (sum by (namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[$__range]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ) / clamp_min(sum( ( (sum by (namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[$__range]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ), 1)",
              "refId": "A",
              "legendFormat": "{{namespace}}"
            }
--- a/services/monitoring/grafana-dashboard-overview.yaml
+++ b/services/monitoring/grafana-dashboard-overview.yaml
@ -984,7 +984,7 @@ data:
          },
          "targets": [
            {
-              "expr": "100 * ( ( (sum by (namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[1h]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ) / clamp_min(sum( ( (sum by (namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[1h]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ), 1)",
+              "expr": "100 * ( ( (sum by (namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[$__range]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ) / clamp_min(sum( ( (sum by (namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[$__range]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ), 1)",
              "refId": "A",
              "legendFormat": "{{namespace}}"
            }
--- a/services/monitoring/helmrelease.yaml
+++ b/services/monitoring/helmrelease.yaml
@ -249,9 +249,27 @@ spec:
    service:
      type: ClusterIP
    env:
-      GF_AUTH_ANONYMOUS_ENABLED: "true"
+      GF_AUTH_ANONYMOUS_ENABLED: "false"
      GF_AUTH_ANONYMOUS_ORG_ROLE: Viewer
      GF_SECURITY_ALLOW_EMBEDDING: "true"
      GF_AUTH_GENERIC_OAUTH_ENABLED: "true"
      GF_AUTH_GENERIC_OAUTH_NAME: "Keycloak"
      GF_AUTH_GENERIC_OAUTH_ALLOW_SIGN_UP: "true"
      GF_AUTH_GENERIC_OAUTH_SCOPES: "openid profile email groups"
      GF_AUTH_GENERIC_OAUTH_AUTH_URL: "https://sso.bstein.dev/realms/atlas/protocol/openid-connect/auth"
      GF_AUTH_GENERIC_OAUTH_TOKEN_URL: "https://sso.bstein.dev/realms/atlas/protocol/openid-connect/token"
      GF_AUTH_GENERIC_OAUTH_API_URL: "https://sso.bstein.dev/realms/atlas/protocol/openid-connect/userinfo"
      GF_AUTH_GENERIC_OAUTH_ROLE_ATTRIBUTE_PATH: "contains(groups, 'admin') && 'Admin' || 'Viewer'"
      GF_AUTH_GENERIC_OAUTH_TLS_SKIP_VERIFY_INSECURE: "false"
      GF_AUTH_SIGNOUT_REDIRECT_URL: "https://sso.bstein.dev/realms/atlas/protocol/openid-connect/logout?redirect_uri=https://metrics.bstein.dev/"
    envValueFrom:
      GF_AUTH_GENERIC_OAUTH_CLIENT_ID:
        secretKeyRef:
          name: grafana-oidc
          key: client_id
      GF_AUTH_GENERIC_OAUTH_CLIENT_SECRET:
        secretKeyRef:
          name: grafana-oidc
          key: client_secret
    grafana.ini:
      server:
        domain: metrics.bstein.dev
--- a/services/oauth2-proxy/deployment.yaml
+++ b/services/oauth2-proxy/deployment.yaml
@ -0,0 +1,82 @@
 # services/oauth2-proxy/deployment.yaml
 apiVersion: apps/v1
 kind: Deployment
 metadata:
  name: oauth2-proxy
  namespace: sso
  labels:
    app: oauth2-proxy
 spec:
  replicas: 2
  selector:
    matchLabels:
      app: oauth2-proxy
  template:
    metadata:
      labels:
        app: oauth2-proxy
    spec:
      nodeSelector:
        node-role.kubernetes.io/worker: "true"
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 90
              preference:
                matchExpressions:
                  - key: hardware
                    operator: In
                    values: ["rpi5","rpi4"]
      containers:
        - name: oauth2-proxy
          image: quay.io/oauth2-proxy/oauth2-proxy:v7.6.0
          imagePullPolicy: IfNotPresent
          args:
            - --provider=oidc
            - --redirect-url=https://auth.bstein.dev/oauth2/callback
            - --oidc-issuer-url=https://sso.bstein.dev/realms/atlas
            - --scope=openid profile email groups
            - --email-domain=*
            - --set-xauthrequest=true
            - --pass-access-token=true
            - --set-authorization-header=true
            - --cookie-secure=true
            - --cookie-samesite=lax
            - --cookie-refresh=20m
            - --cookie-expire=168h
            - --upstream=static://200
            - --http-address=0.0.0.0:4180
            - --skip-provider-button=true
            - --skip-jwt-bearer-tokens=true
            - --oidc-groups-claim=groups
          env:
            - name: OAUTH2_PROXY_CLIENT_ID
              valueFrom:
                secretKeyRef:
                  name: oauth2-proxy-oidc
                  key: client_id
            - name: OAUTH2_PROXY_CLIENT_SECRET
              valueFrom:
                secretKeyRef:
                  name: oauth2-proxy-oidc
                  key: client_secret
            - name: OAUTH2_PROXY_COOKIE_SECRET
              valueFrom:
                secretKeyRef:
                  name: oauth2-proxy-oidc
                  key: cookie_secret
          ports:
            - containerPort: 4180
              name: http
          readinessProbe:
            httpGet:
              path: /ping
              port: 4180
            initialDelaySeconds: 5
            periodSeconds: 10
          livenessProbe:
            httpGet:
              path: /ping
              port: 4180
            initialDelaySeconds: 20
            periodSeconds: 20
--- a/services/oauth2-proxy/ingress.yaml
+++ b/services/oauth2-proxy/ingress.yaml
@ -0,0 +1,25 @@
 # services/oauth2-proxy/ingress.yaml
 apiVersion: networking.k8s.io/v1
 kind: Ingress
 metadata:
  name: oauth2-proxy
  namespace: sso
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt
    traefik.ingress.kubernetes.io/router.middlewares: sso-oauth2-proxy-errors@kubernetescrd
 spec:
  ingressClassName: traefik
  rules:
    - host: auth.bstein.dev
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: oauth2-proxy
                port:
                  number: 80
  tls:
    - hosts: [auth.bstein.dev]
      secretName: auth-tls
--- a/services/oauth2-proxy/kustomization.yaml
+++ b/services/oauth2-proxy/kustomization.yaml
@ -0,0 +1,10 @@
 # services/oauth2-proxy/kustomization.yaml
 apiVersion: kustomize.config.k8s.io/v1beta1
 kind: Kustomization
 namespace: sso
 resources:
  - deployment.yaml
  - service.yaml
  - ingress.yaml
  - middleware.yaml
  - middleware-errors.yaml
--- a/services/oauth2-proxy/middleware-errors.yaml
+++ b/services/oauth2-proxy/middleware-errors.yaml
@ -0,0 +1,15 @@
 # services/oauth2-proxy/middleware-errors.yaml
 apiVersion: traefik.io/v1alpha1
 kind: Middleware
 metadata:
  name: oauth2-proxy-errors
  namespace: sso
 spec:
  errors:
    status:
      - "401"
      - "403"
    service:
      name: oauth2-proxy
      port: 80
    query: /oauth2/start?rd={url}
--- a/services/oauth2-proxy/middleware.yaml
+++ b/services/oauth2-proxy/middleware.yaml
@ -0,0 +1,15 @@
 # services/oauth2-proxy/middleware.yaml
 apiVersion: traefik.io/v1alpha1
 kind: Middleware
 metadata:
  name: oauth2-proxy-forward-auth
  namespace: sso
 spec:
  forwardAuth:
    address: http://oauth2-proxy.sso.svc.cluster.local:4180/oauth2/auth
    trustForwardHeader: true
    authResponseHeaders:
      - Authorization
      - X-Auth-Request-Email
      - X-Auth-Request-User
      - X-Auth-Request-Groups
--- a/services/oauth2-proxy/service.yaml
+++ b/services/oauth2-proxy/service.yaml
@ -0,0 +1,15 @@
 # services/oauth2-proxy/service.yaml
 apiVersion: v1
 kind: Service
 metadata:
  name: oauth2-proxy
  namespace: sso
  labels:
    app: oauth2-proxy
 spec:
  selector:
    app: oauth2-proxy
  ports:
    - name: http
      port: 80
      targetPort: 4180
--- a/services/vault/ingress.yaml
+++ b/services/vault/ingress.yaml
@ -7,7 +7,6 @@ metadata:
  annotations:
    kubernetes.io/ingress.class: traefik
    traefik.ingress.kubernetes.io/router.entrypoints: websecure
    traefik.ingress.kubernetes.io/router.middlewares: vault-vault-basicauth@kubernetescrd
    traefik.ingress.kubernetes.io/service.serversscheme: https
    traefik.ingress.kubernetes.io/service.serversTransport: vault-vault-to-https@kubernetescrd
 spec:
--- a/services/vault/kustomization.yaml
+++ b/services/vault/kustomization.yaml
@ -7,5 +7,4 @@ resources:
  - helmrelease.yaml
  - certificate.yaml
  - ingress.yaml
  - middleware.yaml
  - serverstransport.yaml
--- a/services/vault/middleware.yaml
+++ b/services/vault/middleware.yaml
@ -1,9 +0,0 @@
 # services/vault/middleware.yaml
 apiVersion: traefik.io/v1alpha1
 kind: Middleware
 metadata:
  name: vault-basicauth
  namespace: vault
 spec:
  basicAuth:
    secret: vault-basic-auth
Author	SHA1	Message	Date
bstein	9f226c1584	Merge pull request 'feature/sso' (#4 ) from feature/sso into main Reviewed-on: #4	2025-12-11 20:43:34 +00:00
Brad Stein	319b515882	zot: restore main branch config	2025-12-11 17:26:15 -03:00
Brad Stein	cb2b2ec1cd	zot: revert to unauthenticated registry	2025-12-11 17:22:16 -03:00
Brad Stein	20cd185c0b	vault: drop traefik basicauth	2025-12-11 17:09:05 -03:00
Brad Stein	2f368f6975	zot,vault: remove oauth2-proxy sso	2025-12-11 17:04:19 -03:00
Brad Stein	6c62d42f7a	longhorn/vault: gate via oauth2-proxy	2025-12-07 19:44:02 -03:00
Brad Stein	a7e9f1f7d8	auth: remove error middleware to allow redirect	2025-12-07 13:19:45 -03:00
Brad Stein	ceb692f7ee	oauth2-proxy: drop groups scope to avoid invalid_scope	2025-12-07 13:09:29 -03:00
Brad Stein	24fbaad040	auth: forward-auth via external auth host (svc traffic flaky)	2025-12-07 13:03:29 -03:00
Brad Stein	04aa32a762	oauth2-proxy: schedule on worker rpis	2025-12-07 12:49:38 -03:00
Brad Stein	25ee698021	oauth2-proxy: ensure error middleware on auth ingress	2025-12-07 12:03:14 -03:00
Brad Stein	4a089876ba	auth: use internal oauth2-proxy svc for forward-auth	2025-12-07 11:25:29 -03:00
Brad Stein	20bb776625	auth: add 401 redirect middleware to oauth2-proxy	2025-12-07 11:14:25 -03:00
Brad Stein	5e59f20bc3	auth: point forward-auth to external auth host	2025-12-07 11:09:09 -03:00
Brad Stein	dbede55ad4	oauth2-proxy: temporarily drop group restriction	2025-12-07 10:42:13 -03:00
Brad Stein	27e5c9391c	auth: add namespace-local forward-auth middlewares	2025-12-07 10:25:44 -03:00
Brad Stein	8d5e6c267c	auth: wire oauth2-proxy and enable grafana oidc	2025-12-07 02:01:21 -03:00
Brad Stein	a55502fe27	add oauth2-proxy for SSO forward-auth	2025-12-06 14:42:24 -03:00
Brad Stein	598bdfc727	keycloak: restrict to worker rpis with titan-24 fallback	2025-12-06 01:44:23 -03:00
Brad Stein	88c7a1c2aa	keycloak: require rpi nodes with titan-24 fallback	2025-12-06 01:40:24 -03:00
Brad Stein	f4da27271e	keycloak: prefer rpi nodes, avoid titan-24	2025-12-06 01:36:33 -03:00
Brad Stein	141c05b08f	keycloak: honor xforwarded headers and hostname url	2025-12-06 01:23:07 -03:00
Brad Stein	f0a8f6d35e	keycloak: enable health/metrics management port	2025-12-06 00:51:47 -03:00
Brad Stein	1b01052eda	keycloak: set fsGroup for data volume	2025-12-06 00:49:17 -03:00
Brad Stein	1d346edd28	keycloak: remove optimized flag for first start	2025-12-06 00:43:24 -03:00
Brad Stein	b14a9dcb98	chore: drop AGENTS.md from repo	2025-12-06 00:43:17 -03:00
Brad Stein	47caf08885	notes: capture GPU share change and flux branch	2025-12-03 12:28:45 -03:00
Brad Stein	0db149605d	monitoring: show GPU share over dashboard range	2025-12-02 20:28:35 -03:00
Brad Stein	f64e60c5a2	flux: add keycloak kustomization	2025-12-02 18:10:20 -03:00
Brad Stein	61c5db5c99	flux: track feature/sso	2025-12-02 18:00:49 -03:00
Brad Stein	2db550afdd	keycloak: add raw manifests backed by shared postgres	2025-12-02 17:58:19 -03:00