diff --git a/AGENTS.md b/AGENTS.md deleted file mode 100644 index a8d49c8..0000000 --- a/AGENTS.md +++ /dev/null @@ -1,68 +0,0 @@ - - -Repository Guidelines - -## Project Structure & Module Organization -- `infrastructure/`: cluster-scoped building blocks (core, flux-system, traefik, longhorn). Add new platform features by mirroring this layout. -- `services/`: workload manifests per app (`services/gitea/`, etc.) with `kustomization.yaml` plus one file per kind; keep diffs small and focused. -- `dockerfiles/` hosts bespoke images, while `scripts/` stores operational Fish/Bash helpers—extend these directories instead of relying on ad-hoc commands. - -## Build, Test, and Development Commands -- `kustomize build services/` (or `kubectl kustomize ...`) renders manifests exactly as Flux will. -- `kubectl apply --server-side --dry-run=client -k services/` checks schema compatibility without touching the cluster. -- `flux reconcile kustomization --namespace flux-system --with-source` pulls the latest Git state after merges or hotfixes. -- `fish scripts/flux_hammer.fish --help` explains the recovery tool; read it before running against production workloads. - -## Coding Style & Naming Conventions -- YAML uses two-space indents; retain the leading path comment (e.g. `# services/gitea/deployment.yaml`) to speed code review. -- Keep resource names lowercase kebab-case, align labels/selectors, and mirror namespaces with directory names. -- List resources in `kustomization.yaml` from namespace/config, through storage, then workloads and networking for predictable diffs. -- Scripts start with `#!/usr/bin/env fish` or bash, stay executable, and follow snake_case names such as `flux_hammer.fish`. - -## Testing Guidelines -- Run `kustomize build` and the dry-run apply for every service you touch; capture failures before opening a PR. -- `flux diff kustomization --path services/` previews reconciliations—link notable output when behavior shifts. -- Docker edits: `docker build -f dockerfiles/Dockerfile.monerod .` (swap the file you changed) to verify image builds. - -## Commit & Pull Request Guidelines -- Keep commit subjects short, present-tense, and optionally scoped (`gpu(titan-24): add RuntimeClass`); squash fixups before review. -- Describe linked issues, affected services, and required operator steps (e.g. `flux reconcile kustomization services-gitea`) in the PR body. -- Focus each PR on one kustomization or service and update `infrastructure/flux-system` when Flux must track new folders. -- Record the validation you ran (dry-runs, diffs, builds) and add screenshots only when ingress or UI behavior changes. - -## Security & Configuration Tips -- Never commit credentials; use Vault workflows (`services/vault/`) or SOPS-encrypted manifests wired through `infrastructure/flux-system`. -- Node selectors and tolerations gate workloads to hardware like `hardware: rpi4`; confirm labels before scaling or renaming nodes. -- Pin external images by digest or rely on Flux image automation to follow approved tags and avoid drift. - -## Dashboard roadmap / context (2025-12-02) -- Atlas dashboards are generated via `scripts/dashboards_render_atlas.py --build`, which writes JSON under `services/monitoring/dashboards/` and ConfigMaps under `services/monitoring/`. Keep the Grafana manifests in sync by regenerating after edits. -- Atlas Overview panels are paired with internal dashboards (pods, nodes, storage, network, GPU). A new `atlas-gpu` internal dashboard holds the detailed GPU metrics that feed the overview share pie. -- Old Grafana folders (`Atlas Storage`, `Atlas SRE`, `Atlas Public`, `Atlas Nodes`) should be removed in Grafana UI when convenient; only `Atlas Overview` and `Atlas Internal` should remain provisioned. -- Future work: add a separate generator (e.g., `dashboards_render_oceanus.py`) for SUI/oceanus validation dashboards, mirroring the atlas pattern of internal dashboards feeding a public overview. - -## Monitoring state (2025-12-03) -- dcgm-exporter DaemonSet pulls `registry.bstein.dev/monitoring/dcgm-exporter:4.4.2-4.7.0-ubuntu22.04` with nvidia runtime/imagePullSecret; titan-24 exports metrics, titan-22 remains NotReady. -- Atlas Overview is the Grafana home (1h range, 1m refresh), Overview folder UID `overview`, internal folder `atlas-internal` (oceanus-internal stub). -- Panels standardized via generator; hottest row compressed, worker/control rows taller, root disk row taller and top12 bar gauge with labels. GPU share pie uses 1h avg_over_time to persist idle activity. -- Internal dashboards are provisioned without Viewer role; if anonymous still sees them, restart Grafana and tighten auth if needed. - -## Upcoming priorities (SSO/storage/mail) -- Establish SSO (Keycloak or similar) and federate Grafana, Gitea, Zot, Nextcloud, Pegasus/Jellyfin; keep Vaultwarden separate until safe. -- Add Nextcloud (limit to rpi5 workers) with office suite; integrate with SSO; plan storage class and ingress. -- Plan mail: mostly self-hosted, relay through trusted provider for outbound; integrate with services (Nextcloud, Vaultwarden, etc.) for notifications and account flows. - -## SSO plan sketch (2025-12-03) -- IdP: use Keycloak (preferred) in a new `sso` namespace, Bitnami or codecentric chart with Postgres backing store (single PVC), ingress `sso.bstein.dev`, admin user bound to brad@bstein.dev; stick with local DB initially (no external IdP). -- Auth flow goals: Grafana (OIDC), Gitea (OAuth2/Keycloak), Zot (via Traefik forward-auth/oauth2-proxy), Jellyfin/Pegasus via Jellyfin OAuth/OpenID plugin (map existing usernames; run migration to pre-create users in Keycloak with same usernames/emails and temporary passwords), Pegasus keeps using Jellyfin tokens. -- Steps to implement: - 1) Add service folder `services/keycloak/` (namespace, PVC, HelmRelease, ingress, secret for admin creds). Verify with kustomize + Flux reconcile. - 2) Seed realm `atlas` with users (import CSV/realm). Create client for Grafana (public/implicit), Gitea (confidential), and a “jellyfin” client for the OAuth plugin; set email for brad@bstein.dev as admin. - 3) Reconfigure Grafana to OIDC (disable anonymous to internal folders, leave Overview public via folder permissions). Reconfigure Gitea to OIDC (app.ini). - 4) Add Traefik forward-auth (oauth2-proxy) in front of Zot and any other services needing headers-based auth. - 5) Deploy Jellyfin OpenID plugin; map Keycloak users to existing Jellyfin usernames; communicate password reset path. -- Migration caution: do not delete existing local creds until SSO validated; keep Pegasus working via Jellyfin tokens during transition. - -## Postgres centralization (2025-12-03) -- Prefer a shared in-cluster Postgres deployment with per-service databases to reduce resource sprawl on Pi nodes. Use it for services that can easily point at an external DB. -- Candidates to migrate to shared Postgres: Keycloak (realm DB), Gitea (git DB), Nextcloud (app DB), possibly Grafana (if persistence needed beyond current provisioner), Jitsi prosody/JVB state (if external DB supported). Keep tightly-coupled or lightweight embedded DBs as-is when migration is painful or not supported. diff --git a/clusters/atlas/flux-system/applications/keycloak/kustomization.yaml b/clusters/atlas/flux-system/applications/keycloak/kustomization.yaml new file mode 100644 index 0000000..4634b5c --- /dev/null +++ b/clusters/atlas/flux-system/applications/keycloak/kustomization.yaml @@ -0,0 +1,15 @@ +# clusters/atlas/flux-system/applications/keycloak/kustomization.yaml +apiVersion: kustomize.toolkit.fluxcd.io/v1 +kind: Kustomization +metadata: + name: keycloak + namespace: flux-system +spec: + interval: 10m + prune: true + sourceRef: + kind: GitRepository + name: flux-system + path: ./services/keycloak + targetNamespace: sso + timeout: 2m diff --git a/clusters/atlas/flux-system/applications/kustomization.yaml b/clusters/atlas/flux-system/applications/kustomization.yaml index 7d2f8ee..1bc2700 100644 --- a/clusters/atlas/flux-system/applications/kustomization.yaml +++ b/clusters/atlas/flux-system/applications/kustomization.yaml @@ -13,3 +13,5 @@ resources: - jellyfin/kustomization.yaml - xmr-miner/kustomization.yaml - sui-metrics/kustomization.yaml + - keycloak/kustomization.yaml + - oauth2-proxy/kustomization.yaml diff --git a/clusters/atlas/flux-system/applications/oauth2-proxy/kustomization.yaml b/clusters/atlas/flux-system/applications/oauth2-proxy/kustomization.yaml new file mode 100644 index 0000000..187572d --- /dev/null +++ b/clusters/atlas/flux-system/applications/oauth2-proxy/kustomization.yaml @@ -0,0 +1,15 @@ +# clusters/atlas/flux-system/applications/oauth2-proxy/kustomization.yaml +apiVersion: kustomize.toolkit.fluxcd.io/v1 +kind: Kustomization +metadata: + name: oauth2-proxy + namespace: flux-system +spec: + interval: 10m + prune: true + sourceRef: + kind: GitRepository + name: flux-system + path: ./services/oauth2-proxy + targetNamespace: sso + timeout: 2m diff --git a/clusters/atlas/flux-system/gotk-sync.yaml b/clusters/atlas/flux-system/gotk-sync.yaml index 46f65d3..4076ef6 100644 --- a/clusters/atlas/flux-system/gotk-sync.yaml +++ b/clusters/atlas/flux-system/gotk-sync.yaml @@ -8,7 +8,7 @@ metadata: spec: interval: 1m0s ref: - branch: feature/atlas-monitoring + branch: feature/sso secretRef: name: flux-system-gitea url: ssh://git@scm.bstein.dev:2242/bstein/titan-iac.git diff --git a/infrastructure/longhorn/ui-ingress/ingress.yaml b/infrastructure/longhorn/ui-ingress/ingress.yaml index 6250cfa..94daeed 100644 --- a/infrastructure/longhorn/ui-ingress/ingress.yaml +++ b/infrastructure/longhorn/ui-ingress/ingress.yaml @@ -7,7 +7,7 @@ metadata: annotations: traefik.ingress.kubernetes.io/router.entrypoints: websecure traefik.ingress.kubernetes.io/router.tls: "true" - traefik.ingress.kubernetes.io/router.middlewares: longhorn-system-longhorn-basicauth@kubernetescrd,longhorn-system-longhorn-headers@kubernetescrd + traefik.ingress.kubernetes.io/router.middlewares: "" spec: ingressClassName: traefik tls: @@ -21,6 +21,6 @@ spec: pathType: Prefix backend: service: - name: longhorn-frontend + name: oauth2-proxy-longhorn port: number: 80 diff --git a/infrastructure/longhorn/ui-ingress/kustomization.yaml b/infrastructure/longhorn/ui-ingress/kustomization.yaml index 1d497dc..a2ae5f3 100644 --- a/infrastructure/longhorn/ui-ingress/kustomization.yaml +++ b/infrastructure/longhorn/ui-ingress/kustomization.yaml @@ -4,3 +4,4 @@ kind: Kustomization resources: - middleware.yaml - ingress.yaml + - oauth2-proxy-longhorn.yaml diff --git a/infrastructure/longhorn/ui-ingress/middleware.yaml b/infrastructure/longhorn/ui-ingress/middleware.yaml index c670cef..3bf2ff5 100644 --- a/infrastructure/longhorn/ui-ingress/middleware.yaml +++ b/infrastructure/longhorn/ui-ingress/middleware.yaml @@ -20,3 +20,20 @@ spec: headers: customRequestHeaders: X-Forwarded-Proto: "https" + +--- + +apiVersion: traefik.io/v1alpha1 +kind: Middleware +metadata: + name: longhorn-forward-auth + namespace: longhorn-system +spec: + forwardAuth: + address: https://auth.bstein.dev/oauth2/auth + trustForwardHeader: true + authResponseHeaders: + - Authorization + - X-Auth-Request-Email + - X-Auth-Request-User + - X-Auth-Request-Groups diff --git a/infrastructure/longhorn/ui-ingress/oauth2-proxy-longhorn.yaml b/infrastructure/longhorn/ui-ingress/oauth2-proxy-longhorn.yaml new file mode 100644 index 0000000..b8d4f34 --- /dev/null +++ b/infrastructure/longhorn/ui-ingress/oauth2-proxy-longhorn.yaml @@ -0,0 +1,102 @@ +# infrastructure/longhorn/ui-ingress/oauth2-proxy-longhorn.yaml +apiVersion: v1 +kind: Service +metadata: + name: oauth2-proxy-longhorn + namespace: longhorn-system + labels: + app: oauth2-proxy-longhorn +spec: + ports: + - name: http + port: 80 + targetPort: 4180 + selector: + app: oauth2-proxy-longhorn + +--- + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: oauth2-proxy-longhorn + namespace: longhorn-system + labels: + app: oauth2-proxy-longhorn +spec: + replicas: 2 + selector: + matchLabels: + app: oauth2-proxy-longhorn + template: + metadata: + labels: + app: oauth2-proxy-longhorn + spec: + nodeSelector: + node-role.kubernetes.io/worker: "true" + affinity: + nodeAffinity: + preferredDuringSchedulingIgnoredDuringExecution: + - weight: 90 + preference: + matchExpressions: + - key: hardware + operator: In + values: ["rpi5","rpi4"] + containers: + - name: oauth2-proxy + image: quay.io/oauth2-proxy/oauth2-proxy:v7.6.0 + imagePullPolicy: IfNotPresent + args: + - --provider=oidc + - --redirect-url=https://longhorn.bstein.dev/oauth2/callback + - --oidc-issuer-url=https://sso.bstein.dev/realms/atlas + - --scope=openid profile email groups + - --email-domain=* + - --allowed-group=admin + - --set-xauthrequest=true + - --pass-access-token=true + - --set-authorization-header=true + - --cookie-secure=true + - --cookie-samesite=lax + - --cookie-refresh=20m + - --cookie-expire=168h + - --insecure-oidc-allow-unverified-email=true + - --upstream=http://longhorn-frontend.longhorn-system.svc.cluster.local + - --http-address=0.0.0.0:4180 + - --skip-provider-button=true + - --skip-jwt-bearer-tokens=true + - --oidc-groups-claim=groups + - --cookie-domain=longhorn.bstein.dev + env: + - name: OAUTH2_PROXY_CLIENT_ID + valueFrom: + secretKeyRef: + name: oauth2-proxy-longhorn-oidc + key: client_id + - name: OAUTH2_PROXY_CLIENT_SECRET + valueFrom: + secretKeyRef: + name: oauth2-proxy-longhorn-oidc + key: client_secret + - name: OAUTH2_PROXY_COOKIE_SECRET + valueFrom: + secretKeyRef: + name: oauth2-proxy-longhorn-oidc + key: cookie_secret + ports: + - containerPort: 4180 + name: http + readinessProbe: + httpGet: + path: /ping + port: 4180 + initialDelaySeconds: 5 + periodSeconds: 10 + livenessProbe: + httpGet: + path: /ping + port: 4180 + initialDelaySeconds: 20 + periodSeconds: 20 diff --git a/scripts/dashboards_render_atlas.py b/scripts/dashboards_render_atlas.py index 93de006..f577eab 100644 --- a/scripts/dashboards_render_atlas.py +++ b/scripts/dashboards_render_atlas.py @@ -232,7 +232,7 @@ NAMESPACE_GPU_ALLOC = ( ' or kube_pod_container_resource_limits{namespace!="",resource="nvidia.com/gpu"})) by (namespace)' ) NAMESPACE_GPU_USAGE_SHARE = ( - 'sum by (namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[1h]))' + 'sum by (namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[$__range]))' ) NAMESPACE_GPU_USAGE_INSTANT = 'sum(DCGM_FI_DEV_GPU_UTIL{namespace!="",pod!=""}) by (namespace)' NAMESPACE_GPU_RAW = ( diff --git a/services/keycloak/README.md b/services/keycloak/README.md new file mode 100644 index 0000000..bf7c21b --- /dev/null +++ b/services/keycloak/README.md @@ -0,0 +1,27 @@ +# services/keycloak + +Keycloak is deployed via raw manifests and backed by the shared Postgres (`postgres-service.postgres.svc.cluster.local:5432`). Create these secrets before applying: + +```bash +# DB creds (per-service DB/user in shared Postgres) +kubectl -n sso create secret generic keycloak-db \ + --from-literal=username=keycloak \ + --from-literal=password='' \ + --from-literal=database=keycloak + +# Admin console creds (maps to KC admin user) +kubectl -n sso create secret generic keycloak-admin \ + --from-literal=username=brad@bstein.dev \ + --from-literal=password='' +``` + +Apply: + +```bash +kubectl apply -k services/keycloak +``` + +Notes +- Service: `keycloak.sso.svc:80` (Ingress `sso.bstein.dev`, TLS via cert-manager). +- Uses Postgres schema `public`; DB/user should be provisioned in the shared Postgres instance. +- Health endpoints on :9000 are wired for probes. diff --git a/services/keycloak/deployment.yaml b/services/keycloak/deployment.yaml new file mode 100644 index 0000000..af7839f --- /dev/null +++ b/services/keycloak/deployment.yaml @@ -0,0 +1,132 @@ +# services/keycloak/deployment.yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: keycloak + namespace: sso + labels: + app: keycloak +spec: + replicas: 1 + selector: + matchLabels: + app: keycloak + template: + metadata: + labels: + app: keycloak + spec: + affinity: + nodeAffinity: + requiredDuringSchedulingIgnoredDuringExecution: + nodeSelectorTerms: + - matchExpressions: + - key: hardware + operator: In + values: ["rpi5","rpi4"] + - key: node-role.kubernetes.io/worker + operator: Exists + - matchExpressions: + - key: kubernetes.io/hostname + operator: In + values: ["titan-24"] + preferredDuringSchedulingIgnoredDuringExecution: + - weight: 90 + preference: + matchExpressions: + - key: hardware + operator: In + values: ["rpi5"] + - weight: 70 + preference: + matchExpressions: + - key: hardware + operator: In + values: ["rpi4"] + securityContext: + runAsUser: 1000 + runAsGroup: 0 + fsGroup: 1000 + fsGroupChangePolicy: OnRootMismatch + containers: + - name: keycloak + image: quay.io/keycloak/keycloak:26.0.7 + imagePullPolicy: IfNotPresent + args: + - start + env: + - name: KC_DB + value: postgres + - name: KC_DB_URL_HOST + value: postgres-service.postgres.svc.cluster.local + - name: KC_DB_URL_DATABASE + valueFrom: + secretKeyRef: + name: keycloak-db + key: database + - name: KC_DB_USERNAME + valueFrom: + secretKeyRef: + name: keycloak-db + key: username + - name: KC_DB_PASSWORD + valueFrom: + secretKeyRef: + name: keycloak-db + key: password + - name: KC_DB_SCHEMA + value: public + - name: KC_HOSTNAME + value: sso.bstein.dev + - name: KC_HOSTNAME_URL + value: https://sso.bstein.dev + - name: KC_PROXY + value: edge + - name: KC_PROXY_HEADERS + value: xforwarded + - name: KC_HTTP_ENABLED + value: "true" + - name: KC_HTTP_MANAGEMENT_PORT + value: "9000" + - name: KC_HTTP_MANAGEMENT_BIND_ADDRESS + value: 0.0.0.0 + - name: KC_HEALTH_ENABLED + value: "true" + - name: KC_METRICS_ENABLED + value: "true" + - name: KEYCLOAK_ADMIN + valueFrom: + secretKeyRef: + name: keycloak-admin + key: username + - name: KEYCLOAK_ADMIN_PASSWORD + valueFrom: + secretKeyRef: + name: keycloak-admin + key: password + ports: + - containerPort: 8080 + name: http + - containerPort: 9000 + name: metrics + readinessProbe: + httpGet: + path: /health/ready + port: 9000 + initialDelaySeconds: 15 + periodSeconds: 10 + failureThreshold: 6 + livenessProbe: + httpGet: + path: /health/live + port: 9000 + initialDelaySeconds: 60 + periodSeconds: 15 + failureThreshold: 6 + volumeMounts: + - name: data + mountPath: /opt/keycloak/data + volumes: + - name: data + persistentVolumeClaim: + claimName: keycloak-data diff --git a/services/keycloak/ingress.yaml b/services/keycloak/ingress.yaml new file mode 100644 index 0000000..39f6cb0 --- /dev/null +++ b/services/keycloak/ingress.yaml @@ -0,0 +1,24 @@ +# services/keycloak/ingress.yaml +apiVersion: networking.k8s.io/v1 +kind: Ingress +metadata: + name: keycloak + namespace: sso + annotations: + cert-manager.io/cluster-issuer: letsencrypt +spec: + ingressClassName: traefik + rules: + - host: sso.bstein.dev + http: + paths: + - path: / + pathType: Prefix + backend: + service: + name: keycloak + port: + number: 80 + tls: + - hosts: [sso.bstein.dev] + secretName: keycloak-tls diff --git a/services/keycloak/kustomization.yaml b/services/keycloak/kustomization.yaml new file mode 100644 index 0000000..a65715c --- /dev/null +++ b/services/keycloak/kustomization.yaml @@ -0,0 +1,10 @@ +# services/keycloak/kustomization.yaml +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization +namespace: sso +resources: + - namespace.yaml + - pvc.yaml + - deployment.yaml + - service.yaml + - ingress.yaml diff --git a/services/keycloak/namespace.yaml b/services/keycloak/namespace.yaml new file mode 100644 index 0000000..b4c731d --- /dev/null +++ b/services/keycloak/namespace.yaml @@ -0,0 +1,5 @@ +# services/keycloak/namespace.yaml +apiVersion: v1 +kind: Namespace +metadata: + name: sso diff --git a/services/keycloak/pvc.yaml b/services/keycloak/pvc.yaml new file mode 100644 index 0000000..b57ec61 --- /dev/null +++ b/services/keycloak/pvc.yaml @@ -0,0 +1,12 @@ +# services/keycloak/pvc.yaml +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: keycloak-data + namespace: sso +spec: + accessModes: ["ReadWriteOnce"] + resources: + requests: + storage: 10Gi + storageClassName: astreae diff --git a/services/keycloak/service.yaml b/services/keycloak/service.yaml new file mode 100644 index 0000000..5d93ef6 --- /dev/null +++ b/services/keycloak/service.yaml @@ -0,0 +1,15 @@ +# services/keycloak/service.yaml +apiVersion: v1 +kind: Service +metadata: + name: keycloak + namespace: sso + labels: + app: keycloak +spec: + selector: + app: keycloak + ports: + - name: http + port: 80 + targetPort: http diff --git a/services/monitoring/dashboards/atlas-gpu.json b/services/monitoring/dashboards/atlas-gpu.json index e67b3d2..9071b0a 100644 --- a/services/monitoring/dashboards/atlas-gpu.json +++ b/services/monitoring/dashboards/atlas-gpu.json @@ -20,7 +20,7 @@ }, "targets": [ { - "expr": "100 * ( ( (sum by (namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[1h]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ) / clamp_min(sum( ( (sum by (namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[1h]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ), 1)", + "expr": "100 * ( ( (sum by (namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[$__range]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ) / clamp_min(sum( ( (sum by (namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[$__range]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ), 1)", "refId": "A", "legendFormat": "{{namespace}}" } diff --git a/services/monitoring/dashboards/atlas-overview.json b/services/monitoring/dashboards/atlas-overview.json index 9eda81d..beb676e 100644 --- a/services/monitoring/dashboards/atlas-overview.json +++ b/services/monitoring/dashboards/atlas-overview.json @@ -975,7 +975,7 @@ }, "targets": [ { - "expr": "100 * ( ( (sum by (namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[1h]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ) / clamp_min(sum( ( (sum by (namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[1h]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ), 1)", + "expr": "100 * ( ( (sum by (namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[$__range]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ) / clamp_min(sum( ( (sum by (namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[$__range]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ), 1)", "refId": "A", "legendFormat": "{{namespace}}" } diff --git a/services/monitoring/grafana-dashboard-gpu.yaml b/services/monitoring/grafana-dashboard-gpu.yaml index 3af8717..b5c2c18 100644 --- a/services/monitoring/grafana-dashboard-gpu.yaml +++ b/services/monitoring/grafana-dashboard-gpu.yaml @@ -29,7 +29,7 @@ data: }, "targets": [ { - "expr": "100 * ( ( (sum by (namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[1h]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ) / clamp_min(sum( ( (sum by (namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[1h]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ), 1)", + "expr": "100 * ( ( (sum by (namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[$__range]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ) / clamp_min(sum( ( (sum by (namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[$__range]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ), 1)", "refId": "A", "legendFormat": "{{namespace}}" } diff --git a/services/monitoring/grafana-dashboard-overview.yaml b/services/monitoring/grafana-dashboard-overview.yaml index 928098e..ef17ebf 100644 --- a/services/monitoring/grafana-dashboard-overview.yaml +++ b/services/monitoring/grafana-dashboard-overview.yaml @@ -984,7 +984,7 @@ data: }, "targets": [ { - "expr": "100 * ( ( (sum by (namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[1h]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ) / clamp_min(sum( ( (sum by (namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[1h]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ), 1)", + "expr": "100 * ( ( (sum by (namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[$__range]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ) / clamp_min(sum( ( (sum by (namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[$__range]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ), 1)", "refId": "A", "legendFormat": "{{namespace}}" } diff --git a/services/monitoring/helmrelease.yaml b/services/monitoring/helmrelease.yaml index 2546dc1..d7d7579 100644 --- a/services/monitoring/helmrelease.yaml +++ b/services/monitoring/helmrelease.yaml @@ -249,9 +249,27 @@ spec: service: type: ClusterIP env: - GF_AUTH_ANONYMOUS_ENABLED: "true" - GF_AUTH_ANONYMOUS_ORG_ROLE: Viewer + GF_AUTH_ANONYMOUS_ENABLED: "false" GF_SECURITY_ALLOW_EMBEDDING: "true" + GF_AUTH_GENERIC_OAUTH_ENABLED: "true" + GF_AUTH_GENERIC_OAUTH_NAME: "Keycloak" + GF_AUTH_GENERIC_OAUTH_ALLOW_SIGN_UP: "true" + GF_AUTH_GENERIC_OAUTH_SCOPES: "openid profile email groups" + GF_AUTH_GENERIC_OAUTH_AUTH_URL: "https://sso.bstein.dev/realms/atlas/protocol/openid-connect/auth" + GF_AUTH_GENERIC_OAUTH_TOKEN_URL: "https://sso.bstein.dev/realms/atlas/protocol/openid-connect/token" + GF_AUTH_GENERIC_OAUTH_API_URL: "https://sso.bstein.dev/realms/atlas/protocol/openid-connect/userinfo" + GF_AUTH_GENERIC_OAUTH_ROLE_ATTRIBUTE_PATH: "contains(groups, 'admin') && 'Admin' || 'Viewer'" + GF_AUTH_GENERIC_OAUTH_TLS_SKIP_VERIFY_INSECURE: "false" + GF_AUTH_SIGNOUT_REDIRECT_URL: "https://sso.bstein.dev/realms/atlas/protocol/openid-connect/logout?redirect_uri=https://metrics.bstein.dev/" + envValueFrom: + GF_AUTH_GENERIC_OAUTH_CLIENT_ID: + secretKeyRef: + name: grafana-oidc + key: client_id + GF_AUTH_GENERIC_OAUTH_CLIENT_SECRET: + secretKeyRef: + name: grafana-oidc + key: client_secret grafana.ini: server: domain: metrics.bstein.dev diff --git a/services/oauth2-proxy/deployment.yaml b/services/oauth2-proxy/deployment.yaml new file mode 100644 index 0000000..7c22a93 --- /dev/null +++ b/services/oauth2-proxy/deployment.yaml @@ -0,0 +1,82 @@ +# services/oauth2-proxy/deployment.yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: oauth2-proxy + namespace: sso + labels: + app: oauth2-proxy +spec: + replicas: 2 + selector: + matchLabels: + app: oauth2-proxy + template: + metadata: + labels: + app: oauth2-proxy + spec: + nodeSelector: + node-role.kubernetes.io/worker: "true" + affinity: + nodeAffinity: + preferredDuringSchedulingIgnoredDuringExecution: + - weight: 90 + preference: + matchExpressions: + - key: hardware + operator: In + values: ["rpi5","rpi4"] + containers: + - name: oauth2-proxy + image: quay.io/oauth2-proxy/oauth2-proxy:v7.6.0 + imagePullPolicy: IfNotPresent + args: + - --provider=oidc + - --redirect-url=https://auth.bstein.dev/oauth2/callback + - --oidc-issuer-url=https://sso.bstein.dev/realms/atlas + - --scope=openid profile email groups + - --email-domain=* + - --set-xauthrequest=true + - --pass-access-token=true + - --set-authorization-header=true + - --cookie-secure=true + - --cookie-samesite=lax + - --cookie-refresh=20m + - --cookie-expire=168h + - --upstream=static://200 + - --http-address=0.0.0.0:4180 + - --skip-provider-button=true + - --skip-jwt-bearer-tokens=true + - --oidc-groups-claim=groups + env: + - name: OAUTH2_PROXY_CLIENT_ID + valueFrom: + secretKeyRef: + name: oauth2-proxy-oidc + key: client_id + - name: OAUTH2_PROXY_CLIENT_SECRET + valueFrom: + secretKeyRef: + name: oauth2-proxy-oidc + key: client_secret + - name: OAUTH2_PROXY_COOKIE_SECRET + valueFrom: + secretKeyRef: + name: oauth2-proxy-oidc + key: cookie_secret + ports: + - containerPort: 4180 + name: http + readinessProbe: + httpGet: + path: /ping + port: 4180 + initialDelaySeconds: 5 + periodSeconds: 10 + livenessProbe: + httpGet: + path: /ping + port: 4180 + initialDelaySeconds: 20 + periodSeconds: 20 diff --git a/services/oauth2-proxy/ingress.yaml b/services/oauth2-proxy/ingress.yaml new file mode 100644 index 0000000..0f5830c --- /dev/null +++ b/services/oauth2-proxy/ingress.yaml @@ -0,0 +1,25 @@ +# services/oauth2-proxy/ingress.yaml +apiVersion: networking.k8s.io/v1 +kind: Ingress +metadata: + name: oauth2-proxy + namespace: sso + annotations: + cert-manager.io/cluster-issuer: letsencrypt + traefik.ingress.kubernetes.io/router.middlewares: sso-oauth2-proxy-errors@kubernetescrd +spec: + ingressClassName: traefik + rules: + - host: auth.bstein.dev + http: + paths: + - path: / + pathType: Prefix + backend: + service: + name: oauth2-proxy + port: + number: 80 + tls: + - hosts: [auth.bstein.dev] + secretName: auth-tls diff --git a/services/oauth2-proxy/kustomization.yaml b/services/oauth2-proxy/kustomization.yaml new file mode 100644 index 0000000..ff4705a --- /dev/null +++ b/services/oauth2-proxy/kustomization.yaml @@ -0,0 +1,10 @@ +# services/oauth2-proxy/kustomization.yaml +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization +namespace: sso +resources: + - deployment.yaml + - service.yaml + - ingress.yaml + - middleware.yaml + - middleware-errors.yaml diff --git a/services/oauth2-proxy/middleware-errors.yaml b/services/oauth2-proxy/middleware-errors.yaml new file mode 100644 index 0000000..55e092a --- /dev/null +++ b/services/oauth2-proxy/middleware-errors.yaml @@ -0,0 +1,15 @@ +# services/oauth2-proxy/middleware-errors.yaml +apiVersion: traefik.io/v1alpha1 +kind: Middleware +metadata: + name: oauth2-proxy-errors + namespace: sso +spec: + errors: + status: + - "401" + - "403" + service: + name: oauth2-proxy + port: 80 + query: /oauth2/start?rd={url} diff --git a/services/oauth2-proxy/middleware.yaml b/services/oauth2-proxy/middleware.yaml new file mode 100644 index 0000000..db5f3a4 --- /dev/null +++ b/services/oauth2-proxy/middleware.yaml @@ -0,0 +1,15 @@ +# services/oauth2-proxy/middleware.yaml +apiVersion: traefik.io/v1alpha1 +kind: Middleware +metadata: + name: oauth2-proxy-forward-auth + namespace: sso +spec: + forwardAuth: + address: http://oauth2-proxy.sso.svc.cluster.local:4180/oauth2/auth + trustForwardHeader: true + authResponseHeaders: + - Authorization + - X-Auth-Request-Email + - X-Auth-Request-User + - X-Auth-Request-Groups diff --git a/services/oauth2-proxy/service.yaml b/services/oauth2-proxy/service.yaml new file mode 100644 index 0000000..1eb5481 --- /dev/null +++ b/services/oauth2-proxy/service.yaml @@ -0,0 +1,15 @@ +# services/oauth2-proxy/service.yaml +apiVersion: v1 +kind: Service +metadata: + name: oauth2-proxy + namespace: sso + labels: + app: oauth2-proxy +spec: + selector: + app: oauth2-proxy + ports: + - name: http + port: 80 + targetPort: 4180 diff --git a/services/vault/ingress.yaml b/services/vault/ingress.yaml index 306556d..91d9ca4 100644 --- a/services/vault/ingress.yaml +++ b/services/vault/ingress.yaml @@ -7,7 +7,6 @@ metadata: annotations: kubernetes.io/ingress.class: traefik traefik.ingress.kubernetes.io/router.entrypoints: websecure - traefik.ingress.kubernetes.io/router.middlewares: vault-vault-basicauth@kubernetescrd traefik.ingress.kubernetes.io/service.serversscheme: https traefik.ingress.kubernetes.io/service.serversTransport: vault-vault-to-https@kubernetescrd spec: diff --git a/services/vault/kustomization.yaml b/services/vault/kustomization.yaml index 4c3fbc5..1d7af87 100644 --- a/services/vault/kustomization.yaml +++ b/services/vault/kustomization.yaml @@ -7,5 +7,4 @@ resources: - helmrelease.yaml - certificate.yaml - ingress.yaml - - middleware.yaml - serverstransport.yaml diff --git a/services/vault/middleware.yaml b/services/vault/middleware.yaml deleted file mode 100644 index 0a41961..0000000 --- a/services/vault/middleware.yaml +++ /dev/null @@ -1,9 +0,0 @@ -# services/vault/middleware.yaml -apiVersion: traefik.io/v1alpha1 -kind: Middleware -metadata: - name: vault-basicauth - namespace: vault -spec: - basicAuth: - secret: vault-basic-auth