Compare commits
31 Commits
65d389193f
...
9f226c1584
| Author | SHA1 | Date | |
|---|---|---|---|
| 9f226c1584 | |||
| 319b515882 | |||
| cb2b2ec1cd | |||
| 20cd185c0b | |||
| 2f368f6975 | |||
| 6c62d42f7a | |||
| a7e9f1f7d8 | |||
| ceb692f7ee | |||
| 24fbaad040 | |||
| 04aa32a762 | |||
| 25ee698021 | |||
| 4a089876ba | |||
| 20bb776625 | |||
| 5e59f20bc3 | |||
| dbede55ad4 | |||
| 27e5c9391c | |||
| 8d5e6c267c | |||
| a55502fe27 | |||
| 598bdfc727 | |||
| 88c7a1c2aa | |||
| f4da27271e | |||
| 141c05b08f | |||
| f0a8f6d35e | |||
| 1b01052eda | |||
| 1d346edd28 | |||
| b14a9dcb98 | |||
| 47caf08885 | |||
| 0db149605d | |||
| f64e60c5a2 | |||
| 61c5db5c99 | |||
| 2db550afdd |
68
AGENTS.md
68
AGENTS.md
@ -1,68 +0,0 @@
|
|||||||
|
|
||||||
|
|
||||||
Repository Guidelines
|
|
||||||
|
|
||||||
## Project Structure & Module Organization
|
|
||||||
- `infrastructure/`: cluster-scoped building blocks (core, flux-system, traefik, longhorn). Add new platform features by mirroring this layout.
|
|
||||||
- `services/`: workload manifests per app (`services/gitea/`, etc.) with `kustomization.yaml` plus one file per kind; keep diffs small and focused.
|
|
||||||
- `dockerfiles/` hosts bespoke images, while `scripts/` stores operational Fish/Bash helpers—extend these directories instead of relying on ad-hoc commands.
|
|
||||||
|
|
||||||
## Build, Test, and Development Commands
|
|
||||||
- `kustomize build services/<app>` (or `kubectl kustomize ...`) renders manifests exactly as Flux will.
|
|
||||||
- `kubectl apply --server-side --dry-run=client -k services/<app>` checks schema compatibility without touching the cluster.
|
|
||||||
- `flux reconcile kustomization <name> --namespace flux-system --with-source` pulls the latest Git state after merges or hotfixes.
|
|
||||||
- `fish scripts/flux_hammer.fish --help` explains the recovery tool; read it before running against production workloads.
|
|
||||||
|
|
||||||
## Coding Style & Naming Conventions
|
|
||||||
- YAML uses two-space indents; retain the leading path comment (e.g. `# services/gitea/deployment.yaml`) to speed code review.
|
|
||||||
- Keep resource names lowercase kebab-case, align labels/selectors, and mirror namespaces with directory names.
|
|
||||||
- List resources in `kustomization.yaml` from namespace/config, through storage, then workloads and networking for predictable diffs.
|
|
||||||
- Scripts start with `#!/usr/bin/env fish` or bash, stay executable, and follow snake_case names such as `flux_hammer.fish`.
|
|
||||||
|
|
||||||
## Testing Guidelines
|
|
||||||
- Run `kustomize build` and the dry-run apply for every service you touch; capture failures before opening a PR.
|
|
||||||
- `flux diff kustomization <name> --path services/<app>` previews reconciliations—link notable output when behavior shifts.
|
|
||||||
- Docker edits: `docker build -f dockerfiles/Dockerfile.monerod .` (swap the file you changed) to verify image builds.
|
|
||||||
|
|
||||||
## Commit & Pull Request Guidelines
|
|
||||||
- Keep commit subjects short, present-tense, and optionally scoped (`gpu(titan-24): add RuntimeClass`); squash fixups before review.
|
|
||||||
- Describe linked issues, affected services, and required operator steps (e.g. `flux reconcile kustomization services-gitea`) in the PR body.
|
|
||||||
- Focus each PR on one kustomization or service and update `infrastructure/flux-system` when Flux must track new folders.
|
|
||||||
- Record the validation you ran (dry-runs, diffs, builds) and add screenshots only when ingress or UI behavior changes.
|
|
||||||
|
|
||||||
## Security & Configuration Tips
|
|
||||||
- Never commit credentials; use Vault workflows (`services/vault/`) or SOPS-encrypted manifests wired through `infrastructure/flux-system`.
|
|
||||||
- Node selectors and tolerations gate workloads to hardware like `hardware: rpi4`; confirm labels before scaling or renaming nodes.
|
|
||||||
- Pin external images by digest or rely on Flux image automation to follow approved tags and avoid drift.
|
|
||||||
|
|
||||||
## Dashboard roadmap / context (2025-12-02)
|
|
||||||
- Atlas dashboards are generated via `scripts/dashboards_render_atlas.py --build`, which writes JSON under `services/monitoring/dashboards/` and ConfigMaps under `services/monitoring/`. Keep the Grafana manifests in sync by regenerating after edits.
|
|
||||||
- Atlas Overview panels are paired with internal dashboards (pods, nodes, storage, network, GPU). A new `atlas-gpu` internal dashboard holds the detailed GPU metrics that feed the overview share pie.
|
|
||||||
- Old Grafana folders (`Atlas Storage`, `Atlas SRE`, `Atlas Public`, `Atlas Nodes`) should be removed in Grafana UI when convenient; only `Atlas Overview` and `Atlas Internal` should remain provisioned.
|
|
||||||
- Future work: add a separate generator (e.g., `dashboards_render_oceanus.py`) for SUI/oceanus validation dashboards, mirroring the atlas pattern of internal dashboards feeding a public overview.
|
|
||||||
|
|
||||||
## Monitoring state (2025-12-03)
|
|
||||||
- dcgm-exporter DaemonSet pulls `registry.bstein.dev/monitoring/dcgm-exporter:4.4.2-4.7.0-ubuntu22.04` with nvidia runtime/imagePullSecret; titan-24 exports metrics, titan-22 remains NotReady.
|
|
||||||
- Atlas Overview is the Grafana home (1h range, 1m refresh), Overview folder UID `overview`, internal folder `atlas-internal` (oceanus-internal stub).
|
|
||||||
- Panels standardized via generator; hottest row compressed, worker/control rows taller, root disk row taller and top12 bar gauge with labels. GPU share pie uses 1h avg_over_time to persist idle activity.
|
|
||||||
- Internal dashboards are provisioned without Viewer role; if anonymous still sees them, restart Grafana and tighten auth if needed.
|
|
||||||
|
|
||||||
## Upcoming priorities (SSO/storage/mail)
|
|
||||||
- Establish SSO (Keycloak or similar) and federate Grafana, Gitea, Zot, Nextcloud, Pegasus/Jellyfin; keep Vaultwarden separate until safe.
|
|
||||||
- Add Nextcloud (limit to rpi5 workers) with office suite; integrate with SSO; plan storage class and ingress.
|
|
||||||
- Plan mail: mostly self-hosted, relay through trusted provider for outbound; integrate with services (Nextcloud, Vaultwarden, etc.) for notifications and account flows.
|
|
||||||
|
|
||||||
## SSO plan sketch (2025-12-03)
|
|
||||||
- IdP: use Keycloak (preferred) in a new `sso` namespace, Bitnami or codecentric chart with Postgres backing store (single PVC), ingress `sso.bstein.dev`, admin user bound to brad@bstein.dev; stick with local DB initially (no external IdP).
|
|
||||||
- Auth flow goals: Grafana (OIDC), Gitea (OAuth2/Keycloak), Zot (via Traefik forward-auth/oauth2-proxy), Jellyfin/Pegasus via Jellyfin OAuth/OpenID plugin (map existing usernames; run migration to pre-create users in Keycloak with same usernames/emails and temporary passwords), Pegasus keeps using Jellyfin tokens.
|
|
||||||
- Steps to implement:
|
|
||||||
1) Add service folder `services/keycloak/` (namespace, PVC, HelmRelease, ingress, secret for admin creds). Verify with kustomize + Flux reconcile.
|
|
||||||
2) Seed realm `atlas` with users (import CSV/realm). Create client for Grafana (public/implicit), Gitea (confidential), and a “jellyfin” client for the OAuth plugin; set email for brad@bstein.dev as admin.
|
|
||||||
3) Reconfigure Grafana to OIDC (disable anonymous to internal folders, leave Overview public via folder permissions). Reconfigure Gitea to OIDC (app.ini).
|
|
||||||
4) Add Traefik forward-auth (oauth2-proxy) in front of Zot and any other services needing headers-based auth.
|
|
||||||
5) Deploy Jellyfin OpenID plugin; map Keycloak users to existing Jellyfin usernames; communicate password reset path.
|
|
||||||
- Migration caution: do not delete existing local creds until SSO validated; keep Pegasus working via Jellyfin tokens during transition.
|
|
||||||
|
|
||||||
## Postgres centralization (2025-12-03)
|
|
||||||
- Prefer a shared in-cluster Postgres deployment with per-service databases to reduce resource sprawl on Pi nodes. Use it for services that can easily point at an external DB.
|
|
||||||
- Candidates to migrate to shared Postgres: Keycloak (realm DB), Gitea (git DB), Nextcloud (app DB), possibly Grafana (if persistence needed beyond current provisioner), Jitsi prosody/JVB state (if external DB supported). Keep tightly-coupled or lightweight embedded DBs as-is when migration is painful or not supported.
|
|
||||||
@ -0,0 +1,15 @@
|
|||||||
|
# clusters/atlas/flux-system/applications/keycloak/kustomization.yaml
|
||||||
|
apiVersion: kustomize.toolkit.fluxcd.io/v1
|
||||||
|
kind: Kustomization
|
||||||
|
metadata:
|
||||||
|
name: keycloak
|
||||||
|
namespace: flux-system
|
||||||
|
spec:
|
||||||
|
interval: 10m
|
||||||
|
prune: true
|
||||||
|
sourceRef:
|
||||||
|
kind: GitRepository
|
||||||
|
name: flux-system
|
||||||
|
path: ./services/keycloak
|
||||||
|
targetNamespace: sso
|
||||||
|
timeout: 2m
|
||||||
@ -13,3 +13,5 @@ resources:
|
|||||||
- jellyfin/kustomization.yaml
|
- jellyfin/kustomization.yaml
|
||||||
- xmr-miner/kustomization.yaml
|
- xmr-miner/kustomization.yaml
|
||||||
- sui-metrics/kustomization.yaml
|
- sui-metrics/kustomization.yaml
|
||||||
|
- keycloak/kustomization.yaml
|
||||||
|
- oauth2-proxy/kustomization.yaml
|
||||||
|
|||||||
@ -0,0 +1,15 @@
|
|||||||
|
# clusters/atlas/flux-system/applications/oauth2-proxy/kustomization.yaml
|
||||||
|
apiVersion: kustomize.toolkit.fluxcd.io/v1
|
||||||
|
kind: Kustomization
|
||||||
|
metadata:
|
||||||
|
name: oauth2-proxy
|
||||||
|
namespace: flux-system
|
||||||
|
spec:
|
||||||
|
interval: 10m
|
||||||
|
prune: true
|
||||||
|
sourceRef:
|
||||||
|
kind: GitRepository
|
||||||
|
name: flux-system
|
||||||
|
path: ./services/oauth2-proxy
|
||||||
|
targetNamespace: sso
|
||||||
|
timeout: 2m
|
||||||
@ -8,7 +8,7 @@ metadata:
|
|||||||
spec:
|
spec:
|
||||||
interval: 1m0s
|
interval: 1m0s
|
||||||
ref:
|
ref:
|
||||||
branch: feature/atlas-monitoring
|
branch: feature/sso
|
||||||
secretRef:
|
secretRef:
|
||||||
name: flux-system-gitea
|
name: flux-system-gitea
|
||||||
url: ssh://git@scm.bstein.dev:2242/bstein/titan-iac.git
|
url: ssh://git@scm.bstein.dev:2242/bstein/titan-iac.git
|
||||||
|
|||||||
@ -7,7 +7,7 @@ metadata:
|
|||||||
annotations:
|
annotations:
|
||||||
traefik.ingress.kubernetes.io/router.entrypoints: websecure
|
traefik.ingress.kubernetes.io/router.entrypoints: websecure
|
||||||
traefik.ingress.kubernetes.io/router.tls: "true"
|
traefik.ingress.kubernetes.io/router.tls: "true"
|
||||||
traefik.ingress.kubernetes.io/router.middlewares: longhorn-system-longhorn-basicauth@kubernetescrd,longhorn-system-longhorn-headers@kubernetescrd
|
traefik.ingress.kubernetes.io/router.middlewares: ""
|
||||||
spec:
|
spec:
|
||||||
ingressClassName: traefik
|
ingressClassName: traefik
|
||||||
tls:
|
tls:
|
||||||
@ -21,6 +21,6 @@ spec:
|
|||||||
pathType: Prefix
|
pathType: Prefix
|
||||||
backend:
|
backend:
|
||||||
service:
|
service:
|
||||||
name: longhorn-frontend
|
name: oauth2-proxy-longhorn
|
||||||
port:
|
port:
|
||||||
number: 80
|
number: 80
|
||||||
|
|||||||
@ -4,3 +4,4 @@ kind: Kustomization
|
|||||||
resources:
|
resources:
|
||||||
- middleware.yaml
|
- middleware.yaml
|
||||||
- ingress.yaml
|
- ingress.yaml
|
||||||
|
- oauth2-proxy-longhorn.yaml
|
||||||
|
|||||||
@ -20,3 +20,20 @@ spec:
|
|||||||
headers:
|
headers:
|
||||||
customRequestHeaders:
|
customRequestHeaders:
|
||||||
X-Forwarded-Proto: "https"
|
X-Forwarded-Proto: "https"
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
apiVersion: traefik.io/v1alpha1
|
||||||
|
kind: Middleware
|
||||||
|
metadata:
|
||||||
|
name: longhorn-forward-auth
|
||||||
|
namespace: longhorn-system
|
||||||
|
spec:
|
||||||
|
forwardAuth:
|
||||||
|
address: https://auth.bstein.dev/oauth2/auth
|
||||||
|
trustForwardHeader: true
|
||||||
|
authResponseHeaders:
|
||||||
|
- Authorization
|
||||||
|
- X-Auth-Request-Email
|
||||||
|
- X-Auth-Request-User
|
||||||
|
- X-Auth-Request-Groups
|
||||||
|
|||||||
102
infrastructure/longhorn/ui-ingress/oauth2-proxy-longhorn.yaml
Normal file
102
infrastructure/longhorn/ui-ingress/oauth2-proxy-longhorn.yaml
Normal file
@ -0,0 +1,102 @@
|
|||||||
|
# infrastructure/longhorn/ui-ingress/oauth2-proxy-longhorn.yaml
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Service
|
||||||
|
metadata:
|
||||||
|
name: oauth2-proxy-longhorn
|
||||||
|
namespace: longhorn-system
|
||||||
|
labels:
|
||||||
|
app: oauth2-proxy-longhorn
|
||||||
|
spec:
|
||||||
|
ports:
|
||||||
|
- name: http
|
||||||
|
port: 80
|
||||||
|
targetPort: 4180
|
||||||
|
selector:
|
||||||
|
app: oauth2-proxy-longhorn
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
apiVersion: apps/v1
|
||||||
|
kind: Deployment
|
||||||
|
metadata:
|
||||||
|
name: oauth2-proxy-longhorn
|
||||||
|
namespace: longhorn-system
|
||||||
|
labels:
|
||||||
|
app: oauth2-proxy-longhorn
|
||||||
|
spec:
|
||||||
|
replicas: 2
|
||||||
|
selector:
|
||||||
|
matchLabels:
|
||||||
|
app: oauth2-proxy-longhorn
|
||||||
|
template:
|
||||||
|
metadata:
|
||||||
|
labels:
|
||||||
|
app: oauth2-proxy-longhorn
|
||||||
|
spec:
|
||||||
|
nodeSelector:
|
||||||
|
node-role.kubernetes.io/worker: "true"
|
||||||
|
affinity:
|
||||||
|
nodeAffinity:
|
||||||
|
preferredDuringSchedulingIgnoredDuringExecution:
|
||||||
|
- weight: 90
|
||||||
|
preference:
|
||||||
|
matchExpressions:
|
||||||
|
- key: hardware
|
||||||
|
operator: In
|
||||||
|
values: ["rpi5","rpi4"]
|
||||||
|
containers:
|
||||||
|
- name: oauth2-proxy
|
||||||
|
image: quay.io/oauth2-proxy/oauth2-proxy:v7.6.0
|
||||||
|
imagePullPolicy: IfNotPresent
|
||||||
|
args:
|
||||||
|
- --provider=oidc
|
||||||
|
- --redirect-url=https://longhorn.bstein.dev/oauth2/callback
|
||||||
|
- --oidc-issuer-url=https://sso.bstein.dev/realms/atlas
|
||||||
|
- --scope=openid profile email groups
|
||||||
|
- --email-domain=*
|
||||||
|
- --allowed-group=admin
|
||||||
|
- --set-xauthrequest=true
|
||||||
|
- --pass-access-token=true
|
||||||
|
- --set-authorization-header=true
|
||||||
|
- --cookie-secure=true
|
||||||
|
- --cookie-samesite=lax
|
||||||
|
- --cookie-refresh=20m
|
||||||
|
- --cookie-expire=168h
|
||||||
|
- --insecure-oidc-allow-unverified-email=true
|
||||||
|
- --upstream=http://longhorn-frontend.longhorn-system.svc.cluster.local
|
||||||
|
- --http-address=0.0.0.0:4180
|
||||||
|
- --skip-provider-button=true
|
||||||
|
- --skip-jwt-bearer-tokens=true
|
||||||
|
- --oidc-groups-claim=groups
|
||||||
|
- --cookie-domain=longhorn.bstein.dev
|
||||||
|
env:
|
||||||
|
- name: OAUTH2_PROXY_CLIENT_ID
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: oauth2-proxy-longhorn-oidc
|
||||||
|
key: client_id
|
||||||
|
- name: OAUTH2_PROXY_CLIENT_SECRET
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: oauth2-proxy-longhorn-oidc
|
||||||
|
key: client_secret
|
||||||
|
- name: OAUTH2_PROXY_COOKIE_SECRET
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: oauth2-proxy-longhorn-oidc
|
||||||
|
key: cookie_secret
|
||||||
|
ports:
|
||||||
|
- containerPort: 4180
|
||||||
|
name: http
|
||||||
|
readinessProbe:
|
||||||
|
httpGet:
|
||||||
|
path: /ping
|
||||||
|
port: 4180
|
||||||
|
initialDelaySeconds: 5
|
||||||
|
periodSeconds: 10
|
||||||
|
livenessProbe:
|
||||||
|
httpGet:
|
||||||
|
path: /ping
|
||||||
|
port: 4180
|
||||||
|
initialDelaySeconds: 20
|
||||||
|
periodSeconds: 20
|
||||||
@ -232,7 +232,7 @@ NAMESPACE_GPU_ALLOC = (
|
|||||||
' or kube_pod_container_resource_limits{namespace!="",resource="nvidia.com/gpu"})) by (namespace)'
|
' or kube_pod_container_resource_limits{namespace!="",resource="nvidia.com/gpu"})) by (namespace)'
|
||||||
)
|
)
|
||||||
NAMESPACE_GPU_USAGE_SHARE = (
|
NAMESPACE_GPU_USAGE_SHARE = (
|
||||||
'sum by (namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[1h]))'
|
'sum by (namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[$__range]))'
|
||||||
)
|
)
|
||||||
NAMESPACE_GPU_USAGE_INSTANT = 'sum(DCGM_FI_DEV_GPU_UTIL{namespace!="",pod!=""}) by (namespace)'
|
NAMESPACE_GPU_USAGE_INSTANT = 'sum(DCGM_FI_DEV_GPU_UTIL{namespace!="",pod!=""}) by (namespace)'
|
||||||
NAMESPACE_GPU_RAW = (
|
NAMESPACE_GPU_RAW = (
|
||||||
|
|||||||
27
services/keycloak/README.md
Normal file
27
services/keycloak/README.md
Normal file
@ -0,0 +1,27 @@
|
|||||||
|
# services/keycloak
|
||||||
|
|
||||||
|
Keycloak is deployed via raw manifests and backed by the shared Postgres (`postgres-service.postgres.svc.cluster.local:5432`). Create these secrets before applying:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# DB creds (per-service DB/user in shared Postgres)
|
||||||
|
kubectl -n sso create secret generic keycloak-db \
|
||||||
|
--from-literal=username=keycloak \
|
||||||
|
--from-literal=password='<DB_PASSWORD>' \
|
||||||
|
--from-literal=database=keycloak
|
||||||
|
|
||||||
|
# Admin console creds (maps to KC admin user)
|
||||||
|
kubectl -n sso create secret generic keycloak-admin \
|
||||||
|
--from-literal=username=brad@bstein.dev \
|
||||||
|
--from-literal=password='<ADMIN_PASSWORD>'
|
||||||
|
```
|
||||||
|
|
||||||
|
Apply:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
kubectl apply -k services/keycloak
|
||||||
|
```
|
||||||
|
|
||||||
|
Notes
|
||||||
|
- Service: `keycloak.sso.svc:80` (Ingress `sso.bstein.dev`, TLS via cert-manager).
|
||||||
|
- Uses Postgres schema `public`; DB/user should be provisioned in the shared Postgres instance.
|
||||||
|
- Health endpoints on :9000 are wired for probes.
|
||||||
132
services/keycloak/deployment.yaml
Normal file
132
services/keycloak/deployment.yaml
Normal file
@ -0,0 +1,132 @@
|
|||||||
|
# services/keycloak/deployment.yaml
|
||||||
|
apiVersion: apps/v1
|
||||||
|
kind: Deployment
|
||||||
|
metadata:
|
||||||
|
name: keycloak
|
||||||
|
namespace: sso
|
||||||
|
labels:
|
||||||
|
app: keycloak
|
||||||
|
spec:
|
||||||
|
replicas: 1
|
||||||
|
selector:
|
||||||
|
matchLabels:
|
||||||
|
app: keycloak
|
||||||
|
template:
|
||||||
|
metadata:
|
||||||
|
labels:
|
||||||
|
app: keycloak
|
||||||
|
spec:
|
||||||
|
affinity:
|
||||||
|
nodeAffinity:
|
||||||
|
requiredDuringSchedulingIgnoredDuringExecution:
|
||||||
|
nodeSelectorTerms:
|
||||||
|
- matchExpressions:
|
||||||
|
- key: hardware
|
||||||
|
operator: In
|
||||||
|
values: ["rpi5","rpi4"]
|
||||||
|
- key: node-role.kubernetes.io/worker
|
||||||
|
operator: Exists
|
||||||
|
- matchExpressions:
|
||||||
|
- key: kubernetes.io/hostname
|
||||||
|
operator: In
|
||||||
|
values: ["titan-24"]
|
||||||
|
preferredDuringSchedulingIgnoredDuringExecution:
|
||||||
|
- weight: 90
|
||||||
|
preference:
|
||||||
|
matchExpressions:
|
||||||
|
- key: hardware
|
||||||
|
operator: In
|
||||||
|
values: ["rpi5"]
|
||||||
|
- weight: 70
|
||||||
|
preference:
|
||||||
|
matchExpressions:
|
||||||
|
- key: hardware
|
||||||
|
operator: In
|
||||||
|
values: ["rpi4"]
|
||||||
|
securityContext:
|
||||||
|
runAsUser: 1000
|
||||||
|
runAsGroup: 0
|
||||||
|
fsGroup: 1000
|
||||||
|
fsGroupChangePolicy: OnRootMismatch
|
||||||
|
containers:
|
||||||
|
- name: keycloak
|
||||||
|
image: quay.io/keycloak/keycloak:26.0.7
|
||||||
|
imagePullPolicy: IfNotPresent
|
||||||
|
args:
|
||||||
|
- start
|
||||||
|
env:
|
||||||
|
- name: KC_DB
|
||||||
|
value: postgres
|
||||||
|
- name: KC_DB_URL_HOST
|
||||||
|
value: postgres-service.postgres.svc.cluster.local
|
||||||
|
- name: KC_DB_URL_DATABASE
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: keycloak-db
|
||||||
|
key: database
|
||||||
|
- name: KC_DB_USERNAME
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: keycloak-db
|
||||||
|
key: username
|
||||||
|
- name: KC_DB_PASSWORD
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: keycloak-db
|
||||||
|
key: password
|
||||||
|
- name: KC_DB_SCHEMA
|
||||||
|
value: public
|
||||||
|
- name: KC_HOSTNAME
|
||||||
|
value: sso.bstein.dev
|
||||||
|
- name: KC_HOSTNAME_URL
|
||||||
|
value: https://sso.bstein.dev
|
||||||
|
- name: KC_PROXY
|
||||||
|
value: edge
|
||||||
|
- name: KC_PROXY_HEADERS
|
||||||
|
value: xforwarded
|
||||||
|
- name: KC_HTTP_ENABLED
|
||||||
|
value: "true"
|
||||||
|
- name: KC_HTTP_MANAGEMENT_PORT
|
||||||
|
value: "9000"
|
||||||
|
- name: KC_HTTP_MANAGEMENT_BIND_ADDRESS
|
||||||
|
value: 0.0.0.0
|
||||||
|
- name: KC_HEALTH_ENABLED
|
||||||
|
value: "true"
|
||||||
|
- name: KC_METRICS_ENABLED
|
||||||
|
value: "true"
|
||||||
|
- name: KEYCLOAK_ADMIN
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: keycloak-admin
|
||||||
|
key: username
|
||||||
|
- name: KEYCLOAK_ADMIN_PASSWORD
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: keycloak-admin
|
||||||
|
key: password
|
||||||
|
ports:
|
||||||
|
- containerPort: 8080
|
||||||
|
name: http
|
||||||
|
- containerPort: 9000
|
||||||
|
name: metrics
|
||||||
|
readinessProbe:
|
||||||
|
httpGet:
|
||||||
|
path: /health/ready
|
||||||
|
port: 9000
|
||||||
|
initialDelaySeconds: 15
|
||||||
|
periodSeconds: 10
|
||||||
|
failureThreshold: 6
|
||||||
|
livenessProbe:
|
||||||
|
httpGet:
|
||||||
|
path: /health/live
|
||||||
|
port: 9000
|
||||||
|
initialDelaySeconds: 60
|
||||||
|
periodSeconds: 15
|
||||||
|
failureThreshold: 6
|
||||||
|
volumeMounts:
|
||||||
|
- name: data
|
||||||
|
mountPath: /opt/keycloak/data
|
||||||
|
volumes:
|
||||||
|
- name: data
|
||||||
|
persistentVolumeClaim:
|
||||||
|
claimName: keycloak-data
|
||||||
24
services/keycloak/ingress.yaml
Normal file
24
services/keycloak/ingress.yaml
Normal file
@ -0,0 +1,24 @@
|
|||||||
|
# services/keycloak/ingress.yaml
|
||||||
|
apiVersion: networking.k8s.io/v1
|
||||||
|
kind: Ingress
|
||||||
|
metadata:
|
||||||
|
name: keycloak
|
||||||
|
namespace: sso
|
||||||
|
annotations:
|
||||||
|
cert-manager.io/cluster-issuer: letsencrypt
|
||||||
|
spec:
|
||||||
|
ingressClassName: traefik
|
||||||
|
rules:
|
||||||
|
- host: sso.bstein.dev
|
||||||
|
http:
|
||||||
|
paths:
|
||||||
|
- path: /
|
||||||
|
pathType: Prefix
|
||||||
|
backend:
|
||||||
|
service:
|
||||||
|
name: keycloak
|
||||||
|
port:
|
||||||
|
number: 80
|
||||||
|
tls:
|
||||||
|
- hosts: [sso.bstein.dev]
|
||||||
|
secretName: keycloak-tls
|
||||||
10
services/keycloak/kustomization.yaml
Normal file
10
services/keycloak/kustomization.yaml
Normal file
@ -0,0 +1,10 @@
|
|||||||
|
# services/keycloak/kustomization.yaml
|
||||||
|
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||||
|
kind: Kustomization
|
||||||
|
namespace: sso
|
||||||
|
resources:
|
||||||
|
- namespace.yaml
|
||||||
|
- pvc.yaml
|
||||||
|
- deployment.yaml
|
||||||
|
- service.yaml
|
||||||
|
- ingress.yaml
|
||||||
5
services/keycloak/namespace.yaml
Normal file
5
services/keycloak/namespace.yaml
Normal file
@ -0,0 +1,5 @@
|
|||||||
|
# services/keycloak/namespace.yaml
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Namespace
|
||||||
|
metadata:
|
||||||
|
name: sso
|
||||||
12
services/keycloak/pvc.yaml
Normal file
12
services/keycloak/pvc.yaml
Normal file
@ -0,0 +1,12 @@
|
|||||||
|
# services/keycloak/pvc.yaml
|
||||||
|
apiVersion: v1
|
||||||
|
kind: PersistentVolumeClaim
|
||||||
|
metadata:
|
||||||
|
name: keycloak-data
|
||||||
|
namespace: sso
|
||||||
|
spec:
|
||||||
|
accessModes: ["ReadWriteOnce"]
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
storage: 10Gi
|
||||||
|
storageClassName: astreae
|
||||||
15
services/keycloak/service.yaml
Normal file
15
services/keycloak/service.yaml
Normal file
@ -0,0 +1,15 @@
|
|||||||
|
# services/keycloak/service.yaml
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Service
|
||||||
|
metadata:
|
||||||
|
name: keycloak
|
||||||
|
namespace: sso
|
||||||
|
labels:
|
||||||
|
app: keycloak
|
||||||
|
spec:
|
||||||
|
selector:
|
||||||
|
app: keycloak
|
||||||
|
ports:
|
||||||
|
- name: http
|
||||||
|
port: 80
|
||||||
|
targetPort: http
|
||||||
@ -20,7 +20,7 @@
|
|||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "100 * ( ( (sum by (namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[1h]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ) / clamp_min(sum( ( (sum by (namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[1h]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ), 1)",
|
"expr": "100 * ( ( (sum by (namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[$__range]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ) / clamp_min(sum( ( (sum by (namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[$__range]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ), 1)",
|
||||||
"refId": "A",
|
"refId": "A",
|
||||||
"legendFormat": "{{namespace}}"
|
"legendFormat": "{{namespace}}"
|
||||||
}
|
}
|
||||||
|
|||||||
@ -975,7 +975,7 @@
|
|||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "100 * ( ( (sum by (namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[1h]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ) / clamp_min(sum( ( (sum by (namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[1h]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ), 1)",
|
"expr": "100 * ( ( (sum by (namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[$__range]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ) / clamp_min(sum( ( (sum by (namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[$__range]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ), 1)",
|
||||||
"refId": "A",
|
"refId": "A",
|
||||||
"legendFormat": "{{namespace}}"
|
"legendFormat": "{{namespace}}"
|
||||||
}
|
}
|
||||||
|
|||||||
@ -29,7 +29,7 @@ data:
|
|||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "100 * ( ( (sum by (namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[1h]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ) / clamp_min(sum( ( (sum by (namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[1h]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ), 1)",
|
"expr": "100 * ( ( (sum by (namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[$__range]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ) / clamp_min(sum( ( (sum by (namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[$__range]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ), 1)",
|
||||||
"refId": "A",
|
"refId": "A",
|
||||||
"legendFormat": "{{namespace}}"
|
"legendFormat": "{{namespace}}"
|
||||||
}
|
}
|
||||||
|
|||||||
@ -984,7 +984,7 @@ data:
|
|||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "100 * ( ( (sum by (namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[1h]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ) / clamp_min(sum( ( (sum by (namespace) (avg_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[1h]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ), 1)",
|
"expr": "100 * ( ( (sum by (namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[$__range]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ) / clamp_min(sum( ( (sum by (namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[$__range]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ), 1)",
|
||||||
"refId": "A",
|
"refId": "A",
|
||||||
"legendFormat": "{{namespace}}"
|
"legendFormat": "{{namespace}}"
|
||||||
}
|
}
|
||||||
|
|||||||
@ -249,9 +249,27 @@ spec:
|
|||||||
service:
|
service:
|
||||||
type: ClusterIP
|
type: ClusterIP
|
||||||
env:
|
env:
|
||||||
GF_AUTH_ANONYMOUS_ENABLED: "true"
|
GF_AUTH_ANONYMOUS_ENABLED: "false"
|
||||||
GF_AUTH_ANONYMOUS_ORG_ROLE: Viewer
|
|
||||||
GF_SECURITY_ALLOW_EMBEDDING: "true"
|
GF_SECURITY_ALLOW_EMBEDDING: "true"
|
||||||
|
GF_AUTH_GENERIC_OAUTH_ENABLED: "true"
|
||||||
|
GF_AUTH_GENERIC_OAUTH_NAME: "Keycloak"
|
||||||
|
GF_AUTH_GENERIC_OAUTH_ALLOW_SIGN_UP: "true"
|
||||||
|
GF_AUTH_GENERIC_OAUTH_SCOPES: "openid profile email groups"
|
||||||
|
GF_AUTH_GENERIC_OAUTH_AUTH_URL: "https://sso.bstein.dev/realms/atlas/protocol/openid-connect/auth"
|
||||||
|
GF_AUTH_GENERIC_OAUTH_TOKEN_URL: "https://sso.bstein.dev/realms/atlas/protocol/openid-connect/token"
|
||||||
|
GF_AUTH_GENERIC_OAUTH_API_URL: "https://sso.bstein.dev/realms/atlas/protocol/openid-connect/userinfo"
|
||||||
|
GF_AUTH_GENERIC_OAUTH_ROLE_ATTRIBUTE_PATH: "contains(groups, 'admin') && 'Admin' || 'Viewer'"
|
||||||
|
GF_AUTH_GENERIC_OAUTH_TLS_SKIP_VERIFY_INSECURE: "false"
|
||||||
|
GF_AUTH_SIGNOUT_REDIRECT_URL: "https://sso.bstein.dev/realms/atlas/protocol/openid-connect/logout?redirect_uri=https://metrics.bstein.dev/"
|
||||||
|
envValueFrom:
|
||||||
|
GF_AUTH_GENERIC_OAUTH_CLIENT_ID:
|
||||||
|
secretKeyRef:
|
||||||
|
name: grafana-oidc
|
||||||
|
key: client_id
|
||||||
|
GF_AUTH_GENERIC_OAUTH_CLIENT_SECRET:
|
||||||
|
secretKeyRef:
|
||||||
|
name: grafana-oidc
|
||||||
|
key: client_secret
|
||||||
grafana.ini:
|
grafana.ini:
|
||||||
server:
|
server:
|
||||||
domain: metrics.bstein.dev
|
domain: metrics.bstein.dev
|
||||||
|
|||||||
82
services/oauth2-proxy/deployment.yaml
Normal file
82
services/oauth2-proxy/deployment.yaml
Normal file
@ -0,0 +1,82 @@
|
|||||||
|
# services/oauth2-proxy/deployment.yaml
|
||||||
|
apiVersion: apps/v1
|
||||||
|
kind: Deployment
|
||||||
|
metadata:
|
||||||
|
name: oauth2-proxy
|
||||||
|
namespace: sso
|
||||||
|
labels:
|
||||||
|
app: oauth2-proxy
|
||||||
|
spec:
|
||||||
|
replicas: 2
|
||||||
|
selector:
|
||||||
|
matchLabels:
|
||||||
|
app: oauth2-proxy
|
||||||
|
template:
|
||||||
|
metadata:
|
||||||
|
labels:
|
||||||
|
app: oauth2-proxy
|
||||||
|
spec:
|
||||||
|
nodeSelector:
|
||||||
|
node-role.kubernetes.io/worker: "true"
|
||||||
|
affinity:
|
||||||
|
nodeAffinity:
|
||||||
|
preferredDuringSchedulingIgnoredDuringExecution:
|
||||||
|
- weight: 90
|
||||||
|
preference:
|
||||||
|
matchExpressions:
|
||||||
|
- key: hardware
|
||||||
|
operator: In
|
||||||
|
values: ["rpi5","rpi4"]
|
||||||
|
containers:
|
||||||
|
- name: oauth2-proxy
|
||||||
|
image: quay.io/oauth2-proxy/oauth2-proxy:v7.6.0
|
||||||
|
imagePullPolicy: IfNotPresent
|
||||||
|
args:
|
||||||
|
- --provider=oidc
|
||||||
|
- --redirect-url=https://auth.bstein.dev/oauth2/callback
|
||||||
|
- --oidc-issuer-url=https://sso.bstein.dev/realms/atlas
|
||||||
|
- --scope=openid profile email groups
|
||||||
|
- --email-domain=*
|
||||||
|
- --set-xauthrequest=true
|
||||||
|
- --pass-access-token=true
|
||||||
|
- --set-authorization-header=true
|
||||||
|
- --cookie-secure=true
|
||||||
|
- --cookie-samesite=lax
|
||||||
|
- --cookie-refresh=20m
|
||||||
|
- --cookie-expire=168h
|
||||||
|
- --upstream=static://200
|
||||||
|
- --http-address=0.0.0.0:4180
|
||||||
|
- --skip-provider-button=true
|
||||||
|
- --skip-jwt-bearer-tokens=true
|
||||||
|
- --oidc-groups-claim=groups
|
||||||
|
env:
|
||||||
|
- name: OAUTH2_PROXY_CLIENT_ID
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: oauth2-proxy-oidc
|
||||||
|
key: client_id
|
||||||
|
- name: OAUTH2_PROXY_CLIENT_SECRET
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: oauth2-proxy-oidc
|
||||||
|
key: client_secret
|
||||||
|
- name: OAUTH2_PROXY_COOKIE_SECRET
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: oauth2-proxy-oidc
|
||||||
|
key: cookie_secret
|
||||||
|
ports:
|
||||||
|
- containerPort: 4180
|
||||||
|
name: http
|
||||||
|
readinessProbe:
|
||||||
|
httpGet:
|
||||||
|
path: /ping
|
||||||
|
port: 4180
|
||||||
|
initialDelaySeconds: 5
|
||||||
|
periodSeconds: 10
|
||||||
|
livenessProbe:
|
||||||
|
httpGet:
|
||||||
|
path: /ping
|
||||||
|
port: 4180
|
||||||
|
initialDelaySeconds: 20
|
||||||
|
periodSeconds: 20
|
||||||
25
services/oauth2-proxy/ingress.yaml
Normal file
25
services/oauth2-proxy/ingress.yaml
Normal file
@ -0,0 +1,25 @@
|
|||||||
|
# services/oauth2-proxy/ingress.yaml
|
||||||
|
apiVersion: networking.k8s.io/v1
|
||||||
|
kind: Ingress
|
||||||
|
metadata:
|
||||||
|
name: oauth2-proxy
|
||||||
|
namespace: sso
|
||||||
|
annotations:
|
||||||
|
cert-manager.io/cluster-issuer: letsencrypt
|
||||||
|
traefik.ingress.kubernetes.io/router.middlewares: sso-oauth2-proxy-errors@kubernetescrd
|
||||||
|
spec:
|
||||||
|
ingressClassName: traefik
|
||||||
|
rules:
|
||||||
|
- host: auth.bstein.dev
|
||||||
|
http:
|
||||||
|
paths:
|
||||||
|
- path: /
|
||||||
|
pathType: Prefix
|
||||||
|
backend:
|
||||||
|
service:
|
||||||
|
name: oauth2-proxy
|
||||||
|
port:
|
||||||
|
number: 80
|
||||||
|
tls:
|
||||||
|
- hosts: [auth.bstein.dev]
|
||||||
|
secretName: auth-tls
|
||||||
10
services/oauth2-proxy/kustomization.yaml
Normal file
10
services/oauth2-proxy/kustomization.yaml
Normal file
@ -0,0 +1,10 @@
|
|||||||
|
# services/oauth2-proxy/kustomization.yaml
|
||||||
|
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||||
|
kind: Kustomization
|
||||||
|
namespace: sso
|
||||||
|
resources:
|
||||||
|
- deployment.yaml
|
||||||
|
- service.yaml
|
||||||
|
- ingress.yaml
|
||||||
|
- middleware.yaml
|
||||||
|
- middleware-errors.yaml
|
||||||
15
services/oauth2-proxy/middleware-errors.yaml
Normal file
15
services/oauth2-proxy/middleware-errors.yaml
Normal file
@ -0,0 +1,15 @@
|
|||||||
|
# services/oauth2-proxy/middleware-errors.yaml
|
||||||
|
apiVersion: traefik.io/v1alpha1
|
||||||
|
kind: Middleware
|
||||||
|
metadata:
|
||||||
|
name: oauth2-proxy-errors
|
||||||
|
namespace: sso
|
||||||
|
spec:
|
||||||
|
errors:
|
||||||
|
status:
|
||||||
|
- "401"
|
||||||
|
- "403"
|
||||||
|
service:
|
||||||
|
name: oauth2-proxy
|
||||||
|
port: 80
|
||||||
|
query: /oauth2/start?rd={url}
|
||||||
15
services/oauth2-proxy/middleware.yaml
Normal file
15
services/oauth2-proxy/middleware.yaml
Normal file
@ -0,0 +1,15 @@
|
|||||||
|
# services/oauth2-proxy/middleware.yaml
|
||||||
|
apiVersion: traefik.io/v1alpha1
|
||||||
|
kind: Middleware
|
||||||
|
metadata:
|
||||||
|
name: oauth2-proxy-forward-auth
|
||||||
|
namespace: sso
|
||||||
|
spec:
|
||||||
|
forwardAuth:
|
||||||
|
address: http://oauth2-proxy.sso.svc.cluster.local:4180/oauth2/auth
|
||||||
|
trustForwardHeader: true
|
||||||
|
authResponseHeaders:
|
||||||
|
- Authorization
|
||||||
|
- X-Auth-Request-Email
|
||||||
|
- X-Auth-Request-User
|
||||||
|
- X-Auth-Request-Groups
|
||||||
15
services/oauth2-proxy/service.yaml
Normal file
15
services/oauth2-proxy/service.yaml
Normal file
@ -0,0 +1,15 @@
|
|||||||
|
# services/oauth2-proxy/service.yaml
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Service
|
||||||
|
metadata:
|
||||||
|
name: oauth2-proxy
|
||||||
|
namespace: sso
|
||||||
|
labels:
|
||||||
|
app: oauth2-proxy
|
||||||
|
spec:
|
||||||
|
selector:
|
||||||
|
app: oauth2-proxy
|
||||||
|
ports:
|
||||||
|
- name: http
|
||||||
|
port: 80
|
||||||
|
targetPort: 4180
|
||||||
@ -7,7 +7,6 @@ metadata:
|
|||||||
annotations:
|
annotations:
|
||||||
kubernetes.io/ingress.class: traefik
|
kubernetes.io/ingress.class: traefik
|
||||||
traefik.ingress.kubernetes.io/router.entrypoints: websecure
|
traefik.ingress.kubernetes.io/router.entrypoints: websecure
|
||||||
traefik.ingress.kubernetes.io/router.middlewares: vault-vault-basicauth@kubernetescrd
|
|
||||||
traefik.ingress.kubernetes.io/service.serversscheme: https
|
traefik.ingress.kubernetes.io/service.serversscheme: https
|
||||||
traefik.ingress.kubernetes.io/service.serversTransport: vault-vault-to-https@kubernetescrd
|
traefik.ingress.kubernetes.io/service.serversTransport: vault-vault-to-https@kubernetescrd
|
||||||
spec:
|
spec:
|
||||||
|
|||||||
@ -7,5 +7,4 @@ resources:
|
|||||||
- helmrelease.yaml
|
- helmrelease.yaml
|
||||||
- certificate.yaml
|
- certificate.yaml
|
||||||
- ingress.yaml
|
- ingress.yaml
|
||||||
- middleware.yaml
|
|
||||||
- serverstransport.yaml
|
- serverstransport.yaml
|
||||||
|
|||||||
@ -1,9 +0,0 @@
|
|||||||
# services/vault/middleware.yaml
|
|
||||||
apiVersion: traefik.io/v1alpha1
|
|
||||||
kind: Middleware
|
|
||||||
metadata:
|
|
||||||
name: vault-basicauth
|
|
||||||
namespace: vault
|
|
||||||
spec:
|
|
||||||
basicAuth:
|
|
||||||
secret: vault-basic-auth
|
|
||||||
Loading…
x
Reference in New Issue
Block a user