Compare commits
No commits in common. "main" and "restructure/hybrid-clusters" have entirely different histories.
main
...
restructur
6
.gitignore
vendored
6
.gitignore
vendored
@ -1,5 +1 @@
|
||||
# Ignore markdown by default, but keep top-level docs
|
||||
*.md
|
||||
!README.md
|
||||
!AGENTS.md
|
||||
!**/NOTES.md
|
||||
AGENTS.md
|
||||
|
||||
81
AGENTS.md
81
AGENTS.md
@ -1,81 +0,0 @@
|
||||
|
||||
|
||||
Repository Guidelines
|
||||
|
||||
> Local-only note: apply changes through Flux-tracked manifests, not by manual kubectl edits in-cluster—manual tweaks will be reverted by Flux.
|
||||
|
||||
## Project Structure & Module Organization
|
||||
- `infrastructure/`: cluster-scoped building blocks (core, flux-system, traefik, longhorn). Add new platform features by mirroring this layout.
|
||||
- `services/`: workload manifests per app (`services/gitea/`, etc.) with `kustomization.yaml` plus one file per kind; keep diffs small and focused.
|
||||
- `dockerfiles/` hosts bespoke images, while `scripts/` stores operational Fish/Bash helpers—extend these directories instead of relying on ad-hoc commands.
|
||||
|
||||
## Build, Test, and Development Commands
|
||||
- `kustomize build services/<app>` (or `kubectl kustomize ...`) renders manifests exactly as Flux will.
|
||||
- `kubectl apply --server-side --dry-run=client -k services/<app>` checks schema compatibility without touching the cluster.
|
||||
- `flux reconcile kustomization <name> --namespace flux-system --with-source` pulls the latest Git state after merges or hotfixes.
|
||||
- `fish scripts/flux_hammer.fish --help` explains the recovery tool; read it before running against production workloads.
|
||||
|
||||
## Coding Style & Naming Conventions
|
||||
- YAML uses two-space indents; retain the leading path comment (e.g. `# services/gitea/deployment.yaml`) to speed code review.
|
||||
- Keep resource names lowercase kebab-case, align labels/selectors, and mirror namespaces with directory names.
|
||||
- List resources in `kustomization.yaml` from namespace/config, through storage, then workloads and networking for predictable diffs.
|
||||
- Scripts start with `#!/usr/bin/env fish` or bash, stay executable, and follow snake_case names such as `flux_hammer.fish`.
|
||||
|
||||
## Testing Guidelines
|
||||
- Run `kustomize build` and the dry-run apply for every service you touch; capture failures before opening a PR.
|
||||
- `flux diff kustomization <name> --path services/<app>` previews reconciliations—link notable output when behavior shifts.
|
||||
- Docker edits: `docker build -f dockerfiles/Dockerfile.monerod .` (swap the file you changed) to verify image builds.
|
||||
|
||||
## Commit & Pull Request Guidelines
|
||||
- Keep commit subjects short, present-tense, and optionally scoped (`gpu(titan-24): add RuntimeClass`); squash fixups before review.
|
||||
- Describe linked issues, affected services, and required operator steps (e.g. `flux reconcile kustomization services-gitea`) in the PR body.
|
||||
- Focus each PR on one kustomization or service and update `infrastructure/flux-system` when Flux must track new folders.
|
||||
- Record the validation you ran (dry-runs, diffs, builds) and add screenshots only when ingress or UI behavior changes.
|
||||
|
||||
## Security & Configuration Tips
|
||||
- Never commit credentials; use Vault workflows (`services/vault/`) or SOPS-encrypted manifests wired through `infrastructure/flux-system`.
|
||||
- Node selectors and tolerations gate workloads to hardware like `hardware: rpi4`; confirm labels before scaling or renaming nodes.
|
||||
- Pin external images by digest or rely on Flux image automation to follow approved tags and avoid drift.
|
||||
|
||||
## Dashboard roadmap / context (2025-12-02)
|
||||
- Atlas dashboards are generated via `scripts/dashboards_render_atlas.py --build`, which writes JSON under `services/monitoring/dashboards/` and ConfigMaps under `services/monitoring/`. Keep the Grafana manifests in sync by regenerating after edits.
|
||||
- Atlas Overview panels are paired with internal dashboards (pods, nodes, storage, network, GPU). A new `atlas-gpu` internal dashboard holds the detailed GPU metrics that feed the overview share pie.
|
||||
- Old Grafana folders (`Atlas Storage`, `Atlas SRE`, `Atlas Public`, `Atlas Nodes`) should be removed in Grafana UI when convenient; only `Atlas Overview` and `Atlas Internal` should remain provisioned.
|
||||
- Future work: add a separate generator (e.g., `dashboards_render_oceanus.py`) for SUI/oceanus validation dashboards, mirroring the atlas pattern of internal dashboards feeding a public overview.
|
||||
|
||||
## Monitoring state (2025-12-03)
|
||||
- dcgm-exporter DaemonSet pulls `registry.bstein.dev/monitoring/dcgm-exporter:4.4.2-4.7.0-ubuntu22.04` with nvidia runtime/imagePullSecret; titan-24 exports metrics, titan-22 remains NotReady.
|
||||
- Atlas Overview is the Grafana home (1h range, 1m refresh), Overview folder UID `overview`, internal folder `atlas-internal` (oceanus-internal stub).
|
||||
- Panels standardized via generator; hottest row compressed, worker/control rows taller, root disk row taller and top12 bar gauge with labels. GPU share pie uses 1h avg_over_time to persist idle activity.
|
||||
- Internal dashboards are provisioned without Viewer role; if anonymous still sees them, restart Grafana and tighten auth if needed.
|
||||
- GPU share panel updated (feature/sso) to use `max_over_time(…[$__range])`, so longer ranges (e.g., 12h) keep recent activity visible. Flux tracking `feature/sso`.
|
||||
|
||||
## Upcoming priorities (SSO/storage/mail)
|
||||
- Establish SSO (Keycloak or similar) and federate Grafana, Gitea, Zot, Nextcloud, Pegasus/Jellyfin; keep Vaultwarden separate until safe.
|
||||
- Add Nextcloud (limit to rpi5 workers) with office suite; integrate with SSO; plan storage class and ingress.
|
||||
- Plan mail: mostly self-hosted, relay through trusted provider for outbound; integrate with services (Nextcloud, Vaultwarden, etc.) for notifications and account flows.
|
||||
|
||||
## SSO plan sketch (2025-12-03)
|
||||
- IdP: use Keycloak (preferred) in a new `sso` namespace, Bitnami or codecentric chart with Postgres backing store (single PVC), ingress `sso.bstein.dev`, admin user bound to brad@bstein.dev; stick with local DB initially (no external IdP).
|
||||
- Auth flow goals: Grafana (OIDC), Gitea (OAuth2/Keycloak), Zot (via Traefik forward-auth/oauth2-proxy), Jellyfin/Pegasus via Jellyfin OAuth/OpenID plugin (map existing usernames; run migration to pre-create users in Keycloak with same usernames/emails and temporary passwords), Pegasus keeps using Jellyfin tokens.
|
||||
- Steps to implement:
|
||||
1) Add service folder `services/keycloak/` (namespace, PVC, HelmRelease, ingress, secret for admin creds). Verify with kustomize + Flux reconcile.
|
||||
2) Seed realm `atlas` with users (import CSV/realm). Create client for Grafana (public/implicit), Gitea (confidential), and a “jellyfin” client for the OAuth plugin; set email for brad@bstein.dev as admin.
|
||||
3) Reconfigure Grafana to OIDC (disable anonymous to internal folders, leave Overview public via folder permissions). Reconfigure Gitea to OIDC (app.ini).
|
||||
4) Add Traefik forward-auth (oauth2-proxy) in front of Zot and any other services needing headers-based auth.
|
||||
5) Deploy Jellyfin OpenID plugin; map Keycloak users to existing Jellyfin usernames; communicate password reset path.
|
||||
- Migration caution: do not delete existing local creds until SSO validated; keep Pegasus working via Jellyfin tokens during transition.
|
||||
|
||||
## Postgres centralization (2025-12-03)
|
||||
- Prefer a shared in-cluster Postgres deployment with per-service databases to reduce resource sprawl on Pi nodes. Use it for services that can easily point at an external DB.
|
||||
- Candidates to migrate to shared Postgres: Keycloak (realm DB), Gitea (git DB), Nextcloud (app DB), possibly Grafana (if persistence needed beyond current provisioner), Jitsi prosody/JVB state (if external DB supported). Keep tightly-coupled or lightweight embedded DBs as-is when migration is painful or not supported.
|
||||
|
||||
## SSO integration snapshot (2025-12-08)
|
||||
- Current blockers: Zot still prompts for basic auth/double-login; Vault still wants the token UI after Keycloak (previously 502/404 when vault-0 sealed). Forward-auth middleware on Zot Ingress likely still causing the 401/Found hop; Vault OIDC mount not completing UI flow unless unsealed and preferred login is set.
|
||||
- Flux-only changes required: remove zot forward-auth middleware from Ingress (let oauth2-proxy handle redirect), ensure Vault OIDC mount is preferred UI login and bound to admin group; keep all edits in repo so Flux enforces them.
|
||||
- Secrets present (per user): `zot-oidc-client` (client_secret only), `oauth2-proxy-zot-oidc`, `oauth2-proxy-vault-oidc`, `vault-oidc-admin-token`. Zot needs its regcred in the zot namespace if image pulls fail.
|
||||
- Cluster validation blocked here: `kubectl get nodes` fails (403/permission) and DNS to `*.bstein.dev` fails in this session, so no live curl verification could be run. Re-test on a host with cluster/DNS access after Flux applies fixes.
|
||||
|
||||
## Docs hygiene
|
||||
- Do not add per-service `README.md` files; use `NOTES.md` if documentation is needed inside service folders. Keep only the top-level repo README.
|
||||
- Keep comments succinct and in a human voice—no AI-sounding notes. Use `NOTES.md` for scratch notes instead of sprinkling reminders into code or extra READMEs.
|
||||
3
NOTES.md
3
NOTES.md
@ -1,3 +0,0 @@
|
||||
# Rotation reminders (temporary secrets set by automation)
|
||||
|
||||
- Weave GitOps UI (`cd.bstein.dev`) admin: `admin` / `G1tOps!2025` — rotate immediately after first login.
|
||||
@ -1,3 +0,0 @@
|
||||
# titan-iac
|
||||
|
||||
Flux-managed Kubernetes cluster for bstein.dev services.
|
||||
@ -1,15 +0,0 @@
|
||||
# clusters/atlas/flux-system/applications/keycloak/kustomization.yaml
|
||||
apiVersion: kustomize.toolkit.fluxcd.io/v1
|
||||
kind: Kustomization
|
||||
metadata:
|
||||
name: keycloak
|
||||
namespace: flux-system
|
||||
spec:
|
||||
interval: 10m
|
||||
prune: true
|
||||
sourceRef:
|
||||
kind: GitRepository
|
||||
name: flux-system
|
||||
path: ./services/keycloak
|
||||
targetNamespace: sso
|
||||
timeout: 2m
|
||||
@ -13,6 +13,3 @@ resources:
|
||||
- jellyfin/kustomization.yaml
|
||||
- xmr-miner/kustomization.yaml
|
||||
- sui-metrics/kustomization.yaml
|
||||
- keycloak/kustomization.yaml
|
||||
- oauth2-proxy/kustomization.yaml
|
||||
- mailu/kustomization.yaml
|
||||
|
||||
@ -1,18 +0,0 @@
|
||||
# clusters/atlas/flux-system/applications/mailu/kustomization.yaml
|
||||
apiVersion: kustomize.toolkit.fluxcd.io/v1
|
||||
kind: Kustomization
|
||||
metadata:
|
||||
name: mailu
|
||||
namespace: flux-system
|
||||
spec:
|
||||
interval: 10m
|
||||
sourceRef:
|
||||
kind: GitRepository
|
||||
name: flux-system
|
||||
namespace: flux-system
|
||||
path: ./services/mailu
|
||||
targetNamespace: mailu-mailserver
|
||||
prune: true
|
||||
wait: true
|
||||
dependsOn:
|
||||
- name: helm
|
||||
@ -1,15 +0,0 @@
|
||||
# clusters/atlas/flux-system/applications/oauth2-proxy/kustomization.yaml
|
||||
apiVersion: kustomize.toolkit.fluxcd.io/v1
|
||||
kind: Kustomization
|
||||
metadata:
|
||||
name: oauth2-proxy
|
||||
namespace: flux-system
|
||||
spec:
|
||||
interval: 10m
|
||||
prune: true
|
||||
sourceRef:
|
||||
kind: GitRepository
|
||||
name: flux-system
|
||||
path: ./services/oauth2-proxy
|
||||
targetNamespace: sso
|
||||
timeout: 2m
|
||||
@ -8,7 +8,7 @@ metadata:
|
||||
spec:
|
||||
interval: 1m0s
|
||||
ref:
|
||||
branch: feature/mailu
|
||||
branch: restructure/hybrid-clusters
|
||||
secretRef:
|
||||
name: flux-system-gitea
|
||||
url: ssh://git@scm.bstein.dev:2242/bstein/titan-iac.git
|
||||
|
||||
@ -1,20 +0,0 @@
|
||||
# clusters/atlas/flux-system/platform/gitops-ui/kustomization.yaml
|
||||
apiVersion: kustomize.toolkit.fluxcd.io/v1
|
||||
kind: Kustomization
|
||||
metadata:
|
||||
name: gitops-ui
|
||||
namespace: flux-system
|
||||
spec:
|
||||
interval: 10m
|
||||
timeout: 10m
|
||||
path: ./services/gitops-ui
|
||||
prune: true
|
||||
sourceRef:
|
||||
kind: GitRepository
|
||||
name: flux-system
|
||||
namespace: flux-system
|
||||
targetNamespace: flux-system
|
||||
dependsOn:
|
||||
- name: helm
|
||||
- name: traefik
|
||||
wait: true
|
||||
@ -5,6 +5,5 @@ resources:
|
||||
- core/kustomization.yaml
|
||||
- helm/kustomization.yaml
|
||||
- traefik/kustomization.yaml
|
||||
- gitops-ui/kustomization.yaml
|
||||
- monitoring/kustomization.yaml
|
||||
- longhorn-ui/kustomization.yaml
|
||||
|
||||
@ -11,4 +11,4 @@ spec:
|
||||
sourceRef:
|
||||
kind: GitRepository
|
||||
name: flux-system
|
||||
wait: false
|
||||
wait: true
|
||||
|
||||
5
clusters/oceanus/README.md
Normal file
5
clusters/oceanus/README.md
Normal file
@ -0,0 +1,5 @@
|
||||
# Oceanus Cluster Scaffold
|
||||
|
||||
This directory prepares the Flux and Kustomize layout for a future Oceanus-managed cluster.
|
||||
Populate `flux-system/` with `gotk-components.yaml` and related manifests after running `flux bootstrap`.
|
||||
Define node-specific resources under `infrastructure/modules/profiles/oceanus-validator/` and reference workloads in `applications/` as they come online.
|
||||
@ -2,14 +2,15 @@
|
||||
|
||||
| Hostname | Role / Function | Managed By | Notes |
|
||||
|------------|--------------------------------|---------------------|-------|
|
||||
| titan-db | HA control plane database | Ansible | PostgreSQL / etcd backing services |
|
||||
| titan-0a | Kubernetes control-plane | Flux (atlas cluster)| HA leader, tainted for control only |
|
||||
| titan-0b | Kubernetes control-plane | Flux (atlas cluster)| Standby control node |
|
||||
| titan-0c | Kubernetes control-plane | Flux (atlas cluster)| Standby control node |
|
||||
| titan-04-19| Raspberry Pi workers | Flux (atlas cluster)| Workload nodes, labelled per hardware |
|
||||
| titan-20&21| NVIDIA Jetson workers | Flux (atlas cluster)| Workload nodes, labelled per hardware |
|
||||
| titan-22 | GPU mini-PC (Jellyfin) | Flux + Ansible | NVIDIA runtime managed via `modules/profiles/atlas-ha` |
|
||||
| titan-23 | Dedicated SUI validator Oceanus| Manual + Ansible | Baremetal validator workloads, exposes metrics to atlas |
|
||||
| titan-24 | Tethys hybrid node | Flux + Ansible | Runs SUI metrics via K8s, validator via Ansible |
|
||||
| titan-jh | Jumphost & bastion & lesavka | Ansible | Entry point / future KVM services / custom kvm - lesavaka |
|
||||
| titan-db | HA control plane database | Ansible | PostgreSQL / etcd backing services |
|
||||
| titan-jh | Jumphost & bastion | Ansible | Entry point / future KVM services |
|
||||
| oceanus | Dedicated SUI validator host | Ansible / Flux prep | Baremetal validator workloads, exposes metrics to atlas; Kustomize scaffold under `clusters/oceanus/` |
|
||||
| styx | Air-gapped workstation | Manual / Scripts | Remains isolated, scripts tracked in `hosts/styx` |
|
||||
|
||||
Use the `clusters/` directory for cluster-scoped state and the `hosts/` directory for baremetal orchestration.
|
||||
|
||||
@ -5,4 +5,3 @@ resources:
|
||||
- ../modules/base
|
||||
- ../modules/profiles/atlas-ha
|
||||
- ../sources/cert-manager/letsencrypt.yaml
|
||||
- ../sources/cert-manager/letsencrypt-prod.yaml
|
||||
|
||||
@ -7,7 +7,7 @@ metadata:
|
||||
annotations:
|
||||
traefik.ingress.kubernetes.io/router.entrypoints: websecure
|
||||
traefik.ingress.kubernetes.io/router.tls: "true"
|
||||
traefik.ingress.kubernetes.io/router.middlewares: ""
|
||||
traefik.ingress.kubernetes.io/router.middlewares: longhorn-system-longhorn-basicauth@kubernetescrd,longhorn-system-longhorn-headers@kubernetescrd
|
||||
spec:
|
||||
ingressClassName: traefik
|
||||
tls:
|
||||
@ -21,6 +21,6 @@ spec:
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service:
|
||||
name: oauth2-proxy-longhorn
|
||||
name: longhorn-frontend
|
||||
port:
|
||||
number: 80
|
||||
|
||||
@ -4,4 +4,3 @@ kind: Kustomization
|
||||
resources:
|
||||
- middleware.yaml
|
||||
- ingress.yaml
|
||||
- oauth2-proxy-longhorn.yaml
|
||||
|
||||
@ -20,20 +20,3 @@ spec:
|
||||
headers:
|
||||
customRequestHeaders:
|
||||
X-Forwarded-Proto: "https"
|
||||
|
||||
---
|
||||
|
||||
apiVersion: traefik.io/v1alpha1
|
||||
kind: Middleware
|
||||
metadata:
|
||||
name: longhorn-forward-auth
|
||||
namespace: longhorn-system
|
||||
spec:
|
||||
forwardAuth:
|
||||
address: https://auth.bstein.dev/oauth2/auth
|
||||
trustForwardHeader: true
|
||||
authResponseHeaders:
|
||||
- Authorization
|
||||
- X-Auth-Request-Email
|
||||
- X-Auth-Request-User
|
||||
- X-Auth-Request-Groups
|
||||
|
||||
@ -1,102 +0,0 @@
|
||||
# infrastructure/longhorn/ui-ingress/oauth2-proxy-longhorn.yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: oauth2-proxy-longhorn
|
||||
namespace: longhorn-system
|
||||
labels:
|
||||
app: oauth2-proxy-longhorn
|
||||
spec:
|
||||
ports:
|
||||
- name: http
|
||||
port: 80
|
||||
targetPort: 4180
|
||||
selector:
|
||||
app: oauth2-proxy-longhorn
|
||||
|
||||
---
|
||||
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: oauth2-proxy-longhorn
|
||||
namespace: longhorn-system
|
||||
labels:
|
||||
app: oauth2-proxy-longhorn
|
||||
spec:
|
||||
replicas: 2
|
||||
selector:
|
||||
matchLabels:
|
||||
app: oauth2-proxy-longhorn
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: oauth2-proxy-longhorn
|
||||
spec:
|
||||
nodeSelector:
|
||||
node-role.kubernetes.io/worker: "true"
|
||||
affinity:
|
||||
nodeAffinity:
|
||||
preferredDuringSchedulingIgnoredDuringExecution:
|
||||
- weight: 90
|
||||
preference:
|
||||
matchExpressions:
|
||||
- key: hardware
|
||||
operator: In
|
||||
values: ["rpi5","rpi4"]
|
||||
containers:
|
||||
- name: oauth2-proxy
|
||||
image: quay.io/oauth2-proxy/oauth2-proxy:v7.6.0
|
||||
imagePullPolicy: IfNotPresent
|
||||
args:
|
||||
- --provider=oidc
|
||||
- --redirect-url=https://longhorn.bstein.dev/oauth2/callback
|
||||
- --oidc-issuer-url=https://sso.bstein.dev/realms/atlas
|
||||
- --scope=openid profile email groups
|
||||
- --email-domain=*
|
||||
- --allowed-group=admin
|
||||
- --set-xauthrequest=true
|
||||
- --pass-access-token=true
|
||||
- --set-authorization-header=true
|
||||
- --cookie-secure=true
|
||||
- --cookie-samesite=lax
|
||||
- --cookie-refresh=20m
|
||||
- --cookie-expire=168h
|
||||
- --insecure-oidc-allow-unverified-email=true
|
||||
- --upstream=http://longhorn-frontend.longhorn-system.svc.cluster.local
|
||||
- --http-address=0.0.0.0:4180
|
||||
- --skip-provider-button=true
|
||||
- --skip-jwt-bearer-tokens=true
|
||||
- --oidc-groups-claim=groups
|
||||
- --cookie-domain=longhorn.bstein.dev
|
||||
env:
|
||||
- name: OAUTH2_PROXY_CLIENT_ID
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: oauth2-proxy-longhorn-oidc
|
||||
key: client_id
|
||||
- name: OAUTH2_PROXY_CLIENT_SECRET
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: oauth2-proxy-longhorn-oidc
|
||||
key: client_secret
|
||||
- name: OAUTH2_PROXY_COOKIE_SECRET
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: oauth2-proxy-longhorn-oidc
|
||||
key: cookie_secret
|
||||
ports:
|
||||
- containerPort: 4180
|
||||
name: http
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /ping
|
||||
port: 4180
|
||||
initialDelaySeconds: 5
|
||||
periodSeconds: 10
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /ping
|
||||
port: 4180
|
||||
initialDelaySeconds: 20
|
||||
periodSeconds: 20
|
||||
@ -1,14 +0,0 @@
|
||||
apiVersion: cert-manager.io/v1
|
||||
kind: ClusterIssuer
|
||||
metadata:
|
||||
name: letsencrypt-prod
|
||||
spec:
|
||||
acme:
|
||||
email: brad.stein@gmail.com
|
||||
server: https://acme-v02.api.letsencrypt.org/directory
|
||||
privateKeySecretRef:
|
||||
name: letsencrypt-prod-account-key
|
||||
solvers:
|
||||
- http01:
|
||||
ingress:
|
||||
class: traefik
|
||||
@ -1,10 +0,0 @@
|
||||
# infrastructure/sources/helm/kustomization.yaml
|
||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||
kind: Kustomization
|
||||
resources:
|
||||
- grafana.yaml
|
||||
- hashicorp.yaml
|
||||
- jetstack.yaml
|
||||
- mailu.yaml
|
||||
- prometheus.yaml
|
||||
- victoria-metrics.yaml
|
||||
@ -1,9 +0,0 @@
|
||||
# infrastructure/sources/helm/mailu.yaml
|
||||
apiVersion: source.toolkit.fluxcd.io/v1
|
||||
kind: HelmRepository
|
||||
metadata:
|
||||
name: mailu
|
||||
namespace: flux-system
|
||||
spec:
|
||||
interval: 1h
|
||||
url: https://mailu.github.io/helm-charts
|
||||
@ -39,12 +39,6 @@ items:
|
||||
- --metrics.prometheus.addEntryPointsLabels=true
|
||||
- --metrics.prometheus.addRoutersLabels=true
|
||||
- --metrics.prometheus.addServicesLabels=true
|
||||
- --entrypoints.web.transport.respondingTimeouts.readTimeout=0s
|
||||
- --entrypoints.web.transport.respondingTimeouts.writeTimeout=0s
|
||||
- --entrypoints.web.transport.respondingTimeouts.idleTimeout=0s
|
||||
- --entrypoints.websecure.transport.respondingTimeouts.readTimeout=0s
|
||||
- --entrypoints.websecure.transport.respondingTimeouts.writeTimeout=0s
|
||||
- --entrypoints.websecure.transport.respondingTimeouts.idleTimeout=0s
|
||||
- --entrypoints.metrics.address=:9100
|
||||
- --metrics.prometheus.entryPoint=metrics
|
||||
image: traefik:v3.3.3
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@ -1,204 +0,0 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Sync Keycloak users to Mailu mailboxes.
|
||||
- Generates/stores a mailu_app_password attribute in Keycloak (admin-only)
|
||||
- Upserts the mailbox in Mailu Postgres using that password
|
||||
"""
|
||||
|
||||
import os
|
||||
import sys
|
||||
import json
|
||||
import time
|
||||
import secrets
|
||||
import string
|
||||
import datetime
|
||||
import requests
|
||||
import psycopg2
|
||||
from psycopg2.extras import RealDictCursor
|
||||
from passlib.hash import bcrypt_sha256
|
||||
|
||||
|
||||
KC_BASE = os.environ["KEYCLOAK_BASE_URL"].rstrip("/")
|
||||
KC_REALM = os.environ["KEYCLOAK_REALM"]
|
||||
KC_CLIENT_ID = os.environ["KEYCLOAK_CLIENT_ID"]
|
||||
KC_CLIENT_SECRET = os.environ["KEYCLOAK_CLIENT_SECRET"]
|
||||
|
||||
MAILU_DOMAIN = os.environ["MAILU_DOMAIN"]
|
||||
MAILU_DEFAULT_QUOTA = int(os.environ.get("MAILU_DEFAULT_QUOTA", "20000000000"))
|
||||
|
||||
DB_CONFIG = {
|
||||
"host": os.environ["MAILU_DB_HOST"],
|
||||
"port": int(os.environ.get("MAILU_DB_PORT", "5432")),
|
||||
"dbname": os.environ["MAILU_DB_NAME"],
|
||||
"user": os.environ["MAILU_DB_USER"],
|
||||
"password": os.environ["MAILU_DB_PASSWORD"],
|
||||
}
|
||||
|
||||
SESSION = requests.Session()
|
||||
|
||||
|
||||
def log(msg):
|
||||
sys.stdout.write(f"{msg}\n")
|
||||
sys.stdout.flush()
|
||||
|
||||
|
||||
def get_kc_token():
|
||||
resp = SESSION.post(
|
||||
f"{KC_BASE}/realms/{KC_REALM}/protocol/openid-connect/token",
|
||||
data={
|
||||
"grant_type": "client_credentials",
|
||||
"client_id": KC_CLIENT_ID,
|
||||
"client_secret": KC_CLIENT_SECRET,
|
||||
},
|
||||
timeout=15,
|
||||
)
|
||||
resp.raise_for_status()
|
||||
return resp.json()["access_token"]
|
||||
|
||||
|
||||
def kc_get_users(token):
|
||||
users = []
|
||||
first = 0
|
||||
max_results = 200
|
||||
headers = {"Authorization": f"Bearer {token}"}
|
||||
while True:
|
||||
resp = SESSION.get(
|
||||
f"{KC_BASE}/admin/realms/{KC_REALM}/users",
|
||||
params={"first": first, "max": max_results, "enabled": "true"},
|
||||
headers=headers,
|
||||
timeout=20,
|
||||
)
|
||||
resp.raise_for_status()
|
||||
batch = resp.json()
|
||||
users.extend(batch)
|
||||
if len(batch) < max_results:
|
||||
break
|
||||
first += max_results
|
||||
return users
|
||||
|
||||
|
||||
def kc_update_attributes(token, user, attributes):
|
||||
headers = {
|
||||
"Authorization": f"Bearer {token}",
|
||||
"Content-Type": "application/json",
|
||||
}
|
||||
payload = {
|
||||
"firstName": user.get("firstName"),
|
||||
"lastName": user.get("lastName"),
|
||||
"email": user.get("email"),
|
||||
"enabled": user.get("enabled", True),
|
||||
"username": user["username"],
|
||||
"emailVerified": user.get("emailVerified", False),
|
||||
"attributes": attributes,
|
||||
}
|
||||
user_url = f"{KC_BASE}/admin/realms/{KC_REALM}/users/{user['id']}"
|
||||
resp = SESSION.put(user_url, headers=headers, json=payload, timeout=20)
|
||||
resp.raise_for_status()
|
||||
verify = SESSION.get(
|
||||
user_url,
|
||||
headers={"Authorization": f"Bearer {token}"},
|
||||
params={"briefRepresentation": "false"},
|
||||
timeout=15,
|
||||
)
|
||||
verify.raise_for_status()
|
||||
attrs = verify.json().get("attributes") or {}
|
||||
if not attrs.get("mailu_app_password"):
|
||||
raise Exception(f"attribute not persisted for {user.get('email') or user['username']}")
|
||||
|
||||
|
||||
def random_password():
|
||||
alphabet = string.ascii_letters + string.digits
|
||||
return "".join(secrets.choice(alphabet) for _ in range(24))
|
||||
|
||||
|
||||
def ensure_mailu_user(cursor, email, password, display_name):
|
||||
localpart, domain = email.split("@", 1)
|
||||
if domain.lower() != MAILU_DOMAIN.lower():
|
||||
return
|
||||
hashed = bcrypt_sha256.hash(password)
|
||||
now = datetime.datetime.utcnow()
|
||||
cursor.execute(
|
||||
"""
|
||||
INSERT INTO "user" (
|
||||
email, localpart, domain_name, password,
|
||||
quota_bytes, quota_bytes_used,
|
||||
global_admin, enabled, enable_imap, enable_pop, allow_spoofing,
|
||||
forward_enabled, forward_destination, forward_keep,
|
||||
reply_enabled, reply_subject, reply_body, reply_startdate, reply_enddate,
|
||||
displayed_name, spam_enabled, spam_mark_as_read, spam_threshold,
|
||||
change_pw_next_login, created_at, updated_at, comment
|
||||
)
|
||||
VALUES (
|
||||
%(email)s, %(localpart)s, %(domain)s, %(password)s,
|
||||
%(quota)s, 0,
|
||||
false, true, true, true, false,
|
||||
false, '', true,
|
||||
false, NULL, NULL, DATE '1900-01-01', DATE '2999-12-31',
|
||||
%(display)s, true, true, 80,
|
||||
false, CURRENT_DATE, %(now)s, ''
|
||||
)
|
||||
ON CONFLICT (email) DO UPDATE
|
||||
SET password = EXCLUDED.password,
|
||||
enabled = true,
|
||||
updated_at = EXCLUDED.updated_at
|
||||
""",
|
||||
{
|
||||
"email": email,
|
||||
"localpart": localpart,
|
||||
"domain": domain,
|
||||
"password": hashed,
|
||||
"quota": MAILU_DEFAULT_QUOTA,
|
||||
"display": display_name or localpart,
|
||||
"now": now,
|
||||
},
|
||||
)
|
||||
|
||||
|
||||
def main():
|
||||
token = get_kc_token()
|
||||
users = kc_get_users(token)
|
||||
if not users:
|
||||
log("No users found; exiting.")
|
||||
return
|
||||
|
||||
conn = psycopg2.connect(**DB_CONFIG)
|
||||
conn.autocommit = True
|
||||
cursor = conn.cursor(cursor_factory=RealDictCursor)
|
||||
|
||||
for user in users:
|
||||
attrs = user.get("attributes", {}) or {}
|
||||
app_pw_value = attrs.get("mailu_app_password")
|
||||
if isinstance(app_pw_value, list):
|
||||
app_pw = app_pw_value[0] if app_pw_value else None
|
||||
elif isinstance(app_pw_value, str):
|
||||
app_pw = app_pw_value
|
||||
else:
|
||||
app_pw = None
|
||||
|
||||
email = user.get("email")
|
||||
if not email:
|
||||
email = f"{user['username']}@{MAILU_DOMAIN}"
|
||||
|
||||
if not app_pw:
|
||||
app_pw = random_password()
|
||||
attrs["mailu_app_password"] = app_pw
|
||||
kc_update_attributes(token, user, attrs)
|
||||
log(f"Set mailu_app_password for {email}")
|
||||
|
||||
display_name = " ".join(
|
||||
part for part in [user.get("firstName"), user.get("lastName")] if part
|
||||
).strip()
|
||||
|
||||
ensure_mailu_user(cursor, email, app_pw, display_name)
|
||||
log(f"Synced mailbox for {email}")
|
||||
|
||||
cursor.close()
|
||||
conn.close()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
try:
|
||||
main()
|
||||
except Exception as exc:
|
||||
log(f"ERROR: {exc}")
|
||||
sys.exit(1)
|
||||
@ -1,49 +0,0 @@
|
||||
#!/bin/bash
|
||||
set -euo pipefail
|
||||
|
||||
KC_BASE="${KC_BASE:?}"
|
||||
KC_REALM="${KC_REALM:?}"
|
||||
KC_ADMIN_USER="${KC_ADMIN_USER:?}"
|
||||
KC_ADMIN_PASS="${KC_ADMIN_PASS:?}"
|
||||
|
||||
if ! command -v jq >/dev/null 2>&1; then
|
||||
apt-get update && apt-get install -y jq curl >/dev/null
|
||||
fi
|
||||
|
||||
account_exists() {
|
||||
# Skip if the account email is already present in the mail app.
|
||||
runuser -u www-data -- php occ mail:account:list 2>/dev/null | grep -Fq " ${1}" || \
|
||||
runuser -u www-data -- php occ mail:account:list 2>/dev/null | grep -Fq "${1} "
|
||||
}
|
||||
|
||||
token=$(
|
||||
curl -s -d "grant_type=password" \
|
||||
-d "client_id=admin-cli" \
|
||||
-d "username=${KC_ADMIN_USER}" \
|
||||
-d "password=${KC_ADMIN_PASS}" \
|
||||
"${KC_BASE}/realms/master/protocol/openid-connect/token" | jq -r '.access_token'
|
||||
)
|
||||
|
||||
if [[ -z "${token}" || "${token}" == "null" ]]; then
|
||||
echo "Failed to obtain admin token"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
users=$(curl -s -H "Authorization: Bearer ${token}" \
|
||||
"${KC_BASE}/admin/realms/${KC_REALM}/users?max=2000")
|
||||
|
||||
echo "${users}" | jq -c '.[]' | while read -r user; do
|
||||
username=$(echo "${user}" | jq -r '.username')
|
||||
email=$(echo "${user}" | jq -r '.email // empty')
|
||||
app_pw=$(echo "${user}" | jq -r '.attributes.mailu_app_password[0] // empty')
|
||||
[[ -z "${email}" || -z "${app_pw}" ]] && continue
|
||||
if account_exists "${email}"; then
|
||||
echo "Skipping ${email}, already exists"
|
||||
continue
|
||||
fi
|
||||
echo "Syncing ${email}"
|
||||
runuser -u www-data -- php occ mail:account:create \
|
||||
"${username}" "${username}" "${email}" \
|
||||
mail.bstein.dev 993 ssl "${email}" "${app_pw}" \
|
||||
mail.bstein.dev 587 tls "${email}" "${app_pw}" login || true
|
||||
done
|
||||
@ -1,65 +0,0 @@
|
||||
#!/bin/bash
|
||||
set -euo pipefail
|
||||
|
||||
NC_URL="${NC_URL:-https://cloud.bstein.dev}"
|
||||
ADMIN_USER="${ADMIN_USER:?}"
|
||||
ADMIN_PASS="${ADMIN_PASS:?}"
|
||||
|
||||
export DEBIAN_FRONTEND=noninteractive
|
||||
apt-get update -qq
|
||||
apt-get install -y -qq curl jq >/dev/null
|
||||
|
||||
run_occ() {
|
||||
runuser -u www-data -- php occ "$@"
|
||||
}
|
||||
|
||||
log() { echo "[$(date -Is)] $*"; }
|
||||
|
||||
log "Applying Atlas theming"
|
||||
run_occ theming:config name "Atlas Cloud"
|
||||
run_occ theming:config slogan "Unified access to Atlas services"
|
||||
run_occ theming:config url "https://cloud.bstein.dev"
|
||||
run_occ theming:config color "#0f172a"
|
||||
run_occ theming:config disable-user-theming yes
|
||||
|
||||
log "Setting default quota to 200 GB"
|
||||
run_occ config:app:set files default_quota --value "200 GB"
|
||||
|
||||
API_BASE="${NC_URL}/ocs/v2.php/apps/external/api/v1"
|
||||
AUTH=(-u "${ADMIN_USER}:${ADMIN_PASS}" -H "OCS-APIRequest: true")
|
||||
|
||||
log "Removing existing external links"
|
||||
existing=$(curl -sf "${AUTH[@]}" "${API_BASE}?format=json" | jq -r '.ocs.data[].id // empty')
|
||||
for id in ${existing}; do
|
||||
curl -sf "${AUTH[@]}" -X DELETE "${API_BASE}/sites/${id}?format=json" >/dev/null || true
|
||||
done
|
||||
|
||||
SITES=(
|
||||
"Vaultwarden|https://vault.bstein.dev"
|
||||
"Jellyfin|https://stream.bstein.dev"
|
||||
"Gitea|https://scm.bstein.dev"
|
||||
"Jenkins|https://ci.bstein.dev"
|
||||
"Zot|https://registry.bstein.dev"
|
||||
"Vault|https://secret.bstein.dev"
|
||||
"Jitsi|https://meet.bstein.dev"
|
||||
"Grafana|https://metrics.bstein.dev"
|
||||
"Chat LLM|https://chat.ai.bstein.dev"
|
||||
"Vision|https://draw.ai.bstein.dev"
|
||||
"STT/TTS|https://talk.ai.bstein.dev"
|
||||
)
|
||||
|
||||
log "Seeding external links"
|
||||
for entry in "${SITES[@]}"; do
|
||||
IFS="|" read -r name url <<<"${entry}"
|
||||
curl -sf "${AUTH[@]}" -X POST "${API_BASE}/sites?format=json" \
|
||||
-d "name=${name}" \
|
||||
-d "url=${url}" \
|
||||
-d "lang=" \
|
||||
-d "type=link" \
|
||||
-d "device=" \
|
||||
-d "icon=" \
|
||||
-d "groups[]=" \
|
||||
-d "redirect=1" >/dev/null
|
||||
done
|
||||
|
||||
log "Maintenance run completed"
|
||||
@ -1,575 +0,0 @@
|
||||
#!/usr/bin/env bash
|
||||
set -euo pipefail
|
||||
|
||||
# --- CONFIG (edit if needed) ---
|
||||
# Leave NVME empty → script will auto-detect the SSK dock.
|
||||
NVME="${NVME:-}"
|
||||
FLAVOR="${FLAVOR:-desktop}"
|
||||
# Persistent cache so the image survives reboots.
|
||||
IMG_DIR="${IMG_DIR:-/var/cache/styx-rpi}"
|
||||
IMG_FILE="${IMG_FILE:-ubuntu-24.04.3-preinstalled-${FLAVOR}-arm64+raspi.img}"
|
||||
IMG_BOOT_MNT="${IMG_BOOT_MNT:-/mnt/img-boot}"
|
||||
IMG_ROOT_MNT="${IMG_ROOT_MNT:-/mnt/img-root}"
|
||||
TGT_ROOT="/mnt/target-root"
|
||||
TGT_BOOT="/mnt/target-boot"
|
||||
|
||||
STYX_USER="styx"
|
||||
STYX_HOSTNAME="titan-ag"
|
||||
STYX_PASS="TempPass#123" # will be forced to change on first login via cloud-init
|
||||
SSH_PUBKEY="ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIOb8oMX6u0z3sH/p/WBGlvPXXdbGETCKzWYwR/dd6fZb titan-bastion"
|
||||
|
||||
# Video / input prefs
|
||||
DSI_FLAGS="video=DSI-1:800x480@60D video=HDMI-A-1:off video=HDMI-A-2:off"
|
||||
|
||||
# --- Helpers ---
|
||||
fatal(){ echo "ERROR: $*" >&2; exit 1; }
|
||||
need(){ command -v "$1" >/dev/null || fatal "Missing tool: $1"; }
|
||||
|
||||
require_root(){ [[ $EUID -eq 0 ]] || exec sudo -E "$0" "$@"; }
|
||||
|
||||
part() {
|
||||
local n="$1"
|
||||
if [[ "$NVME" =~ [0-9]$ ]]; then
|
||||
echo "${NVME}p${n}"
|
||||
else
|
||||
echo "${NVME}${n}"
|
||||
fi
|
||||
}
|
||||
|
||||
auto_detect_target_disk() {
|
||||
# If user already set NVME, validate and return
|
||||
if [[ -n "${NVME:-}" ]]; then
|
||||
[[ -b "$NVME" ]] || fatal "NVME='$NVME' is not a block device"
|
||||
return
|
||||
fi
|
||||
|
||||
# Prefer stable by-id symlinks
|
||||
local byid
|
||||
byid=$(ls -1 /dev/disk/by-id/usb-SSK* 2>/dev/null | head -n1 || true)
|
||||
if [[ -n "$byid" ]]; then
|
||||
NVME=$(readlink -f "$byid")
|
||||
else
|
||||
# Heuristic via lsblk -S: look for USB with SSK/Ingram/Storage in vendor/model
|
||||
NVME=$(lsblk -S -p -o NAME,TRAN,VENDOR,MODEL | \
|
||||
awk '/ usb / && ($3 ~ /SSK|Ingram/i || $4 ~ /SSK|Storage/i){print $1; exit}')
|
||||
fi
|
||||
|
||||
[[ -n "${NVME:-}" && -b "$NVME" ]] || fatal "Could not auto-detect SSK USB NVMe dock. Export NVME=/dev/sdX and re-run."
|
||||
echo "Auto-detected target disk: $NVME"
|
||||
}
|
||||
|
||||
preflight_cleanup() {
|
||||
local img="$IMG_DIR/$IMG_FILE"
|
||||
|
||||
# 1) Unmount image mountpoints and detach only loops for this IMG
|
||||
umount -lf "$IMG_BOOT_MNT" "$IMG_ROOT_MNT" 2>/dev/null || true
|
||||
# losetup -j exits non-zero if no association → tolerate it
|
||||
{ losetup -j "$img" | cut -d: -f1 | xargs -r losetup -d; } 2>/dev/null || true
|
||||
|
||||
# 2) Unmount our target mounts
|
||||
umount -lf "$TGT_ROOT/boot/firmware" "$TGT_BOOT" "$TGT_ROOT" 2>/dev/null || true
|
||||
|
||||
# 3) Unmount the actual target partitions if mounted anywhere (tolerate 'not found')
|
||||
for p in "$(part 1)" "$(part 2)"; do
|
||||
# findmnt returns 1 when no match → capture and iterate if any
|
||||
while read -r mnt; do
|
||||
[ -n "$mnt" ] && umount -lf "$mnt" 2>/dev/null || true
|
||||
done < <(findmnt -rno TARGET -S "$p" 2>/dev/null || true)
|
||||
done
|
||||
|
||||
# 4) Close dm-crypt mapping (if it exists)
|
||||
cryptsetup luksClose cryptroot 2>/dev/null || true
|
||||
dmsetup remove -f cryptroot 2>/dev/null || true
|
||||
|
||||
# 5) Let udev settle
|
||||
command -v udevadm >/dev/null && udevadm settle || true
|
||||
}
|
||||
|
||||
guard_target_device() {
|
||||
# Refuse to operate if NVME appears to be the current system disk
|
||||
local root_src root_disk
|
||||
root_src=$(findmnt -no SOURCE /)
|
||||
root_disk=$(lsblk -no pkname "$root_src" 2>/dev/null || true)
|
||||
if [[ -n "$root_disk" && "/dev/$root_disk" == "$NVME" ]]; then
|
||||
fatal "Refusing to operate on system disk ($NVME). Pick the external NVMe."
|
||||
fi
|
||||
}
|
||||
|
||||
need_host_fido2() {
|
||||
if ! command -v fido2-token >/dev/null 2>&1; then
|
||||
echo "Host is missing fido2-token. On Arch: sudo pacman -S libfido2"
|
||||
echo "On Debian/Ubuntu host: sudo apt-get install fido2-tools"
|
||||
exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
ensure_image() {
|
||||
mkdir -p "$IMG_DIR"
|
||||
chmod 755 "$IMG_DIR"
|
||||
|
||||
local BASE="https://cdimage.ubuntu.com/releases/noble/release"
|
||||
local XZ="ubuntu-24.04.3-preinstalled-${FLAVOR}-arm64+raspi.img.xz"
|
||||
|
||||
# If the decompressed .img is missing, fetch/decompress into the cache.
|
||||
if [[ ! -f "$IMG_DIR/$IMG_FILE" ]]; then
|
||||
need curl; need unxz # Arch: pacman -S curl xz | Ubuntu: apt-get install curl xz-utils
|
||||
if [[ ! -f "$IMG_DIR/$XZ" ]]; then
|
||||
echo "Fetching image…"
|
||||
curl -fL -o "$IMG_DIR/$XZ" "$BASE/$XZ"
|
||||
fi
|
||||
echo "Decompressing to $IMG_DIR/$IMG_FILE …"
|
||||
# Keep the .xz for future runs; stream-decompress to the .img
|
||||
if command -v unxz >/dev/null 2>&1; then
|
||||
unxz -c "$IMG_DIR/$XZ" > "$IMG_DIR/$IMG_FILE"
|
||||
else
|
||||
need xz
|
||||
xz -dc "$IMG_DIR/$XZ" > "$IMG_DIR/$IMG_FILE"
|
||||
fi
|
||||
sync
|
||||
else
|
||||
echo "Using cached image: $IMG_DIR/$IMG_FILE"
|
||||
fi
|
||||
}
|
||||
|
||||
ensure_binfmt_aarch64(){
|
||||
# Register qemu-aarch64 for chrooted ARM64 apt runs
|
||||
if [[ ! -e /proc/sys/fs/binfmt_misc/qemu-aarch64 ]]; then
|
||||
need docker
|
||||
systemctl enable --now docker >/dev/null 2>&1 || true
|
||||
docker run --rm --privileged tonistiigi/binfmt --install arm64 >/dev/null
|
||||
fi
|
||||
if [[ ! -x /usr/local/bin/qemu-aarch64-static ]]; then
|
||||
docker rm -f qemu-static >/dev/null 2>&1 || true
|
||||
docker create --name qemu-static docker.io/multiarch/qemu-user-static:latest >/dev/null
|
||||
docker cp qemu-static:/usr/bin/qemu-aarch64-static /usr/local/bin/
|
||||
install -D -m755 /usr/local/bin/qemu-aarch64-static /usr/local/bin/qemu-aarch64-static
|
||||
docker rm qemu-static >/dev/null
|
||||
fi
|
||||
}
|
||||
|
||||
open_image() {
|
||||
[[ -r "$IMG_DIR/$IMG_FILE" ]] || fatal "Image not found: $IMG_DIR/$IMG_FILE"
|
||||
mkdir -p "$IMG_BOOT_MNT" "$IMG_ROOT_MNT"
|
||||
|
||||
# Pre-clean: detach any previous loop(s) for this image (tolerate absence)
|
||||
umount -lf "$IMG_BOOT_MNT" 2>/dev/null || true
|
||||
umount -lf "$IMG_ROOT_MNT" 2>/dev/null || true
|
||||
# If no loop is attached, losetup -j returns non-zero → swallow it
|
||||
mapfile -t OLD < <({ losetup -j "$IMG_DIR/$IMG_FILE" | cut -d: -f1; } 2>/dev/null || true)
|
||||
for L in "${OLD[@]:-}"; do losetup -d "$L" 2>/dev/null || true; done
|
||||
command -v udevadm >/dev/null && udevadm settle || true
|
||||
|
||||
# Attach with partition scan; wait for partition nodes to exist
|
||||
LOOP=$(losetup --find --show --partscan "$IMG_DIR/$IMG_FILE") || fatal "losetup failed"
|
||||
command -v udevadm >/dev/null && udevadm settle || true
|
||||
for _ in {1..25}; do
|
||||
[[ -b "${LOOP}p1" && -b "${LOOP}p2" ]] && break
|
||||
sleep 0.1
|
||||
command -v udevadm >/dev/null && udevadm settle || true
|
||||
done
|
||||
[[ -b "${LOOP}p1" ]] || fatal "loop partitions not present for $LOOP"
|
||||
|
||||
# Cleanup on exit: unmount first, then detach loop (tolerate absence)
|
||||
trap 'umount -lf "'"$IMG_BOOT_MNT"'" "'"$IMG_ROOT_MNT"'" 2>/dev/null; losetup -d "'"$LOOP"'" 2>/dev/null' EXIT
|
||||
|
||||
# Mount image partitions read-only
|
||||
mount -o ro "${LOOP}p1" "$IMG_BOOT_MNT"
|
||||
mount -o ro "${LOOP}p2" "$IMG_ROOT_MNT"
|
||||
|
||||
# Sanity checks without using failing pipelines
|
||||
# start*.elf must exist
|
||||
if ! compgen -G "$IMG_BOOT_MNT/start*.elf" > /dev/null; then
|
||||
fatal "start*.elf not found in image"
|
||||
fi
|
||||
# vmlinuz-* must exist
|
||||
if ! compgen -G "$IMG_ROOT_MNT/boot/vmlinuz-*" > /dev/null; then
|
||||
fatal "vmlinuz-* not found in image root"
|
||||
fi
|
||||
}
|
||||
|
||||
confirm_and_wipe(){
|
||||
lsblk -o NAME,SIZE,MODEL,TRAN,LABEL "$NVME"
|
||||
read -rp "Type EXACTLY 'WIPE' to destroy ALL DATA on $NVME: " ACK
|
||||
[[ "$ACK" == "WIPE" ]] || fatal "Aborted"
|
||||
wipefs -a "$NVME"
|
||||
sgdisk -Zo "$NVME"
|
||||
# GPT: 1: 1MiB..513MiB vfat ESP; 2: rest LUKS
|
||||
parted -s "$NVME" mklabel gpt \
|
||||
mkpart system-boot fat32 1MiB 513MiB set 1 esp on \
|
||||
mkpart cryptroot 513MiB 100%
|
||||
partprobe "$NVME"; sleep 1
|
||||
mkfs.vfat -F32 -n system-boot "$(part 1)"
|
||||
}
|
||||
|
||||
setup_luks(){
|
||||
echo "Create LUKS2 on $(part 2) (you will be prompted for a passphrase; keep it as fallback)"
|
||||
need cryptsetup
|
||||
cryptsetup luksFormat --type luks2 "$(part 2)"
|
||||
cryptsetup open "$(part 2)" cryptroot
|
||||
mkfs.ext4 -L rootfs /dev/mapper/cryptroot
|
||||
}
|
||||
|
||||
mount_targets(){
|
||||
mkdir -p "$TGT_ROOT" "$TGT_BOOT"
|
||||
mount /dev/mapper/cryptroot "$TGT_ROOT"
|
||||
mkdir -p "$TGT_ROOT/boot/firmware"
|
||||
mount "$(part 1)" "$TGT_BOOT"
|
||||
mount --bind "$TGT_BOOT" "$TGT_ROOT/boot/firmware"
|
||||
}
|
||||
|
||||
rsync_root_and_boot(){
|
||||
need rsync
|
||||
rsync -aAXH --numeric-ids --delete \
|
||||
--exclude='/boot/firmware' --exclude='/boot/firmware/**' \
|
||||
--exclude='/dev/*' --exclude='/proc/*' --exclude='/sys/*' \
|
||||
--exclude='/run/*' --exclude='/tmp/*' --exclude='/mnt/*' \
|
||||
--exclude='/media/*' --exclude='/lost+found' \
|
||||
"$IMG_ROOT_MNT"/ "$TGT_ROOT"/
|
||||
rsync -aH --delete "$IMG_BOOT_MNT"/ "$TGT_ROOT/boot/firmware"/
|
||||
}
|
||||
|
||||
write_crypttab_fstab(){
|
||||
LUUID=$(blkid -s UUID -o value "$(part 2)")
|
||||
printf 'cryptroot UUID=%s none luks,discard,fido2-device=auto\n' "$LUUID" > "$TGT_ROOT/etc/crypttab"
|
||||
cat > "$TGT_ROOT/etc/fstab" <<EOF
|
||||
/dev/mapper/cryptroot / ext4 defaults,discard,errors=remount-ro 0 1
|
||||
LABEL=system-boot /boot/firmware vfat defaults,umask=0077 0 1
|
||||
EOF
|
||||
}
|
||||
|
||||
fix_firmware_files(){
|
||||
local C="$TGT_ROOT/boot/firmware/config.txt"
|
||||
local CL="$TGT_ROOT/boot/firmware/cmdline.txt"
|
||||
[[ -f "$C" ]] || fatal "missing $C"
|
||||
|
||||
# Always boot the uncompressed Pi 5 kernel
|
||||
if grep -q '^kernel=' "$C"; then
|
||||
sed -i 's#^kernel=.*#kernel=kernel_2712.img#' "$C"
|
||||
else
|
||||
sed -i '1i kernel=kernel_2712.img' "$C"
|
||||
fi
|
||||
|
||||
# Ensure initramfs and cmdline indirection are set
|
||||
grep -q '^initramfs ' "$C" || echo 'initramfs initrd.img followkernel' >> "$C"
|
||||
grep -q '^cmdline=cmdline.txt' "$C" || sed -i '1i cmdline=cmdline.txt' "$C"
|
||||
|
||||
# Display & buses (Pi 5)
|
||||
grep -q '^dtoverlay=vc4-kms-v3d-pi5' "$C" || echo 'dtoverlay=vc4-kms-v3d-pi5' >> "$C"
|
||||
grep -q '^dtparam=i2c_arm=on' "$C" || echo 'dtparam=i2c_arm=on' >> "$C"
|
||||
grep -q '^dtparam=pciex1=on' "$C" || echo 'dtparam=pciex1=on' >> "$C"
|
||||
grep -q '^dtparam=pciex1_gen=2' "$C" || echo 'dtparam=pciex1_gen=2' >> "$C"
|
||||
grep -q '^enable_uart=1' "$C" || echo 'enable_uart=1' >> "$C"
|
||||
|
||||
# Minimal, correct dracut hints using the bare UUID
|
||||
local LUUID; LUUID=$(blkid -s UUID -o value "$(part 2)")
|
||||
: > "$CL"
|
||||
{
|
||||
echo -n "rd.luks.uuid=$LUUID rd.luks.name=$LUUID=cryptroot "
|
||||
echo -n "root=/dev/mapper/cryptroot rootfstype=ext4 rootwait fixrtc "
|
||||
echo "console=serial0,115200 console=tty1 ds=nocloud;s=file:///boot/firmware/ ${DSI_FLAGS} rd.debug"
|
||||
} >> "$CL"
|
||||
}
|
||||
|
||||
seed_cloud_init(){
|
||||
# NoCloud seed to create user, lock down SSH, set hostname, and enable avahi.
|
||||
cat > "$TGT_ROOT/boot/firmware/user-data" <<EOF
|
||||
#cloud-config
|
||||
hostname: $STYX_HOSTNAME
|
||||
manage_etc_hosts: true
|
||||
users:
|
||||
- name: $STYX_USER
|
||||
gecos: "$STYX_USER"
|
||||
shell: /bin/bash
|
||||
groups: [sudo,video,i2c]
|
||||
sudo: ALL=(ALL) NOPASSWD:ALL
|
||||
lock_passwd: false
|
||||
ssh_authorized_keys:
|
||||
- $SSH_PUBKEY
|
||||
chpasswd:
|
||||
list: |
|
||||
$STYX_USER:$STYX_PASS
|
||||
expire: true
|
||||
ssh_pwauth: false
|
||||
package_update: true
|
||||
packages: [openssh-server, avahi-daemon]
|
||||
runcmd:
|
||||
- systemctl enable --now ssh
|
||||
- systemctl enable --now avahi-daemon || true
|
||||
EOF
|
||||
|
||||
# Minimal meta-data for NoCloud
|
||||
date +%s | awk '{print "instance-id: iid-titan-ag-"$1"\nlocal-hostname: '"$STYX_HOSTNAME"'"}' \
|
||||
> "$TGT_ROOT/boot/firmware/meta-data"
|
||||
}
|
||||
|
||||
prep_chroot_mounts(){
|
||||
for d in dev proc sys; do mount --bind "/$d" "$TGT_ROOT/$d"; done
|
||||
mount -t devpts devpts "$TGT_ROOT/dev/pts"
|
||||
# Replace the usual resolv.conf symlink with a real file for apt to work
|
||||
rm -f "$TGT_ROOT/etc/resolv.conf"
|
||||
cp /etc/resolv.conf "$TGT_ROOT/etc/resolv.conf"
|
||||
|
||||
# Block service starts (no systemd in chroot)
|
||||
cat > "$TGT_ROOT/usr/sbin/policy-rc.d" <<'EOP'
|
||||
#!/bin/sh
|
||||
exit 101
|
||||
EOP
|
||||
chmod +x "$TGT_ROOT/usr/sbin/policy-rc.d"
|
||||
|
||||
# Ensure qemu static is present inside chroot
|
||||
install -D -m755 /usr/local/bin/qemu-aarch64-static "$TGT_ROOT/usr/bin/qemu-aarch64-static"
|
||||
}
|
||||
|
||||
in_chroot(){
|
||||
chroot "$TGT_ROOT" /usr/bin/qemu-aarch64-static /bin/bash -lc '
|
||||
set -euo pipefail
|
||||
export DEBIAN_FRONTEND=noninteractive TZ=Etc/UTC
|
||||
|
||||
# --- APT sources (ports) ---
|
||||
cat > /etc/apt/sources.list <<'"'"'EOS'"'"'
|
||||
deb http://ports.ubuntu.com/ubuntu-ports noble main restricted universe multiverse
|
||||
deb http://ports.ubuntu.com/ubuntu-ports noble-updates main restricted universe multiverse
|
||||
deb http://ports.ubuntu.com/ubuntu-ports noble-security main restricted universe multiverse
|
||||
EOS
|
||||
|
||||
apt-get update
|
||||
|
||||
# --- Remove snaps and pin them off ---
|
||||
apt-get -y purge snapd || true
|
||||
rm -rf /snap /var/snap /var/lib/snapd /home/*/snap || true
|
||||
mkdir -p /etc/apt/preferences.d
|
||||
cat > /etc/apt/preferences.d/nosnap.pref <<'"'"'EOS'"'"'
|
||||
Package: snapd
|
||||
Pin: release *
|
||||
Pin-Priority: -10
|
||||
EOS
|
||||
|
||||
# --- Base tools (no flash-kernel; we use dracut) ---
|
||||
apt-get install -y --no-install-recommends \
|
||||
openssh-client openssh-server openssh-sftp-server avahi-daemon \
|
||||
cryptsetup dracut fido2-tools libfido2-1 i2c-tools \
|
||||
python3-smbus python3-pil zbar-tools qrencode lm-sensors \
|
||||
file zstd lz4 || true
|
||||
|
||||
# Camera apps: try rpicam-apps; otherwise basic libcamera tools
|
||||
apt-get install -y rpicam-apps || apt-get install -y libcamera-tools || true
|
||||
|
||||
# --- Persistent journal so we can read logs after failed boot ---
|
||||
mkdir -p /etc/systemd/journald.conf.d
|
||||
cat > /etc/systemd/journald.conf.d/99-persistent.conf <<'"'"'EOS'"'"'
|
||||
[Journal]
|
||||
Storage=persistent
|
||||
EOS
|
||||
|
||||
# --- SSH hardening (ensure file exists even if package was half-installed) ---
|
||||
if [ ! -f /etc/ssh/sshd_config ]; then
|
||||
mkdir -p /etc/ssh
|
||||
cat > /etc/ssh/sshd_config <<'"'"'EOS'"'"'
|
||||
PermitRootLogin no
|
||||
PasswordAuthentication no
|
||||
KbdInteractiveAuthentication no
|
||||
PubkeyAuthentication yes
|
||||
# Accept defaults for the rest
|
||||
EOS
|
||||
fi
|
||||
sed -i -e "s/^#\?PasswordAuthentication .*/PasswordAuthentication no/" \
|
||||
-e "s/^#\?KbdInteractiveAuthentication .*/KbdInteractiveAuthentication no/" \
|
||||
-e "s/^#\?PermitRootLogin .*/PermitRootLogin no/" \
|
||||
-e "s/^#\?PubkeyAuthentication .*/PubkeyAuthentication yes/" /etc/ssh/sshd_config || true
|
||||
|
||||
# --- Hostname & hosts ---
|
||||
echo "'"$STYX_HOSTNAME"'" > /etc/hostname
|
||||
if grep -q "^127\\.0\\.1\\.1" /etc/hosts; then
|
||||
sed -i "s/^127\\.0\\.1\\.1.*/127.0.1.1\t'"$STYX_HOSTNAME"'/" /etc/hosts
|
||||
else
|
||||
echo -e "127.0.1.1\t'"$STYX_HOSTNAME"'" >> /etc/hosts
|
||||
fi
|
||||
|
||||
# --- Enable services on first boot ---
|
||||
mkdir -p /etc/systemd/system/multi-user.target.wants
|
||||
ln -sf /lib/systemd/system/ssh.service /etc/systemd/system/multi-user.target.wants/ssh.service
|
||||
ln -sf /lib/systemd/system/avahi-daemon.service /etc/systemd/system/multi-user.target.wants/avahi-daemon.service || true
|
||||
|
||||
# --- Ensure i2c group ---
|
||||
getent group i2c >/dev/null || groupadd i2c
|
||||
|
||||
# --- Dracut configuration (generic, not host-only) ---
|
||||
mkdir -p /etc/dracut.conf.d
|
||||
cat > /etc/dracut.conf.d/00-hostonly.conf <<'"'"'EOS'"'"'
|
||||
hostonly=no
|
||||
EOS
|
||||
cat > /etc/dracut.conf.d/10-systemd-crypt.conf <<'"'"'EOS'"'"'
|
||||
add_dracutmodules+=" systemd crypt "
|
||||
EOS
|
||||
cat > /etc/dracut.conf.d/20-drivers.conf <<'"'"'EOS'"'"'
|
||||
add_drivers+=" nvme xhci_pci xhci_hcd usbhid hid_generic hid "
|
||||
EOS
|
||||
cat > /etc/dracut.conf.d/30-fido2.conf <<'"'"'EOS'"'"'
|
||||
install_items+="/usr/bin/systemd-cryptsetup /usr/bin/fido2-token /usr/lib/*/libfido2.so* /usr/lib/*/libcbor.so*"
|
||||
EOS
|
||||
|
||||
# --- Build initramfs and place it where firmware expects it ---
|
||||
KVER=$(ls -1 /lib/modules | sort -V | tail -n1)
|
||||
dracut --force /boot/initramfs-$KVER.img $KVER
|
||||
ln -sf initramfs-$KVER.img /boot/initrd.img
|
||||
ln -sf initramfs-$KVER.img /boot/initrd.img-$KVER
|
||||
cp -a /boot/initramfs-$KVER.img /boot/firmware/initrd.img
|
||||
|
||||
# --- Create uncompressed kernel for Pi 5 firmware ---
|
||||
if [ -f "/usr/lib/linux-image-$KVER/Image" ]; then
|
||||
cp -a "/usr/lib/linux-image-$KVER/Image" /boot/firmware/kernel_2712.img
|
||||
else
|
||||
FMT=$(file -b "/boot/vmlinuz-$KVER" || true)
|
||||
case "$FMT" in
|
||||
*Zstandard*|*zstd*) zstd -dc "/boot/vmlinuz-$KVER" > /boot/firmware/kernel_2712.img ;;
|
||||
*LZ4*) lz4 -dc "/boot/vmlinuz-$KVER" > /boot/firmware/kernel_2712.img ;;
|
||||
*gzip*) zcat "/boot/vmlinuz-$KVER" > /boot/firmware/kernel_2712.img ;;
|
||||
*) cp -a "/boot/vmlinuz-$KVER" /boot/firmware/kernel_2712.img ;;
|
||||
esac
|
||||
fi
|
||||
|
||||
# --- Ensure Pi 5 DTB is present on the boot partition ---
|
||||
DTB=$(find /lib/firmware -type f -name "bcm2712-rpi-5-b.dtb" | sort | tail -n1 || true)
|
||||
[ -n "$DTB" ] && cp -a "$DTB" /boot/firmware/
|
||||
|
||||
# --- Dracut hook to copy rdsosreport.txt to the FAT partition on failure ---
|
||||
mkdir -p /usr/lib/dracut/modules.d/99copylog
|
||||
cat > /usr/lib/dracut/modules.d/99copylog/module-setup.sh <<'"'"'EOS'"'"'
|
||||
#!/bin/bash
|
||||
check() { return 0; }
|
||||
depends() { echo base; return 0; }
|
||||
install() {
|
||||
# Guard $moddir for nounset; derive if absent
|
||||
local mdir="${moddir:-$(cd -- "$(dirname -- "${BASH_SOURCE[0]}")" && pwd)}"
|
||||
inst_hook emergency 99 "$mdir/copylog.sh"
|
||||
}
|
||||
EOS
|
||||
chmod +x /usr/lib/dracut/modules.d/99copylog/module-setup.sh
|
||||
|
||||
cat > /usr/lib/dracut/modules.d/99copylog/copylog.sh <<'"'"'EOS'"'"'
|
||||
#!/bin/sh
|
||||
set -e
|
||||
for dev in /dev/nvme0n1p1 /dev/sda1 /dev/sdb1 /dev/mmcblk0p1; do
|
||||
[ -b "$dev" ] || continue
|
||||
mkdir -p /mnt/bootfat
|
||||
if mount -t vfat "$dev" /mnt/bootfat 2>/dev/null; then
|
||||
if [ -s /run/initramfs/rdsosreport.txt ]; then
|
||||
cp -f /run/initramfs/rdsosreport.txt /mnt/bootfat/rdsosreport.txt 2>/dev/null || true
|
||||
sync || true
|
||||
fi
|
||||
umount /mnt/bootfat || true
|
||||
break
|
||||
fi
|
||||
done
|
||||
EOS
|
||||
chmod +x /usr/lib/dracut/modules.d/99copylog/copylog.sh
|
||||
|
||||
# Rebuild to ensure the copylog module is included
|
||||
dracut --force /boot/initramfs-$KVER.img $KVER
|
||||
ln -sf initramfs-$KVER.img /boot/initrd.img
|
||||
cp -a /boot/initramfs-$KVER.img /boot/firmware/initrd.img
|
||||
|
||||
true
|
||||
'
|
||||
}
|
||||
|
||||
verify_boot_assets(){
|
||||
echo "---- verify boot assets on FAT ----"
|
||||
file "$TGT_ROOT/boot/firmware/kernel_2712.img" || true
|
||||
ls -lh "$TGT_ROOT/boot/firmware/initrd.img" || true
|
||||
echo "-- config.txt (key lines) --"
|
||||
grep -E '^(kernel|initramfs|cmdline)=|^dtoverlay=|^dtparam=' "$TGT_ROOT/boot/firmware/config.txt" || true
|
||||
echo "-- cmdline.txt --"
|
||||
cat "$TGT_ROOT/boot/firmware/cmdline.txt" || true
|
||||
echo "-- firmware blobs (sample) --"
|
||||
ls -1 "$TGT_ROOT/boot/firmware"/start*.elf "$TGT_ROOT/boot/firmware"/fixup*.dat | head -n 8 || true
|
||||
echo "-- Pi5 DTB --"
|
||||
ls -l "$TGT_ROOT/boot/firmware/"*rpi-5-b.dtb || true
|
||||
}
|
||||
|
||||
enroll_fido_tokens(){
|
||||
echo "Enrolling FIDO2 Solo keys into $(part 2) ..."
|
||||
need systemd-cryptenroll
|
||||
need fido2-token
|
||||
|
||||
# Collect all hidraw paths from both output styles (some distros print 'Device: /dev/hidrawX')
|
||||
mapfile -t DEVS < <(
|
||||
fido2-token -L \
|
||||
| sed -n 's,^\(/dev/hidraw[0-9]\+\):.*,\1,p; s,^Device:[[:space:]]\+/dev/hidraw\([0-9]\+\).*,/dev/hidraw\1,p' \
|
||||
| sort -u
|
||||
)
|
||||
|
||||
if (( ${#DEVS[@]} == 0 )); then
|
||||
echo "No FIDO2 tokens detected; skipping enrollment (you can enroll later)."
|
||||
echo "Example later: systemd-cryptenroll $(part 2) --fido2-device=/dev/hidrawX --fido2-with-client-pin=no"
|
||||
return 0
|
||||
fi
|
||||
|
||||
# Recommend keeping exactly ONE key plugged during first enrollment to avoid ambiguity.
|
||||
if (( ${#DEVS[@]} > 1 )); then
|
||||
echo "Note: multiple FIDO2 tokens present: ${DEVS[*]}"
|
||||
echo "If enrollment fails, try with only one key inserted."
|
||||
fi
|
||||
|
||||
local rc=0
|
||||
for D in "${DEVS[@]}"; do
|
||||
echo "-> Enrolling $D (you should be asked to touch the key)"
|
||||
if ! SYSTEMD_LOG_LEVEL=debug systemd-cryptenroll "$(part 2)" \
|
||||
--fido2-device="$D" \
|
||||
--fido2-with-client-pin=no \
|
||||
--fido2-with-user-presence=yes \
|
||||
--fido2-with-user-verification=no \
|
||||
--label="solo-$(basename "$D")"; then
|
||||
echo "WARN: enrollment failed for $D"
|
||||
rc=1
|
||||
fi
|
||||
done
|
||||
|
||||
echo "Tokens enrolled (if any):"
|
||||
systemd-cryptenroll "$(part 2)" --list || true
|
||||
return $rc
|
||||
}
|
||||
|
||||
cleanup(){
|
||||
rm -f "$TGT_ROOT/usr/sbin/policy-rc.d" || true
|
||||
umount -lf "$TGT_ROOT/dev/pts" 2>/dev/null || true
|
||||
for d in dev proc sys; do umount -lf "$TGT_ROOT/$d" 2>/dev/null || true; done
|
||||
umount -lf "$TGT_ROOT/boot/firmware" 2>/dev/null || true
|
||||
umount -lf "$TGT_BOOT" 2>/dev/null || true
|
||||
umount -lf "$TGT_ROOT" 2>/dev/null || true
|
||||
cryptsetup close cryptroot 2>/dev/null || true
|
||||
umount -lf "$IMG_BOOT_MNT" 2>/dev/null || true
|
||||
umount -lf "$IMG_ROOT_MNT" 2>/dev/null || true
|
||||
}
|
||||
|
||||
main(){
|
||||
require_root
|
||||
need losetup; need parted; need rsync
|
||||
auto_detect_target_disk
|
||||
echo "Target disk: $NVME"
|
||||
ensure_binfmt_aarch64
|
||||
ensure_image
|
||||
preflight_cleanup
|
||||
guard_target_device
|
||||
open_image
|
||||
confirm_and_wipe
|
||||
setup_luks
|
||||
mount_targets
|
||||
rsync_root_and_boot
|
||||
write_crypttab_fstab
|
||||
fix_firmware_files
|
||||
seed_cloud_init
|
||||
prep_chroot_mounts
|
||||
in_chroot
|
||||
verify_boot_assets
|
||||
need_host_fido2
|
||||
enroll_fido_tokens
|
||||
cleanup
|
||||
echo "✅ NVMe prepared."
|
||||
echo " Install in the Pi 5 and boot with no SD."
|
||||
echo " Expect LUKS to unlock automatically with a Solo key inserted;"
|
||||
echo " passphrase fallback remains. Hostname: ${STYX_HOSTNAME} User: ${STYX_USER}"
|
||||
echo " On first boot, reach it via: ssh -i ~/.ssh/id_ed25519_titan styx@titan-ag.local"
|
||||
}
|
||||
|
||||
main "$@"
|
||||
@ -1,58 +0,0 @@
|
||||
import importlib.util
|
||||
import pathlib
|
||||
|
||||
|
||||
def load_module():
|
||||
path = pathlib.Path(__file__).resolve().parents[1] / "dashboards_render_atlas.py"
|
||||
spec = importlib.util.spec_from_file_location("dashboards_render_atlas", path)
|
||||
module = importlib.util.module_from_spec(spec)
|
||||
assert spec.loader is not None
|
||||
spec.loader.exec_module(module)
|
||||
return module
|
||||
|
||||
|
||||
def test_table_panel_options_and_filterable():
|
||||
mod = load_module()
|
||||
panel = mod.table_panel(
|
||||
1,
|
||||
"test",
|
||||
"metric",
|
||||
{"h": 1, "w": 1, "x": 0, "y": 0},
|
||||
unit="percent",
|
||||
transformations=[{"id": "labelsToFields", "options": {}}],
|
||||
instant=True,
|
||||
options={"showColumnFilters": False},
|
||||
filterable=False,
|
||||
footer={"show": False, "fields": "", "calcs": []},
|
||||
format="table",
|
||||
)
|
||||
assert panel["fieldConfig"]["defaults"]["unit"] == "percent"
|
||||
assert panel["fieldConfig"]["defaults"]["custom"]["filterable"] is False
|
||||
assert panel["options"]["showHeader"] is True
|
||||
assert panel["targets"][0]["format"] == "table"
|
||||
|
||||
|
||||
def test_node_filter_and_expr_helpers():
|
||||
mod = load_module()
|
||||
expr = mod.node_filter("titan-.*")
|
||||
assert "label_replace" in expr
|
||||
cpu_expr = mod.node_cpu_expr("titan-.*")
|
||||
mem_expr = mod.node_mem_expr("titan-.*")
|
||||
assert "node_cpu_seconds_total" in cpu_expr
|
||||
assert "node_memory_MemAvailable_bytes" in mem_expr
|
||||
|
||||
|
||||
def test_render_configmap_writes(tmp_path):
|
||||
mod = load_module()
|
||||
mod.DASHBOARD_DIR = tmp_path / "dash"
|
||||
mod.ROOT = tmp_path
|
||||
uid = "atlas-test"
|
||||
info = {"configmap": tmp_path / "cm.yaml"}
|
||||
data = {"title": "Atlas Test"}
|
||||
mod.write_json(uid, data)
|
||||
mod.render_configmap(uid, info)
|
||||
json_path = mod.DASHBOARD_DIR / f"{uid}.json"
|
||||
assert json_path.exists()
|
||||
content = (tmp_path / "cm.yaml").read_text()
|
||||
assert "kind: ConfigMap" in content
|
||||
assert f"{uid}.json" in content
|
||||
@ -1,181 +0,0 @@
|
||||
import importlib.util
|
||||
import pathlib
|
||||
|
||||
import pytest
|
||||
|
||||
|
||||
def load_sync_module(monkeypatch):
|
||||
# Minimal env required by module import
|
||||
env = {
|
||||
"KEYCLOAK_BASE_URL": "http://keycloak",
|
||||
"KEYCLOAK_REALM": "atlas",
|
||||
"KEYCLOAK_CLIENT_ID": "mailu-sync",
|
||||
"KEYCLOAK_CLIENT_SECRET": "secret",
|
||||
"MAILU_DOMAIN": "example.com",
|
||||
"MAILU_DB_HOST": "localhost",
|
||||
"MAILU_DB_PORT": "5432",
|
||||
"MAILU_DB_NAME": "mailu",
|
||||
"MAILU_DB_USER": "mailu",
|
||||
"MAILU_DB_PASSWORD": "pw",
|
||||
}
|
||||
for k, v in env.items():
|
||||
monkeypatch.setenv(k, v)
|
||||
module_path = pathlib.Path(__file__).resolve().parents[1] / "mailu_sync.py"
|
||||
spec = importlib.util.spec_from_file_location("mailu_sync_testmod", module_path)
|
||||
module = importlib.util.module_from_spec(spec)
|
||||
assert spec.loader is not None
|
||||
spec.loader.exec_module(module)
|
||||
return module
|
||||
|
||||
|
||||
def test_random_password_length_and_charset(monkeypatch):
|
||||
sync = load_sync_module(monkeypatch)
|
||||
pw = sync.random_password()
|
||||
assert len(pw) == 24
|
||||
assert all(ch.isalnum() for ch in pw)
|
||||
|
||||
|
||||
class _FakeResponse:
|
||||
def __init__(self, json_data=None, status=200):
|
||||
self._json_data = json_data or {}
|
||||
self.status_code = status
|
||||
|
||||
def raise_for_status(self):
|
||||
if self.status_code >= 400:
|
||||
raise AssertionError(f"status {self.status_code}")
|
||||
|
||||
def json(self):
|
||||
return self._json_data
|
||||
|
||||
|
||||
class _FakeSession:
|
||||
def __init__(self, put_resp, get_resp):
|
||||
self.put_resp = put_resp
|
||||
self.get_resp = get_resp
|
||||
self.put_called = False
|
||||
self.get_called = False
|
||||
|
||||
def post(self, *args, **kwargs):
|
||||
return _FakeResponse({"access_token": "dummy"})
|
||||
|
||||
def put(self, *args, **kwargs):
|
||||
self.put_called = True
|
||||
return self.put_resp
|
||||
|
||||
def get(self, *args, **kwargs):
|
||||
self.get_called = True
|
||||
return self.get_resp
|
||||
|
||||
|
||||
def test_kc_update_attributes_succeeds(monkeypatch):
|
||||
sync = load_sync_module(monkeypatch)
|
||||
ok_resp = _FakeResponse({"attributes": {"mailu_app_password": ["abc"]}})
|
||||
sync.SESSION = _FakeSession(_FakeResponse({}), ok_resp)
|
||||
sync.kc_update_attributes("token", {"id": "u1", "username": "u1"}, {"mailu_app_password": "abc"})
|
||||
assert sync.SESSION.put_called and sync.SESSION.get_called
|
||||
|
||||
|
||||
def test_kc_update_attributes_raises_without_attribute(monkeypatch):
|
||||
sync = load_sync_module(monkeypatch)
|
||||
missing_attr_resp = _FakeResponse({"attributes": {}}, status=200)
|
||||
sync.SESSION = _FakeSession(_FakeResponse({}), missing_attr_resp)
|
||||
with pytest.raises(Exception):
|
||||
sync.kc_update_attributes("token", {"id": "u1", "username": "u1"}, {"mailu_app_password": "abc"})
|
||||
|
||||
|
||||
def test_kc_get_users_paginates(monkeypatch):
|
||||
sync = load_sync_module(monkeypatch)
|
||||
|
||||
class _PagedSession:
|
||||
def __init__(self):
|
||||
self.calls = 0
|
||||
|
||||
def post(self, *_, **__):
|
||||
return _FakeResponse({"access_token": "tok"})
|
||||
|
||||
def get(self, *_, **__):
|
||||
self.calls += 1
|
||||
if self.calls == 1:
|
||||
return _FakeResponse([{"id": "u1"}, {"id": "u2"}])
|
||||
return _FakeResponse([]) # stop pagination
|
||||
|
||||
sync.SESSION = _PagedSession()
|
||||
users = sync.kc_get_users("tok")
|
||||
assert [u["id"] for u in users] == ["u1", "u2"]
|
||||
assert sync.SESSION.calls == 2
|
||||
|
||||
|
||||
def test_ensure_mailu_user_skips_foreign_domain(monkeypatch):
|
||||
sync = load_sync_module(monkeypatch)
|
||||
executed = []
|
||||
|
||||
class _Cursor:
|
||||
def execute(self, sql, params):
|
||||
executed.append((sql, params))
|
||||
|
||||
sync.ensure_mailu_user(_Cursor(), "user@other.com", "pw", "User")
|
||||
assert not executed
|
||||
|
||||
|
||||
def test_ensure_mailu_user_upserts(monkeypatch):
|
||||
sync = load_sync_module(monkeypatch)
|
||||
captured = {}
|
||||
|
||||
class _Cursor:
|
||||
def execute(self, sql, params):
|
||||
captured.update(params)
|
||||
|
||||
sync.ensure_mailu_user(_Cursor(), "user@example.com", "pw", "User Example")
|
||||
assert captured["email"] == "user@example.com"
|
||||
assert captured["localpart"] == "user"
|
||||
# password should be hashed, not the raw string
|
||||
assert captured["password"] != "pw"
|
||||
|
||||
|
||||
def test_main_generates_password_and_upserts(monkeypatch):
|
||||
sync = load_sync_module(monkeypatch)
|
||||
users = [
|
||||
{"id": "u1", "username": "user1", "email": "user1@example.com", "attributes": {}},
|
||||
{"id": "u2", "username": "user2", "email": "user2@example.com", "attributes": {"mailu_app_password": ["keepme"]}},
|
||||
{"id": "u3", "username": "user3", "email": "user3@other.com", "attributes": {}},
|
||||
]
|
||||
updated = []
|
||||
|
||||
class _Cursor:
|
||||
def __init__(self):
|
||||
self.executions = []
|
||||
|
||||
def execute(self, sql, params):
|
||||
self.executions.append(params)
|
||||
|
||||
def close(self):
|
||||
return None
|
||||
|
||||
class _Conn:
|
||||
def __init__(self):
|
||||
self.autocommit = False
|
||||
self._cursor = _Cursor()
|
||||
|
||||
def cursor(self, cursor_factory=None):
|
||||
return self._cursor
|
||||
|
||||
def close(self):
|
||||
return None
|
||||
|
||||
monkeypatch.setattr(sync, "get_kc_token", lambda: "tok")
|
||||
monkeypatch.setattr(sync, "kc_get_users", lambda token: users)
|
||||
monkeypatch.setattr(sync, "kc_update_attributes", lambda token, user, attrs: updated.append((user["id"], attrs["mailu_app_password"])))
|
||||
conns = []
|
||||
|
||||
def _connect(**kwargs):
|
||||
conn = _Conn()
|
||||
conns.append(conn)
|
||||
return conn
|
||||
|
||||
monkeypatch.setattr(sync.psycopg2, "connect", _connect)
|
||||
|
||||
sync.main()
|
||||
|
||||
# Should attempt two inserts (third user skipped due to domain mismatch)
|
||||
assert len(updated) == 1 # only one missing attr was backfilled
|
||||
assert conns and len(conns[0]._cursor.executions) == 2
|
||||
@ -5,7 +5,7 @@ metadata:
|
||||
name: gitea-ingress
|
||||
namespace: gitea
|
||||
annotations:
|
||||
cert-manager.io/cluster-issuer: letsencrypt
|
||||
cert-manager.io/cluster-issuer: "letsencrypt-prod"
|
||||
nginx.ingress.kubernetes.io/ssl-redirect: "true"
|
||||
spec:
|
||||
tls:
|
||||
|
||||
@ -1,49 +0,0 @@
|
||||
# services/gitops-ui/helmrelease.yaml
|
||||
apiVersion: helm.toolkit.fluxcd.io/v2
|
||||
kind: HelmRelease
|
||||
metadata:
|
||||
name: weave-gitops
|
||||
namespace: flux-system
|
||||
spec:
|
||||
interval: 30m
|
||||
chart:
|
||||
spec:
|
||||
chart: ./charts/gitops-server
|
||||
sourceRef:
|
||||
kind: GitRepository
|
||||
name: weave-gitops-upstream
|
||||
namespace: flux-system
|
||||
# track upstream tag; see source object for version pin
|
||||
install:
|
||||
remediation:
|
||||
retries: 3
|
||||
upgrade:
|
||||
remediation:
|
||||
retries: 3
|
||||
remediateLastFailure: true
|
||||
cleanupOnFail: true
|
||||
values:
|
||||
adminUser:
|
||||
create: true
|
||||
createClusterRole: true
|
||||
createSecret: true
|
||||
username: admin
|
||||
# bcrypt hash for temporary password "G1tOps!2025" (rotate after login)
|
||||
passwordHash: "$2y$12$wDEOzR1Gc2dbvNSJ3ZXNdOBVFEjC6YASIxnZmHIbO.W1m0fie/QVi"
|
||||
ingress:
|
||||
enabled: true
|
||||
className: traefik
|
||||
annotations:
|
||||
cert-manager.io/cluster-issuer: letsencrypt-prod
|
||||
traefik.ingress.kubernetes.io/router.entrypoints: websecure
|
||||
hosts:
|
||||
- host: cd.bstein.dev
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
tls:
|
||||
- secretName: gitops-ui-tls
|
||||
hosts:
|
||||
- cd.bstein.dev
|
||||
metrics:
|
||||
enabled: true
|
||||
@ -1,7 +0,0 @@
|
||||
# services/gitops-ui/kustomization.yaml
|
||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||
kind: Kustomization
|
||||
namespace: flux-system
|
||||
resources:
|
||||
- source.yaml
|
||||
- helmrelease.yaml
|
||||
@ -1,11 +0,0 @@
|
||||
# services/gitops-ui/source.yaml
|
||||
apiVersion: source.toolkit.fluxcd.io/v1
|
||||
kind: GitRepository
|
||||
metadata:
|
||||
name: weave-gitops-upstream
|
||||
namespace: flux-system
|
||||
spec:
|
||||
interval: 1h
|
||||
url: https://github.com/weaveworks/weave-gitops.git
|
||||
ref:
|
||||
tag: v0.38.0
|
||||
@ -5,7 +5,7 @@ metadata:
|
||||
name: jitsi
|
||||
namespace: jitsi
|
||||
annotations:
|
||||
cert-manager.io/cluster-issuer: letsencrypt
|
||||
cert-manager.io/cluster-issuer: "letsencrypt-prod"
|
||||
spec:
|
||||
ingressClassName: traefik
|
||||
tls:
|
||||
|
||||
@ -1,27 +0,0 @@
|
||||
# services/keycloak
|
||||
|
||||
Keycloak is deployed via raw manifests and backed by the shared Postgres (`postgres-service.postgres.svc.cluster.local:5432`). Create these secrets before applying:
|
||||
|
||||
```bash
|
||||
# DB creds (per-service DB/user in shared Postgres)
|
||||
kubectl -n sso create secret generic keycloak-db \
|
||||
--from-literal=username=keycloak \
|
||||
--from-literal=password='<DB_PASSWORD>' \
|
||||
--from-literal=database=keycloak
|
||||
|
||||
# Admin console creds (maps to KC admin user)
|
||||
kubectl -n sso create secret generic keycloak-admin \
|
||||
--from-literal=username=brad@bstein.dev \
|
||||
--from-literal=password='<ADMIN_PASSWORD>'
|
||||
```
|
||||
|
||||
Apply:
|
||||
|
||||
```bash
|
||||
kubectl apply -k services/keycloak
|
||||
```
|
||||
|
||||
Notes
|
||||
- Service: `keycloak.sso.svc:80` (Ingress `sso.bstein.dev`, TLS via cert-manager).
|
||||
- Uses Postgres schema `public`; DB/user should be provisioned in the shared Postgres instance.
|
||||
- Health endpoints on :9000 are wired for probes.
|
||||
@ -1,154 +0,0 @@
|
||||
# services/keycloak/deployment.yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: keycloak
|
||||
namespace: sso
|
||||
labels:
|
||||
app: keycloak
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app: keycloak
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: keycloak
|
||||
spec:
|
||||
affinity:
|
||||
nodeAffinity:
|
||||
requiredDuringSchedulingIgnoredDuringExecution:
|
||||
nodeSelectorTerms:
|
||||
- matchExpressions:
|
||||
- key: hardware
|
||||
operator: In
|
||||
values: ["rpi5","rpi4"]
|
||||
- key: node-role.kubernetes.io/worker
|
||||
operator: Exists
|
||||
- matchExpressions:
|
||||
- key: kubernetes.io/hostname
|
||||
operator: In
|
||||
values: ["titan-24"]
|
||||
preferredDuringSchedulingIgnoredDuringExecution:
|
||||
- weight: 90
|
||||
preference:
|
||||
matchExpressions:
|
||||
- key: hardware
|
||||
operator: In
|
||||
values: ["rpi5"]
|
||||
- weight: 70
|
||||
preference:
|
||||
matchExpressions:
|
||||
- key: hardware
|
||||
operator: In
|
||||
values: ["rpi4"]
|
||||
securityContext:
|
||||
runAsUser: 1000
|
||||
runAsGroup: 0
|
||||
fsGroup: 1000
|
||||
fsGroupChangePolicy: OnRootMismatch
|
||||
imagePullSecrets:
|
||||
- name: zot-regcred
|
||||
initContainers:
|
||||
- name: mailu-http-listener
|
||||
image: registry.bstein.dev/sso/mailu-http-listener:0.1.0
|
||||
imagePullPolicy: IfNotPresent
|
||||
command: ["/bin/sh", "-c"]
|
||||
args:
|
||||
- |
|
||||
cp /plugin/mailu-http-listener-0.1.0.jar /providers/
|
||||
cp -r /plugin/src /providers/src
|
||||
volumeMounts:
|
||||
- name: providers
|
||||
mountPath: /providers
|
||||
containers:
|
||||
- name: keycloak
|
||||
image: quay.io/keycloak/keycloak:26.0.7
|
||||
imagePullPolicy: IfNotPresent
|
||||
args:
|
||||
- start
|
||||
env:
|
||||
- name: KC_DB
|
||||
value: postgres
|
||||
- name: KC_DB_URL_HOST
|
||||
value: postgres-service.postgres.svc.cluster.local
|
||||
- name: KC_DB_URL_DATABASE
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: keycloak-db
|
||||
key: database
|
||||
- name: KC_DB_USERNAME
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: keycloak-db
|
||||
key: username
|
||||
- name: KC_DB_PASSWORD
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: keycloak-db
|
||||
key: password
|
||||
- name: KC_DB_SCHEMA
|
||||
value: public
|
||||
- name: KC_HOSTNAME
|
||||
value: sso.bstein.dev
|
||||
- name: KC_HOSTNAME_URL
|
||||
value: https://sso.bstein.dev
|
||||
- name: KC_PROXY
|
||||
value: edge
|
||||
- name: KC_PROXY_HEADERS
|
||||
value: xforwarded
|
||||
- name: KC_HTTP_ENABLED
|
||||
value: "true"
|
||||
- name: KC_HTTP_MANAGEMENT_PORT
|
||||
value: "9000"
|
||||
- name: KC_HTTP_MANAGEMENT_BIND_ADDRESS
|
||||
value: 0.0.0.0
|
||||
- name: KC_HEALTH_ENABLED
|
||||
value: "true"
|
||||
- name: KC_METRICS_ENABLED
|
||||
value: "true"
|
||||
- name: KEYCLOAK_ADMIN
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: keycloak-admin
|
||||
key: username
|
||||
- name: KEYCLOAK_ADMIN_PASSWORD
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: keycloak-admin
|
||||
key: password
|
||||
- name: KC_EVENTS_LISTENERS
|
||||
value: jboss-logging,mailu-http
|
||||
- name: KC_SPI_EVENTS_LISTENER_MAILU-HTTP_ENDPOINT
|
||||
value: http://mailu-sync-listener.mailu-mailserver.svc.cluster.local:8080/events
|
||||
ports:
|
||||
- containerPort: 8080
|
||||
name: http
|
||||
- containerPort: 9000
|
||||
name: metrics
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health/ready
|
||||
port: 9000
|
||||
initialDelaySeconds: 15
|
||||
periodSeconds: 10
|
||||
failureThreshold: 6
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /health/live
|
||||
port: 9000
|
||||
initialDelaySeconds: 60
|
||||
periodSeconds: 15
|
||||
failureThreshold: 6
|
||||
volumeMounts:
|
||||
- name: data
|
||||
mountPath: /opt/keycloak/data
|
||||
- name: providers
|
||||
mountPath: /opt/keycloak/providers
|
||||
volumes:
|
||||
- name: data
|
||||
persistentVolumeClaim:
|
||||
claimName: keycloak-data
|
||||
- name: providers
|
||||
emptyDir: {}
|
||||
@ -1,24 +0,0 @@
|
||||
# services/keycloak/ingress.yaml
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: Ingress
|
||||
metadata:
|
||||
name: keycloak
|
||||
namespace: sso
|
||||
annotations:
|
||||
cert-manager.io/cluster-issuer: letsencrypt
|
||||
spec:
|
||||
ingressClassName: traefik
|
||||
rules:
|
||||
- host: sso.bstein.dev
|
||||
http:
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service:
|
||||
name: keycloak
|
||||
port:
|
||||
number: 80
|
||||
tls:
|
||||
- hosts: [sso.bstein.dev]
|
||||
secretName: keycloak-tls
|
||||
@ -1,10 +0,0 @@
|
||||
# services/keycloak/kustomization.yaml
|
||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||
kind: Kustomization
|
||||
namespace: sso
|
||||
resources:
|
||||
- namespace.yaml
|
||||
- pvc.yaml
|
||||
- deployment.yaml
|
||||
- service.yaml
|
||||
- ingress.yaml
|
||||
@ -1,5 +0,0 @@
|
||||
# services/keycloak/namespace.yaml
|
||||
apiVersion: v1
|
||||
kind: Namespace
|
||||
metadata:
|
||||
name: sso
|
||||
@ -1,12 +0,0 @@
|
||||
# services/keycloak/pvc.yaml
|
||||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
metadata:
|
||||
name: keycloak-data
|
||||
namespace: sso
|
||||
spec:
|
||||
accessModes: ["ReadWriteOnce"]
|
||||
resources:
|
||||
requests:
|
||||
storage: 10Gi
|
||||
storageClassName: astreae
|
||||
@ -1,15 +0,0 @@
|
||||
# services/keycloak/service.yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: keycloak
|
||||
namespace: sso
|
||||
labels:
|
||||
app: keycloak
|
||||
spec:
|
||||
selector:
|
||||
app: keycloak
|
||||
ports:
|
||||
- name: http
|
||||
port: 80
|
||||
targetPort: http
|
||||
@ -1,13 +0,0 @@
|
||||
# services/mailu/certificate.yaml
|
||||
apiVersion: cert-manager.io/v1
|
||||
kind: Certificate
|
||||
metadata:
|
||||
name: mailu-tls
|
||||
namespace: mailu-mailserver
|
||||
spec:
|
||||
secretName: mailu-certificates
|
||||
issuerRef:
|
||||
kind: ClusterIssuer
|
||||
name: letsencrypt-prod
|
||||
dnsNames:
|
||||
- mail.bstein.dev
|
||||
@ -1,287 +0,0 @@
|
||||
# services/mailu/helmrelease.yaml
|
||||
apiVersion: helm.toolkit.fluxcd.io/v2
|
||||
kind: HelmRelease
|
||||
metadata:
|
||||
name: mailu
|
||||
namespace: mailu-mailserver
|
||||
spec:
|
||||
interval: 30m
|
||||
chart:
|
||||
spec:
|
||||
chart: mailu
|
||||
version: 2.1.2
|
||||
sourceRef:
|
||||
kind: HelmRepository
|
||||
name: mailu
|
||||
namespace: flux-system
|
||||
install:
|
||||
remediation: { retries: 3 }
|
||||
timeout: 10m
|
||||
upgrade:
|
||||
remediation:
|
||||
retries: 3
|
||||
remediateLastFailure: true
|
||||
cleanupOnFail: true
|
||||
timeout: 10m
|
||||
values:
|
||||
mailuVersion: "2024.06"
|
||||
domain: bstein.dev
|
||||
hostnames: [mail.bstein.dev]
|
||||
domains:
|
||||
- name: bstein.dev
|
||||
enabled: true
|
||||
dkim:
|
||||
enabled: true
|
||||
externalRelay:
|
||||
host: "[email-smtp.us-east-2.amazonaws.com]:587"
|
||||
existingSecret: mailu-ses-relay
|
||||
usernameKey: relay-username
|
||||
passwordKey: relay-password
|
||||
timezone: Etc/UTC
|
||||
subnet: 10.42.0.0/16
|
||||
existingSecret: mailu-secret
|
||||
tls:
|
||||
outboundLevel: encrypt
|
||||
externalDatabase:
|
||||
enabled: true
|
||||
type: postgresql
|
||||
host: postgres-service.postgres.svc.cluster.local
|
||||
port: 5432
|
||||
database: mailu
|
||||
username: mailu
|
||||
existingSecret: mailu-db-secret
|
||||
existingSecretUsernameKey: username
|
||||
existingSecretPasswordKey: password
|
||||
existingSecretDatabaseKey: database
|
||||
initialAccount:
|
||||
enabled: true
|
||||
username: test
|
||||
domain: bstein.dev
|
||||
existingSecret: mailu-initial-account-secret
|
||||
existingSecretPasswordKey: password
|
||||
persistence:
|
||||
accessModes: [ReadWriteMany]
|
||||
size: 100Gi
|
||||
storageClass: astreae
|
||||
single_pvc: true
|
||||
front:
|
||||
hostnames: [mail.bstein.dev]
|
||||
proxied: true
|
||||
hostPort:
|
||||
enabled: false
|
||||
https:
|
||||
enabled: false
|
||||
external: false
|
||||
forceHttps: false
|
||||
externalService:
|
||||
enabled: true
|
||||
type: LoadBalancer
|
||||
externalTrafficPolicy: Cluster
|
||||
ports:
|
||||
submission: true
|
||||
nodePorts:
|
||||
pop3: 30010
|
||||
pop3s: 30011
|
||||
imap: 30143
|
||||
imaps: 30993
|
||||
manageSieve: 30419
|
||||
smtp: 30025
|
||||
smtps: 30465
|
||||
submission: 30587
|
||||
logLevel: DEBUG
|
||||
nodeSelector:
|
||||
hardware: rpi4
|
||||
admin:
|
||||
logLevel: DEBUG
|
||||
nodeSelector:
|
||||
hardware: rpi4
|
||||
podLivenessProbe:
|
||||
enabled: true
|
||||
initialDelaySeconds: 30
|
||||
periodSeconds: 10
|
||||
timeoutSeconds: 5
|
||||
failureThreshold: 6
|
||||
successThreshold: 1
|
||||
podReadinessProbe:
|
||||
enabled: true
|
||||
initialDelaySeconds: 20
|
||||
periodSeconds: 10
|
||||
timeoutSeconds: 5
|
||||
failureThreshold: 6
|
||||
successThreshold: 1
|
||||
extraEnvVars:
|
||||
- name: FLASK_DEBUG
|
||||
value: "1"
|
||||
- name: ACCESSLOG
|
||||
value: /dev/stdout
|
||||
- name: ERRORLOG
|
||||
value: /dev/stderr
|
||||
- name: WEBROOT_REDIRECT
|
||||
value: ""
|
||||
- name: FORWARDED_ALLOW_IPS
|
||||
value: 127.0.0.1,10.42.0.0/16
|
||||
- name: DNS_RESOLVERS
|
||||
value: 1.1.1.1,9.9.9.9
|
||||
extraVolumes:
|
||||
- name: unbound-config
|
||||
configMap:
|
||||
name: mailu-unbound
|
||||
- name: unbound-run
|
||||
emptyDir: {}
|
||||
extraVolumeMounts:
|
||||
- name: unbound-run
|
||||
mountPath: /var/lib/unbound
|
||||
extraContainers:
|
||||
- name: unbound
|
||||
image: docker.io/alpine:3.20
|
||||
command: ["/bin/sh", "-c"]
|
||||
args:
|
||||
- |
|
||||
while :; do
|
||||
printf "nameserver 10.43.0.10\n" > /etc/resolv.conf
|
||||
if apk add --no-cache unbound bind-tools; then
|
||||
break
|
||||
fi
|
||||
echo "apk failed, retrying" >&2
|
||||
sleep 10
|
||||
done
|
||||
cat >/etc/resolv.conf <<'EOF'
|
||||
search mailu-mailserver.svc.cluster.local svc.cluster.local cluster.local
|
||||
nameserver 127.0.0.1
|
||||
EOF
|
||||
unbound-anchor -a /var/lib/unbound/root.key || true
|
||||
exec unbound -d -c /opt/unbound/etc/unbound/unbound.conf
|
||||
ports:
|
||||
- containerPort: 53
|
||||
protocol: UDP
|
||||
- containerPort: 53
|
||||
protocol: TCP
|
||||
volumeMounts:
|
||||
- name: unbound-config
|
||||
mountPath: /opt/unbound/etc/unbound
|
||||
- name: unbound-run
|
||||
mountPath: /var/lib/unbound
|
||||
dnsPolicy: None
|
||||
dnsConfig:
|
||||
nameservers:
|
||||
- 127.0.0.1
|
||||
searches:
|
||||
- mailu-mailserver.svc.cluster.local
|
||||
- svc.cluster.local
|
||||
- cluster.local
|
||||
clamav:
|
||||
image:
|
||||
repository: clamav/clamav-debian
|
||||
tag: "1.4"
|
||||
logLevel: DEBUG
|
||||
nodeSelector:
|
||||
hardware: rpi5
|
||||
resources:
|
||||
requests:
|
||||
cpu: 200m
|
||||
memory: 1Gi
|
||||
limits:
|
||||
cpu: 500m
|
||||
memory: 3Gi
|
||||
livenessProbe:
|
||||
enabled: false
|
||||
initialDelaySeconds: 300
|
||||
periodSeconds: 30
|
||||
timeoutSeconds: 5
|
||||
failureThreshold: 6
|
||||
successThreshold: 1
|
||||
startupProbe:
|
||||
enabled: false
|
||||
initialDelaySeconds: 60
|
||||
periodSeconds: 30
|
||||
timeoutSeconds: 5
|
||||
failureThreshold: 20
|
||||
successThreshold: 1
|
||||
readinessProbe:
|
||||
enabled: false
|
||||
initialDelaySeconds: 300
|
||||
periodSeconds: 30
|
||||
timeoutSeconds: 5
|
||||
failureThreshold: 6
|
||||
successThreshold: 1
|
||||
dovecot:
|
||||
logLevel: DEBUG
|
||||
nodeSelector:
|
||||
hardware: rpi4
|
||||
oletools:
|
||||
logLevel: DEBUG
|
||||
nodeSelector:
|
||||
hardware: rpi4
|
||||
postfix:
|
||||
logLevel: DEBUG
|
||||
nodeSelector:
|
||||
hardware: rpi4
|
||||
overrides:
|
||||
smtp_use_tls: "yes"
|
||||
smtp_tls_security_level: "encrypt"
|
||||
smtp_sasl_security_options: "noanonymous"
|
||||
redis:
|
||||
enabled: true
|
||||
architecture: standalone
|
||||
logLevel: DEBUG
|
||||
image:
|
||||
repository: bitnamilegacy/redis
|
||||
tag: 8.0.3-debian-12-r3
|
||||
master:
|
||||
nodeSelector:
|
||||
hardware: rpi4
|
||||
persistence:
|
||||
enabled: true
|
||||
accessModes: [ReadWriteMany]
|
||||
size: 8Gi
|
||||
storageClass: astreae
|
||||
rspamd:
|
||||
logLevel: DEBUG
|
||||
nodeSelector:
|
||||
hardware: rpi4
|
||||
persistence:
|
||||
accessModes: [ReadWriteOnce]
|
||||
size: 8Gi
|
||||
storageClass: astreae
|
||||
tika:
|
||||
logLevel: DEBUG
|
||||
nodeSelector:
|
||||
hardware: rpi4
|
||||
global:
|
||||
logLevel: DEBUG
|
||||
storageClass: astreae
|
||||
webmail:
|
||||
enabled: false
|
||||
nodeSelector:
|
||||
hardware: rpi4
|
||||
ingress:
|
||||
enabled: false
|
||||
ingressClassName: traefik
|
||||
tls: true
|
||||
existingSecret: mailu-certificates
|
||||
annotations:
|
||||
traefik.ingress.kubernetes.io/router.entrypoints: websecure
|
||||
traefik.ingress.kubernetes.io/service.serversscheme: https
|
||||
traefik.ingress.kubernetes.io/service.serverstransport: mailu-transport@kubernetescrd
|
||||
extraRules:
|
||||
- host: mail.bstein.dev
|
||||
http:
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service:
|
||||
name: mailu-front
|
||||
port:
|
||||
number: 443
|
||||
service:
|
||||
ports:
|
||||
smtp:
|
||||
port: 25
|
||||
targetPort: 25
|
||||
smtps:
|
||||
port: 465
|
||||
targetPort: 465
|
||||
submission:
|
||||
port: 587
|
||||
targetPort: 587
|
||||
@ -1,19 +0,0 @@
|
||||
# services/mailu/ingressroute.yaml
|
||||
apiVersion: traefik.io/v1alpha1
|
||||
kind: IngressRoute
|
||||
metadata:
|
||||
name: mailu
|
||||
namespace: mailu-mailserver
|
||||
spec:
|
||||
entryPoints:
|
||||
- websecure
|
||||
routes:
|
||||
- match: Host(`mail.bstein.dev`)
|
||||
kind: Rule
|
||||
services:
|
||||
- name: mailu-front
|
||||
port: 443
|
||||
scheme: https
|
||||
serversTransport: mailu-transport
|
||||
tls:
|
||||
secretName: mailu-certificates
|
||||
@ -1,23 +0,0 @@
|
||||
# services/mailu/kustomization.yaml
|
||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||
kind: Kustomization
|
||||
namespace: mailu-mailserver
|
||||
resources:
|
||||
- namespace.yaml
|
||||
- helmrelease.yaml
|
||||
- certificate.yaml
|
||||
- vip-controller.yaml
|
||||
- unbound-configmap.yaml
|
||||
- serverstransport.yaml
|
||||
- ingressroute.yaml
|
||||
- mailu-sync-job.yaml
|
||||
- mailu-sync-cronjob.yaml
|
||||
- mailu-sync-listener.yaml
|
||||
|
||||
configMapGenerator:
|
||||
- name: mailu-sync-script
|
||||
namespace: mailu-mailserver
|
||||
files:
|
||||
- sync.py=../../scripts/mailu_sync.py
|
||||
options:
|
||||
disableNameSuffixHash: true
|
||||
@ -1,77 +0,0 @@
|
||||
# services/mailu/mailu-sync-cronjob.yaml
|
||||
apiVersion: batch/v1
|
||||
kind: CronJob
|
||||
metadata:
|
||||
name: mailu-sync-nightly
|
||||
namespace: mailu-mailserver
|
||||
spec:
|
||||
schedule: "30 4 * * *"
|
||||
concurrencyPolicy: Forbid
|
||||
jobTemplate:
|
||||
spec:
|
||||
template:
|
||||
spec:
|
||||
restartPolicy: OnFailure
|
||||
containers:
|
||||
- name: mailu-sync
|
||||
image: python:3.11-alpine
|
||||
imagePullPolicy: IfNotPresent
|
||||
command: ["/bin/sh", "-c"]
|
||||
args:
|
||||
- |
|
||||
pip install --no-cache-dir requests psycopg2-binary passlib >/tmp/pip.log \
|
||||
&& python /app/sync.py
|
||||
env:
|
||||
- name: KEYCLOAK_BASE_URL
|
||||
value: http://keycloak.sso.svc.cluster.local
|
||||
- name: KEYCLOAK_REALM
|
||||
value: atlas
|
||||
- name: MAILU_DOMAIN
|
||||
value: bstein.dev
|
||||
- name: MAILU_DEFAULT_QUOTA
|
||||
value: "20000000000"
|
||||
- name: MAILU_DB_HOST
|
||||
value: postgres-service.postgres.svc.cluster.local
|
||||
- name: MAILU_DB_PORT
|
||||
value: "5432"
|
||||
- name: MAILU_DB_NAME
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: mailu-db-secret
|
||||
key: database
|
||||
- name: MAILU_DB_USER
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: mailu-db-secret
|
||||
key: username
|
||||
- name: MAILU_DB_PASSWORD
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: mailu-db-secret
|
||||
key: password
|
||||
- name: KEYCLOAK_CLIENT_ID
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: mailu-sync-credentials
|
||||
key: client-id
|
||||
- name: KEYCLOAK_CLIENT_SECRET
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: mailu-sync-credentials
|
||||
key: client-secret
|
||||
volumeMounts:
|
||||
- name: sync-script
|
||||
mountPath: /app/sync.py
|
||||
subPath: sync.py
|
||||
resources:
|
||||
requests:
|
||||
cpu: 50m
|
||||
memory: 128Mi
|
||||
limits:
|
||||
cpu: 200m
|
||||
memory: 256Mi
|
||||
volumes:
|
||||
- name: sync-script
|
||||
configMap:
|
||||
name: mailu-sync-script
|
||||
defaultMode: 0444
|
||||
@ -1,73 +0,0 @@
|
||||
# services/mailu/mailu-sync-job.yaml
|
||||
apiVersion: batch/v1
|
||||
kind: Job
|
||||
metadata:
|
||||
name: mailu-sync
|
||||
namespace: mailu-mailserver
|
||||
spec:
|
||||
template:
|
||||
spec:
|
||||
restartPolicy: OnFailure
|
||||
containers:
|
||||
- name: mailu-sync
|
||||
image: python:3.11-alpine
|
||||
imagePullPolicy: IfNotPresent
|
||||
command: ["/bin/sh", "-c"]
|
||||
args:
|
||||
- |
|
||||
pip install --no-cache-dir requests psycopg2-binary passlib >/tmp/pip.log \
|
||||
&& python /app/sync.py
|
||||
env:
|
||||
- name: KEYCLOAK_BASE_URL
|
||||
value: http://keycloak.sso.svc.cluster.local
|
||||
- name: KEYCLOAK_REALM
|
||||
value: atlas
|
||||
- name: MAILU_DOMAIN
|
||||
value: bstein.dev
|
||||
- name: MAILU_DEFAULT_QUOTA
|
||||
value: "20000000000"
|
||||
- name: MAILU_DB_HOST
|
||||
value: postgres-service.postgres.svc.cluster.local
|
||||
- name: MAILU_DB_PORT
|
||||
value: "5432"
|
||||
- name: MAILU_DB_NAME
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: mailu-db-secret
|
||||
key: database
|
||||
- name: MAILU_DB_USER
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: mailu-db-secret
|
||||
key: username
|
||||
- name: MAILU_DB_PASSWORD
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: mailu-db-secret
|
||||
key: password
|
||||
- name: KEYCLOAK_CLIENT_ID
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: mailu-sync-credentials
|
||||
key: client-id
|
||||
- name: KEYCLOAK_CLIENT_SECRET
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: mailu-sync-credentials
|
||||
key: client-secret
|
||||
volumeMounts:
|
||||
- name: sync-script
|
||||
mountPath: /app/sync.py
|
||||
subPath: sync.py
|
||||
resources:
|
||||
requests:
|
||||
cpu: 50m
|
||||
memory: 128Mi
|
||||
limits:
|
||||
cpu: 200m
|
||||
memory: 256Mi
|
||||
volumes:
|
||||
- name: sync-script
|
||||
configMap:
|
||||
name: mailu-sync-script
|
||||
defaultMode: 0444
|
||||
@ -1,154 +0,0 @@
|
||||
# services/mailu/mailu-sync-listener.yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: mailu-sync-listener
|
||||
namespace: mailu-mailserver
|
||||
spec:
|
||||
selector:
|
||||
app: mailu-sync-listener
|
||||
ports:
|
||||
- name: http
|
||||
port: 8080
|
||||
targetPort: 8080
|
||||
---
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: mailu-sync-listener
|
||||
namespace: mailu-mailserver
|
||||
labels:
|
||||
app: mailu-sync-listener
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app: mailu-sync-listener
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: mailu-sync-listener
|
||||
spec:
|
||||
restartPolicy: Always
|
||||
containers:
|
||||
- name: listener
|
||||
image: python:3.11-alpine
|
||||
imagePullPolicy: IfNotPresent
|
||||
command: ["/bin/sh", "-c"]
|
||||
args:
|
||||
- |
|
||||
pip install --no-cache-dir requests psycopg2-binary passlib >/tmp/pip.log \
|
||||
&& python /app/listener.py
|
||||
env:
|
||||
- name: KEYCLOAK_BASE_URL
|
||||
value: http://keycloak.sso.svc.cluster.local
|
||||
- name: KEYCLOAK_REALM
|
||||
value: atlas
|
||||
- name: MAILU_DOMAIN
|
||||
value: bstein.dev
|
||||
- name: MAILU_DEFAULT_QUOTA
|
||||
value: "20000000000"
|
||||
- name: MAILU_DB_HOST
|
||||
value: postgres-service.postgres.svc.cluster.local
|
||||
- name: MAILU_DB_PORT
|
||||
value: "5432"
|
||||
- name: MAILU_DB_NAME
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: mailu-db-secret
|
||||
key: database
|
||||
- name: MAILU_DB_USER
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: mailu-db-secret
|
||||
key: username
|
||||
- name: MAILU_DB_PASSWORD
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: mailu-db-secret
|
||||
key: password
|
||||
- name: KEYCLOAK_CLIENT_ID
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: mailu-sync-credentials
|
||||
key: client-id
|
||||
- name: KEYCLOAK_CLIENT_SECRET
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: mailu-sync-credentials
|
||||
key: client-secret
|
||||
volumeMounts:
|
||||
- name: sync-script
|
||||
mountPath: /app/sync.py
|
||||
subPath: sync.py
|
||||
- name: listener-script
|
||||
mountPath: /app/listener.py
|
||||
subPath: listener.py
|
||||
resources:
|
||||
requests:
|
||||
cpu: 50m
|
||||
memory: 128Mi
|
||||
limits:
|
||||
cpu: 200m
|
||||
memory: 256Mi
|
||||
volumes:
|
||||
- name: sync-script
|
||||
configMap:
|
||||
name: mailu-sync-script
|
||||
defaultMode: 0444
|
||||
- name: listener-script
|
||||
configMap:
|
||||
name: mailu-sync-listener
|
||||
defaultMode: 0444
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: mailu-sync-listener
|
||||
namespace: mailu-mailserver
|
||||
data:
|
||||
listener.py: |
|
||||
import http.server
|
||||
import json
|
||||
import os
|
||||
import subprocess
|
||||
import threading
|
||||
|
||||
from time import time
|
||||
|
||||
# Simple debounce to avoid hammering on bursts
|
||||
MIN_INTERVAL_SECONDS = 10
|
||||
last_run = 0.0
|
||||
lock = threading.Lock()
|
||||
|
||||
def trigger_sync():
|
||||
global last_run
|
||||
with lock:
|
||||
now = time()
|
||||
if now - last_run < MIN_INTERVAL_SECONDS:
|
||||
return
|
||||
last_run = now
|
||||
# Fire and forget; output to stdout
|
||||
subprocess.Popen(["python", "/app/sync.py"], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
|
||||
|
||||
class Handler(http.server.BaseHTTPRequestHandler):
|
||||
def do_POST(self):
|
||||
length = int(self.headers.get("Content-Length", 0))
|
||||
body = self.rfile.read(length) if length else b""
|
||||
try:
|
||||
json.loads(body or b"{}")
|
||||
except json.JSONDecodeError:
|
||||
self.send_response(400)
|
||||
self.end_headers()
|
||||
return
|
||||
trigger_sync()
|
||||
self.send_response(202)
|
||||
self.end_headers()
|
||||
|
||||
def log_message(self, fmt, *args):
|
||||
# Quiet logging
|
||||
return
|
||||
|
||||
if __name__ == "__main__":
|
||||
server = http.server.ThreadingHTTPServer(("", 8080), Handler)
|
||||
server.serve_forever()
|
||||
@ -1,5 +0,0 @@
|
||||
# services/mailu/namespace.yaml
|
||||
apiVersion: v1
|
||||
kind: Namespace
|
||||
metadata:
|
||||
name: mailu-mailserver
|
||||
@ -1,10 +0,0 @@
|
||||
# services/mailu/serverstransport.yaml
|
||||
apiVersion: traefik.io/v1alpha1
|
||||
kind: ServersTransport
|
||||
metadata:
|
||||
name: mailu-transport
|
||||
namespace: mailu-mailserver
|
||||
spec:
|
||||
# Force SNI to mail.bstein.dev and skip backend cert verification (backend cert is for the host, not the pod IP).
|
||||
serverName: mail.bstein.dev
|
||||
insecureSkipVerify: true
|
||||
@ -1,49 +0,0 @@
|
||||
# services/mailu/unbound-configmap.yaml
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: mailu-unbound
|
||||
namespace: mailu-mailserver
|
||||
data:
|
||||
unbound.conf: |
|
||||
server:
|
||||
verbosity: 1
|
||||
interface: 0.0.0.0
|
||||
do-ip4: yes
|
||||
do-ip6: no
|
||||
do-udp: yes
|
||||
do-tcp: yes
|
||||
auto-trust-anchor-file: "/var/lib/unbound/root.key"
|
||||
prefetch: yes
|
||||
qname-minimisation: yes
|
||||
harden-dnssec-stripped: yes
|
||||
val-clean-additional: yes
|
||||
domain-insecure: "mailu-mailserver.svc.cluster.local."
|
||||
domain-insecure: "svc.cluster.local."
|
||||
domain-insecure: "cluster.local."
|
||||
cache-min-ttl: 120
|
||||
cache-max-ttl: 86400
|
||||
access-control: 0.0.0.0/0 allow
|
||||
|
||||
forward-zone:
|
||||
name: "mailu-mailserver.svc.cluster.local."
|
||||
forward-addr: 10.43.0.10
|
||||
forward-no-cache: yes
|
||||
forward-first: yes
|
||||
|
||||
forward-zone:
|
||||
name: "svc.cluster.local."
|
||||
forward-addr: 10.43.0.10
|
||||
forward-no-cache: yes
|
||||
forward-first: yes
|
||||
|
||||
forward-zone:
|
||||
name: "cluster.local."
|
||||
forward-addr: 10.43.0.10
|
||||
forward-no-cache: yes
|
||||
forward-first: yes
|
||||
|
||||
forward-zone:
|
||||
name: "."
|
||||
forward-addr: 9.9.9.9
|
||||
forward-addr: 1.1.1.1
|
||||
@ -1,71 +0,0 @@
|
||||
# services/mailu/vip-controller.yaml
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: ServiceAccount
|
||||
metadata:
|
||||
name: vip-controller
|
||||
namespace: mailu-mailserver
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: Role
|
||||
metadata:
|
||||
name: vip-controller-role
|
||||
namespace: mailu-mailserver
|
||||
rules:
|
||||
- apiGroups: ["apps"]
|
||||
resources: ["deployments"]
|
||||
verbs: ["get", "list", "patch", "update"]
|
||||
---
|
||||
apiVersion: rbac.authorization.k8s.io/v1
|
||||
kind: RoleBinding
|
||||
metadata:
|
||||
name: vip-controller-binding
|
||||
namespace: mailu-mailserver
|
||||
roleRef:
|
||||
apiGroup: rbac.authorization.k8s.io
|
||||
kind: Role
|
||||
name: vip-controller-role
|
||||
subjects:
|
||||
- kind: ServiceAccount
|
||||
name: vip-controller
|
||||
namespace: mailu-mailserver
|
||||
---
|
||||
apiVersion: apps/v1
|
||||
kind: DaemonSet
|
||||
metadata:
|
||||
name: vip-controller
|
||||
namespace: mailu-mailserver
|
||||
spec:
|
||||
selector:
|
||||
matchLabels:
|
||||
app: vip-controller
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: vip-controller
|
||||
spec:
|
||||
serviceAccountName: vip-controller
|
||||
hostNetwork: true
|
||||
nodeSelector:
|
||||
mailu.bstein.dev/vip: "true"
|
||||
containers:
|
||||
- name: vip-controller
|
||||
image: lachlanevenson/k8s-kubectl:latest
|
||||
imagePullPolicy: IfNotPresent
|
||||
command:
|
||||
- /bin/sh
|
||||
- -c
|
||||
args:
|
||||
- |
|
||||
set -e
|
||||
while true; do
|
||||
if ip addr show end0 | grep -q 'inet 192\.168\.22\.9/32'; then
|
||||
NODE=$(hostname)
|
||||
echo "VIP found on node ${NODE}."
|
||||
kubectl patch deployment mailu-front -n mailu-mailserver --type='merge' \
|
||||
-p "{\"spec\":{\"template\":{\"spec\":{\"nodeSelector\":{\"kubernetes.io/hostname\":\"${NODE}\"}}}}}"
|
||||
else
|
||||
echo "No VIP on node ${HOSTNAME}."
|
||||
fi
|
||||
sleep 60
|
||||
done
|
||||
@ -1,186 +0,0 @@
|
||||
{
|
||||
"uid": "atlas-gpu",
|
||||
"title": "Atlas GPU",
|
||||
"folderUid": "atlas-internal",
|
||||
"editable": true,
|
||||
"panels": [
|
||||
{
|
||||
"id": 1,
|
||||
"type": "piechart",
|
||||
"title": "Namespace GPU Share",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "100 * ( ( (sum by (namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[$__range]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ) / clamp_min(sum( ( (sum by (namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[$__range]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ), 1)",
|
||||
"refId": "A",
|
||||
"legendFormat": "{{namespace}}"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percent",
|
||||
"color": {
|
||||
"mode": "palette-classic"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "list",
|
||||
"placement": "right"
|
||||
},
|
||||
"pieType": "pie",
|
||||
"displayLabels": [],
|
||||
"tooltip": {
|
||||
"mode": "single"
|
||||
},
|
||||
"colorScheme": "interpolateSpectral",
|
||||
"colorBy": "value",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"type": "timeseries",
|
||||
"title": "GPU Util by Namespace",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}) by (namespace)",
|
||||
"refId": "A",
|
||||
"legendFormat": "{{namespace}}"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percent"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "right"
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
"type": "timeseries",
|
||||
"title": "GPU Util by Node",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 8
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum by (Hostname) (DCGM_FI_DEV_GPU_UTIL{pod!=\"\"})",
|
||||
"refId": "A",
|
||||
"legendFormat": "{{Hostname}}"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percent"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "right"
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 4,
|
||||
"type": "table",
|
||||
"title": "Top Pods by GPU Util",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 8
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "topk(10, sum(DCGM_FI_DEV_GPU_UTIL{pod!=\"\"}) by (namespace,pod,Hostname))",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percent",
|
||||
"custom": {
|
||||
"filterable": true
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"showHeader": true,
|
||||
"columnFilters": false
|
||||
},
|
||||
"transformations": [
|
||||
{
|
||||
"id": "labelsToFields",
|
||||
"options": {}
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"time": {
|
||||
"from": "now-12h",
|
||||
"to": "now"
|
||||
},
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"schemaVersion": 39,
|
||||
"style": "dark",
|
||||
"tags": [
|
||||
"atlas",
|
||||
"gpu"
|
||||
]
|
||||
}
|
||||
@ -1,668 +0,0 @@
|
||||
{
|
||||
"uid": "atlas-network",
|
||||
"title": "Atlas Network",
|
||||
"folderUid": "atlas-internal",
|
||||
"editable": true,
|
||||
"panels": [
|
||||
{
|
||||
"id": 1,
|
||||
"type": "stat",
|
||||
"title": "Ingress Success Rate (5m)",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 6,
|
||||
"x": 0,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(sum(rate(traefik_entrypoint_requests_total{code!~\"5..\"}[5m]))) / clamp_min(sum(rate(traefik_entrypoint_requests_total[5m])), 1)",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "red",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "orange",
|
||||
"value": 0.995
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 0.999
|
||||
},
|
||||
{
|
||||
"color": "green",
|
||||
"value": 0.9995
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "percentunit",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
},
|
||||
"decimals": 2
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"type": "stat",
|
||||
"title": "Error Budget Burn (1h)",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 6,
|
||||
"x": 6,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(1 - ((sum(rate(traefik_entrypoint_requests_total{code!~\"5..\"}[1h]))) / clamp_min(sum(rate(traefik_entrypoint_requests_total[1h])), 1))) / 0.0010000000000000009",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"color": "orange",
|
||||
"value": 2
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 4
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "none",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
},
|
||||
"decimals": 2
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
"type": "stat",
|
||||
"title": "Error Budget Burn (6h)",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 6,
|
||||
"x": 12,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(1 - ((sum(rate(traefik_entrypoint_requests_total{code!~\"5..\"}[6h]))) / clamp_min(sum(rate(traefik_entrypoint_requests_total[6h])), 1))) / 0.0010000000000000009",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"color": "orange",
|
||||
"value": 2
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 4
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "none",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
},
|
||||
"decimals": 2
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 4,
|
||||
"type": "stat",
|
||||
"title": "Edge P99 Latency (ms)",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 6,
|
||||
"x": 18,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "histogram_quantile(0.99, sum by (le) (rate(traefik_entrypoint_request_duration_seconds_bucket[5m]))) * 1000",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 200
|
||||
},
|
||||
{
|
||||
"color": "orange",
|
||||
"value": 350
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 500
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "ms",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
},
|
||||
"decimals": 1
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 5,
|
||||
"type": "stat",
|
||||
"title": "Ingress Traffic",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 0,
|
||||
"y": 4
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(node_network_receive_bytes_total{device!~\"lo|cni.*|veth.*|flannel.*|docker.*|virbr.*|vxlan.*|wg.*\"}[5m])) or on() vector(0)",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "rgba(115, 115, 115, 1)",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "green",
|
||||
"value": 1
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "Bps",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 6,
|
||||
"type": "stat",
|
||||
"title": "Egress Traffic",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 8,
|
||||
"y": 4
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(node_network_transmit_bytes_total{device!~\"lo|cni.*|veth.*|flannel.*|docker.*|virbr.*|vxlan.*|wg.*\"}[5m])) or on() vector(0)",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "rgba(115, 115, 115, 1)",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "green",
|
||||
"value": 1
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "Bps",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 7,
|
||||
"type": "stat",
|
||||
"title": "Intra-Cluster Traffic",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 16,
|
||||
"y": 4
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(container_network_receive_bytes_total{namespace!=\"traefik\",pod!=\"\"}[5m]) + rate(container_network_transmit_bytes_total{namespace!=\"traefik\",pod!=\"\"}[5m])) or on() vector(0)",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "rgba(115, 115, 115, 1)",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "green",
|
||||
"value": 1
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "Bps",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 8,
|
||||
"type": "timeseries",
|
||||
"title": "Per-Node Throughput",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 8
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "avg by (node) ((sum(rate(node_network_transmit_bytes_total{device!~\"lo|cni.*|veth.*|flannel.*|docker.*|virbr.*|vxlan.*|wg.*\"}[5m])) or on() vector(0) + sum(rate(node_network_receive_bytes_total{device!~\"lo|cni.*|veth.*|flannel.*|docker.*|virbr.*|vxlan.*|wg.*\"}[5m])) or on() vector(0)) * on(instance) group_left(node) label_replace(node_uname_info{nodename!=\"\"}, \"node\", \"$1\", \"nodename\", \"(.*)\"))",
|
||||
"refId": "A",
|
||||
"legendFormat": "{{node}}"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "Bps"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "right"
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 9,
|
||||
"type": "table",
|
||||
"title": "Top Namespaces",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 9,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 16
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "topk(10, sum(rate(container_network_transmit_bytes_total{namespace!=\"\"}[5m]) + rate(container_network_receive_bytes_total{namespace!=\"\"}[5m])) by (namespace))",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "Bps",
|
||||
"custom": {
|
||||
"filterable": true
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"showHeader": true,
|
||||
"columnFilters": false
|
||||
},
|
||||
"transformations": [
|
||||
{
|
||||
"id": "labelsToFields",
|
||||
"options": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 10,
|
||||
"type": "table",
|
||||
"title": "Top Pods",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 9,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 16
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "topk(10, sum(rate(container_network_transmit_bytes_total{pod!=\"\"}[5m]) + rate(container_network_receive_bytes_total{pod!=\"\"}[5m])) by (namespace,pod))",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "Bps",
|
||||
"custom": {
|
||||
"filterable": true
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"showHeader": true,
|
||||
"columnFilters": false
|
||||
},
|
||||
"transformations": [
|
||||
{
|
||||
"id": "labelsToFields",
|
||||
"options": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 11,
|
||||
"type": "timeseries",
|
||||
"title": "Traefik Routers (req/s)",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 9,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 25
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "topk(10, sum by (router) (rate(traefik_router_requests_total[5m])))",
|
||||
"refId": "A",
|
||||
"legendFormat": "{{router}}"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "req/s"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "right"
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 12,
|
||||
"type": "timeseries",
|
||||
"title": "Traefik Entrypoints (req/s)",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 9,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 25
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum by (entrypoint) (rate(traefik_entrypoint_requests_total[5m]))",
|
||||
"refId": "A",
|
||||
"legendFormat": "{{entrypoint}}"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "req/s"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "right"
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
}
|
||||
}
|
||||
}
|
||||
],
|
||||
"time": {
|
||||
"from": "now-12h",
|
||||
"to": "now"
|
||||
},
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"schemaVersion": 39,
|
||||
"style": "dark",
|
||||
"tags": [
|
||||
"atlas",
|
||||
"network"
|
||||
]
|
||||
}
|
||||
@ -1,602 +0,0 @@
|
||||
{
|
||||
"uid": "atlas-nodes",
|
||||
"title": "Atlas Nodes",
|
||||
"folderUid": "atlas-internal",
|
||||
"editable": true,
|
||||
"panels": [
|
||||
{
|
||||
"id": 1,
|
||||
"type": "stat",
|
||||
"title": "Worker Nodes Ready",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 0,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(kube_node_status_condition{condition=\"Ready\",status=\"true\",node=~\"titan-04|titan-05|titan-06|titan-07|titan-08|titan-09|titan-10|titan-11|titan-12|titan-13|titan-14|titan-15|titan-16|titan-17|titan-18|titan-19|titan-22|titan-24\"})",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "rgba(115, 115, 115, 1)",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "green",
|
||||
"value": 1
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "none",
|
||||
"custom": {
|
||||
"displayMode": "auto",
|
||||
"valueSuffix": "/18"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"type": "stat",
|
||||
"title": "Control Plane Ready",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 8,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(kube_node_status_condition{condition=\"Ready\",status=\"true\",node=~\"titan-0a|titan-0b|titan-0c\"})",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "rgba(115, 115, 115, 1)",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "green",
|
||||
"value": 1
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "none",
|
||||
"custom": {
|
||||
"displayMode": "auto",
|
||||
"valueSuffix": "/3"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
"type": "stat",
|
||||
"title": "Control Plane Workloads",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 16,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(kube_pod_info{node=~\"titan-0a|titan-0b|titan-0c\",namespace!~\"kube-system|kube-public|kube-node-lease|longhorn-system|monitoring|flux-system\"})",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "rgba(115, 115, 115, 1)",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "green",
|
||||
"value": 1
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "none",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 9,
|
||||
"type": "stat",
|
||||
"title": "API Server 5xx rate",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 0,
|
||||
"y": 4
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(apiserver_request_total{code=~\"5..\"}[5m]))",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 0.05
|
||||
},
|
||||
{
|
||||
"color": "orange",
|
||||
"value": 0.2
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 0.5
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "req/s",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
},
|
||||
"decimals": 3
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 10,
|
||||
"type": "stat",
|
||||
"title": "API Server P99 latency",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 8,
|
||||
"y": 4
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "histogram_quantile(0.99, sum by (le) (rate(apiserver_request_duration_seconds_bucket[5m]))) * 1000",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 250
|
||||
},
|
||||
{
|
||||
"color": "orange",
|
||||
"value": 400
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 600
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "ms",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
},
|
||||
"decimals": 1
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 11,
|
||||
"type": "stat",
|
||||
"title": "etcd P99 latency",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 16,
|
||||
"y": 4
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "histogram_quantile(0.99, sum by (le) (rate(etcd_request_duration_seconds_bucket[5m]))) * 1000",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 50
|
||||
},
|
||||
{
|
||||
"color": "orange",
|
||||
"value": 100
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 200
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "ms",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
},
|
||||
"decimals": 1
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 4,
|
||||
"type": "timeseries",
|
||||
"title": "Node CPU",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 9,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 8
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "avg by (node) (((1 - avg by (instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m]))) * 100) * on(instance) group_left(node) label_replace(node_uname_info{nodename!=\"\"}, \"node\", \"$1\", \"nodename\", \"(.*)\"))",
|
||||
"refId": "A",
|
||||
"legendFormat": "{{node}}"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percent"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "right",
|
||||
"calcs": [
|
||||
"last"
|
||||
]
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 5,
|
||||
"type": "timeseries",
|
||||
"title": "Node RAM",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 9,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 17
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "avg by (node) ((avg by (instance) ((node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100)) * on(instance) group_left(node) label_replace(node_uname_info{nodename!=\"\"}, \"node\", \"$1\", \"nodename\", \"(.*)\"))",
|
||||
"refId": "A",
|
||||
"legendFormat": "{{node}}"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percent"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "right",
|
||||
"calcs": [
|
||||
"last"
|
||||
]
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 6,
|
||||
"type": "timeseries",
|
||||
"title": "Control Plane (incl. titan-db) CPU",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 9,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 26
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(avg by (node) (((1 - avg by (instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m]))) * 100) * on(instance) group_left(node) label_replace(node_uname_info{nodename!=\"\"}, \"node\", \"$1\", \"nodename\", \"(.*)\"))) * on(node) group_left() label_replace(node_uname_info{nodename=~\"titan-0a|titan-0b|titan-0c|titan-db\"}, \"node\", \"$1\", \"nodename\", \"(.*)\")",
|
||||
"refId": "A",
|
||||
"legendFormat": "{{node}}"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percent"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "right"
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 7,
|
||||
"type": "timeseries",
|
||||
"title": "Control Plane (incl. titan-db) RAM",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 9,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 26
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(avg by (node) ((avg by (instance) ((node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100)) * on(instance) group_left(node) label_replace(node_uname_info{nodename!=\"\"}, \"node\", \"$1\", \"nodename\", \"(.*)\"))) * on(node) group_left() label_replace(node_uname_info{nodename=~\"titan-0a|titan-0b|titan-0c|titan-db\"}, \"node\", \"$1\", \"nodename\", \"(.*)\")",
|
||||
"refId": "A",
|
||||
"legendFormat": "{{node}}"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percent"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "right"
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 8,
|
||||
"type": "timeseries",
|
||||
"title": "Root Filesystem Usage",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 9,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 35
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "avg by (node) ((avg by (instance) ((1 - (node_filesystem_avail_bytes{mountpoint=\"/\",fstype!~\"tmpfs|overlay\"} / node_filesystem_size_bytes{mountpoint=\"/\",fstype!~\"tmpfs|overlay\"})) * 100)) * on(instance) group_left(node) label_replace(node_uname_info{nodename!=\"\"}, \"node\", \"$1\", \"nodename\", \"(.*)\"))",
|
||||
"refId": "A",
|
||||
"legendFormat": "{{node}}"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percent"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "right"
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
}
|
||||
},
|
||||
"timeFrom": "30d"
|
||||
}
|
||||
],
|
||||
"time": {
|
||||
"from": "now-12h",
|
||||
"to": "now"
|
||||
},
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"schemaVersion": 39,
|
||||
"style": "dark",
|
||||
"tags": [
|
||||
"atlas",
|
||||
"nodes"
|
||||
]
|
||||
}
|
||||
File diff suppressed because it is too large
Load Diff
@ -1,616 +0,0 @@
|
||||
{
|
||||
"uid": "atlas-pods",
|
||||
"title": "Atlas Pods",
|
||||
"folderUid": "atlas-internal",
|
||||
"editable": true,
|
||||
"panels": [
|
||||
{
|
||||
"id": 1,
|
||||
"type": "stat",
|
||||
"title": "Problem Pods",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 6,
|
||||
"x": 0,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(max by (namespace,pod) (kube_pod_status_phase{phase!~\"Running|Succeeded\"})) or on() vector(0)",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 1
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "none",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"type": "stat",
|
||||
"title": "CrashLoop / ImagePull",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 6,
|
||||
"x": 6,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(max by (namespace,pod) (kube_pod_container_status_waiting_reason{reason=~\"CrashLoopBackOff|ImagePullBackOff\"})) or on() vector(0)",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 1
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "none",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
"type": "stat",
|
||||
"title": "Stuck Terminating (>10m)",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 6,
|
||||
"x": 12,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(max by (namespace,pod) (((time() - kube_pod_deletion_timestamp{pod!=\"\"}) > bool 600) and on(namespace,pod) (kube_pod_deletion_timestamp{pod!=\"\"} > bool 0))) or on() vector(0)",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 1
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "none",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 4,
|
||||
"type": "stat",
|
||||
"title": "Control Plane Workloads",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 6,
|
||||
"x": 18,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(kube_pod_info{node=~\"titan-0a|titan-0b|titan-0c\",namespace!~\"kube-system|kube-public|kube-node-lease|longhorn-system|monitoring|flux-system\"})",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 1
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "none",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 5,
|
||||
"type": "table",
|
||||
"title": "Pods Not Running",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 10,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 4
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(time() - kube_pod_created{pod!=\"\"}) * on(namespace,pod) group_left(node) kube_pod_info * on(namespace,pod) group_left(phase) max by (namespace,pod,phase) (kube_pod_status_phase{phase!~\"Running|Succeeded\"})",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "s",
|
||||
"custom": {
|
||||
"filterable": true
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"showHeader": true,
|
||||
"columnFilters": false
|
||||
},
|
||||
"transformations": [
|
||||
{
|
||||
"id": "labelsToFields",
|
||||
"options": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 6,
|
||||
"type": "table",
|
||||
"title": "CrashLoop / ImagePull",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 10,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 14
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(time() - kube_pod_created{pod!=\"\"}) * on(namespace,pod) group_left(node) kube_pod_info * on(namespace,pod,container) group_left(reason) max by (namespace,pod,container,reason) (kube_pod_container_status_waiting_reason{reason=~\"CrashLoopBackOff|ImagePullBackOff\"})",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "s",
|
||||
"custom": {
|
||||
"filterable": true
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"showHeader": true,
|
||||
"columnFilters": false
|
||||
},
|
||||
"transformations": [
|
||||
{
|
||||
"id": "labelsToFields",
|
||||
"options": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 7,
|
||||
"type": "table",
|
||||
"title": "Terminating >10m",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 10,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 24
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(((time() - kube_pod_deletion_timestamp{pod!=\"\"}) and on(namespace,pod) (kube_pod_deletion_timestamp{pod!=\"\"} > bool 0)) * on(namespace,pod) group_left(node) kube_pod_info)",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "s",
|
||||
"custom": {
|
||||
"filterable": true
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"showHeader": true,
|
||||
"columnFilters": false
|
||||
},
|
||||
"transformations": [
|
||||
{
|
||||
"id": "labelsToFields",
|
||||
"options": {}
|
||||
},
|
||||
{
|
||||
"id": "filterByValue",
|
||||
"options": {
|
||||
"match": "Value",
|
||||
"operator": "gt",
|
||||
"value": 600
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 8,
|
||||
"type": "piechart",
|
||||
"title": "Node Pod Share",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 34
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(sum(kube_pod_info{pod!=\"\" , node!=\"\"}) by (node) / clamp_min(sum(kube_pod_info{pod!=\"\" , node!=\"\"}), 1)) * 100",
|
||||
"refId": "A",
|
||||
"legendFormat": "{{namespace}}"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percent",
|
||||
"color": {
|
||||
"mode": "palette-classic"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "list",
|
||||
"placement": "right"
|
||||
},
|
||||
"pieType": "pie",
|
||||
"displayLabels": [],
|
||||
"tooltip": {
|
||||
"mode": "single"
|
||||
},
|
||||
"colorScheme": "interpolateSpectral",
|
||||
"colorBy": "value",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 9,
|
||||
"type": "bargauge",
|
||||
"title": "Top Nodes by Pod Count",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 34
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "topk(12, sum(kube_pod_info{pod!=\"\" , node!=\"\"}) by (node))",
|
||||
"refId": "A",
|
||||
"legendFormat": "{{node}}",
|
||||
"instant": true
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "none",
|
||||
"min": 0,
|
||||
"max": null,
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 50
|
||||
},
|
||||
{
|
||||
"color": "orange",
|
||||
"value": 75
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 100
|
||||
}
|
||||
]
|
||||
},
|
||||
"decimals": 0
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"displayMode": "gradient",
|
||||
"orientation": "horizontal",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
}
|
||||
},
|
||||
"transformations": [
|
||||
{
|
||||
"id": "sortBy",
|
||||
"options": {
|
||||
"fields": [
|
||||
"Value"
|
||||
],
|
||||
"order": "desc"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "limit",
|
||||
"options": {
|
||||
"limit": 12
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 10,
|
||||
"type": "table",
|
||||
"title": "Namespace Plurality by Node v27",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 42
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(sum by (namespace,node) (kube_pod_info{pod!=\"\" , node!=\"\"}) / on(namespace) group_left() clamp_min(sum by (namespace) (kube_pod_info{pod!=\"\"}), 1) * 100) * on(namespace,node) group_left() ((sum by (namespace,node) (kube_pod_info{pod!=\"\" , node!=\"\"}) / on(namespace) group_left() clamp_min(sum by (namespace) (kube_pod_info{pod!=\"\"}), 1) * 100) + on(node) group_left() ((sum by (node) (kube_node_info{node=\"titan-0a\"}) * 0 + 0.001) or (sum by (node) (kube_node_info{node=\"titan-0b\"}) * 0 + 0.002) or (sum by (node) (kube_node_info{node=\"titan-0c\"}) * 0 + 0.003) or (sum by (node) (kube_node_info{node=\"titan-db\"}) * 0 + 0.004) or (sum by (node) (kube_node_info{node=\"titan-04\"}) * 0 + 0.005) or (sum by (node) (kube_node_info{node=\"titan-05\"}) * 0 + 0.006) or (sum by (node) (kube_node_info{node=\"titan-06\"}) * 0 + 0.007) or (sum by (node) (kube_node_info{node=\"titan-07\"}) * 0 + 0.008) or (sum by (node) (kube_node_info{node=\"titan-08\"}) * 0 + 0.009000000000000001) or (sum by (node) (kube_node_info{node=\"titan-09\"}) * 0 + 0.01) or (sum by (node) (kube_node_info{node=\"titan-10\"}) * 0 + 0.011) or (sum by (node) (kube_node_info{node=\"titan-11\"}) * 0 + 0.012) or (sum by (node) (kube_node_info{node=\"titan-12\"}) * 0 + 0.013000000000000001) or (sum by (node) (kube_node_info{node=\"titan-13\"}) * 0 + 0.014) or (sum by (node) (kube_node_info{node=\"titan-14\"}) * 0 + 0.015) or (sum by (node) (kube_node_info{node=\"titan-15\"}) * 0 + 0.016) or (sum by (node) (kube_node_info{node=\"titan-16\"}) * 0 + 0.017) or (sum by (node) (kube_node_info{node=\"titan-17\"}) * 0 + 0.018000000000000002) or (sum by (node) (kube_node_info{node=\"titan-18\"}) * 0 + 0.019) or (sum by (node) (kube_node_info{node=\"titan-19\"}) * 0 + 0.02) or (sum by (node) (kube_node_info{node=\"titan-22\"}) * 0 + 0.021) or (sum by (node) (kube_node_info{node=\"titan-24\"}) * 0 + 0.022)) == bool on(namespace) group_left() (max by (namespace) ((sum by (namespace,node) (kube_pod_info{pod!=\"\" , node!=\"\"}) / on(namespace) group_left() clamp_min(sum by (namespace) (kube_pod_info{pod!=\"\"}), 1) * 100) + on(node) group_left() ((sum by (node) (kube_node_info{node=\"titan-0a\"}) * 0 + 0.001) or (sum by (node) (kube_node_info{node=\"titan-0b\"}) * 0 + 0.002) or (sum by (node) (kube_node_info{node=\"titan-0c\"}) * 0 + 0.003) or (sum by (node) (kube_node_info{node=\"titan-db\"}) * 0 + 0.004) or (sum by (node) (kube_node_info{node=\"titan-04\"}) * 0 + 0.005) or (sum by (node) (kube_node_info{node=\"titan-05\"}) * 0 + 0.006) or (sum by (node) (kube_node_info{node=\"titan-06\"}) * 0 + 0.007) or (sum by (node) (kube_node_info{node=\"titan-07\"}) * 0 + 0.008) or (sum by (node) (kube_node_info{node=\"titan-08\"}) * 0 + 0.009000000000000001) or (sum by (node) (kube_node_info{node=\"titan-09\"}) * 0 + 0.01) or (sum by (node) (kube_node_info{node=\"titan-10\"}) * 0 + 0.011) or (sum by (node) (kube_node_info{node=\"titan-11\"}) * 0 + 0.012) or (sum by (node) (kube_node_info{node=\"titan-12\"}) * 0 + 0.013000000000000001) or (sum by (node) (kube_node_info{node=\"titan-13\"}) * 0 + 0.014) or (sum by (node) (kube_node_info{node=\"titan-14\"}) * 0 + 0.015) or (sum by (node) (kube_node_info{node=\"titan-15\"}) * 0 + 0.016) or (sum by (node) (kube_node_info{node=\"titan-16\"}) * 0 + 0.017) or (sum by (node) (kube_node_info{node=\"titan-17\"}) * 0 + 0.018000000000000002) or (sum by (node) (kube_node_info{node=\"titan-18\"}) * 0 + 0.019) or (sum by (node) (kube_node_info{node=\"titan-19\"}) * 0 + 0.02) or (sum by (node) (kube_node_info{node=\"titan-22\"}) * 0 + 0.021) or (sum by (node) (kube_node_info{node=\"titan-24\"}) * 0 + 0.022)))))",
|
||||
"refId": "A",
|
||||
"instant": true,
|
||||
"format": "table"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percent",
|
||||
"custom": {
|
||||
"filterable": false
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"showHeader": true,
|
||||
"columnFilters": false,
|
||||
"showColumnFilters": false,
|
||||
"footer": {
|
||||
"show": false,
|
||||
"fields": "",
|
||||
"calcs": []
|
||||
}
|
||||
},
|
||||
"transformations": [
|
||||
{
|
||||
"id": "labelsToFields",
|
||||
"options": {}
|
||||
},
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {
|
||||
"excludeByName": {
|
||||
"Time": true
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "filterByValue",
|
||||
"options": {
|
||||
"match": "Value",
|
||||
"operator": "gt",
|
||||
"value": 0
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "sortBy",
|
||||
"options": {
|
||||
"fields": [
|
||||
"Value"
|
||||
],
|
||||
"order": "desc"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "groupBy",
|
||||
"options": {
|
||||
"fields": {
|
||||
"namespace": {
|
||||
"aggregations": [
|
||||
{
|
||||
"field": "Value",
|
||||
"operation": "max"
|
||||
},
|
||||
{
|
||||
"field": "node",
|
||||
"operation": "first"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"rowBy": [
|
||||
"namespace"
|
||||
]
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"time": {
|
||||
"from": "now-12h",
|
||||
"to": "now"
|
||||
},
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"schemaVersion": 39,
|
||||
"style": "dark",
|
||||
"tags": [
|
||||
"atlas",
|
||||
"pods"
|
||||
]
|
||||
}
|
||||
@ -1,427 +0,0 @@
|
||||
{
|
||||
"uid": "atlas-storage",
|
||||
"title": "Atlas Storage",
|
||||
"folderUid": "atlas-internal",
|
||||
"editable": true,
|
||||
"panels": [
|
||||
{
|
||||
"id": 1,
|
||||
"type": "stat",
|
||||
"title": "Astreae Usage",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 5,
|
||||
"w": 6,
|
||||
"x": 0,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "100 - (sum(node_filesystem_avail_bytes{mountpoint=\"/mnt/astreae\",fstype!~\"tmpfs|overlay\"}) / sum(node_filesystem_size_bytes{mountpoint=\"/mnt/astreae\",fstype!~\"tmpfs|overlay\"}) * 100)",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 50
|
||||
},
|
||||
{
|
||||
"color": "orange",
|
||||
"value": 75
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 91.5
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "percent",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"type": "stat",
|
||||
"title": "Asteria Usage",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 5,
|
||||
"w": 6,
|
||||
"x": 6,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "100 - (sum(node_filesystem_avail_bytes{mountpoint=\"/mnt/asteria\",fstype!~\"tmpfs|overlay\"}) / sum(node_filesystem_size_bytes{mountpoint=\"/mnt/asteria\",fstype!~\"tmpfs|overlay\"}) * 100)",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 50
|
||||
},
|
||||
{
|
||||
"color": "orange",
|
||||
"value": 75
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 91.5
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "percent",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
"type": "stat",
|
||||
"title": "Astreae Free",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 5,
|
||||
"w": 6,
|
||||
"x": 12,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(node_filesystem_avail_bytes{mountpoint=\"/mnt/astreae\",fstype!~\"tmpfs|overlay\"})",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "rgba(115, 115, 115, 1)",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "green",
|
||||
"value": 1
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "decbytes",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 4,
|
||||
"type": "stat",
|
||||
"title": "Asteria Free",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 5,
|
||||
"w": 6,
|
||||
"x": 18,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(node_filesystem_avail_bytes{mountpoint=\"/mnt/asteria\",fstype!~\"tmpfs|overlay\"})",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "rgba(115, 115, 115, 1)",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "green",
|
||||
"value": 1
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "decbytes",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 5,
|
||||
"type": "timeseries",
|
||||
"title": "Astreae Per-Node Usage",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 9,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 5
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(avg by (node) ((avg by (instance) ((1 - (node_filesystem_avail_bytes{mountpoint=\"/mnt/astreae\",fstype!~\"tmpfs|overlay\"} / node_filesystem_size_bytes{mountpoint=\"/mnt/astreae\",fstype!~\"tmpfs|overlay\"})) * 100)) * on(instance) group_left(node) label_replace(node_uname_info{nodename!=\"\"}, \"node\", \"$1\", \"nodename\", \"(.*)\"))) * on(node) group_left() label_replace(node_uname_info{nodename=~\"titan-1[2-9]|titan-2[24]\"}, \"node\", \"$1\", \"nodename\", \"(.*)\")",
|
||||
"refId": "A",
|
||||
"legendFormat": "{{node}}"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percent"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "right"
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
}
|
||||
},
|
||||
"timeFrom": "30d"
|
||||
},
|
||||
{
|
||||
"id": 6,
|
||||
"type": "timeseries",
|
||||
"title": "Asteria Per-Node Usage",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 9,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 5
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(avg by (node) ((avg by (instance) ((1 - (node_filesystem_avail_bytes{mountpoint=\"/mnt/asteria\",fstype!~\"tmpfs|overlay\"} / node_filesystem_size_bytes{mountpoint=\"/mnt/asteria\",fstype!~\"tmpfs|overlay\"})) * 100)) * on(instance) group_left(node) label_replace(node_uname_info{nodename!=\"\"}, \"node\", \"$1\", \"nodename\", \"(.*)\"))) * on(node) group_left() label_replace(node_uname_info{nodename=~\"titan-1[2-9]|titan-2[24]\"}, \"node\", \"$1\", \"nodename\", \"(.*)\")",
|
||||
"refId": "A",
|
||||
"legendFormat": "{{node}}"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percent"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "right"
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
}
|
||||
},
|
||||
"timeFrom": "30d"
|
||||
},
|
||||
{
|
||||
"id": 7,
|
||||
"type": "timeseries",
|
||||
"title": "Astreae Usage History",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 9,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 14
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "100 - (sum(node_filesystem_avail_bytes{mountpoint=\"/mnt/astreae\",fstype!~\"tmpfs|overlay\"}) / sum(node_filesystem_size_bytes{mountpoint=\"/mnt/astreae\",fstype!~\"tmpfs|overlay\"}) * 100)",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percent"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "bottom"
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
}
|
||||
},
|
||||
"timeFrom": "90d"
|
||||
},
|
||||
{
|
||||
"id": 8,
|
||||
"type": "timeseries",
|
||||
"title": "Asteria Usage History",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 9,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 14
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "100 - (sum(node_filesystem_avail_bytes{mountpoint=\"/mnt/asteria\",fstype!~\"tmpfs|overlay\"}) / sum(node_filesystem_size_bytes{mountpoint=\"/mnt/asteria\",fstype!~\"tmpfs|overlay\"}) * 100)",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percent"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "bottom"
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
}
|
||||
},
|
||||
"timeFrom": "90d"
|
||||
}
|
||||
],
|
||||
"time": {
|
||||
"from": "now-12h",
|
||||
"to": "now"
|
||||
},
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"schemaVersion": 39,
|
||||
"style": "dark",
|
||||
"tags": [
|
||||
"atlas",
|
||||
"storage"
|
||||
]
|
||||
}
|
||||
@ -1,80 +0,0 @@
|
||||
# services/monitoring/dcgm-exporter.yaml
|
||||
apiVersion: apps/v1
|
||||
kind: DaemonSet
|
||||
metadata:
|
||||
name: dcgm-exporter
|
||||
namespace: monitoring
|
||||
labels:
|
||||
app: dcgm-exporter
|
||||
spec:
|
||||
selector:
|
||||
matchLabels:
|
||||
app: dcgm-exporter
|
||||
updateStrategy:
|
||||
rollingUpdate:
|
||||
maxUnavailable: 2
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: dcgm-exporter
|
||||
annotations:
|
||||
prometheus.io/scrape: "true"
|
||||
prometheus.io/port: "9400"
|
||||
spec:
|
||||
serviceAccountName: default
|
||||
runtimeClassName: nvidia
|
||||
affinity:
|
||||
nodeAffinity:
|
||||
requiredDuringSchedulingIgnoredDuringExecution:
|
||||
nodeSelectorTerms:
|
||||
- matchExpressions:
|
||||
- key: kubernetes.io/hostname
|
||||
operator: In
|
||||
values:
|
||||
- titan-20
|
||||
- titan-21
|
||||
- titan-22
|
||||
- titan-24
|
||||
tolerations:
|
||||
- operator: Exists
|
||||
containers:
|
||||
- name: dcgm-exporter
|
||||
image: registry.bstein.dev/monitoring/dcgm-exporter:4.4.2-4.7.0-ubuntu22.04
|
||||
imagePullPolicy: Always
|
||||
ports:
|
||||
- name: metrics
|
||||
containerPort: 9400
|
||||
env:
|
||||
- name: DCGM_EXPORTER_KUBERNETES
|
||||
value: "true"
|
||||
securityContext:
|
||||
privileged: true
|
||||
resources:
|
||||
requests:
|
||||
cpu: 50m
|
||||
memory: 64Mi
|
||||
volumeMounts:
|
||||
- name: pod-resources
|
||||
mountPath: /var/lib/kubelet/pod-resources
|
||||
imagePullSecrets:
|
||||
- name: zot-regcred
|
||||
volumes:
|
||||
- name: pod-resources
|
||||
hostPath:
|
||||
path: /var/lib/kubelet/pod-resources
|
||||
type: Directory
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: dcgm-exporter
|
||||
namespace: monitoring
|
||||
labels:
|
||||
app: dcgm-exporter
|
||||
spec:
|
||||
selector:
|
||||
app: dcgm-exporter
|
||||
ports:
|
||||
- name: metrics
|
||||
port: 9400
|
||||
targetPort: metrics
|
||||
@ -1,195 +0,0 @@
|
||||
# services/monitoring/grafana-dashboard-gpu.yaml
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: grafana-dashboard-gpu
|
||||
labels:
|
||||
grafana_dashboard: "1"
|
||||
data:
|
||||
atlas-gpu.json: |
|
||||
{
|
||||
"uid": "atlas-gpu",
|
||||
"title": "Atlas GPU",
|
||||
"folderUid": "atlas-internal",
|
||||
"editable": true,
|
||||
"panels": [
|
||||
{
|
||||
"id": 1,
|
||||
"type": "piechart",
|
||||
"title": "Namespace GPU Share",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "100 * ( ( (sum by (namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[$__range]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ) / clamp_min(sum( ( (sum by (namespace) (max_over_time(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}[$__range]))) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) ) and on(namespace) ( (topk(10, ( sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) ) + (sum(container_memory_working_set_bytes{namespace!=\"\",pod!=\"\",container!=\"\"}) by (namespace) / 1e9) + ((sum((kube_pod_container_resource_requests{namespace!=\"\",resource=\"nvidia.com/gpu\"} or kube_pod_container_resource_limits{namespace!=\"\",resource=\"nvidia.com/gpu\"})) by (namespace)) or on(namespace) (sum(rate(container_cpu_usage_seconds_total{namespace!=\"\",pod!=\"\",container!=\"\"}[5m])) by (namespace) * 0) * 100)) >= bool 0) ) ), 1)",
|
||||
"refId": "A",
|
||||
"legendFormat": "{{namespace}}"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percent",
|
||||
"color": {
|
||||
"mode": "palette-classic"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "list",
|
||||
"placement": "right"
|
||||
},
|
||||
"pieType": "pie",
|
||||
"displayLabels": [],
|
||||
"tooltip": {
|
||||
"mode": "single"
|
||||
},
|
||||
"colorScheme": "interpolateSpectral",
|
||||
"colorBy": "value",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"type": "timeseries",
|
||||
"title": "GPU Util by Namespace",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(DCGM_FI_DEV_GPU_UTIL{namespace!=\"\",pod!=\"\"}) by (namespace)",
|
||||
"refId": "A",
|
||||
"legendFormat": "{{namespace}}"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percent"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "right"
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
"type": "timeseries",
|
||||
"title": "GPU Util by Node",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 8
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum by (Hostname) (DCGM_FI_DEV_GPU_UTIL{pod!=\"\"})",
|
||||
"refId": "A",
|
||||
"legendFormat": "{{Hostname}}"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percent"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "right"
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 4,
|
||||
"type": "table",
|
||||
"title": "Top Pods by GPU Util",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 8
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "topk(10, sum(DCGM_FI_DEV_GPU_UTIL{pod!=\"\"}) by (namespace,pod,Hostname))",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percent",
|
||||
"custom": {
|
||||
"filterable": true
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"showHeader": true,
|
||||
"columnFilters": false
|
||||
},
|
||||
"transformations": [
|
||||
{
|
||||
"id": "labelsToFields",
|
||||
"options": {}
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"time": {
|
||||
"from": "now-12h",
|
||||
"to": "now"
|
||||
},
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"schemaVersion": 39,
|
||||
"style": "dark",
|
||||
"tags": [
|
||||
"atlas",
|
||||
"gpu"
|
||||
]
|
||||
}
|
||||
@ -1,677 +0,0 @@
|
||||
# services/monitoring/grafana-dashboard-network.yaml
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: grafana-dashboard-network
|
||||
labels:
|
||||
grafana_dashboard: "1"
|
||||
data:
|
||||
atlas-network.json: |
|
||||
{
|
||||
"uid": "atlas-network",
|
||||
"title": "Atlas Network",
|
||||
"folderUid": "atlas-internal",
|
||||
"editable": true,
|
||||
"panels": [
|
||||
{
|
||||
"id": 1,
|
||||
"type": "stat",
|
||||
"title": "Ingress Success Rate (5m)",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 6,
|
||||
"x": 0,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(sum(rate(traefik_entrypoint_requests_total{code!~\"5..\"}[5m]))) / clamp_min(sum(rate(traefik_entrypoint_requests_total[5m])), 1)",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "red",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "orange",
|
||||
"value": 0.995
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 0.999
|
||||
},
|
||||
{
|
||||
"color": "green",
|
||||
"value": 0.9995
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "percentunit",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
},
|
||||
"decimals": 2
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"type": "stat",
|
||||
"title": "Error Budget Burn (1h)",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 6,
|
||||
"x": 6,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(1 - ((sum(rate(traefik_entrypoint_requests_total{code!~\"5..\"}[1h]))) / clamp_min(sum(rate(traefik_entrypoint_requests_total[1h])), 1))) / 0.0010000000000000009",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"color": "orange",
|
||||
"value": 2
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 4
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "none",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
},
|
||||
"decimals": 2
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
"type": "stat",
|
||||
"title": "Error Budget Burn (6h)",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 6,
|
||||
"x": 12,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(1 - ((sum(rate(traefik_entrypoint_requests_total{code!~\"5..\"}[6h]))) / clamp_min(sum(rate(traefik_entrypoint_requests_total[6h])), 1))) / 0.0010000000000000009",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 1
|
||||
},
|
||||
{
|
||||
"color": "orange",
|
||||
"value": 2
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 4
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "none",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
},
|
||||
"decimals": 2
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 4,
|
||||
"type": "stat",
|
||||
"title": "Edge P99 Latency (ms)",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 6,
|
||||
"x": 18,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "histogram_quantile(0.99, sum by (le) (rate(traefik_entrypoint_request_duration_seconds_bucket[5m]))) * 1000",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 200
|
||||
},
|
||||
{
|
||||
"color": "orange",
|
||||
"value": 350
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 500
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "ms",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
},
|
||||
"decimals": 1
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 5,
|
||||
"type": "stat",
|
||||
"title": "Ingress Traffic",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 0,
|
||||
"y": 4
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(node_network_receive_bytes_total{device!~\"lo|cni.*|veth.*|flannel.*|docker.*|virbr.*|vxlan.*|wg.*\"}[5m])) or on() vector(0)",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "rgba(115, 115, 115, 1)",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "green",
|
||||
"value": 1
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "Bps",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 6,
|
||||
"type": "stat",
|
||||
"title": "Egress Traffic",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 8,
|
||||
"y": 4
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(node_network_transmit_bytes_total{device!~\"lo|cni.*|veth.*|flannel.*|docker.*|virbr.*|vxlan.*|wg.*\"}[5m])) or on() vector(0)",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "rgba(115, 115, 115, 1)",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "green",
|
||||
"value": 1
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "Bps",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 7,
|
||||
"type": "stat",
|
||||
"title": "Intra-Cluster Traffic",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 16,
|
||||
"y": 4
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(container_network_receive_bytes_total{namespace!=\"traefik\",pod!=\"\"}[5m]) + rate(container_network_transmit_bytes_total{namespace!=\"traefik\",pod!=\"\"}[5m])) or on() vector(0)",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "rgba(115, 115, 115, 1)",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "green",
|
||||
"value": 1
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "Bps",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 8,
|
||||
"type": "timeseries",
|
||||
"title": "Per-Node Throughput",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 8
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "avg by (node) ((sum(rate(node_network_transmit_bytes_total{device!~\"lo|cni.*|veth.*|flannel.*|docker.*|virbr.*|vxlan.*|wg.*\"}[5m])) or on() vector(0) + sum(rate(node_network_receive_bytes_total{device!~\"lo|cni.*|veth.*|flannel.*|docker.*|virbr.*|vxlan.*|wg.*\"}[5m])) or on() vector(0)) * on(instance) group_left(node) label_replace(node_uname_info{nodename!=\"\"}, \"node\", \"$1\", \"nodename\", \"(.*)\"))",
|
||||
"refId": "A",
|
||||
"legendFormat": "{{node}}"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "Bps"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "right"
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 9,
|
||||
"type": "table",
|
||||
"title": "Top Namespaces",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 9,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 16
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "topk(10, sum(rate(container_network_transmit_bytes_total{namespace!=\"\"}[5m]) + rate(container_network_receive_bytes_total{namespace!=\"\"}[5m])) by (namespace))",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "Bps",
|
||||
"custom": {
|
||||
"filterable": true
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"showHeader": true,
|
||||
"columnFilters": false
|
||||
},
|
||||
"transformations": [
|
||||
{
|
||||
"id": "labelsToFields",
|
||||
"options": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 10,
|
||||
"type": "table",
|
||||
"title": "Top Pods",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 9,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 16
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "topk(10, sum(rate(container_network_transmit_bytes_total{pod!=\"\"}[5m]) + rate(container_network_receive_bytes_total{pod!=\"\"}[5m])) by (namespace,pod))",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "Bps",
|
||||
"custom": {
|
||||
"filterable": true
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"showHeader": true,
|
||||
"columnFilters": false
|
||||
},
|
||||
"transformations": [
|
||||
{
|
||||
"id": "labelsToFields",
|
||||
"options": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 11,
|
||||
"type": "timeseries",
|
||||
"title": "Traefik Routers (req/s)",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 9,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 25
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "topk(10, sum by (router) (rate(traefik_router_requests_total[5m])))",
|
||||
"refId": "A",
|
||||
"legendFormat": "{{router}}"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "req/s"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "right"
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 12,
|
||||
"type": "timeseries",
|
||||
"title": "Traefik Entrypoints (req/s)",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 9,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 25
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum by (entrypoint) (rate(traefik_entrypoint_requests_total[5m]))",
|
||||
"refId": "A",
|
||||
"legendFormat": "{{entrypoint}}"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "req/s"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "right"
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
}
|
||||
}
|
||||
}
|
||||
],
|
||||
"time": {
|
||||
"from": "now-12h",
|
||||
"to": "now"
|
||||
},
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"schemaVersion": 39,
|
||||
"style": "dark",
|
||||
"tags": [
|
||||
"atlas",
|
||||
"network"
|
||||
]
|
||||
}
|
||||
@ -1,611 +0,0 @@
|
||||
# services/monitoring/grafana-dashboard-nodes.yaml
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: grafana-dashboard-nodes
|
||||
labels:
|
||||
grafana_dashboard: "1"
|
||||
data:
|
||||
atlas-nodes.json: |
|
||||
{
|
||||
"uid": "atlas-nodes",
|
||||
"title": "Atlas Nodes",
|
||||
"folderUid": "atlas-internal",
|
||||
"editable": true,
|
||||
"panels": [
|
||||
{
|
||||
"id": 1,
|
||||
"type": "stat",
|
||||
"title": "Worker Nodes Ready",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 0,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(kube_node_status_condition{condition=\"Ready\",status=\"true\",node=~\"titan-04|titan-05|titan-06|titan-07|titan-08|titan-09|titan-10|titan-11|titan-12|titan-13|titan-14|titan-15|titan-16|titan-17|titan-18|titan-19|titan-22|titan-24\"})",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "rgba(115, 115, 115, 1)",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "green",
|
||||
"value": 1
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "none",
|
||||
"custom": {
|
||||
"displayMode": "auto",
|
||||
"valueSuffix": "/18"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"type": "stat",
|
||||
"title": "Control Plane Ready",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 8,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(kube_node_status_condition{condition=\"Ready\",status=\"true\",node=~\"titan-0a|titan-0b|titan-0c\"})",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "rgba(115, 115, 115, 1)",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "green",
|
||||
"value": 1
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "none",
|
||||
"custom": {
|
||||
"displayMode": "auto",
|
||||
"valueSuffix": "/3"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
"type": "stat",
|
||||
"title": "Control Plane Workloads",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 16,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(kube_pod_info{node=~\"titan-0a|titan-0b|titan-0c\",namespace!~\"kube-system|kube-public|kube-node-lease|longhorn-system|monitoring|flux-system\"})",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "rgba(115, 115, 115, 1)",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "green",
|
||||
"value": 1
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "none",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 9,
|
||||
"type": "stat",
|
||||
"title": "API Server 5xx rate",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 0,
|
||||
"y": 4
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(rate(apiserver_request_total{code=~\"5..\"}[5m]))",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 0.05
|
||||
},
|
||||
{
|
||||
"color": "orange",
|
||||
"value": 0.2
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 0.5
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "req/s",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
},
|
||||
"decimals": 3
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 10,
|
||||
"type": "stat",
|
||||
"title": "API Server P99 latency",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 8,
|
||||
"y": 4
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "histogram_quantile(0.99, sum by (le) (rate(apiserver_request_duration_seconds_bucket[5m]))) * 1000",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 250
|
||||
},
|
||||
{
|
||||
"color": "orange",
|
||||
"value": 400
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 600
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "ms",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
},
|
||||
"decimals": 1
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 11,
|
||||
"type": "stat",
|
||||
"title": "etcd P99 latency",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 8,
|
||||
"x": 16,
|
||||
"y": 4
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "histogram_quantile(0.99, sum by (le) (rate(etcd_request_duration_seconds_bucket[5m]))) * 1000",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 50
|
||||
},
|
||||
{
|
||||
"color": "orange",
|
||||
"value": 100
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 200
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "ms",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
},
|
||||
"decimals": 1
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 4,
|
||||
"type": "timeseries",
|
||||
"title": "Node CPU",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 9,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 8
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "avg by (node) (((1 - avg by (instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m]))) * 100) * on(instance) group_left(node) label_replace(node_uname_info{nodename!=\"\"}, \"node\", \"$1\", \"nodename\", \"(.*)\"))",
|
||||
"refId": "A",
|
||||
"legendFormat": "{{node}}"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percent"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "right",
|
||||
"calcs": [
|
||||
"last"
|
||||
]
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 5,
|
||||
"type": "timeseries",
|
||||
"title": "Node RAM",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 9,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 17
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "avg by (node) ((avg by (instance) ((node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100)) * on(instance) group_left(node) label_replace(node_uname_info{nodename!=\"\"}, \"node\", \"$1\", \"nodename\", \"(.*)\"))",
|
||||
"refId": "A",
|
||||
"legendFormat": "{{node}}"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percent"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "right",
|
||||
"calcs": [
|
||||
"last"
|
||||
]
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 6,
|
||||
"type": "timeseries",
|
||||
"title": "Control Plane (incl. titan-db) CPU",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 9,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 26
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(avg by (node) (((1 - avg by (instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m]))) * 100) * on(instance) group_left(node) label_replace(node_uname_info{nodename!=\"\"}, \"node\", \"$1\", \"nodename\", \"(.*)\"))) * on(node) group_left() label_replace(node_uname_info{nodename=~\"titan-0a|titan-0b|titan-0c|titan-db\"}, \"node\", \"$1\", \"nodename\", \"(.*)\")",
|
||||
"refId": "A",
|
||||
"legendFormat": "{{node}}"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percent"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "right"
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 7,
|
||||
"type": "timeseries",
|
||||
"title": "Control Plane (incl. titan-db) RAM",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 9,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 26
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(avg by (node) ((avg by (instance) ((node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100)) * on(instance) group_left(node) label_replace(node_uname_info{nodename!=\"\"}, \"node\", \"$1\", \"nodename\", \"(.*)\"))) * on(node) group_left() label_replace(node_uname_info{nodename=~\"titan-0a|titan-0b|titan-0c|titan-db\"}, \"node\", \"$1\", \"nodename\", \"(.*)\")",
|
||||
"refId": "A",
|
||||
"legendFormat": "{{node}}"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percent"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "right"
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 8,
|
||||
"type": "timeseries",
|
||||
"title": "Root Filesystem Usage",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 9,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 35
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "avg by (node) ((avg by (instance) ((1 - (node_filesystem_avail_bytes{mountpoint=\"/\",fstype!~\"tmpfs|overlay\"} / node_filesystem_size_bytes{mountpoint=\"/\",fstype!~\"tmpfs|overlay\"})) * 100)) * on(instance) group_left(node) label_replace(node_uname_info{nodename!=\"\"}, \"node\", \"$1\", \"nodename\", \"(.*)\"))",
|
||||
"refId": "A",
|
||||
"legendFormat": "{{node}}"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percent"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "right"
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
}
|
||||
},
|
||||
"timeFrom": "30d"
|
||||
}
|
||||
],
|
||||
"time": {
|
||||
"from": "now-12h",
|
||||
"to": "now"
|
||||
},
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"schemaVersion": 39,
|
||||
"style": "dark",
|
||||
"tags": [
|
||||
"atlas",
|
||||
"nodes"
|
||||
]
|
||||
}
|
||||
File diff suppressed because it is too large
Load Diff
@ -1,625 +0,0 @@
|
||||
# services/monitoring/grafana-dashboard-pods.yaml
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: grafana-dashboard-pods
|
||||
labels:
|
||||
grafana_dashboard: "1"
|
||||
data:
|
||||
atlas-pods.json: |
|
||||
{
|
||||
"uid": "atlas-pods",
|
||||
"title": "Atlas Pods",
|
||||
"folderUid": "atlas-internal",
|
||||
"editable": true,
|
||||
"panels": [
|
||||
{
|
||||
"id": 1,
|
||||
"type": "stat",
|
||||
"title": "Problem Pods",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 6,
|
||||
"x": 0,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(max by (namespace,pod) (kube_pod_status_phase{phase!~\"Running|Succeeded\"})) or on() vector(0)",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 1
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "none",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"type": "stat",
|
||||
"title": "CrashLoop / ImagePull",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 6,
|
||||
"x": 6,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(max by (namespace,pod) (kube_pod_container_status_waiting_reason{reason=~\"CrashLoopBackOff|ImagePullBackOff\"})) or on() vector(0)",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 1
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "none",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
"type": "stat",
|
||||
"title": "Stuck Terminating (>10m)",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 6,
|
||||
"x": 12,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(max by (namespace,pod) (((time() - kube_pod_deletion_timestamp{pod!=\"\"}) > bool 600) and on(namespace,pod) (kube_pod_deletion_timestamp{pod!=\"\"} > bool 0))) or on() vector(0)",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 1
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "none",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 4,
|
||||
"type": "stat",
|
||||
"title": "Control Plane Workloads",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 4,
|
||||
"w": 6,
|
||||
"x": 18,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(kube_pod_info{node=~\"titan-0a|titan-0b|titan-0c\",namespace!~\"kube-system|kube-public|kube-node-lease|longhorn-system|monitoring|flux-system\"})",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 1
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "none",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 5,
|
||||
"type": "table",
|
||||
"title": "Pods Not Running",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 10,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 4
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(time() - kube_pod_created{pod!=\"\"}) * on(namespace,pod) group_left(node) kube_pod_info * on(namespace,pod) group_left(phase) max by (namespace,pod,phase) (kube_pod_status_phase{phase!~\"Running|Succeeded\"})",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "s",
|
||||
"custom": {
|
||||
"filterable": true
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"showHeader": true,
|
||||
"columnFilters": false
|
||||
},
|
||||
"transformations": [
|
||||
{
|
||||
"id": "labelsToFields",
|
||||
"options": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 6,
|
||||
"type": "table",
|
||||
"title": "CrashLoop / ImagePull",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 10,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 14
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(time() - kube_pod_created{pod!=\"\"}) * on(namespace,pod) group_left(node) kube_pod_info * on(namespace,pod,container) group_left(reason) max by (namespace,pod,container,reason) (kube_pod_container_status_waiting_reason{reason=~\"CrashLoopBackOff|ImagePullBackOff\"})",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "s",
|
||||
"custom": {
|
||||
"filterable": true
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"showHeader": true,
|
||||
"columnFilters": false
|
||||
},
|
||||
"transformations": [
|
||||
{
|
||||
"id": "labelsToFields",
|
||||
"options": {}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 7,
|
||||
"type": "table",
|
||||
"title": "Terminating >10m",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 10,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 24
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(((time() - kube_pod_deletion_timestamp{pod!=\"\"}) and on(namespace,pod) (kube_pod_deletion_timestamp{pod!=\"\"} > bool 0)) * on(namespace,pod) group_left(node) kube_pod_info)",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "s",
|
||||
"custom": {
|
||||
"filterable": true
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"showHeader": true,
|
||||
"columnFilters": false
|
||||
},
|
||||
"transformations": [
|
||||
{
|
||||
"id": "labelsToFields",
|
||||
"options": {}
|
||||
},
|
||||
{
|
||||
"id": "filterByValue",
|
||||
"options": {
|
||||
"match": "Value",
|
||||
"operator": "gt",
|
||||
"value": 600
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 8,
|
||||
"type": "piechart",
|
||||
"title": "Node Pod Share",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 34
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(sum(kube_pod_info{pod!=\"\" , node!=\"\"}) by (node) / clamp_min(sum(kube_pod_info{pod!=\"\" , node!=\"\"}), 1)) * 100",
|
||||
"refId": "A",
|
||||
"legendFormat": "{{namespace}}"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percent",
|
||||
"color": {
|
||||
"mode": "palette-classic"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "list",
|
||||
"placement": "right"
|
||||
},
|
||||
"pieType": "pie",
|
||||
"displayLabels": [],
|
||||
"tooltip": {
|
||||
"mode": "single"
|
||||
},
|
||||
"colorScheme": "interpolateSpectral",
|
||||
"colorBy": "value",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 9,
|
||||
"type": "bargauge",
|
||||
"title": "Top Nodes by Pod Count",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 34
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "topk(12, sum(kube_pod_info{pod!=\"\" , node!=\"\"}) by (node))",
|
||||
"refId": "A",
|
||||
"legendFormat": "{{node}}",
|
||||
"instant": true
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "none",
|
||||
"min": 0,
|
||||
"max": null,
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 50
|
||||
},
|
||||
{
|
||||
"color": "orange",
|
||||
"value": 75
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 100
|
||||
}
|
||||
]
|
||||
},
|
||||
"decimals": 0
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"displayMode": "gradient",
|
||||
"orientation": "horizontal",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
}
|
||||
},
|
||||
"transformations": [
|
||||
{
|
||||
"id": "sortBy",
|
||||
"options": {
|
||||
"fields": [
|
||||
"Value"
|
||||
],
|
||||
"order": "desc"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "limit",
|
||||
"options": {
|
||||
"limit": 12
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"id": 10,
|
||||
"type": "table",
|
||||
"title": "Namespace Plurality by Node v27",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 8,
|
||||
"w": 24,
|
||||
"x": 0,
|
||||
"y": 42
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(sum by (namespace,node) (kube_pod_info{pod!=\"\" , node!=\"\"}) / on(namespace) group_left() clamp_min(sum by (namespace) (kube_pod_info{pod!=\"\"}), 1) * 100) * on(namespace,node) group_left() ((sum by (namespace,node) (kube_pod_info{pod!=\"\" , node!=\"\"}) / on(namespace) group_left() clamp_min(sum by (namespace) (kube_pod_info{pod!=\"\"}), 1) * 100) + on(node) group_left() ((sum by (node) (kube_node_info{node=\"titan-0a\"}) * 0 + 0.001) or (sum by (node) (kube_node_info{node=\"titan-0b\"}) * 0 + 0.002) or (sum by (node) (kube_node_info{node=\"titan-0c\"}) * 0 + 0.003) or (sum by (node) (kube_node_info{node=\"titan-db\"}) * 0 + 0.004) or (sum by (node) (kube_node_info{node=\"titan-04\"}) * 0 + 0.005) or (sum by (node) (kube_node_info{node=\"titan-05\"}) * 0 + 0.006) or (sum by (node) (kube_node_info{node=\"titan-06\"}) * 0 + 0.007) or (sum by (node) (kube_node_info{node=\"titan-07\"}) * 0 + 0.008) or (sum by (node) (kube_node_info{node=\"titan-08\"}) * 0 + 0.009000000000000001) or (sum by (node) (kube_node_info{node=\"titan-09\"}) * 0 + 0.01) or (sum by (node) (kube_node_info{node=\"titan-10\"}) * 0 + 0.011) or (sum by (node) (kube_node_info{node=\"titan-11\"}) * 0 + 0.012) or (sum by (node) (kube_node_info{node=\"titan-12\"}) * 0 + 0.013000000000000001) or (sum by (node) (kube_node_info{node=\"titan-13\"}) * 0 + 0.014) or (sum by (node) (kube_node_info{node=\"titan-14\"}) * 0 + 0.015) or (sum by (node) (kube_node_info{node=\"titan-15\"}) * 0 + 0.016) or (sum by (node) (kube_node_info{node=\"titan-16\"}) * 0 + 0.017) or (sum by (node) (kube_node_info{node=\"titan-17\"}) * 0 + 0.018000000000000002) or (sum by (node) (kube_node_info{node=\"titan-18\"}) * 0 + 0.019) or (sum by (node) (kube_node_info{node=\"titan-19\"}) * 0 + 0.02) or (sum by (node) (kube_node_info{node=\"titan-22\"}) * 0 + 0.021) or (sum by (node) (kube_node_info{node=\"titan-24\"}) * 0 + 0.022)) == bool on(namespace) group_left() (max by (namespace) ((sum by (namespace,node) (kube_pod_info{pod!=\"\" , node!=\"\"}) / on(namespace) group_left() clamp_min(sum by (namespace) (kube_pod_info{pod!=\"\"}), 1) * 100) + on(node) group_left() ((sum by (node) (kube_node_info{node=\"titan-0a\"}) * 0 + 0.001) or (sum by (node) (kube_node_info{node=\"titan-0b\"}) * 0 + 0.002) or (sum by (node) (kube_node_info{node=\"titan-0c\"}) * 0 + 0.003) or (sum by (node) (kube_node_info{node=\"titan-db\"}) * 0 + 0.004) or (sum by (node) (kube_node_info{node=\"titan-04\"}) * 0 + 0.005) or (sum by (node) (kube_node_info{node=\"titan-05\"}) * 0 + 0.006) or (sum by (node) (kube_node_info{node=\"titan-06\"}) * 0 + 0.007) or (sum by (node) (kube_node_info{node=\"titan-07\"}) * 0 + 0.008) or (sum by (node) (kube_node_info{node=\"titan-08\"}) * 0 + 0.009000000000000001) or (sum by (node) (kube_node_info{node=\"titan-09\"}) * 0 + 0.01) or (sum by (node) (kube_node_info{node=\"titan-10\"}) * 0 + 0.011) or (sum by (node) (kube_node_info{node=\"titan-11\"}) * 0 + 0.012) or (sum by (node) (kube_node_info{node=\"titan-12\"}) * 0 + 0.013000000000000001) or (sum by (node) (kube_node_info{node=\"titan-13\"}) * 0 + 0.014) or (sum by (node) (kube_node_info{node=\"titan-14\"}) * 0 + 0.015) or (sum by (node) (kube_node_info{node=\"titan-15\"}) * 0 + 0.016) or (sum by (node) (kube_node_info{node=\"titan-16\"}) * 0 + 0.017) or (sum by (node) (kube_node_info{node=\"titan-17\"}) * 0 + 0.018000000000000002) or (sum by (node) (kube_node_info{node=\"titan-18\"}) * 0 + 0.019) or (sum by (node) (kube_node_info{node=\"titan-19\"}) * 0 + 0.02) or (sum by (node) (kube_node_info{node=\"titan-22\"}) * 0 + 0.021) or (sum by (node) (kube_node_info{node=\"titan-24\"}) * 0 + 0.022)))))",
|
||||
"refId": "A",
|
||||
"instant": true,
|
||||
"format": "table"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percent",
|
||||
"custom": {
|
||||
"filterable": false
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"showHeader": true,
|
||||
"columnFilters": false,
|
||||
"showColumnFilters": false,
|
||||
"footer": {
|
||||
"show": false,
|
||||
"fields": "",
|
||||
"calcs": []
|
||||
}
|
||||
},
|
||||
"transformations": [
|
||||
{
|
||||
"id": "labelsToFields",
|
||||
"options": {}
|
||||
},
|
||||
{
|
||||
"id": "organize",
|
||||
"options": {
|
||||
"excludeByName": {
|
||||
"Time": true
|
||||
}
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "filterByValue",
|
||||
"options": {
|
||||
"match": "Value",
|
||||
"operator": "gt",
|
||||
"value": 0
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "sortBy",
|
||||
"options": {
|
||||
"fields": [
|
||||
"Value"
|
||||
],
|
||||
"order": "desc"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": "groupBy",
|
||||
"options": {
|
||||
"fields": {
|
||||
"namespace": {
|
||||
"aggregations": [
|
||||
{
|
||||
"field": "Value",
|
||||
"operation": "max"
|
||||
},
|
||||
{
|
||||
"field": "node",
|
||||
"operation": "first"
|
||||
}
|
||||
]
|
||||
}
|
||||
},
|
||||
"rowBy": [
|
||||
"namespace"
|
||||
]
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
],
|
||||
"time": {
|
||||
"from": "now-12h",
|
||||
"to": "now"
|
||||
},
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"schemaVersion": 39,
|
||||
"style": "dark",
|
||||
"tags": [
|
||||
"atlas",
|
||||
"pods"
|
||||
]
|
||||
}
|
||||
@ -1,436 +0,0 @@
|
||||
# services/monitoring/grafana-dashboard-storage.yaml
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: grafana-dashboard-storage
|
||||
labels:
|
||||
grafana_dashboard: "1"
|
||||
data:
|
||||
atlas-storage.json: |
|
||||
{
|
||||
"uid": "atlas-storage",
|
||||
"title": "Atlas Storage",
|
||||
"folderUid": "atlas-internal",
|
||||
"editable": true,
|
||||
"panels": [
|
||||
{
|
||||
"id": 1,
|
||||
"type": "stat",
|
||||
"title": "Astreae Usage",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 5,
|
||||
"w": 6,
|
||||
"x": 0,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "100 - (sum(node_filesystem_avail_bytes{mountpoint=\"/mnt/astreae\",fstype!~\"tmpfs|overlay\"}) / sum(node_filesystem_size_bytes{mountpoint=\"/mnt/astreae\",fstype!~\"tmpfs|overlay\"}) * 100)",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 50
|
||||
},
|
||||
{
|
||||
"color": "orange",
|
||||
"value": 75
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 91.5
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "percent",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 2,
|
||||
"type": "stat",
|
||||
"title": "Asteria Usage",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 5,
|
||||
"w": 6,
|
||||
"x": 6,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "100 - (sum(node_filesystem_avail_bytes{mountpoint=\"/mnt/asteria\",fstype!~\"tmpfs|overlay\"}) / sum(node_filesystem_size_bytes{mountpoint=\"/mnt/asteria\",fstype!~\"tmpfs|overlay\"}) * 100)",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "green",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "yellow",
|
||||
"value": 50
|
||||
},
|
||||
{
|
||||
"color": "orange",
|
||||
"value": 75
|
||||
},
|
||||
{
|
||||
"color": "red",
|
||||
"value": 91.5
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "percent",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 3,
|
||||
"type": "stat",
|
||||
"title": "Astreae Free",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 5,
|
||||
"w": 6,
|
||||
"x": 12,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(node_filesystem_avail_bytes{mountpoint=\"/mnt/astreae\",fstype!~\"tmpfs|overlay\"})",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "rgba(115, 115, 115, 1)",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "green",
|
||||
"value": 1
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "decbytes",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 4,
|
||||
"type": "stat",
|
||||
"title": "Asteria Free",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 5,
|
||||
"w": 6,
|
||||
"x": 18,
|
||||
"y": 0
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "sum(node_filesystem_avail_bytes{mountpoint=\"/mnt/asteria\",fstype!~\"tmpfs|overlay\"})",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"color": {
|
||||
"mode": "thresholds"
|
||||
},
|
||||
"mappings": [],
|
||||
"thresholds": {
|
||||
"mode": "absolute",
|
||||
"steps": [
|
||||
{
|
||||
"color": "rgba(115, 115, 115, 1)",
|
||||
"value": null
|
||||
},
|
||||
{
|
||||
"color": "green",
|
||||
"value": 1
|
||||
}
|
||||
]
|
||||
},
|
||||
"unit": "decbytes",
|
||||
"custom": {
|
||||
"displayMode": "auto"
|
||||
}
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"colorMode": "value",
|
||||
"graphMode": "area",
|
||||
"justifyMode": "center",
|
||||
"reduceOptions": {
|
||||
"calcs": [
|
||||
"lastNotNull"
|
||||
],
|
||||
"fields": "",
|
||||
"values": false
|
||||
},
|
||||
"textMode": "value"
|
||||
}
|
||||
},
|
||||
{
|
||||
"id": 5,
|
||||
"type": "timeseries",
|
||||
"title": "Astreae Per-Node Usage",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 9,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 5
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(avg by (node) ((avg by (instance) ((1 - (node_filesystem_avail_bytes{mountpoint=\"/mnt/astreae\",fstype!~\"tmpfs|overlay\"} / node_filesystem_size_bytes{mountpoint=\"/mnt/astreae\",fstype!~\"tmpfs|overlay\"})) * 100)) * on(instance) group_left(node) label_replace(node_uname_info{nodename!=\"\"}, \"node\", \"$1\", \"nodename\", \"(.*)\"))) * on(node) group_left() label_replace(node_uname_info{nodename=~\"titan-1[2-9]|titan-2[24]\"}, \"node\", \"$1\", \"nodename\", \"(.*)\")",
|
||||
"refId": "A",
|
||||
"legendFormat": "{{node}}"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percent"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "right"
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
}
|
||||
},
|
||||
"timeFrom": "30d"
|
||||
},
|
||||
{
|
||||
"id": 6,
|
||||
"type": "timeseries",
|
||||
"title": "Asteria Per-Node Usage",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 9,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 5
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "(avg by (node) ((avg by (instance) ((1 - (node_filesystem_avail_bytes{mountpoint=\"/mnt/asteria\",fstype!~\"tmpfs|overlay\"} / node_filesystem_size_bytes{mountpoint=\"/mnt/asteria\",fstype!~\"tmpfs|overlay\"})) * 100)) * on(instance) group_left(node) label_replace(node_uname_info{nodename!=\"\"}, \"node\", \"$1\", \"nodename\", \"(.*)\"))) * on(node) group_left() label_replace(node_uname_info{nodename=~\"titan-1[2-9]|titan-2[24]\"}, \"node\", \"$1\", \"nodename\", \"(.*)\")",
|
||||
"refId": "A",
|
||||
"legendFormat": "{{node}}"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percent"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "right"
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
}
|
||||
},
|
||||
"timeFrom": "30d"
|
||||
},
|
||||
{
|
||||
"id": 7,
|
||||
"type": "timeseries",
|
||||
"title": "Astreae Usage History",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 9,
|
||||
"w": 12,
|
||||
"x": 0,
|
||||
"y": 14
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "100 - (sum(node_filesystem_avail_bytes{mountpoint=\"/mnt/astreae\",fstype!~\"tmpfs|overlay\"}) / sum(node_filesystem_size_bytes{mountpoint=\"/mnt/astreae\",fstype!~\"tmpfs|overlay\"}) * 100)",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percent"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "bottom"
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
}
|
||||
},
|
||||
"timeFrom": "90d"
|
||||
},
|
||||
{
|
||||
"id": 8,
|
||||
"type": "timeseries",
|
||||
"title": "Asteria Usage History",
|
||||
"datasource": {
|
||||
"type": "prometheus",
|
||||
"uid": "atlas-vm"
|
||||
},
|
||||
"gridPos": {
|
||||
"h": 9,
|
||||
"w": 12,
|
||||
"x": 12,
|
||||
"y": 14
|
||||
},
|
||||
"targets": [
|
||||
{
|
||||
"expr": "100 - (sum(node_filesystem_avail_bytes{mountpoint=\"/mnt/asteria\",fstype!~\"tmpfs|overlay\"}) / sum(node_filesystem_size_bytes{mountpoint=\"/mnt/asteria\",fstype!~\"tmpfs|overlay\"}) * 100)",
|
||||
"refId": "A"
|
||||
}
|
||||
],
|
||||
"fieldConfig": {
|
||||
"defaults": {
|
||||
"unit": "percent"
|
||||
},
|
||||
"overrides": []
|
||||
},
|
||||
"options": {
|
||||
"legend": {
|
||||
"displayMode": "table",
|
||||
"placement": "bottom"
|
||||
},
|
||||
"tooltip": {
|
||||
"mode": "multi"
|
||||
}
|
||||
},
|
||||
"timeFrom": "90d"
|
||||
}
|
||||
],
|
||||
"time": {
|
||||
"from": "now-12h",
|
||||
"to": "now"
|
||||
},
|
||||
"annotations": {
|
||||
"list": []
|
||||
},
|
||||
"schemaVersion": 39,
|
||||
"style": "dark",
|
||||
"tags": [
|
||||
"atlas",
|
||||
"storage"
|
||||
]
|
||||
}
|
||||
@ -1,35 +0,0 @@
|
||||
# services/monitoring/grafana-folders.yaml
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: grafana-folders
|
||||
labels:
|
||||
app.kubernetes.io/name: grafana
|
||||
app.kubernetes.io/component: folders
|
||||
data:
|
||||
folders.yaml: |
|
||||
apiVersion: 1
|
||||
folders:
|
||||
- uid: overview
|
||||
title: Overview
|
||||
permissions:
|
||||
- role: Viewer
|
||||
permission: View
|
||||
- role: Editor
|
||||
permission: Edit
|
||||
- role: Admin
|
||||
permission: Admin
|
||||
- uid: atlas-internal
|
||||
title: Atlas Internal
|
||||
permissions:
|
||||
- role: Editor
|
||||
permission: View
|
||||
- role: Admin
|
||||
permission: Admin
|
||||
- uid: oceanus-internal
|
||||
title: Oceanus Internal
|
||||
permissions:
|
||||
- role: Editor
|
||||
permission: View
|
||||
- role: Admin
|
||||
permission: Admin
|
||||
@ -65,13 +65,14 @@ spec:
|
||||
namespace: flux-system
|
||||
values:
|
||||
server:
|
||||
# keep 1 year; supports "d", "y"
|
||||
# keep ~3 months; change as you like (supports "d", "y")
|
||||
extraArgs:
|
||||
retentionPeriod: "1y" # VM flag -retentionPeriod=1y. :contentReference[oaicite:11]{index=11}
|
||||
retentionPeriod: "90d" # VM flag -retentionPeriod=90d. :contentReference[oaicite:11]{index=11}
|
||||
|
||||
persistentVolume:
|
||||
enabled: true
|
||||
size: 250Gi
|
||||
size: 100Gi # adjust; uses default StorageClass (Longhorn)
|
||||
# storageClassName: "" # set if you want a specific class
|
||||
|
||||
# Enable built-in Kubernetes scraping
|
||||
scrape:
|
||||
@ -186,15 +187,6 @@ spec:
|
||||
- targets: ["longhorn-backend.longhorn-system.svc:9500"]
|
||||
metrics_path: /metrics
|
||||
|
||||
# --- titan-db node_exporter (external control-plane DB host) ---
|
||||
- job_name: "titan-db"
|
||||
static_configs:
|
||||
- targets: ["192.168.22.10:9100"]
|
||||
relabel_configs:
|
||||
- source_labels: [__address__]
|
||||
target_label: instance
|
||||
replacement: titan-db
|
||||
|
||||
# --- cert-manager (pods expose on 9402) ---
|
||||
- job_name: "cert-manager"
|
||||
kubernetes_sd_configs: [{ role: pod }]
|
||||
@ -218,195 +210,3 @@ spec:
|
||||
- action: keep
|
||||
source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_pod_label_app_kubernetes_io_part_of]
|
||||
regex: flux-system;flux
|
||||
|
||||
---
|
||||
|
||||
apiVersion: helm.toolkit.fluxcd.io/v2
|
||||
kind: HelmRelease
|
||||
metadata:
|
||||
name: grafana
|
||||
namespace: monitoring
|
||||
spec:
|
||||
interval: 15m
|
||||
chart:
|
||||
spec:
|
||||
chart: grafana
|
||||
version: "~8.5.0"
|
||||
sourceRef:
|
||||
kind: HelmRepository
|
||||
name: grafana
|
||||
namespace: flux-system
|
||||
values:
|
||||
admin:
|
||||
existingSecret: grafana-admin
|
||||
userKey: admin-user
|
||||
passwordKey: admin-password
|
||||
persistence:
|
||||
enabled: true
|
||||
size: 20Gi
|
||||
storageClassName: astreae
|
||||
service:
|
||||
type: ClusterIP
|
||||
env:
|
||||
GF_AUTH_ANONYMOUS_ENABLED: "false"
|
||||
GF_SECURITY_ALLOW_EMBEDDING: "true"
|
||||
GF_AUTH_GENERIC_OAUTH_ENABLED: "true"
|
||||
GF_AUTH_GENERIC_OAUTH_NAME: "Keycloak"
|
||||
GF_AUTH_GENERIC_OAUTH_ALLOW_SIGN_UP: "true"
|
||||
GF_AUTH_GENERIC_OAUTH_SCOPES: "openid profile email groups"
|
||||
GF_AUTH_GENERIC_OAUTH_AUTH_URL: "https://sso.bstein.dev/realms/atlas/protocol/openid-connect/auth"
|
||||
GF_AUTH_GENERIC_OAUTH_TOKEN_URL: "https://sso.bstein.dev/realms/atlas/protocol/openid-connect/token"
|
||||
GF_AUTH_GENERIC_OAUTH_API_URL: "https://sso.bstein.dev/realms/atlas/protocol/openid-connect/userinfo"
|
||||
GF_AUTH_GENERIC_OAUTH_ROLE_ATTRIBUTE_PATH: "contains(groups, 'admin') && 'Admin' || 'Viewer'"
|
||||
GF_AUTH_GENERIC_OAUTH_TLS_SKIP_VERIFY_INSECURE: "false"
|
||||
GF_AUTH_SIGNOUT_REDIRECT_URL: "https://sso.bstein.dev/realms/atlas/protocol/openid-connect/logout?redirect_uri=https://metrics.bstein.dev/"
|
||||
envValueFrom:
|
||||
GF_AUTH_GENERIC_OAUTH_CLIENT_ID:
|
||||
secretKeyRef:
|
||||
name: grafana-oidc
|
||||
key: client_id
|
||||
GF_AUTH_GENERIC_OAUTH_CLIENT_SECRET:
|
||||
secretKeyRef:
|
||||
name: grafana-oidc
|
||||
key: client_secret
|
||||
grafana.ini:
|
||||
server:
|
||||
domain: metrics.bstein.dev
|
||||
root_url: https://metrics.bstein.dev/
|
||||
dashboards:
|
||||
default_home_dashboard_path: /var/lib/grafana/dashboards/overview/atlas-overview.json
|
||||
auth.anonymous:
|
||||
hide_version: true
|
||||
users:
|
||||
default_theme: dark
|
||||
ingress:
|
||||
enabled: true
|
||||
ingressClassName: traefik
|
||||
annotations:
|
||||
cert-manager.io/cluster-issuer: letsencrypt
|
||||
hosts:
|
||||
- metrics.bstein.dev
|
||||
path: /
|
||||
tls:
|
||||
- secretName: grafana-metrics-tls
|
||||
hosts:
|
||||
- metrics.bstein.dev
|
||||
datasources:
|
||||
datasources.yaml:
|
||||
apiVersion: 1
|
||||
datasources:
|
||||
- name: VictoriaMetrics
|
||||
type: prometheus
|
||||
access: proxy
|
||||
url: http://victoria-metrics-single-server:8428
|
||||
isDefault: true
|
||||
jsonData:
|
||||
timeInterval: "15s"
|
||||
uid: atlas-vm
|
||||
dashboardProviders:
|
||||
dashboardproviders.yaml:
|
||||
apiVersion: 1
|
||||
providers:
|
||||
- name: overview
|
||||
orgId: 1
|
||||
folder: Overview
|
||||
type: file
|
||||
disableDeletion: false
|
||||
editable: false
|
||||
options:
|
||||
path: /var/lib/grafana/dashboards/overview
|
||||
- name: pods
|
||||
orgId: 1
|
||||
folder: Atlas Internal
|
||||
type: file
|
||||
disableDeletion: false
|
||||
editable: true
|
||||
options:
|
||||
path: /var/lib/grafana/dashboards/pods
|
||||
- name: nodes
|
||||
orgId: 1
|
||||
folder: Atlas Internal
|
||||
type: file
|
||||
disableDeletion: false
|
||||
editable: true
|
||||
options:
|
||||
path: /var/lib/grafana/dashboards/nodes
|
||||
- name: storage
|
||||
orgId: 1
|
||||
folder: Atlas Internal
|
||||
type: file
|
||||
disableDeletion: false
|
||||
editable: true
|
||||
options:
|
||||
path: /var/lib/grafana/dashboards/storage
|
||||
- name: gpu
|
||||
orgId: 1
|
||||
folder: Atlas Internal
|
||||
type: file
|
||||
disableDeletion: false
|
||||
editable: true
|
||||
options:
|
||||
path: /var/lib/grafana/dashboards/gpu
|
||||
- name: network
|
||||
orgId: 1
|
||||
folder: Atlas Internal
|
||||
type: file
|
||||
disableDeletion: false
|
||||
editable: true
|
||||
options:
|
||||
path: /var/lib/grafana/dashboards/network
|
||||
dashboardsConfigMaps:
|
||||
overview: grafana-dashboard-overview
|
||||
pods: grafana-dashboard-pods
|
||||
nodes: grafana-dashboard-nodes
|
||||
storage: grafana-dashboard-storage
|
||||
gpu: grafana-dashboard-gpu
|
||||
network: grafana-dashboard-network
|
||||
extraConfigmapMounts:
|
||||
- name: grafana-folders
|
||||
mountPath: /etc/grafana/provisioning/folders
|
||||
configMap: grafana-folders
|
||||
readOnly: true
|
||||
|
||||
---
|
||||
|
||||
apiVersion: helm.toolkit.fluxcd.io/v2
|
||||
kind: HelmRelease
|
||||
metadata:
|
||||
name: alertmanager
|
||||
namespace: monitoring
|
||||
spec:
|
||||
interval: 15m
|
||||
chart:
|
||||
spec:
|
||||
chart: alertmanager
|
||||
version: "~1.9.0"
|
||||
sourceRef:
|
||||
kind: HelmRepository
|
||||
name: prometheus
|
||||
namespace: flux-system
|
||||
values:
|
||||
ingress:
|
||||
enabled: true
|
||||
ingressClassName: traefik
|
||||
annotations:
|
||||
cert-manager.io/cluster-issuer: letsencrypt
|
||||
hosts:
|
||||
- host: alerts.bstein.dev
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
tls:
|
||||
- secretName: alerts-bstein-dev-tls
|
||||
hosts:
|
||||
- alerts.bstein.dev
|
||||
config:
|
||||
global:
|
||||
resolve_timeout: 5m
|
||||
route:
|
||||
receiver: default
|
||||
group_wait: 30s
|
||||
group_interval: 5m
|
||||
repeat_interval: 2h
|
||||
receivers:
|
||||
- name: default
|
||||
|
||||
@ -5,12 +5,4 @@ namespace: monitoring
|
||||
resources:
|
||||
- namespace.yaml
|
||||
- rbac.yaml
|
||||
- grafana-dashboard-overview.yaml
|
||||
- grafana-dashboard-pods.yaml
|
||||
- grafana-dashboard-nodes.yaml
|
||||
- grafana-dashboard-storage.yaml
|
||||
- grafana-dashboard-network.yaml
|
||||
- grafana-dashboard-gpu.yaml
|
||||
- dcgm-exporter.yaml
|
||||
- grafana-folders.yaml
|
||||
- helmrelease.yaml
|
||||
|
||||
@ -1,48 +0,0 @@
|
||||
# services/nextcloud/configmap.yaml
|
||||
apiVersion: v1
|
||||
kind: ConfigMap
|
||||
metadata:
|
||||
name: nextcloud-config
|
||||
namespace: nextcloud
|
||||
data:
|
||||
extra.config.php: |
|
||||
<?php
|
||||
$CONFIG = array (
|
||||
'trusted_domains' =>
|
||||
array (
|
||||
0 => 'cloud.bstein.dev',
|
||||
),
|
||||
'overwritehost' => 'cloud.bstein.dev',
|
||||
'overwriteprotocol' => 'https',
|
||||
'overwrite.cli.url' => 'https://cloud.bstein.dev',
|
||||
'default_phone_region' => 'US',
|
||||
'mail_smtpmode' => 'smtp',
|
||||
'mail_sendmailmode' => 'smtp',
|
||||
'mail_smtphost' => 'mail.bstein.dev',
|
||||
'mail_smtpport' => '587',
|
||||
'mail_smtpsecure' => 'tls',
|
||||
'mail_smtpauth' => true,
|
||||
'mail_smtpauthtype' => 'LOGIN',
|
||||
'mail_domain' => 'bstein.dev',
|
||||
'mail_from_address' => 'no-reply',
|
||||
'oidc_login_provider_url' => 'https://sso.bstein.dev/realms/atlas',
|
||||
'oidc_login_client_id' => getenv('OIDC_CLIENT_ID'),
|
||||
'oidc_login_client_secret' => getenv('OIDC_CLIENT_SECRET'),
|
||||
'oidc_login_auto_redirect' => false,
|
||||
'oidc_login_end_session_redirect' => true,
|
||||
'oidc_login_button_text' => 'Login with Keycloak',
|
||||
'oidc_login_hide_password_form' => false,
|
||||
'oidc_login_attributes' =>
|
||||
array (
|
||||
'id' => 'preferred_username',
|
||||
'mail' => 'email',
|
||||
'name' => 'name',
|
||||
),
|
||||
'oidc_login_scope' => 'openid profile email',
|
||||
'oidc_login_unique_id' => 'preferred_username',
|
||||
'oidc_login_use_pkce' => true,
|
||||
'oidc_login_disable_registration' => false,
|
||||
'oidc_login_create_groups' => false,
|
||||
# External storage for user data should be configured to Asteria via the External Storage app (admin UI),
|
||||
# keeping the astreae PVC for app internals only.
|
||||
);
|
||||
@ -1,32 +0,0 @@
|
||||
# services/nextcloud/cronjob.yaml
|
||||
apiVersion: batch/v1
|
||||
kind: CronJob
|
||||
metadata:
|
||||
name: nextcloud-cron
|
||||
namespace: nextcloud
|
||||
spec:
|
||||
schedule: "*/5 * * * *"
|
||||
concurrencyPolicy: Forbid
|
||||
jobTemplate:
|
||||
spec:
|
||||
template:
|
||||
spec:
|
||||
securityContext:
|
||||
runAsUser: 33
|
||||
runAsGroup: 33
|
||||
fsGroup: 33
|
||||
restartPolicy: OnFailure
|
||||
containers:
|
||||
- name: nextcloud-cron
|
||||
image: nextcloud:29-apache
|
||||
imagePullPolicy: IfNotPresent
|
||||
command: ["/bin/sh", "-c"]
|
||||
args:
|
||||
- "cd /var/www/html && php -f cron.php"
|
||||
volumeMounts:
|
||||
- name: nextcloud-data
|
||||
mountPath: /var/www/html
|
||||
volumes:
|
||||
- name: nextcloud-data
|
||||
persistentVolumeClaim:
|
||||
claimName: nextcloud-data
|
||||
@ -1,143 +0,0 @@
|
||||
# services/nextcloud/deployment.yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: nextcloud
|
||||
namespace: nextcloud
|
||||
labels:
|
||||
app: nextcloud
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app: nextcloud
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: nextcloud
|
||||
spec:
|
||||
nodeSelector:
|
||||
hardware: rpi5
|
||||
securityContext:
|
||||
fsGroup: 33
|
||||
runAsUser: 33
|
||||
runAsGroup: 33
|
||||
initContainers:
|
||||
- name: fix-perms
|
||||
image: alpine:3.20
|
||||
command: ["/bin/sh", "-c"]
|
||||
args:
|
||||
- |
|
||||
chown -R 33:33 /var/www/html/config || true
|
||||
chown -R 33:33 /var/www/html/data || true
|
||||
securityContext:
|
||||
runAsUser: 0
|
||||
runAsGroup: 0
|
||||
volumeMounts:
|
||||
- name: nextcloud-data
|
||||
mountPath: /var/www/html
|
||||
- name: nextcloud-config
|
||||
mountPath: /var/www/html/config/extra.config.php
|
||||
subPath: extra.config.php
|
||||
containers:
|
||||
- name: nextcloud
|
||||
image: nextcloud:29-apache
|
||||
imagePullPolicy: IfNotPresent
|
||||
env:
|
||||
# DB (external secret required: nextcloud-db with keys username,password,database)
|
||||
- name: POSTGRES_HOST
|
||||
value: postgres-service.postgres.svc.cluster.local
|
||||
- name: POSTGRES_DB
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: nextcloud-db
|
||||
key: database
|
||||
- name: POSTGRES_USER
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: nextcloud-db
|
||||
key: db-username
|
||||
- name: POSTGRES_PASSWORD
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: nextcloud-db
|
||||
key: db-password
|
||||
# Admin bootstrap (external secret: nextcloud-admin with keys admin-user, admin-password)
|
||||
- name: NEXTCLOUD_ADMIN_USER
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: nextcloud-admin
|
||||
key: admin-user
|
||||
- name: NEXTCLOUD_ADMIN_PASSWORD
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: nextcloud-admin
|
||||
key: admin-password
|
||||
- name: NEXTCLOUD_TRUSTED_DOMAINS
|
||||
value: cloud.bstein.dev
|
||||
- name: OVERWRITEHOST
|
||||
value: cloud.bstein.dev
|
||||
- name: OVERWRITEPROTOCOL
|
||||
value: https
|
||||
- name: OVERWRITECLIURL
|
||||
value: https://cloud.bstein.dev
|
||||
# SMTP (external secret: nextcloud-smtp with keys username, password)
|
||||
- name: SMTP_HOST
|
||||
value: mail.bstein.dev
|
||||
- name: SMTP_PORT
|
||||
value: "587"
|
||||
- name: SMTP_SECURE
|
||||
value: tls
|
||||
- name: SMTP_NAME
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: nextcloud-smtp
|
||||
key: smtp-username
|
||||
- name: SMTP_PASSWORD
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: nextcloud-smtp
|
||||
key: smtp-password
|
||||
- name: MAIL_FROM_ADDRESS
|
||||
value: no-reply
|
||||
- name: MAIL_DOMAIN
|
||||
value: bstein.dev
|
||||
# OIDC (external secret: nextcloud-oidc with keys client-id, client-secret)
|
||||
- name: OIDC_CLIENT_ID
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: nextcloud-oidc
|
||||
key: client-id
|
||||
- name: OIDC_CLIENT_SECRET
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: nextcloud-oidc
|
||||
key: client-secret
|
||||
- name: NEXTCLOUD_UPDATE
|
||||
value: "1"
|
||||
- name: APP_INSTALL
|
||||
value: "mail,oidc_login,external"
|
||||
ports:
|
||||
- containerPort: 80
|
||||
name: http
|
||||
volumeMounts:
|
||||
- name: nextcloud-data
|
||||
mountPath: /var/www/html
|
||||
- name: nextcloud-config
|
||||
mountPath: /var/www/html/config/extra.config.php
|
||||
subPath: extra.config.php
|
||||
resources:
|
||||
requests:
|
||||
cpu: 250m
|
||||
memory: 1Gi
|
||||
limits:
|
||||
cpu: 1
|
||||
memory: 3Gi
|
||||
volumes:
|
||||
- name: nextcloud-data
|
||||
persistentVolumeClaim:
|
||||
claimName: nextcloud-data
|
||||
- name: nextcloud-config
|
||||
configMap:
|
||||
name: nextcloud-config
|
||||
defaultMode: 0444
|
||||
@ -1,25 +0,0 @@
|
||||
# services/nextcloud/ingress.yaml
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: Ingress
|
||||
metadata:
|
||||
name: nextcloud
|
||||
namespace: nextcloud
|
||||
annotations:
|
||||
cert-manager.io/cluster-issuer: letsencrypt-prod
|
||||
traefik.ingress.kubernetes.io/router.entrypoints: websecure
|
||||
spec:
|
||||
tls:
|
||||
- hosts:
|
||||
- cloud.bstein.dev
|
||||
secretName: nextcloud-tls
|
||||
rules:
|
||||
- host: cloud.bstein.dev
|
||||
http:
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service:
|
||||
name: nextcloud
|
||||
port:
|
||||
number: 80
|
||||
@ -1,25 +0,0 @@
|
||||
# services/nextcloud/kustomization.yaml
|
||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||
kind: Kustomization
|
||||
namespace: nextcloud
|
||||
resources:
|
||||
- namespace.yaml
|
||||
- configmap.yaml
|
||||
- pvc.yaml
|
||||
- deployment.yaml
|
||||
- service.yaml
|
||||
- ingress.yaml
|
||||
- cronjob.yaml
|
||||
- mail-sync-cronjob.yaml
|
||||
- maintenance-cronjob.yaml
|
||||
configMapGenerator:
|
||||
- name: nextcloud-maintenance-script
|
||||
files:
|
||||
- maintenance.sh=../../scripts/nextcloud-maintenance.sh
|
||||
options:
|
||||
disableNameSuffixHash: true
|
||||
- name: nextcloud-mail-sync-script
|
||||
files:
|
||||
- sync.sh=../../scripts/nextcloud-mail-sync.sh
|
||||
options:
|
||||
disableNameSuffixHash: true
|
||||
@ -1,58 +0,0 @@
|
||||
# services/nextcloud/mail-sync-cronjob.yaml
|
||||
apiVersion: batch/v1
|
||||
kind: CronJob
|
||||
metadata:
|
||||
name: nextcloud-mail-sync
|
||||
namespace: nextcloud
|
||||
spec:
|
||||
schedule: "0 5 * * *"
|
||||
concurrencyPolicy: Forbid
|
||||
jobTemplate:
|
||||
spec:
|
||||
template:
|
||||
spec:
|
||||
restartPolicy: OnFailure
|
||||
securityContext:
|
||||
runAsUser: 0
|
||||
runAsGroup: 0
|
||||
containers:
|
||||
- name: mail-sync
|
||||
image: nextcloud:29-apache
|
||||
imagePullPolicy: IfNotPresent
|
||||
command: ["/bin/bash", "/sync/sync.sh"]
|
||||
env:
|
||||
- name: KC_BASE
|
||||
value: https://sso.bstein.dev
|
||||
- name: KC_REALM
|
||||
value: atlas
|
||||
- name: KC_ADMIN_USER
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: nextcloud-keycloak-admin
|
||||
key: username
|
||||
- name: KC_ADMIN_PASS
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: nextcloud-keycloak-admin
|
||||
key: password
|
||||
volumeMounts:
|
||||
- name: nextcloud-data
|
||||
mountPath: /var/www/html
|
||||
- name: sync-script
|
||||
mountPath: /sync/sync.sh
|
||||
subPath: sync.sh
|
||||
resources:
|
||||
requests:
|
||||
cpu: 100m
|
||||
memory: 256Mi
|
||||
limits:
|
||||
cpu: 500m
|
||||
memory: 512Mi
|
||||
volumes:
|
||||
- name: nextcloud-data
|
||||
persistentVolumeClaim:
|
||||
claimName: nextcloud-data
|
||||
- name: sync-script
|
||||
configMap:
|
||||
name: nextcloud-mail-sync-script
|
||||
defaultMode: 0755
|
||||
@ -1,56 +0,0 @@
|
||||
# services/nextcloud/maintenance-cronjob.yaml
|
||||
apiVersion: batch/v1
|
||||
kind: CronJob
|
||||
metadata:
|
||||
name: nextcloud-maintenance
|
||||
namespace: nextcloud
|
||||
spec:
|
||||
schedule: "30 4 * * *"
|
||||
concurrencyPolicy: Forbid
|
||||
jobTemplate:
|
||||
spec:
|
||||
template:
|
||||
spec:
|
||||
restartPolicy: OnFailure
|
||||
securityContext:
|
||||
runAsUser: 0
|
||||
runAsGroup: 0
|
||||
containers:
|
||||
- name: maintenance
|
||||
image: nextcloud:29-apache
|
||||
imagePullPolicy: IfNotPresent
|
||||
command: ["/bin/bash", "/maintenance/maintenance.sh"]
|
||||
env:
|
||||
- name: NC_URL
|
||||
value: https://cloud.bstein.dev
|
||||
- name: ADMIN_USER
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: nextcloud-admin
|
||||
key: admin-user
|
||||
- name: ADMIN_PASS
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: nextcloud-admin
|
||||
key: admin-password
|
||||
volumeMounts:
|
||||
- name: nextcloud-data
|
||||
mountPath: /var/www/html
|
||||
- name: maintenance-script
|
||||
mountPath: /maintenance/maintenance.sh
|
||||
subPath: maintenance.sh
|
||||
resources:
|
||||
requests:
|
||||
cpu: 100m
|
||||
memory: 256Mi
|
||||
limits:
|
||||
cpu: 500m
|
||||
memory: 512Mi
|
||||
volumes:
|
||||
- name: nextcloud-data
|
||||
persistentVolumeClaim:
|
||||
claimName: nextcloud-data
|
||||
- name: maintenance-script
|
||||
configMap:
|
||||
name: nextcloud-maintenance-script
|
||||
defaultMode: 0755
|
||||
@ -1,5 +0,0 @@
|
||||
# services/nextcloud/namespace.yaml
|
||||
apiVersion: v1
|
||||
kind: Namespace
|
||||
metadata:
|
||||
name: nextcloud
|
||||
@ -1,13 +0,0 @@
|
||||
# services/nextcloud/pvc.yaml
|
||||
apiVersion: v1
|
||||
kind: PersistentVolumeClaim
|
||||
metadata:
|
||||
name: nextcloud-data
|
||||
namespace: nextcloud
|
||||
spec:
|
||||
accessModes:
|
||||
- ReadWriteMany
|
||||
resources:
|
||||
requests:
|
||||
storage: 200Gi
|
||||
storageClassName: astreae
|
||||
@ -1,13 +0,0 @@
|
||||
# services/nextcloud/service.yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: nextcloud
|
||||
namespace: nextcloud
|
||||
spec:
|
||||
selector:
|
||||
app: nextcloud
|
||||
ports:
|
||||
- name: http
|
||||
port: 80
|
||||
targetPort: http
|
||||
@ -1,82 +0,0 @@
|
||||
# services/oauth2-proxy/deployment.yaml
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: oauth2-proxy
|
||||
namespace: sso
|
||||
labels:
|
||||
app: oauth2-proxy
|
||||
spec:
|
||||
replicas: 2
|
||||
selector:
|
||||
matchLabels:
|
||||
app: oauth2-proxy
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app: oauth2-proxy
|
||||
spec:
|
||||
nodeSelector:
|
||||
node-role.kubernetes.io/worker: "true"
|
||||
affinity:
|
||||
nodeAffinity:
|
||||
preferredDuringSchedulingIgnoredDuringExecution:
|
||||
- weight: 90
|
||||
preference:
|
||||
matchExpressions:
|
||||
- key: hardware
|
||||
operator: In
|
||||
values: ["rpi5","rpi4"]
|
||||
containers:
|
||||
- name: oauth2-proxy
|
||||
image: quay.io/oauth2-proxy/oauth2-proxy:v7.6.0
|
||||
imagePullPolicy: IfNotPresent
|
||||
args:
|
||||
- --provider=oidc
|
||||
- --redirect-url=https://auth.bstein.dev/oauth2/callback
|
||||
- --oidc-issuer-url=https://sso.bstein.dev/realms/atlas
|
||||
- --scope=openid profile email groups
|
||||
- --email-domain=*
|
||||
- --set-xauthrequest=true
|
||||
- --pass-access-token=true
|
||||
- --set-authorization-header=true
|
||||
- --cookie-secure=true
|
||||
- --cookie-samesite=lax
|
||||
- --cookie-refresh=20m
|
||||
- --cookie-expire=168h
|
||||
- --upstream=static://200
|
||||
- --http-address=0.0.0.0:4180
|
||||
- --skip-provider-button=true
|
||||
- --skip-jwt-bearer-tokens=true
|
||||
- --oidc-groups-claim=groups
|
||||
env:
|
||||
- name: OAUTH2_PROXY_CLIENT_ID
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: oauth2-proxy-oidc
|
||||
key: client_id
|
||||
- name: OAUTH2_PROXY_CLIENT_SECRET
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: oauth2-proxy-oidc
|
||||
key: client_secret
|
||||
- name: OAUTH2_PROXY_COOKIE_SECRET
|
||||
valueFrom:
|
||||
secretKeyRef:
|
||||
name: oauth2-proxy-oidc
|
||||
key: cookie_secret
|
||||
ports:
|
||||
- containerPort: 4180
|
||||
name: http
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /ping
|
||||
port: 4180
|
||||
initialDelaySeconds: 5
|
||||
periodSeconds: 10
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /ping
|
||||
port: 4180
|
||||
initialDelaySeconds: 20
|
||||
periodSeconds: 20
|
||||
@ -1,25 +0,0 @@
|
||||
# services/oauth2-proxy/ingress.yaml
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: Ingress
|
||||
metadata:
|
||||
name: oauth2-proxy
|
||||
namespace: sso
|
||||
annotations:
|
||||
cert-manager.io/cluster-issuer: letsencrypt
|
||||
traefik.ingress.kubernetes.io/router.middlewares: sso-oauth2-proxy-errors@kubernetescrd
|
||||
spec:
|
||||
ingressClassName: traefik
|
||||
rules:
|
||||
- host: auth.bstein.dev
|
||||
http:
|
||||
paths:
|
||||
- path: /
|
||||
pathType: Prefix
|
||||
backend:
|
||||
service:
|
||||
name: oauth2-proxy
|
||||
port:
|
||||
number: 80
|
||||
tls:
|
||||
- hosts: [auth.bstein.dev]
|
||||
secretName: auth-tls
|
||||
@ -1,10 +0,0 @@
|
||||
# services/oauth2-proxy/kustomization.yaml
|
||||
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||
kind: Kustomization
|
||||
namespace: sso
|
||||
resources:
|
||||
- deployment.yaml
|
||||
- service.yaml
|
||||
- ingress.yaml
|
||||
- middleware.yaml
|
||||
- middleware-errors.yaml
|
||||
@ -1,15 +0,0 @@
|
||||
# services/oauth2-proxy/middleware-errors.yaml
|
||||
apiVersion: traefik.io/v1alpha1
|
||||
kind: Middleware
|
||||
metadata:
|
||||
name: oauth2-proxy-errors
|
||||
namespace: sso
|
||||
spec:
|
||||
errors:
|
||||
status:
|
||||
- "401"
|
||||
- "403"
|
||||
service:
|
||||
name: oauth2-proxy
|
||||
port: 80
|
||||
query: /oauth2/start?rd={url}
|
||||
@ -1,15 +0,0 @@
|
||||
# services/oauth2-proxy/middleware.yaml
|
||||
apiVersion: traefik.io/v1alpha1
|
||||
kind: Middleware
|
||||
metadata:
|
||||
name: oauth2-proxy-forward-auth
|
||||
namespace: sso
|
||||
spec:
|
||||
forwardAuth:
|
||||
address: http://oauth2-proxy.sso.svc.cluster.local:4180/oauth2/auth
|
||||
trustForwardHeader: true
|
||||
authResponseHeaders:
|
||||
- Authorization
|
||||
- X-Auth-Request-Email
|
||||
- X-Auth-Request-User
|
||||
- X-Auth-Request-Groups
|
||||
@ -1,15 +0,0 @@
|
||||
# services/oauth2-proxy/service.yaml
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: oauth2-proxy
|
||||
namespace: sso
|
||||
labels:
|
||||
app: oauth2-proxy
|
||||
spec:
|
||||
selector:
|
||||
app: oauth2-proxy
|
||||
ports:
|
||||
- name: http
|
||||
port: 80
|
||||
targetPort: 4180
|
||||
@ -8,7 +8,7 @@ metadata:
|
||||
kubernetes.io/ingress.class: traefik
|
||||
traefik.ingress.kubernetes.io/router.entrypoints: websecure
|
||||
traefik.ingress.kubernetes.io/router.tls: "true"
|
||||
cert-manager.io/cluster-issuer: letsencrypt
|
||||
cert-manager.io/cluster-issuer: letsencrypt-prod
|
||||
spec:
|
||||
tls:
|
||||
- hosts: [ "pegasus.bstein.dev" ]
|
||||
|
||||
@ -8,7 +8,7 @@ spec:
|
||||
secretName: vault-server-tls
|
||||
issuerRef:
|
||||
kind: ClusterIssuer
|
||||
name: letsencrypt
|
||||
name: letsencrypt-prod
|
||||
commonName: secret.bstein.dev
|
||||
dnsNames:
|
||||
- secret.bstein.dev
|
||||
|
||||
@ -7,6 +7,7 @@ metadata:
|
||||
annotations:
|
||||
kubernetes.io/ingress.class: traefik
|
||||
traefik.ingress.kubernetes.io/router.entrypoints: websecure
|
||||
traefik.ingress.kubernetes.io/router.middlewares: vault-vault-basicauth@kubernetescrd
|
||||
traefik.ingress.kubernetes.io/service.serversscheme: https
|
||||
traefik.ingress.kubernetes.io/service.serversTransport: vault-vault-to-https@kubernetescrd
|
||||
spec:
|
||||
|
||||
@ -7,4 +7,5 @@ resources:
|
||||
- helmrelease.yaml
|
||||
- certificate.yaml
|
||||
- ingress.yaml
|
||||
- middleware.yaml
|
||||
- serverstransport.yaml
|
||||
|
||||
9
services/vault/middleware.yaml
Normal file
9
services/vault/middleware.yaml
Normal file
@ -0,0 +1,9 @@
|
||||
# services/vault/middleware.yaml
|
||||
apiVersion: traefik.io/v1alpha1
|
||||
kind: Middleware
|
||||
metadata:
|
||||
name: vault-basicauth
|
||||
namespace: vault
|
||||
spec:
|
||||
basicAuth:
|
||||
secret: vault-basic-auth
|
||||
@ -5,7 +5,7 @@ metadata:
|
||||
name: zot
|
||||
namespace: zot
|
||||
annotations:
|
||||
cert-manager.io/cluster-issuer: letsencrypt
|
||||
cert-manager.io/cluster-issuer: letsencrypt-prod
|
||||
traefik.ingress.kubernetes.io/router.entrypoints: websecure
|
||||
traefik.ingress.kubernetes.io/router.tls: "true"
|
||||
traefik.ingress.kubernetes.io/router.middlewares: zot-zot-resp-headers@kubernetescrd
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user