feature/mailu #5
6
.gitignore
vendored
6
.gitignore
vendored
@ -1 +1,5 @@
|
|||||||
AGENTS.md
|
# Ignore markdown by default, but keep top-level docs
|
||||||
|
*.md
|
||||||
|
!README.md
|
||||||
|
!AGENTS.md
|
||||||
|
!**/NOTES.md
|
||||||
|
|||||||
81
AGENTS.md
Normal file
81
AGENTS.md
Normal file
@ -0,0 +1,81 @@
|
|||||||
|
|
||||||
|
|
||||||
|
Repository Guidelines
|
||||||
|
|
||||||
|
> Local-only note: apply changes through Flux-tracked manifests, not by manual kubectl edits in-cluster—manual tweaks will be reverted by Flux.
|
||||||
|
|
||||||
|
## Project Structure & Module Organization
|
||||||
|
- `infrastructure/`: cluster-scoped building blocks (core, flux-system, traefik, longhorn). Add new platform features by mirroring this layout.
|
||||||
|
- `services/`: workload manifests per app (`services/gitea/`, etc.) with `kustomization.yaml` plus one file per kind; keep diffs small and focused.
|
||||||
|
- `dockerfiles/` hosts bespoke images, while `scripts/` stores operational Fish/Bash helpers—extend these directories instead of relying on ad-hoc commands.
|
||||||
|
|
||||||
|
## Build, Test, and Development Commands
|
||||||
|
- `kustomize build services/<app>` (or `kubectl kustomize ...`) renders manifests exactly as Flux will.
|
||||||
|
- `kubectl apply --server-side --dry-run=client -k services/<app>` checks schema compatibility without touching the cluster.
|
||||||
|
- `flux reconcile kustomization <name> --namespace flux-system --with-source` pulls the latest Git state after merges or hotfixes.
|
||||||
|
- `fish scripts/flux_hammer.fish --help` explains the recovery tool; read it before running against production workloads.
|
||||||
|
|
||||||
|
## Coding Style & Naming Conventions
|
||||||
|
- YAML uses two-space indents; retain the leading path comment (e.g. `# services/gitea/deployment.yaml`) to speed code review.
|
||||||
|
- Keep resource names lowercase kebab-case, align labels/selectors, and mirror namespaces with directory names.
|
||||||
|
- List resources in `kustomization.yaml` from namespace/config, through storage, then workloads and networking for predictable diffs.
|
||||||
|
- Scripts start with `#!/usr/bin/env fish` or bash, stay executable, and follow snake_case names such as `flux_hammer.fish`.
|
||||||
|
|
||||||
|
## Testing Guidelines
|
||||||
|
- Run `kustomize build` and the dry-run apply for every service you touch; capture failures before opening a PR.
|
||||||
|
- `flux diff kustomization <name> --path services/<app>` previews reconciliations—link notable output when behavior shifts.
|
||||||
|
- Docker edits: `docker build -f dockerfiles/Dockerfile.monerod .` (swap the file you changed) to verify image builds.
|
||||||
|
|
||||||
|
## Commit & Pull Request Guidelines
|
||||||
|
- Keep commit subjects short, present-tense, and optionally scoped (`gpu(titan-24): add RuntimeClass`); squash fixups before review.
|
||||||
|
- Describe linked issues, affected services, and required operator steps (e.g. `flux reconcile kustomization services-gitea`) in the PR body.
|
||||||
|
- Focus each PR on one kustomization or service and update `infrastructure/flux-system` when Flux must track new folders.
|
||||||
|
- Record the validation you ran (dry-runs, diffs, builds) and add screenshots only when ingress or UI behavior changes.
|
||||||
|
|
||||||
|
## Security & Configuration Tips
|
||||||
|
- Never commit credentials; use Vault workflows (`services/vault/`) or SOPS-encrypted manifests wired through `infrastructure/flux-system`.
|
||||||
|
- Node selectors and tolerations gate workloads to hardware like `hardware: rpi4`; confirm labels before scaling or renaming nodes.
|
||||||
|
- Pin external images by digest or rely on Flux image automation to follow approved tags and avoid drift.
|
||||||
|
|
||||||
|
## Dashboard roadmap / context (2025-12-02)
|
||||||
|
- Atlas dashboards are generated via `scripts/dashboards_render_atlas.py --build`, which writes JSON under `services/monitoring/dashboards/` and ConfigMaps under `services/monitoring/`. Keep the Grafana manifests in sync by regenerating after edits.
|
||||||
|
- Atlas Overview panels are paired with internal dashboards (pods, nodes, storage, network, GPU). A new `atlas-gpu` internal dashboard holds the detailed GPU metrics that feed the overview share pie.
|
||||||
|
- Old Grafana folders (`Atlas Storage`, `Atlas SRE`, `Atlas Public`, `Atlas Nodes`) should be removed in Grafana UI when convenient; only `Atlas Overview` and `Atlas Internal` should remain provisioned.
|
||||||
|
- Future work: add a separate generator (e.g., `dashboards_render_oceanus.py`) for SUI/oceanus validation dashboards, mirroring the atlas pattern of internal dashboards feeding a public overview.
|
||||||
|
|
||||||
|
## Monitoring state (2025-12-03)
|
||||||
|
- dcgm-exporter DaemonSet pulls `registry.bstein.dev/monitoring/dcgm-exporter:4.4.2-4.7.0-ubuntu22.04` with nvidia runtime/imagePullSecret; titan-24 exports metrics, titan-22 remains NotReady.
|
||||||
|
- Atlas Overview is the Grafana home (1h range, 1m refresh), Overview folder UID `overview`, internal folder `atlas-internal` (oceanus-internal stub).
|
||||||
|
- Panels standardized via generator; hottest row compressed, worker/control rows taller, root disk row taller and top12 bar gauge with labels. GPU share pie uses 1h avg_over_time to persist idle activity.
|
||||||
|
- Internal dashboards are provisioned without Viewer role; if anonymous still sees them, restart Grafana and tighten auth if needed.
|
||||||
|
- GPU share panel updated (feature/sso) to use `max_over_time(…[$__range])`, so longer ranges (e.g., 12h) keep recent activity visible. Flux tracking `feature/sso`.
|
||||||
|
|
||||||
|
## Upcoming priorities (SSO/storage/mail)
|
||||||
|
- Establish SSO (Keycloak or similar) and federate Grafana, Gitea, Zot, Nextcloud, Pegasus/Jellyfin; keep Vaultwarden separate until safe.
|
||||||
|
- Add Nextcloud (limit to rpi5 workers) with office suite; integrate with SSO; plan storage class and ingress.
|
||||||
|
- Plan mail: mostly self-hosted, relay through trusted provider for outbound; integrate with services (Nextcloud, Vaultwarden, etc.) for notifications and account flows.
|
||||||
|
|
||||||
|
## SSO plan sketch (2025-12-03)
|
||||||
|
- IdP: use Keycloak (preferred) in a new `sso` namespace, Bitnami or codecentric chart with Postgres backing store (single PVC), ingress `sso.bstein.dev`, admin user bound to brad@bstein.dev; stick with local DB initially (no external IdP).
|
||||||
|
- Auth flow goals: Grafana (OIDC), Gitea (OAuth2/Keycloak), Zot (via Traefik forward-auth/oauth2-proxy), Jellyfin/Pegasus via Jellyfin OAuth/OpenID plugin (map existing usernames; run migration to pre-create users in Keycloak with same usernames/emails and temporary passwords), Pegasus keeps using Jellyfin tokens.
|
||||||
|
- Steps to implement:
|
||||||
|
1) Add service folder `services/keycloak/` (namespace, PVC, HelmRelease, ingress, secret for admin creds). Verify with kustomize + Flux reconcile.
|
||||||
|
2) Seed realm `atlas` with users (import CSV/realm). Create client for Grafana (public/implicit), Gitea (confidential), and a “jellyfin” client for the OAuth plugin; set email for brad@bstein.dev as admin.
|
||||||
|
3) Reconfigure Grafana to OIDC (disable anonymous to internal folders, leave Overview public via folder permissions). Reconfigure Gitea to OIDC (app.ini).
|
||||||
|
4) Add Traefik forward-auth (oauth2-proxy) in front of Zot and any other services needing headers-based auth.
|
||||||
|
5) Deploy Jellyfin OpenID plugin; map Keycloak users to existing Jellyfin usernames; communicate password reset path.
|
||||||
|
- Migration caution: do not delete existing local creds until SSO validated; keep Pegasus working via Jellyfin tokens during transition.
|
||||||
|
|
||||||
|
## Postgres centralization (2025-12-03)
|
||||||
|
- Prefer a shared in-cluster Postgres deployment with per-service databases to reduce resource sprawl on Pi nodes. Use it for services that can easily point at an external DB.
|
||||||
|
- Candidates to migrate to shared Postgres: Keycloak (realm DB), Gitea (git DB), Nextcloud (app DB), possibly Grafana (if persistence needed beyond current provisioner), Jitsi prosody/JVB state (if external DB supported). Keep tightly-coupled or lightweight embedded DBs as-is when migration is painful or not supported.
|
||||||
|
|
||||||
|
## SSO integration snapshot (2025-12-08)
|
||||||
|
- Current blockers: Zot still prompts for basic auth/double-login; Vault still wants the token UI after Keycloak (previously 502/404 when vault-0 sealed). Forward-auth middleware on Zot Ingress likely still causing the 401/Found hop; Vault OIDC mount not completing UI flow unless unsealed and preferred login is set.
|
||||||
|
- Flux-only changes required: remove zot forward-auth middleware from Ingress (let oauth2-proxy handle redirect), ensure Vault OIDC mount is preferred UI login and bound to admin group; keep all edits in repo so Flux enforces them.
|
||||||
|
- Secrets present (per user): `zot-oidc-client` (client_secret only), `oauth2-proxy-zot-oidc`, `oauth2-proxy-vault-oidc`, `vault-oidc-admin-token`. Zot needs its regcred in the zot namespace if image pulls fail.
|
||||||
|
- Cluster validation blocked here: `kubectl get nodes` fails (403/permission) and DNS to `*.bstein.dev` fails in this session, so no live curl verification could be run. Re-test on a host with cluster/DNS access after Flux applies fixes.
|
||||||
|
|
||||||
|
## Docs hygiene
|
||||||
|
- Do not add per-service `README.md` files; use `NOTES.md` if documentation is needed inside service folders. Keep only the top-level repo README.
|
||||||
|
- Keep comments succinct and in a human voice—no AI-sounding notes. Use `NOTES.md` for scratch notes instead of sprinkling reminders into code or extra READMEs.
|
||||||
3
NOTES.md
Normal file
3
NOTES.md
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
# Rotation reminders (temporary secrets set by automation)
|
||||||
|
|
||||||
|
- Weave GitOps UI (`cd.bstein.dev`) admin: `admin` / `G1tOps!2025` — rotate immediately after first login.
|
||||||
3
README.md
Normal file
3
README.md
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
# titan-iac
|
||||||
|
|
||||||
|
Flux-managed Kubernetes cluster for bstein.dev services.
|
||||||
@ -15,3 +15,4 @@ resources:
|
|||||||
- sui-metrics/kustomization.yaml
|
- sui-metrics/kustomization.yaml
|
||||||
- keycloak/kustomization.yaml
|
- keycloak/kustomization.yaml
|
||||||
- oauth2-proxy/kustomization.yaml
|
- oauth2-proxy/kustomization.yaml
|
||||||
|
- mailu/kustomization.yaml
|
||||||
|
|||||||
@ -0,0 +1,18 @@
|
|||||||
|
# clusters/atlas/flux-system/applications/mailu/kustomization.yaml
|
||||||
|
apiVersion: kustomize.toolkit.fluxcd.io/v1
|
||||||
|
kind: Kustomization
|
||||||
|
metadata:
|
||||||
|
name: mailu
|
||||||
|
namespace: flux-system
|
||||||
|
spec:
|
||||||
|
interval: 10m
|
||||||
|
sourceRef:
|
||||||
|
kind: GitRepository
|
||||||
|
name: flux-system
|
||||||
|
namespace: flux-system
|
||||||
|
path: ./services/mailu
|
||||||
|
targetNamespace: mailu-mailserver
|
||||||
|
prune: true
|
||||||
|
wait: true
|
||||||
|
dependsOn:
|
||||||
|
- name: helm
|
||||||
@ -8,7 +8,7 @@ metadata:
|
|||||||
spec:
|
spec:
|
||||||
interval: 1m0s
|
interval: 1m0s
|
||||||
ref:
|
ref:
|
||||||
branch: feature/sso
|
branch: feature/mailu
|
||||||
secretRef:
|
secretRef:
|
||||||
name: flux-system-gitea
|
name: flux-system-gitea
|
||||||
url: ssh://git@scm.bstein.dev:2242/bstein/titan-iac.git
|
url: ssh://git@scm.bstein.dev:2242/bstein/titan-iac.git
|
||||||
|
|||||||
@ -0,0 +1,20 @@
|
|||||||
|
# clusters/atlas/flux-system/platform/gitops-ui/kustomization.yaml
|
||||||
|
apiVersion: kustomize.toolkit.fluxcd.io/v1
|
||||||
|
kind: Kustomization
|
||||||
|
metadata:
|
||||||
|
name: gitops-ui
|
||||||
|
namespace: flux-system
|
||||||
|
spec:
|
||||||
|
interval: 10m
|
||||||
|
timeout: 10m
|
||||||
|
path: ./services/gitops-ui
|
||||||
|
prune: true
|
||||||
|
sourceRef:
|
||||||
|
kind: GitRepository
|
||||||
|
name: flux-system
|
||||||
|
namespace: flux-system
|
||||||
|
targetNamespace: flux-system
|
||||||
|
dependsOn:
|
||||||
|
- name: helm
|
||||||
|
- name: traefik
|
||||||
|
wait: true
|
||||||
@ -5,5 +5,6 @@ resources:
|
|||||||
- core/kustomization.yaml
|
- core/kustomization.yaml
|
||||||
- helm/kustomization.yaml
|
- helm/kustomization.yaml
|
||||||
- traefik/kustomization.yaml
|
- traefik/kustomization.yaml
|
||||||
|
- gitops-ui/kustomization.yaml
|
||||||
- monitoring/kustomization.yaml
|
- monitoring/kustomization.yaml
|
||||||
- longhorn-ui/kustomization.yaml
|
- longhorn-ui/kustomization.yaml
|
||||||
|
|||||||
@ -1,5 +0,0 @@
|
|||||||
# Oceanus Cluster Scaffold
|
|
||||||
|
|
||||||
This directory prepares the Flux and Kustomize layout for a future Oceanus-managed cluster.
|
|
||||||
Populate `flux-system/` with `gotk-components.yaml` and related manifests after running `flux bootstrap`.
|
|
||||||
Define node-specific resources under `infrastructure/modules/profiles/oceanus-validator/` and reference workloads in `applications/` as they come online.
|
|
||||||
@ -2,15 +2,14 @@
|
|||||||
|
|
||||||
| Hostname | Role / Function | Managed By | Notes |
|
| Hostname | Role / Function | Managed By | Notes |
|
||||||
|------------|--------------------------------|---------------------|-------|
|
|------------|--------------------------------|---------------------|-------|
|
||||||
|
| titan-db | HA control plane database | Ansible | PostgreSQL / etcd backing services |
|
||||||
| titan-0a | Kubernetes control-plane | Flux (atlas cluster)| HA leader, tainted for control only |
|
| titan-0a | Kubernetes control-plane | Flux (atlas cluster)| HA leader, tainted for control only |
|
||||||
| titan-0b | Kubernetes control-plane | Flux (atlas cluster)| Standby control node |
|
| titan-0b | Kubernetes control-plane | Flux (atlas cluster)| Standby control node |
|
||||||
| titan-0c | Kubernetes control-plane | Flux (atlas cluster)| Standby control node |
|
| titan-0c | Kubernetes control-plane | Flux (atlas cluster)| Standby control node |
|
||||||
| titan-04-19| Raspberry Pi workers | Flux (atlas cluster)| Workload nodes, labelled per hardware |
|
| titan-04-19| Raspberry Pi workers | Flux (atlas cluster)| Workload nodes, labelled per hardware |
|
||||||
|
| titan-20&21| NVIDIA Jetson workers | Flux (atlas cluster)| Workload nodes, labelled per hardware |
|
||||||
| titan-22 | GPU mini-PC (Jellyfin) | Flux + Ansible | NVIDIA runtime managed via `modules/profiles/atlas-ha` |
|
| titan-22 | GPU mini-PC (Jellyfin) | Flux + Ansible | NVIDIA runtime managed via `modules/profiles/atlas-ha` |
|
||||||
|
| titan-23 | Dedicated SUI validator Oceanus| Manual + Ansible | Baremetal validator workloads, exposes metrics to atlas |
|
||||||
| titan-24 | Tethys hybrid node | Flux + Ansible | Runs SUI metrics via K8s, validator via Ansible |
|
| titan-24 | Tethys hybrid node | Flux + Ansible | Runs SUI metrics via K8s, validator via Ansible |
|
||||||
| titan-db | HA control plane database | Ansible | PostgreSQL / etcd backing services |
|
| titan-jh | Jumphost & bastion & lesavka | Ansible | Entry point / future KVM services / custom kvm - lesavaka |
|
||||||
| titan-jh | Jumphost & bastion | Ansible | Entry point / future KVM services |
|
|
||||||
| oceanus | Dedicated SUI validator host | Ansible / Flux prep | Baremetal validator workloads, exposes metrics to atlas; Kustomize scaffold under `clusters/oceanus/` |
|
|
||||||
| styx | Air-gapped workstation | Manual / Scripts | Remains isolated, scripts tracked in `hosts/styx` |
|
| styx | Air-gapped workstation | Manual / Scripts | Remains isolated, scripts tracked in `hosts/styx` |
|
||||||
|
|
||||||
Use the `clusters/` directory for cluster-scoped state and the `hosts/` directory for baremetal orchestration.
|
|
||||||
|
|||||||
@ -5,3 +5,4 @@ resources:
|
|||||||
- ../modules/base
|
- ../modules/base
|
||||||
- ../modules/profiles/atlas-ha
|
- ../modules/profiles/atlas-ha
|
||||||
- ../sources/cert-manager/letsencrypt.yaml
|
- ../sources/cert-manager/letsencrypt.yaml
|
||||||
|
- ../sources/cert-manager/letsencrypt-prod.yaml
|
||||||
|
|||||||
14
infrastructure/sources/cert-manager/letsencrypt-prod.yaml
Normal file
14
infrastructure/sources/cert-manager/letsencrypt-prod.yaml
Normal file
@ -0,0 +1,14 @@
|
|||||||
|
apiVersion: cert-manager.io/v1
|
||||||
|
kind: ClusterIssuer
|
||||||
|
metadata:
|
||||||
|
name: letsencrypt-prod
|
||||||
|
spec:
|
||||||
|
acme:
|
||||||
|
email: brad.stein@gmail.com
|
||||||
|
server: https://acme-v02.api.letsencrypt.org/directory
|
||||||
|
privateKeySecretRef:
|
||||||
|
name: letsencrypt-prod-account-key
|
||||||
|
solvers:
|
||||||
|
- http01:
|
||||||
|
ingress:
|
||||||
|
class: traefik
|
||||||
10
infrastructure/sources/helm/kustomization.yaml
Normal file
10
infrastructure/sources/helm/kustomization.yaml
Normal file
@ -0,0 +1,10 @@
|
|||||||
|
# infrastructure/sources/helm/kustomization.yaml
|
||||||
|
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||||
|
kind: Kustomization
|
||||||
|
resources:
|
||||||
|
- grafana.yaml
|
||||||
|
- hashicorp.yaml
|
||||||
|
- jetstack.yaml
|
||||||
|
- mailu.yaml
|
||||||
|
- prometheus.yaml
|
||||||
|
- victoria-metrics.yaml
|
||||||
9
infrastructure/sources/helm/mailu.yaml
Normal file
9
infrastructure/sources/helm/mailu.yaml
Normal file
@ -0,0 +1,9 @@
|
|||||||
|
# infrastructure/sources/helm/mailu.yaml
|
||||||
|
apiVersion: source.toolkit.fluxcd.io/v1
|
||||||
|
kind: HelmRepository
|
||||||
|
metadata:
|
||||||
|
name: mailu
|
||||||
|
namespace: flux-system
|
||||||
|
spec:
|
||||||
|
interval: 1h
|
||||||
|
url: https://mailu.github.io/helm-charts
|
||||||
@ -36,11 +36,12 @@ PUBLIC_FOLDER = "overview"
|
|||||||
PRIVATE_FOLDER = "atlas-internal"
|
PRIVATE_FOLDER = "atlas-internal"
|
||||||
|
|
||||||
PERCENT_THRESHOLDS = {
|
PERCENT_THRESHOLDS = {
|
||||||
"mode": "percentage",
|
"mode": "absolute",
|
||||||
"steps": [
|
"steps": [
|
||||||
{"color": "green", "value": None},
|
{"color": "green", "value": None},
|
||||||
{"color": "yellow", "value": 70},
|
{"color": "yellow", "value": 50},
|
||||||
{"color": "red", "value": 85},
|
{"color": "orange", "value": 75},
|
||||||
|
{"color": "red", "value": 91.5},
|
||||||
],
|
],
|
||||||
}
|
}
|
||||||
|
|
||||||
@ -81,7 +82,7 @@ CONTROL_SUFFIX = f"/{CONTROL_TOTAL}"
|
|||||||
WORKER_SUFFIX = f"/{WORKER_TOTAL}"
|
WORKER_SUFFIX = f"/{WORKER_TOTAL}"
|
||||||
CP_ALLOWED_NS = "kube-system|kube-public|kube-node-lease|longhorn-system|monitoring|flux-system"
|
CP_ALLOWED_NS = "kube-system|kube-public|kube-node-lease|longhorn-system|monitoring|flux-system"
|
||||||
LONGHORN_NODE_REGEX = "titan-1[2-9]|titan-2[24]"
|
LONGHORN_NODE_REGEX = "titan-1[2-9]|titan-2[24]"
|
||||||
GAUGE_WIDTHS = [5, 5, 5, 5, 4]
|
GAUGE_WIDTHS = [4, 3, 3, 4, 3, 3, 4]
|
||||||
CONTROL_WORKLOADS_EXPR = (
|
CONTROL_WORKLOADS_EXPR = (
|
||||||
f'sum(kube_pod_info{{node=~"{CONTROL_REGEX}",namespace!~"{CP_ALLOWED_NS}"}}) or on() vector(0)'
|
f'sum(kube_pod_info{{node=~"{CONTROL_REGEX}",namespace!~"{CP_ALLOWED_NS}"}}) or on() vector(0)'
|
||||||
)
|
)
|
||||||
@ -187,17 +188,64 @@ def namespace_gpu_share_expr():
|
|||||||
return namespace_share_expr(NAMESPACE_GPU_RAW)
|
return namespace_share_expr(NAMESPACE_GPU_RAW)
|
||||||
|
|
||||||
|
|
||||||
PROBLEM_PODS_EXPR = 'sum(max by (namespace,pod) (kube_pod_status_phase{phase!~"Running|Succeeded"}))'
|
PROBLEM_PODS_EXPR = (
|
||||||
|
'sum(max by (namespace,pod) (kube_pod_status_phase{phase!~"Running|Succeeded"})) '
|
||||||
|
"or on() vector(0)"
|
||||||
|
)
|
||||||
CRASHLOOP_EXPR = (
|
CRASHLOOP_EXPR = (
|
||||||
'sum(max by (namespace,pod) (kube_pod_container_status_waiting_reason'
|
'sum(max by (namespace,pod) (kube_pod_container_status_waiting_reason'
|
||||||
'{reason=~"CrashLoopBackOff|ImagePullBackOff"}))'
|
'{reason=~"CrashLoopBackOff|ImagePullBackOff"})) '
|
||||||
|
"or on() vector(0)"
|
||||||
)
|
)
|
||||||
STUCK_TERMINATING_EXPR = (
|
STUCK_TERMINATING_EXPR = (
|
||||||
'sum(max by (namespace,pod) ('
|
'sum(max by (namespace,pod) ('
|
||||||
'((time() - kube_pod_deletion_timestamp{pod!=""}) > bool 600)'
|
'((time() - kube_pod_deletion_timestamp{pod!=""}) > bool 600)'
|
||||||
' and on(namespace,pod) (kube_pod_deletion_timestamp{pod!=""} > bool 0)'
|
' and on(namespace,pod) (kube_pod_deletion_timestamp{pod!=""} > bool 0)'
|
||||||
'))'
|
')) '
|
||||||
|
"or on() vector(0)"
|
||||||
)
|
)
|
||||||
|
UPTIME_WINDOW = "30d"
|
||||||
|
TRAEFIK_READY_EXPR = (
|
||||||
|
"("
|
||||||
|
'sum(kube_deployment_status_replicas_available{namespace=~"traefik|kube-system",deployment="traefik"})'
|
||||||
|
" / clamp_min("
|
||||||
|
'sum(kube_deployment_spec_replicas{namespace=~"traefik|kube-system",deployment="traefik"}), 1)'
|
||||||
|
")"
|
||||||
|
)
|
||||||
|
CONTROL_READY_FRACTION_EXPR = (
|
||||||
|
f"(sum(kube_node_status_condition{{condition=\"Ready\",status=\"true\",node=~\"{CONTROL_REGEX}\"}})"
|
||||||
|
f" / {CONTROL_TOTAL})"
|
||||||
|
)
|
||||||
|
UPTIME_AVAIL_EXPR = (
|
||||||
|
f"min(({CONTROL_READY_FRACTION_EXPR}), ({TRAEFIK_READY_EXPR}))"
|
||||||
|
)
|
||||||
|
|
||||||
|
# Tie-breaker to deterministically pick one node per namespace when shares tie.
|
||||||
|
NODE_TIEBREAKER = " + ".join(
|
||||||
|
f"({node_filter(node)}) * 1e-6 * {idx}"
|
||||||
|
for idx, node in enumerate(CONTROL_ALL + WORKER_NODES, start=1)
|
||||||
|
)
|
||||||
|
UPTIME_AVG_EXPR = f"avg_over_time(({UPTIME_AVAIL_EXPR})[{UPTIME_WINDOW}:5m])"
|
||||||
|
UPTIME_PERCENT_EXPR = UPTIME_AVG_EXPR
|
||||||
|
UPTIME_NINES_EXPR = f"-log10(1 - clamp_max({UPTIME_AVG_EXPR}, 0.999999999))"
|
||||||
|
UPTIME_THRESHOLDS = {
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{"color": "red", "value": None},
|
||||||
|
{"color": "orange", "value": 2},
|
||||||
|
{"color": "yellow", "value": 3},
|
||||||
|
{"color": "green", "value": 3.5},
|
||||||
|
],
|
||||||
|
}
|
||||||
|
UPTIME_PERCENT_THRESHOLDS = {
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{"color": "red", "value": None},
|
||||||
|
{"color": "orange", "value": 0.999},
|
||||||
|
{"color": "yellow", "value": 0.9999},
|
||||||
|
{"color": "green", "value": 0.99999},
|
||||||
|
],
|
||||||
|
}
|
||||||
PROBLEM_TABLE_EXPR = (
|
PROBLEM_TABLE_EXPR = (
|
||||||
"(time() - kube_pod_created{pod!=\"\"}) "
|
"(time() - kube_pod_created{pod!=\"\"}) "
|
||||||
"* on(namespace,pod) group_left(node) kube_pod_info "
|
"* on(namespace,pod) group_left(node) kube_pod_info "
|
||||||
@ -291,6 +339,34 @@ NET_INTERNAL_EXPR = (
|
|||||||
'+ rate(container_network_transmit_bytes_total{namespace!="traefik",pod!=""}[5m]))'
|
'+ rate(container_network_transmit_bytes_total{namespace!="traefik",pod!=""}[5m]))'
|
||||||
' or on() vector(0)'
|
' or on() vector(0)'
|
||||||
)
|
)
|
||||||
|
APISERVER_5XX_RATE = 'sum(rate(apiserver_request_total{code=~"5.."}[5m]))'
|
||||||
|
APISERVER_P99_LATENCY_MS = (
|
||||||
|
"histogram_quantile(0.99, sum by (le) (rate(apiserver_request_duration_seconds_bucket[5m]))) * 1000"
|
||||||
|
)
|
||||||
|
ETCD_P99_LATENCY_MS = (
|
||||||
|
"histogram_quantile(0.99, sum by (le) (rate(etcd_request_duration_seconds_bucket[5m]))) * 1000"
|
||||||
|
)
|
||||||
|
TRAEFIK_TOTAL_5M = "sum(rate(traefik_entrypoint_requests_total[5m]))"
|
||||||
|
TRAEFIK_SUCCESS_5M = 'sum(rate(traefik_entrypoint_requests_total{code!~"5.."}[5m]))'
|
||||||
|
TRAEFIK_SLI_5M = f"({TRAEFIK_SUCCESS_5M}) / clamp_min({TRAEFIK_TOTAL_5M}, 1)"
|
||||||
|
TRAEFIK_P99_LATENCY_MS = (
|
||||||
|
"histogram_quantile(0.99, sum by (le) (rate(traefik_entrypoint_request_duration_seconds_bucket[5m]))) * 1000"
|
||||||
|
)
|
||||||
|
TRAEFIK_P95_LATENCY_MS = (
|
||||||
|
"histogram_quantile(0.95, sum by (le) (rate(traefik_entrypoint_request_duration_seconds_bucket[5m]))) * 1000"
|
||||||
|
)
|
||||||
|
SLO_AVAILABILITY = 0.999
|
||||||
|
|
||||||
|
|
||||||
|
def traefik_sli(window):
|
||||||
|
total = f'sum(rate(traefik_entrypoint_requests_total[{window}]))'
|
||||||
|
success = f'sum(rate(traefik_entrypoint_requests_total{{code!~"5.."}}[{window}]))'
|
||||||
|
return f"({success}) / clamp_min({total}, 1)"
|
||||||
|
|
||||||
|
|
||||||
|
def traefik_burn(window):
|
||||||
|
sli = traefik_sli(window)
|
||||||
|
return f"(1 - ({sli})) / {1 - SLO_AVAILABILITY}"
|
||||||
|
|
||||||
# ---------------------------------------------------------------------------
|
# ---------------------------------------------------------------------------
|
||||||
# Panel factories
|
# Panel factories
|
||||||
@ -304,6 +380,7 @@ def stat_panel(
|
|||||||
grid,
|
grid,
|
||||||
*,
|
*,
|
||||||
unit="none",
|
unit="none",
|
||||||
|
decimals=None,
|
||||||
thresholds=None,
|
thresholds=None,
|
||||||
text_mode="value",
|
text_mode="value",
|
||||||
legend=None,
|
legend=None,
|
||||||
@ -313,7 +390,7 @@ def stat_panel(
|
|||||||
):
|
):
|
||||||
"""Return a Grafana stat panel definition."""
|
"""Return a Grafana stat panel definition."""
|
||||||
defaults = {
|
defaults = {
|
||||||
"color": {"mode": "palette-classic"},
|
"color": {"mode": "thresholds"},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": thresholds
|
"thresholds": thresholds
|
||||||
or {
|
or {
|
||||||
@ -328,6 +405,8 @@ def stat_panel(
|
|||||||
}
|
}
|
||||||
if value_suffix:
|
if value_suffix:
|
||||||
defaults["custom"]["valueSuffix"] = value_suffix
|
defaults["custom"]["valueSuffix"] = value_suffix
|
||||||
|
if decimals is not None:
|
||||||
|
defaults["decimals"] = decimals
|
||||||
panel = {
|
panel = {
|
||||||
"id": panel_id,
|
"id": panel_id,
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
@ -446,17 +525,32 @@ def table_panel(
|
|||||||
*,
|
*,
|
||||||
unit="none",
|
unit="none",
|
||||||
transformations=None,
|
transformations=None,
|
||||||
|
instant=False,
|
||||||
|
options=None,
|
||||||
|
filterable=True,
|
||||||
|
footer=None,
|
||||||
|
format=None,
|
||||||
):
|
):
|
||||||
"""Return a Grafana table panel definition."""
|
"""Return a Grafana table panel definition."""
|
||||||
|
# Optional PromQL subquery helpers in expr: share(), etc.
|
||||||
|
panel_options = {"showHeader": True, "columnFilters": False}
|
||||||
|
if options:
|
||||||
|
panel_options.update(options)
|
||||||
|
if footer is not None:
|
||||||
|
panel_options["footer"] = footer
|
||||||
|
field_defaults = {"unit": unit, "custom": {"filterable": filterable}}
|
||||||
|
target = {"expr": expr, "refId": "A", **({"instant": True} if instant else {})}
|
||||||
|
if format:
|
||||||
|
target["format"] = format
|
||||||
panel = {
|
panel = {
|
||||||
"id": panel_id,
|
"id": panel_id,
|
||||||
"type": "table",
|
"type": "table",
|
||||||
"title": title,
|
"title": title,
|
||||||
"datasource": PROM_DS,
|
"datasource": PROM_DS,
|
||||||
"gridPos": grid,
|
"gridPos": grid,
|
||||||
"targets": [{"expr": expr, "refId": "A"}],
|
"targets": [target],
|
||||||
"fieldConfig": {"defaults": {"unit": unit}, "overrides": []},
|
"fieldConfig": {"defaults": field_defaults, "overrides": []},
|
||||||
"options": {"showHeader": True},
|
"options": panel_options,
|
||||||
}
|
}
|
||||||
if transformations:
|
if transformations:
|
||||||
panel["transformations"] = transformations
|
panel["transformations"] = transformations
|
||||||
@ -482,7 +576,7 @@ def pie_panel(panel_id, title, expr, grid):
|
|||||||
"options": {
|
"options": {
|
||||||
"legend": {"displayMode": "list", "placement": "right"},
|
"legend": {"displayMode": "list", "placement": "right"},
|
||||||
"pieType": "pie",
|
"pieType": "pie",
|
||||||
"displayLabels": ["percent"],
|
"displayLabels": [],
|
||||||
"tooltip": {"mode": "single"},
|
"tooltip": {"mode": "single"},
|
||||||
"colorScheme": "interpolateSpectral",
|
"colorScheme": "interpolateSpectral",
|
||||||
"colorBy": "value",
|
"colorBy": "value",
|
||||||
@ -491,7 +585,19 @@ def pie_panel(panel_id, title, expr, grid):
|
|||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
def bargauge_panel(panel_id, title, expr, grid, *, unit="none", links=None):
|
def bargauge_panel(
|
||||||
|
panel_id,
|
||||||
|
title,
|
||||||
|
expr,
|
||||||
|
grid,
|
||||||
|
*,
|
||||||
|
unit="none",
|
||||||
|
links=None,
|
||||||
|
limit=None,
|
||||||
|
thresholds=None,
|
||||||
|
decimals=None,
|
||||||
|
instant=False,
|
||||||
|
):
|
||||||
"""Return a bar gauge panel with label-aware reduction."""
|
"""Return a bar gauge panel with label-aware reduction."""
|
||||||
panel = {
|
panel = {
|
||||||
"id": panel_id,
|
"id": panel_id,
|
||||||
@ -499,13 +605,16 @@ def bargauge_panel(panel_id, title, expr, grid, *, unit="none", links=None):
|
|||||||
"title": title,
|
"title": title,
|
||||||
"datasource": PROM_DS,
|
"datasource": PROM_DS,
|
||||||
"gridPos": grid,
|
"gridPos": grid,
|
||||||
"targets": [{"expr": expr, "refId": "A", "legendFormat": "{{node}}"}],
|
"targets": [
|
||||||
|
{"expr": expr, "refId": "A", "legendFormat": "{{node}}", **({"instant": True} if instant else {})}
|
||||||
|
],
|
||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"unit": unit,
|
"unit": unit,
|
||||||
"min": 0,
|
"min": 0,
|
||||||
"max": 100 if unit == "percent" else None,
|
"max": 100 if unit == "percent" else None,
|
||||||
"thresholds": {
|
"thresholds": thresholds
|
||||||
|
or {
|
||||||
"mode": "absolute",
|
"mode": "absolute",
|
||||||
"steps": [
|
"steps": [
|
||||||
{"color": "green", "value": None},
|
{"color": "green", "value": None},
|
||||||
@ -527,8 +636,19 @@ def bargauge_panel(panel_id, title, expr, grid, *, unit="none", links=None):
|
|||||||
},
|
},
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
|
if decimals is not None:
|
||||||
|
panel["fieldConfig"]["defaults"]["decimals"] = decimals
|
||||||
if links:
|
if links:
|
||||||
panel["links"] = links
|
panel["links"] = links
|
||||||
|
# Keep bars ordered by value descending for readability.
|
||||||
|
panel["transformations"] = [
|
||||||
|
{
|
||||||
|
"id": "sortBy",
|
||||||
|
"options": {"fields": ["Value"], "order": "desc"},
|
||||||
|
}
|
||||||
|
]
|
||||||
|
if limit:
|
||||||
|
panel["transformations"].append({"id": "limit", "options": {"limit": limit}})
|
||||||
return panel
|
return panel
|
||||||
|
|
||||||
|
|
||||||
@ -555,81 +675,37 @@ def link_to(uid):
|
|||||||
def build_overview():
|
def build_overview():
|
||||||
panels = []
|
panels = []
|
||||||
|
|
||||||
|
count_thresholds = {
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{"color": "green", "value": None},
|
||||||
|
{"color": "yellow", "value": 1},
|
||||||
|
{"color": "orange", "value": 2},
|
||||||
|
{"color": "red", "value": 3},
|
||||||
|
],
|
||||||
|
}
|
||||||
|
|
||||||
row1_stats = [
|
row1_stats = [
|
||||||
(
|
{
|
||||||
1,
|
"id": 2,
|
||||||
"Workers Ready",
|
"title": "Control Plane Ready",
|
||||||
f'sum(kube_node_status_condition{{condition="Ready",status="true",node=~"{WORKER_REGEX}"}})',
|
"expr": f'sum(kube_node_status_condition{{condition="Ready",status="true",node=~"{CONTROL_REGEX}"}})',
|
||||||
WORKER_SUFFIX,
|
"kind": "gauge",
|
||||||
WORKER_TOTAL,
|
"max_value": CONTROL_TOTAL,
|
||||||
None,
|
"thresholds": {
|
||||||
),
|
|
||||||
(
|
|
||||||
2,
|
|
||||||
"Control Plane Ready",
|
|
||||||
f'sum(kube_node_status_condition{{condition="Ready",status="true",node=~"{CONTROL_REGEX}"}})',
|
|
||||||
CONTROL_SUFFIX,
|
|
||||||
CONTROL_TOTAL,
|
|
||||||
None,
|
|
||||||
),
|
|
||||||
(
|
|
||||||
3,
|
|
||||||
"Control Plane Workloads",
|
|
||||||
CONTROL_WORKLOADS_EXPR,
|
|
||||||
None,
|
|
||||||
4,
|
|
||||||
link_to("atlas-pods"),
|
|
||||||
),
|
|
||||||
(
|
|
||||||
4,
|
|
||||||
"Problem Pods",
|
|
||||||
PROBLEM_PODS_EXPR,
|
|
||||||
None,
|
|
||||||
1,
|
|
||||||
link_to("atlas-pods"),
|
|
||||||
),
|
|
||||||
(
|
|
||||||
5,
|
|
||||||
"Stuck Terminating",
|
|
||||||
STUCK_TERMINATING_EXPR,
|
|
||||||
None,
|
|
||||||
1,
|
|
||||||
link_to("atlas-pods"),
|
|
||||||
),
|
|
||||||
]
|
|
||||||
|
|
||||||
def gauge_grid(idx):
|
|
||||||
width = GAUGE_WIDTHS[idx] if idx < len(GAUGE_WIDTHS) else 4
|
|
||||||
x = sum(GAUGE_WIDTHS[:idx])
|
|
||||||
return width, x
|
|
||||||
|
|
||||||
for idx, (panel_id, title, expr, suffix, ok_value, links) in enumerate(row1_stats):
|
|
||||||
thresholds = None
|
|
||||||
min_value = 0
|
|
||||||
max_value = ok_value or 5
|
|
||||||
if panel_id == 1:
|
|
||||||
max_value = WORKER_TOTAL
|
|
||||||
thresholds = {
|
|
||||||
"mode": "absolute",
|
|
||||||
"steps": [
|
|
||||||
{"color": "red", "value": None},
|
|
||||||
{"color": "orange", "value": WORKER_TOTAL - 2},
|
|
||||||
{"color": "yellow", "value": WORKER_TOTAL - 1},
|
|
||||||
{"color": "green", "value": WORKER_TOTAL},
|
|
||||||
],
|
|
||||||
}
|
|
||||||
elif panel_id == 2:
|
|
||||||
max_value = CONTROL_TOTAL
|
|
||||||
thresholds = {
|
|
||||||
"mode": "absolute",
|
"mode": "absolute",
|
||||||
"steps": [
|
"steps": [
|
||||||
{"color": "red", "value": None},
|
{"color": "red", "value": None},
|
||||||
{"color": "green", "value": CONTROL_TOTAL},
|
{"color": "green", "value": CONTROL_TOTAL},
|
||||||
],
|
],
|
||||||
}
|
},
|
||||||
elif panel_id in (3, 4, 5):
|
},
|
||||||
max_value = 4
|
{
|
||||||
thresholds = {
|
"id": 3,
|
||||||
|
"title": "Control Plane Workloads",
|
||||||
|
"expr": CONTROL_WORKLOADS_EXPR,
|
||||||
|
"kind": "stat",
|
||||||
|
"thresholds": {
|
||||||
"mode": "absolute",
|
"mode": "absolute",
|
||||||
"steps": [
|
"steps": [
|
||||||
{"color": "green", "value": None},
|
{"color": "green", "value": None},
|
||||||
@ -637,40 +713,122 @@ def build_overview():
|
|||||||
{"color": "orange", "value": 2},
|
{"color": "orange", "value": 2},
|
||||||
{"color": "red", "value": 3},
|
{"color": "red", "value": 3},
|
||||||
],
|
],
|
||||||
}
|
},
|
||||||
else:
|
"links": link_to("atlas-pods"),
|
||||||
thresholds = {
|
},
|
||||||
|
{
|
||||||
|
"id": 5,
|
||||||
|
"title": "Stuck Terminating",
|
||||||
|
"expr": STUCK_TERMINATING_EXPR,
|
||||||
|
"kind": "stat",
|
||||||
|
"thresholds": {
|
||||||
"mode": "absolute",
|
"mode": "absolute",
|
||||||
"steps": [
|
"steps": [
|
||||||
{"color": "green", "value": None},
|
{"color": "green", "value": None},
|
||||||
{"color": "red", "value": max_value},
|
{"color": "yellow", "value": 1},
|
||||||
|
{"color": "orange", "value": 2},
|
||||||
|
{"color": "red", "value": 3},
|
||||||
],
|
],
|
||||||
}
|
},
|
||||||
|
"links": link_to("atlas-pods"),
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 27,
|
||||||
|
"title": "Atlas Availability (30d)",
|
||||||
|
"expr": UPTIME_PERCENT_EXPR,
|
||||||
|
"kind": "stat",
|
||||||
|
"thresholds": UPTIME_PERCENT_THRESHOLDS,
|
||||||
|
"unit": "percentunit",
|
||||||
|
"decimals": 3,
|
||||||
|
"text_mode": "value",
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 4,
|
||||||
|
"title": "Problem Pods",
|
||||||
|
"expr": PROBLEM_PODS_EXPR,
|
||||||
|
"kind": "stat",
|
||||||
|
"thresholds": {
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{"color": "green", "value": None},
|
||||||
|
{"color": "yellow", "value": 1},
|
||||||
|
{"color": "orange", "value": 2},
|
||||||
|
{"color": "red", "value": 3},
|
||||||
|
],
|
||||||
|
},
|
||||||
|
"links": link_to("atlas-pods"),
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 6,
|
||||||
|
"title": "CrashLoop / ImagePull",
|
||||||
|
"expr": CRASHLOOP_EXPR,
|
||||||
|
"kind": "stat",
|
||||||
|
"thresholds": {
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{"color": "green", "value": None},
|
||||||
|
{"color": "yellow", "value": 1},
|
||||||
|
{"color": "orange", "value": 2},
|
||||||
|
{"color": "red", "value": 3},
|
||||||
|
],
|
||||||
|
},
|
||||||
|
"links": link_to("atlas-pods"),
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 1,
|
||||||
|
"title": "Workers Ready",
|
||||||
|
"expr": f'sum(kube_node_status_condition{{condition="Ready",status="true",node=~"{WORKER_REGEX}"}})',
|
||||||
|
"kind": "gauge",
|
||||||
|
"max_value": WORKER_TOTAL,
|
||||||
|
"thresholds": {
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{"color": "red", "value": None},
|
||||||
|
{"color": "orange", "value": WORKER_TOTAL - 2},
|
||||||
|
{"color": "yellow", "value": WORKER_TOTAL - 1},
|
||||||
|
{"color": "green", "value": WORKER_TOTAL},
|
||||||
|
],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
]
|
||||||
|
|
||||||
|
def gauge_grid(idx):
|
||||||
|
width = GAUGE_WIDTHS[idx] if idx < len(GAUGE_WIDTHS) else 4
|
||||||
|
x = sum(GAUGE_WIDTHS[:idx])
|
||||||
|
return width, x
|
||||||
|
|
||||||
|
for idx, item in enumerate(row1_stats):
|
||||||
|
panel_id = item["id"]
|
||||||
width, x = gauge_grid(idx)
|
width, x = gauge_grid(idx)
|
||||||
if panel_id in (3, 4, 5):
|
grid = {"h": 5, "w": width, "x": x, "y": 0}
|
||||||
|
kind = item.get("kind", "gauge")
|
||||||
|
if kind == "stat":
|
||||||
panels.append(
|
panels.append(
|
||||||
stat_panel(
|
stat_panel(
|
||||||
panel_id,
|
panel_id,
|
||||||
title,
|
item["title"],
|
||||||
expr,
|
item["expr"],
|
||||||
{"h": 5, "w": width, "x": x, "y": 0},
|
grid,
|
||||||
thresholds=thresholds,
|
thresholds=item.get("thresholds"),
|
||||||
legend=None,
|
legend=None,
|
||||||
links=links,
|
links=item.get("links"),
|
||||||
text_mode="value",
|
text_mode=item.get("text_mode", "value"),
|
||||||
)
|
value_suffix=item.get("value_suffix"),
|
||||||
)
|
unit=item.get("unit", "none"),
|
||||||
|
decimals=item.get("decimals"),
|
||||||
|
)
|
||||||
|
)
|
||||||
else:
|
else:
|
||||||
panels.append(
|
panels.append(
|
||||||
gauge_panel(
|
gauge_panel(
|
||||||
panel_id,
|
panel_id,
|
||||||
title,
|
item["title"],
|
||||||
expr,
|
item["expr"],
|
||||||
{"h": 5, "w": width, "x": x, "y": 0},
|
grid,
|
||||||
min_value=min_value,
|
min_value=0,
|
||||||
max_value=max_value,
|
max_value=item.get("max_value", 5),
|
||||||
thresholds=thresholds,
|
thresholds=item.get("thresholds"),
|
||||||
links=links,
|
links=item.get("links"),
|
||||||
)
|
)
|
||||||
)
|
)
|
||||||
|
|
||||||
@ -774,7 +932,7 @@ def build_overview():
|
|||||||
timeseries_panel(
|
timeseries_panel(
|
||||||
16,
|
16,
|
||||||
"Control plane CPU",
|
"Control plane CPU",
|
||||||
node_cpu_expr(CONTROL_REGEX),
|
node_cpu_expr(CONTROL_ALL_REGEX),
|
||||||
{"h": 10, "w": 12, "x": 0, "y": 44},
|
{"h": 10, "w": 12, "x": 0, "y": 44},
|
||||||
unit="percent",
|
unit="percent",
|
||||||
legend="{{node}}",
|
legend="{{node}}",
|
||||||
@ -786,7 +944,7 @@ def build_overview():
|
|||||||
timeseries_panel(
|
timeseries_panel(
|
||||||
17,
|
17,
|
||||||
"Control plane RAM",
|
"Control plane RAM",
|
||||||
node_mem_expr(CONTROL_REGEX),
|
node_mem_expr(CONTROL_ALL_REGEX),
|
||||||
{"h": 10, "w": 12, "x": 12, "y": 44},
|
{"h": 10, "w": 12, "x": 12, "y": 44},
|
||||||
unit="percent",
|
unit="percent",
|
||||||
legend="{{node}}",
|
legend="{{node}}",
|
||||||
@ -795,6 +953,36 @@ def build_overview():
|
|||||||
)
|
)
|
||||||
)
|
)
|
||||||
|
|
||||||
|
panels.append(
|
||||||
|
pie_panel(
|
||||||
|
28,
|
||||||
|
"Node Pod Share",
|
||||||
|
'(sum(kube_pod_info{pod!="" , node!=""}) by (node) / clamp_min(sum(kube_pod_info{pod!="" , node!=""}), 1)) * 100',
|
||||||
|
{"h": 10, "w": 12, "x": 0, "y": 54},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
panels.append(
|
||||||
|
bargauge_panel(
|
||||||
|
29,
|
||||||
|
"Top Nodes by Pod Count",
|
||||||
|
'topk(12, sum(kube_pod_info{pod!="" , node!=""}) by (node))',
|
||||||
|
{"h": 10, "w": 12, "x": 12, "y": 54},
|
||||||
|
unit="none",
|
||||||
|
limit=12,
|
||||||
|
decimals=0,
|
||||||
|
thresholds={
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{"color": "green", "value": None},
|
||||||
|
{"color": "yellow", "value": 50},
|
||||||
|
{"color": "orange", "value": 75},
|
||||||
|
{"color": "red", "value": 100},
|
||||||
|
],
|
||||||
|
},
|
||||||
|
instant=True,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
panels.append(
|
panels.append(
|
||||||
timeseries_panel(
|
timeseries_panel(
|
||||||
18,
|
18,
|
||||||
@ -840,7 +1028,7 @@ def build_overview():
|
|||||||
21,
|
21,
|
||||||
"Root Filesystem Usage",
|
"Root Filesystem Usage",
|
||||||
root_usage_expr(),
|
root_usage_expr(),
|
||||||
{"h": 16, "w": 12, "x": 0, "y": 54},
|
{"h": 16, "w": 12, "x": 0, "y": 64},
|
||||||
unit="percent",
|
unit="percent",
|
||||||
legend="{{node}}",
|
legend="{{node}}",
|
||||||
legend_calcs=["last"],
|
legend_calcs=["last"],
|
||||||
@ -855,8 +1043,9 @@ def build_overview():
|
|||||||
22,
|
22,
|
||||||
"Nodes Closest to Full Root Disks",
|
"Nodes Closest to Full Root Disks",
|
||||||
f"topk(12, {root_usage_expr()})",
|
f"topk(12, {root_usage_expr()})",
|
||||||
{"h": 16, "w": 12, "x": 12, "y": 54},
|
{"h": 16, "w": 12, "x": 12, "y": 64},
|
||||||
unit="percent",
|
unit="percent",
|
||||||
|
thresholds=PERCENT_THRESHOLDS,
|
||||||
links=link_to("atlas-storage"),
|
links=link_to("atlas-storage"),
|
||||||
)
|
)
|
||||||
)
|
)
|
||||||
@ -874,13 +1063,7 @@ def build_overview():
|
|||||||
"templating": {"list": []},
|
"templating": {"list": []},
|
||||||
"time": {"from": "now-1h", "to": "now"},
|
"time": {"from": "now-1h", "to": "now"},
|
||||||
"refresh": "1m",
|
"refresh": "1m",
|
||||||
"links": [
|
"links": [],
|
||||||
{"title": "Atlas Pods", "type": "dashboard", "dashboardUid": "atlas-pods", "keepTime": False},
|
|
||||||
{"title": "Atlas Nodes", "type": "dashboard", "dashboardUid": "atlas-nodes", "keepTime": False},
|
|
||||||
{"title": "Atlas Storage", "type": "dashboard", "dashboardUid": "atlas-storage", "keepTime": False},
|
|
||||||
{"title": "Atlas Network", "type": "dashboard", "dashboardUid": "atlas-network", "keepTime": False},
|
|
||||||
{"title": "Atlas GPU", "type": "dashboard", "dashboardUid": "atlas-gpu", "keepTime": False},
|
|
||||||
],
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
@ -980,6 +1163,91 @@ def build_pods_dashboard():
|
|||||||
],
|
],
|
||||||
)
|
)
|
||||||
)
|
)
|
||||||
|
panels.append(
|
||||||
|
pie_panel(
|
||||||
|
8,
|
||||||
|
"Node Pod Share",
|
||||||
|
'(sum(kube_pod_info{pod!="" , node!=""}) by (node) / clamp_min(sum(kube_pod_info{pod!="" , node!=""}), 1)) * 100',
|
||||||
|
{"h": 8, "w": 12, "x": 12, "y": 34},
|
||||||
|
)
|
||||||
|
)
|
||||||
|
panels.append(
|
||||||
|
bargauge_panel(
|
||||||
|
9,
|
||||||
|
"Top Nodes by Pod Count",
|
||||||
|
'topk(12, sum(kube_pod_info{pod!="" , node!=""}) by (node))',
|
||||||
|
{"h": 8, "w": 12, "x": 0, "y": 34},
|
||||||
|
unit="none",
|
||||||
|
limit=12,
|
||||||
|
decimals=0,
|
||||||
|
thresholds={
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{"color": "green", "value": None},
|
||||||
|
{"color": "yellow", "value": 50},
|
||||||
|
{"color": "orange", "value": 75},
|
||||||
|
{"color": "red", "value": 100},
|
||||||
|
],
|
||||||
|
},
|
||||||
|
instant=True,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
share_expr = (
|
||||||
|
'(sum by (namespace,node) (kube_pod_info{pod!="" , node!=""}) '
|
||||||
|
'/ on(namespace) group_left() clamp_min(sum by (namespace) (kube_pod_info{pod!=""}), 1) * 100)'
|
||||||
|
)
|
||||||
|
rank_terms = [
|
||||||
|
f"(sum by (node) (kube_node_info{{node=\"{node}\"}}) * 0 + {idx * 1e-3})"
|
||||||
|
for idx, node in enumerate(CONTROL_ALL + WORKER_NODES, start=1)
|
||||||
|
]
|
||||||
|
rank_expr = " or ".join(rank_terms)
|
||||||
|
score_expr = f"{share_expr} + on(node) group_left() ({rank_expr})"
|
||||||
|
mask_expr = (
|
||||||
|
f"{score_expr} == bool on(namespace) group_left() "
|
||||||
|
f"(max by (namespace) ({score_expr}))"
|
||||||
|
)
|
||||||
|
panels.append(
|
||||||
|
table_panel(
|
||||||
|
10,
|
||||||
|
"Namespace Plurality by Node v27",
|
||||||
|
(
|
||||||
|
f"{share_expr} * on(namespace,node) group_left() "
|
||||||
|
f"({mask_expr})"
|
||||||
|
),
|
||||||
|
{"h": 8, "w": 24, "x": 0, "y": 42},
|
||||||
|
unit="percent",
|
||||||
|
transformations=[
|
||||||
|
{"id": "labelsToFields", "options": {}},
|
||||||
|
{"id": "organize", "options": {"excludeByName": {"Time": True}}},
|
||||||
|
{"id": "filterByValue", "options": {"match": "Value", "operator": "gt", "value": 0}},
|
||||||
|
{
|
||||||
|
"id": "sortBy",
|
||||||
|
"options": {"fields": ["Value"], "order": "desc"},
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "groupBy",
|
||||||
|
"options": {
|
||||||
|
"fields": {
|
||||||
|
"namespace": {
|
||||||
|
"aggregations": [
|
||||||
|
{"field": "Value", "operation": "max"},
|
||||||
|
{"field": "node", "operation": "first"},
|
||||||
|
]
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"rowBy": ["namespace"],
|
||||||
|
},
|
||||||
|
},
|
||||||
|
],
|
||||||
|
instant=True,
|
||||||
|
options={"showColumnFilters": False},
|
||||||
|
filterable=False,
|
||||||
|
footer={"show": False, "fields": "", "calcs": []},
|
||||||
|
format="table",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
return {
|
return {
|
||||||
"uid": "atlas-pods",
|
"uid": "atlas-pods",
|
||||||
"title": "Atlas Pods",
|
"title": "Atlas Pods",
|
||||||
@ -1022,12 +1290,69 @@ def build_nodes_dashboard():
|
|||||||
{"h": 4, "w": 8, "x": 16, "y": 0},
|
{"h": 4, "w": 8, "x": 16, "y": 0},
|
||||||
)
|
)
|
||||||
)
|
)
|
||||||
|
panels.append(
|
||||||
|
stat_panel(
|
||||||
|
9,
|
||||||
|
"API Server 5xx rate",
|
||||||
|
APISERVER_5XX_RATE,
|
||||||
|
{"h": 4, "w": 8, "x": 0, "y": 4},
|
||||||
|
unit="req/s",
|
||||||
|
thresholds={
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{"color": "green", "value": None},
|
||||||
|
{"color": "yellow", "value": 0.05},
|
||||||
|
{"color": "orange", "value": 0.2},
|
||||||
|
{"color": "red", "value": 0.5},
|
||||||
|
],
|
||||||
|
},
|
||||||
|
decimals=3,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
panels.append(
|
||||||
|
stat_panel(
|
||||||
|
10,
|
||||||
|
"API Server P99 latency",
|
||||||
|
APISERVER_P99_LATENCY_MS,
|
||||||
|
{"h": 4, "w": 8, "x": 8, "y": 4},
|
||||||
|
unit="ms",
|
||||||
|
thresholds={
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{"color": "green", "value": None},
|
||||||
|
{"color": "yellow", "value": 250},
|
||||||
|
{"color": "orange", "value": 400},
|
||||||
|
{"color": "red", "value": 600},
|
||||||
|
],
|
||||||
|
},
|
||||||
|
decimals=1,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
panels.append(
|
||||||
|
stat_panel(
|
||||||
|
11,
|
||||||
|
"etcd P99 latency",
|
||||||
|
ETCD_P99_LATENCY_MS,
|
||||||
|
{"h": 4, "w": 8, "x": 16, "y": 4},
|
||||||
|
unit="ms",
|
||||||
|
thresholds={
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{"color": "green", "value": None},
|
||||||
|
{"color": "yellow", "value": 50},
|
||||||
|
{"color": "orange", "value": 100},
|
||||||
|
{"color": "red", "value": 200},
|
||||||
|
],
|
||||||
|
},
|
||||||
|
decimals=1,
|
||||||
|
)
|
||||||
|
)
|
||||||
panels.append(
|
panels.append(
|
||||||
timeseries_panel(
|
timeseries_panel(
|
||||||
4,
|
4,
|
||||||
"Node CPU",
|
"Node CPU",
|
||||||
node_cpu_expr(),
|
node_cpu_expr(),
|
||||||
{"h": 9, "w": 24, "x": 0, "y": 4},
|
{"h": 9, "w": 24, "x": 0, "y": 8},
|
||||||
unit="percent",
|
unit="percent",
|
||||||
legend="{{node}}",
|
legend="{{node}}",
|
||||||
legend_calcs=["last"],
|
legend_calcs=["last"],
|
||||||
@ -1040,7 +1365,7 @@ def build_nodes_dashboard():
|
|||||||
5,
|
5,
|
||||||
"Node RAM",
|
"Node RAM",
|
||||||
node_mem_expr(),
|
node_mem_expr(),
|
||||||
{"h": 9, "w": 24, "x": 0, "y": 13},
|
{"h": 9, "w": 24, "x": 0, "y": 17},
|
||||||
unit="percent",
|
unit="percent",
|
||||||
legend="{{node}}",
|
legend="{{node}}",
|
||||||
legend_calcs=["last"],
|
legend_calcs=["last"],
|
||||||
@ -1053,7 +1378,7 @@ def build_nodes_dashboard():
|
|||||||
6,
|
6,
|
||||||
"Control Plane (incl. titan-db) CPU",
|
"Control Plane (incl. titan-db) CPU",
|
||||||
node_cpu_expr(CONTROL_ALL_REGEX),
|
node_cpu_expr(CONTROL_ALL_REGEX),
|
||||||
{"h": 9, "w": 12, "x": 0, "y": 22},
|
{"h": 9, "w": 12, "x": 0, "y": 26},
|
||||||
unit="percent",
|
unit="percent",
|
||||||
legend="{{node}}",
|
legend="{{node}}",
|
||||||
legend_display="table",
|
legend_display="table",
|
||||||
@ -1065,7 +1390,7 @@ def build_nodes_dashboard():
|
|||||||
7,
|
7,
|
||||||
"Control Plane (incl. titan-db) RAM",
|
"Control Plane (incl. titan-db) RAM",
|
||||||
node_mem_expr(CONTROL_ALL_REGEX),
|
node_mem_expr(CONTROL_ALL_REGEX),
|
||||||
{"h": 9, "w": 12, "x": 12, "y": 22},
|
{"h": 9, "w": 12, "x": 12, "y": 26},
|
||||||
unit="percent",
|
unit="percent",
|
||||||
legend="{{node}}",
|
legend="{{node}}",
|
||||||
legend_display="table",
|
legend_display="table",
|
||||||
@ -1077,7 +1402,7 @@ def build_nodes_dashboard():
|
|||||||
8,
|
8,
|
||||||
"Root Filesystem Usage",
|
"Root Filesystem Usage",
|
||||||
root_usage_expr(),
|
root_usage_expr(),
|
||||||
{"h": 9, "w": 24, "x": 0, "y": 31},
|
{"h": 9, "w": 24, "x": 0, "y": 35},
|
||||||
unit="percent",
|
unit="percent",
|
||||||
legend="{{node}}",
|
legend="{{node}}",
|
||||||
legend_display="table",
|
legend_display="table",
|
||||||
@ -1204,43 +1529,107 @@ def build_network_dashboard():
|
|||||||
panels.append(
|
panels.append(
|
||||||
stat_panel(
|
stat_panel(
|
||||||
1,
|
1,
|
||||||
"Ingress Traffic",
|
"Ingress Success Rate (5m)",
|
||||||
NET_INGRESS_EXPR,
|
TRAEFIK_SLI_5M,
|
||||||
{"h": 4, "w": 8, "x": 0, "y": 0},
|
{"h": 4, "w": 6, "x": 0, "y": 0},
|
||||||
unit="Bps",
|
unit="percentunit",
|
||||||
|
decimals=2,
|
||||||
|
thresholds={
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{"color": "red", "value": None},
|
||||||
|
{"color": "orange", "value": 0.995},
|
||||||
|
{"color": "yellow", "value": 0.999},
|
||||||
|
{"color": "green", "value": 0.9995},
|
||||||
|
],
|
||||||
|
},
|
||||||
)
|
)
|
||||||
)
|
)
|
||||||
panels.append(
|
panels.append(
|
||||||
stat_panel(
|
stat_panel(
|
||||||
2,
|
2,
|
||||||
"Egress Traffic",
|
"Error Budget Burn (1h)",
|
||||||
NET_EGRESS_EXPR,
|
traefik_burn("1h"),
|
||||||
{"h": 4, "w": 8, "x": 8, "y": 0},
|
{"h": 4, "w": 6, "x": 6, "y": 0},
|
||||||
unit="Bps",
|
thresholds={
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{"color": "green", "value": None},
|
||||||
|
{"color": "yellow", "value": 1},
|
||||||
|
{"color": "orange", "value": 2},
|
||||||
|
{"color": "red", "value": 4},
|
||||||
|
],
|
||||||
|
},
|
||||||
|
decimals=2,
|
||||||
)
|
)
|
||||||
)
|
)
|
||||||
panels.append(
|
panels.append(
|
||||||
stat_panel(
|
stat_panel(
|
||||||
3,
|
3,
|
||||||
"Intra-Cluster Traffic",
|
"Error Budget Burn (6h)",
|
||||||
NET_INTERNAL_EXPR,
|
traefik_burn("6h"),
|
||||||
{"h": 4, "w": 8, "x": 16, "y": 0},
|
{"h": 4, "w": 6, "x": 12, "y": 0},
|
||||||
unit="Bps",
|
thresholds={
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{"color": "green", "value": None},
|
||||||
|
{"color": "yellow", "value": 1},
|
||||||
|
{"color": "orange", "value": 2},
|
||||||
|
{"color": "red", "value": 4},
|
||||||
|
],
|
||||||
|
},
|
||||||
|
decimals=2,
|
||||||
)
|
)
|
||||||
)
|
)
|
||||||
panels.append(
|
panels.append(
|
||||||
stat_panel(
|
stat_panel(
|
||||||
4,
|
4,
|
||||||
"Top Router req/s",
|
"Edge P99 Latency (ms)",
|
||||||
f"topk(1, {TRAEFIK_ROUTER_EXPR})",
|
TRAEFIK_P99_LATENCY_MS,
|
||||||
|
{"h": 4, "w": 6, "x": 18, "y": 0},
|
||||||
|
unit="ms",
|
||||||
|
thresholds={
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{"color": "green", "value": None},
|
||||||
|
{"color": "yellow", "value": 200},
|
||||||
|
{"color": "orange", "value": 350},
|
||||||
|
{"color": "red", "value": 500},
|
||||||
|
],
|
||||||
|
},
|
||||||
|
decimals=1,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
panels.append(
|
||||||
|
stat_panel(
|
||||||
|
5,
|
||||||
|
"Ingress Traffic",
|
||||||
|
NET_INGRESS_EXPR,
|
||||||
{"h": 4, "w": 8, "x": 0, "y": 4},
|
{"h": 4, "w": 8, "x": 0, "y": 4},
|
||||||
unit="req/s",
|
unit="Bps",
|
||||||
legend="{{router}}",
|
)
|
||||||
|
)
|
||||||
|
panels.append(
|
||||||
|
stat_panel(
|
||||||
|
6,
|
||||||
|
"Egress Traffic",
|
||||||
|
NET_EGRESS_EXPR,
|
||||||
|
{"h": 4, "w": 8, "x": 8, "y": 4},
|
||||||
|
unit="Bps",
|
||||||
|
)
|
||||||
|
)
|
||||||
|
panels.append(
|
||||||
|
stat_panel(
|
||||||
|
7,
|
||||||
|
"Intra-Cluster Traffic",
|
||||||
|
NET_INTERNAL_EXPR,
|
||||||
|
{"h": 4, "w": 8, "x": 16, "y": 4},
|
||||||
|
unit="Bps",
|
||||||
)
|
)
|
||||||
)
|
)
|
||||||
panels.append(
|
panels.append(
|
||||||
timeseries_panel(
|
timeseries_panel(
|
||||||
5,
|
8,
|
||||||
"Per-Node Throughput",
|
"Per-Node Throughput",
|
||||||
f'avg by (node) (({NET_NODE_TX_PHYS} + {NET_NODE_RX_PHYS}) * on(instance) group_left(node) {NODE_INFO})',
|
f'avg by (node) (({NET_NODE_TX_PHYS} + {NET_NODE_RX_PHYS}) * on(instance) group_left(node) {NODE_INFO})',
|
||||||
{"h": 8, "w": 24, "x": 0, "y": 8},
|
{"h": 8, "w": 24, "x": 0, "y": 8},
|
||||||
@ -1252,7 +1641,7 @@ def build_network_dashboard():
|
|||||||
)
|
)
|
||||||
panels.append(
|
panels.append(
|
||||||
table_panel(
|
table_panel(
|
||||||
6,
|
9,
|
||||||
"Top Namespaces",
|
"Top Namespaces",
|
||||||
'topk(10, sum(rate(container_network_transmit_bytes_total{namespace!=""}[5m]) '
|
'topk(10, sum(rate(container_network_transmit_bytes_total{namespace!=""}[5m]) '
|
||||||
'+ rate(container_network_receive_bytes_total{namespace!=""}[5m])) by (namespace))',
|
'+ rate(container_network_receive_bytes_total{namespace!=""}[5m])) by (namespace))',
|
||||||
@ -1263,7 +1652,7 @@ def build_network_dashboard():
|
|||||||
)
|
)
|
||||||
panels.append(
|
panels.append(
|
||||||
table_panel(
|
table_panel(
|
||||||
7,
|
10,
|
||||||
"Top Pods",
|
"Top Pods",
|
||||||
'topk(10, sum(rate(container_network_transmit_bytes_total{pod!=""}[5m]) '
|
'topk(10, sum(rate(container_network_transmit_bytes_total{pod!=""}[5m]) '
|
||||||
'+ rate(container_network_receive_bytes_total{pod!=""}[5m])) by (namespace,pod))',
|
'+ rate(container_network_receive_bytes_total{pod!=""}[5m])) by (namespace,pod))',
|
||||||
@ -1274,7 +1663,7 @@ def build_network_dashboard():
|
|||||||
)
|
)
|
||||||
panels.append(
|
panels.append(
|
||||||
timeseries_panel(
|
timeseries_panel(
|
||||||
8,
|
11,
|
||||||
"Traefik Routers (req/s)",
|
"Traefik Routers (req/s)",
|
||||||
f"topk(10, {TRAEFIK_ROUTER_EXPR})",
|
f"topk(10, {TRAEFIK_ROUTER_EXPR})",
|
||||||
{"h": 9, "w": 12, "x": 0, "y": 25},
|
{"h": 9, "w": 12, "x": 0, "y": 25},
|
||||||
@ -1286,7 +1675,7 @@ def build_network_dashboard():
|
|||||||
)
|
)
|
||||||
panels.append(
|
panels.append(
|
||||||
timeseries_panel(
|
timeseries_panel(
|
||||||
9,
|
12,
|
||||||
"Traefik Entrypoints (req/s)",
|
"Traefik Entrypoints (req/s)",
|
||||||
'sum by (entrypoint) (rate(traefik_entrypoint_requests_total[5m]))',
|
'sum by (entrypoint) (rate(traefik_entrypoint_requests_total[5m]))',
|
||||||
{"h": 9, "w": 12, "x": 12, "y": 25},
|
{"h": 9, "w": 12, "x": 12, "y": 25},
|
||||||
|
|||||||
204
scripts/mailu_sync.py
Normal file
204
scripts/mailu_sync.py
Normal file
@ -0,0 +1,204 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""
|
||||||
|
Sync Keycloak users to Mailu mailboxes.
|
||||||
|
- Generates/stores a mailu_app_password attribute in Keycloak (admin-only)
|
||||||
|
- Upserts the mailbox in Mailu Postgres using that password
|
||||||
|
"""
|
||||||
|
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import json
|
||||||
|
import time
|
||||||
|
import secrets
|
||||||
|
import string
|
||||||
|
import datetime
|
||||||
|
import requests
|
||||||
|
import psycopg2
|
||||||
|
from psycopg2.extras import RealDictCursor
|
||||||
|
from passlib.hash import bcrypt_sha256
|
||||||
|
|
||||||
|
|
||||||
|
KC_BASE = os.environ["KEYCLOAK_BASE_URL"].rstrip("/")
|
||||||
|
KC_REALM = os.environ["KEYCLOAK_REALM"]
|
||||||
|
KC_CLIENT_ID = os.environ["KEYCLOAK_CLIENT_ID"]
|
||||||
|
KC_CLIENT_SECRET = os.environ["KEYCLOAK_CLIENT_SECRET"]
|
||||||
|
|
||||||
|
MAILU_DOMAIN = os.environ["MAILU_DOMAIN"]
|
||||||
|
MAILU_DEFAULT_QUOTA = int(os.environ.get("MAILU_DEFAULT_QUOTA", "20000000000"))
|
||||||
|
|
||||||
|
DB_CONFIG = {
|
||||||
|
"host": os.environ["MAILU_DB_HOST"],
|
||||||
|
"port": int(os.environ.get("MAILU_DB_PORT", "5432")),
|
||||||
|
"dbname": os.environ["MAILU_DB_NAME"],
|
||||||
|
"user": os.environ["MAILU_DB_USER"],
|
||||||
|
"password": os.environ["MAILU_DB_PASSWORD"],
|
||||||
|
}
|
||||||
|
|
||||||
|
SESSION = requests.Session()
|
||||||
|
|
||||||
|
|
||||||
|
def log(msg):
|
||||||
|
sys.stdout.write(f"{msg}\n")
|
||||||
|
sys.stdout.flush()
|
||||||
|
|
||||||
|
|
||||||
|
def get_kc_token():
|
||||||
|
resp = SESSION.post(
|
||||||
|
f"{KC_BASE}/realms/{KC_REALM}/protocol/openid-connect/token",
|
||||||
|
data={
|
||||||
|
"grant_type": "client_credentials",
|
||||||
|
"client_id": KC_CLIENT_ID,
|
||||||
|
"client_secret": KC_CLIENT_SECRET,
|
||||||
|
},
|
||||||
|
timeout=15,
|
||||||
|
)
|
||||||
|
resp.raise_for_status()
|
||||||
|
return resp.json()["access_token"]
|
||||||
|
|
||||||
|
|
||||||
|
def kc_get_users(token):
|
||||||
|
users = []
|
||||||
|
first = 0
|
||||||
|
max_results = 200
|
||||||
|
headers = {"Authorization": f"Bearer {token}"}
|
||||||
|
while True:
|
||||||
|
resp = SESSION.get(
|
||||||
|
f"{KC_BASE}/admin/realms/{KC_REALM}/users",
|
||||||
|
params={"first": first, "max": max_results, "enabled": "true"},
|
||||||
|
headers=headers,
|
||||||
|
timeout=20,
|
||||||
|
)
|
||||||
|
resp.raise_for_status()
|
||||||
|
batch = resp.json()
|
||||||
|
users.extend(batch)
|
||||||
|
if len(batch) < max_results:
|
||||||
|
break
|
||||||
|
first += max_results
|
||||||
|
return users
|
||||||
|
|
||||||
|
|
||||||
|
def kc_update_attributes(token, user, attributes):
|
||||||
|
headers = {
|
||||||
|
"Authorization": f"Bearer {token}",
|
||||||
|
"Content-Type": "application/json",
|
||||||
|
}
|
||||||
|
payload = {
|
||||||
|
"firstName": user.get("firstName"),
|
||||||
|
"lastName": user.get("lastName"),
|
||||||
|
"email": user.get("email"),
|
||||||
|
"enabled": user.get("enabled", True),
|
||||||
|
"username": user["username"],
|
||||||
|
"emailVerified": user.get("emailVerified", False),
|
||||||
|
"attributes": attributes,
|
||||||
|
}
|
||||||
|
user_url = f"{KC_BASE}/admin/realms/{KC_REALM}/users/{user['id']}"
|
||||||
|
resp = SESSION.put(user_url, headers=headers, json=payload, timeout=20)
|
||||||
|
resp.raise_for_status()
|
||||||
|
verify = SESSION.get(
|
||||||
|
user_url,
|
||||||
|
headers={"Authorization": f"Bearer {token}"},
|
||||||
|
params={"briefRepresentation": "false"},
|
||||||
|
timeout=15,
|
||||||
|
)
|
||||||
|
verify.raise_for_status()
|
||||||
|
attrs = verify.json().get("attributes") or {}
|
||||||
|
if not attrs.get("mailu_app_password"):
|
||||||
|
raise Exception(f"attribute not persisted for {user.get('email') or user['username']}")
|
||||||
|
|
||||||
|
|
||||||
|
def random_password():
|
||||||
|
alphabet = string.ascii_letters + string.digits
|
||||||
|
return "".join(secrets.choice(alphabet) for _ in range(24))
|
||||||
|
|
||||||
|
|
||||||
|
def ensure_mailu_user(cursor, email, password, display_name):
|
||||||
|
localpart, domain = email.split("@", 1)
|
||||||
|
if domain.lower() != MAILU_DOMAIN.lower():
|
||||||
|
return
|
||||||
|
hashed = bcrypt_sha256.hash(password)
|
||||||
|
now = datetime.datetime.utcnow()
|
||||||
|
cursor.execute(
|
||||||
|
"""
|
||||||
|
INSERT INTO "user" (
|
||||||
|
email, localpart, domain_name, password,
|
||||||
|
quota_bytes, quota_bytes_used,
|
||||||
|
global_admin, enabled, enable_imap, enable_pop, allow_spoofing,
|
||||||
|
forward_enabled, forward_destination, forward_keep,
|
||||||
|
reply_enabled, reply_subject, reply_body, reply_startdate, reply_enddate,
|
||||||
|
displayed_name, spam_enabled, spam_mark_as_read, spam_threshold,
|
||||||
|
change_pw_next_login, created_at, updated_at, comment
|
||||||
|
)
|
||||||
|
VALUES (
|
||||||
|
%(email)s, %(localpart)s, %(domain)s, %(password)s,
|
||||||
|
%(quota)s, 0,
|
||||||
|
false, true, true, true, false,
|
||||||
|
false, '', true,
|
||||||
|
false, NULL, NULL, DATE '1900-01-01', DATE '2999-12-31',
|
||||||
|
%(display)s, true, true, 80,
|
||||||
|
false, CURRENT_DATE, %(now)s, ''
|
||||||
|
)
|
||||||
|
ON CONFLICT (email) DO UPDATE
|
||||||
|
SET password = EXCLUDED.password,
|
||||||
|
enabled = true,
|
||||||
|
updated_at = EXCLUDED.updated_at
|
||||||
|
""",
|
||||||
|
{
|
||||||
|
"email": email,
|
||||||
|
"localpart": localpart,
|
||||||
|
"domain": domain,
|
||||||
|
"password": hashed,
|
||||||
|
"quota": MAILU_DEFAULT_QUOTA,
|
||||||
|
"display": display_name or localpart,
|
||||||
|
"now": now,
|
||||||
|
},
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
token = get_kc_token()
|
||||||
|
users = kc_get_users(token)
|
||||||
|
if not users:
|
||||||
|
log("No users found; exiting.")
|
||||||
|
return
|
||||||
|
|
||||||
|
conn = psycopg2.connect(**DB_CONFIG)
|
||||||
|
conn.autocommit = True
|
||||||
|
cursor = conn.cursor(cursor_factory=RealDictCursor)
|
||||||
|
|
||||||
|
for user in users:
|
||||||
|
attrs = user.get("attributes", {}) or {}
|
||||||
|
app_pw_value = attrs.get("mailu_app_password")
|
||||||
|
if isinstance(app_pw_value, list):
|
||||||
|
app_pw = app_pw_value[0] if app_pw_value else None
|
||||||
|
elif isinstance(app_pw_value, str):
|
||||||
|
app_pw = app_pw_value
|
||||||
|
else:
|
||||||
|
app_pw = None
|
||||||
|
|
||||||
|
email = user.get("email")
|
||||||
|
if not email:
|
||||||
|
email = f"{user['username']}@{MAILU_DOMAIN}"
|
||||||
|
|
||||||
|
if not app_pw:
|
||||||
|
app_pw = random_password()
|
||||||
|
attrs["mailu_app_password"] = app_pw
|
||||||
|
kc_update_attributes(token, user, attrs)
|
||||||
|
log(f"Set mailu_app_password for {email}")
|
||||||
|
|
||||||
|
display_name = " ".join(
|
||||||
|
part for part in [user.get("firstName"), user.get("lastName")] if part
|
||||||
|
).strip()
|
||||||
|
|
||||||
|
ensure_mailu_user(cursor, email, app_pw, display_name)
|
||||||
|
log(f"Synced mailbox for {email}")
|
||||||
|
|
||||||
|
cursor.close()
|
||||||
|
conn.close()
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
try:
|
||||||
|
main()
|
||||||
|
except Exception as exc:
|
||||||
|
log(f"ERROR: {exc}")
|
||||||
|
sys.exit(1)
|
||||||
49
scripts/nextcloud-mail-sync.sh
Executable file
49
scripts/nextcloud-mail-sync.sh
Executable file
@ -0,0 +1,49 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
KC_BASE="${KC_BASE:?}"
|
||||||
|
KC_REALM="${KC_REALM:?}"
|
||||||
|
KC_ADMIN_USER="${KC_ADMIN_USER:?}"
|
||||||
|
KC_ADMIN_PASS="${KC_ADMIN_PASS:?}"
|
||||||
|
|
||||||
|
if ! command -v jq >/dev/null 2>&1; then
|
||||||
|
apt-get update && apt-get install -y jq curl >/dev/null
|
||||||
|
fi
|
||||||
|
|
||||||
|
account_exists() {
|
||||||
|
# Skip if the account email is already present in the mail app.
|
||||||
|
runuser -u www-data -- php occ mail:account:list 2>/dev/null | grep -Fq " ${1}" || \
|
||||||
|
runuser -u www-data -- php occ mail:account:list 2>/dev/null | grep -Fq "${1} "
|
||||||
|
}
|
||||||
|
|
||||||
|
token=$(
|
||||||
|
curl -s -d "grant_type=password" \
|
||||||
|
-d "client_id=admin-cli" \
|
||||||
|
-d "username=${KC_ADMIN_USER}" \
|
||||||
|
-d "password=${KC_ADMIN_PASS}" \
|
||||||
|
"${KC_BASE}/realms/master/protocol/openid-connect/token" | jq -r '.access_token'
|
||||||
|
)
|
||||||
|
|
||||||
|
if [[ -z "${token}" || "${token}" == "null" ]]; then
|
||||||
|
echo "Failed to obtain admin token"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
users=$(curl -s -H "Authorization: Bearer ${token}" \
|
||||||
|
"${KC_BASE}/admin/realms/${KC_REALM}/users?max=2000")
|
||||||
|
|
||||||
|
echo "${users}" | jq -c '.[]' | while read -r user; do
|
||||||
|
username=$(echo "${user}" | jq -r '.username')
|
||||||
|
email=$(echo "${user}" | jq -r '.email // empty')
|
||||||
|
app_pw=$(echo "${user}" | jq -r '.attributes.mailu_app_password[0] // empty')
|
||||||
|
[[ -z "${email}" || -z "${app_pw}" ]] && continue
|
||||||
|
if account_exists "${email}"; then
|
||||||
|
echo "Skipping ${email}, already exists"
|
||||||
|
continue
|
||||||
|
fi
|
||||||
|
echo "Syncing ${email}"
|
||||||
|
runuser -u www-data -- php occ mail:account:create \
|
||||||
|
"${username}" "${username}" "${email}" \
|
||||||
|
mail.bstein.dev 993 ssl "${email}" "${app_pw}" \
|
||||||
|
mail.bstein.dev 587 tls "${email}" "${app_pw}" login || true
|
||||||
|
done
|
||||||
65
scripts/nextcloud-maintenance.sh
Executable file
65
scripts/nextcloud-maintenance.sh
Executable file
@ -0,0 +1,65 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
set -euo pipefail
|
||||||
|
|
||||||
|
NC_URL="${NC_URL:-https://cloud.bstein.dev}"
|
||||||
|
ADMIN_USER="${ADMIN_USER:?}"
|
||||||
|
ADMIN_PASS="${ADMIN_PASS:?}"
|
||||||
|
|
||||||
|
export DEBIAN_FRONTEND=noninteractive
|
||||||
|
apt-get update -qq
|
||||||
|
apt-get install -y -qq curl jq >/dev/null
|
||||||
|
|
||||||
|
run_occ() {
|
||||||
|
runuser -u www-data -- php occ "$@"
|
||||||
|
}
|
||||||
|
|
||||||
|
log() { echo "[$(date -Is)] $*"; }
|
||||||
|
|
||||||
|
log "Applying Atlas theming"
|
||||||
|
run_occ theming:config name "Atlas Cloud"
|
||||||
|
run_occ theming:config slogan "Unified access to Atlas services"
|
||||||
|
run_occ theming:config url "https://cloud.bstein.dev"
|
||||||
|
run_occ theming:config color "#0f172a"
|
||||||
|
run_occ theming:config disable-user-theming yes
|
||||||
|
|
||||||
|
log "Setting default quota to 200 GB"
|
||||||
|
run_occ config:app:set files default_quota --value "200 GB"
|
||||||
|
|
||||||
|
API_BASE="${NC_URL}/ocs/v2.php/apps/external/api/v1"
|
||||||
|
AUTH=(-u "${ADMIN_USER}:${ADMIN_PASS}" -H "OCS-APIRequest: true")
|
||||||
|
|
||||||
|
log "Removing existing external links"
|
||||||
|
existing=$(curl -sf "${AUTH[@]}" "${API_BASE}?format=json" | jq -r '.ocs.data[].id // empty')
|
||||||
|
for id in ${existing}; do
|
||||||
|
curl -sf "${AUTH[@]}" -X DELETE "${API_BASE}/sites/${id}?format=json" >/dev/null || true
|
||||||
|
done
|
||||||
|
|
||||||
|
SITES=(
|
||||||
|
"Vaultwarden|https://vault.bstein.dev"
|
||||||
|
"Jellyfin|https://stream.bstein.dev"
|
||||||
|
"Gitea|https://scm.bstein.dev"
|
||||||
|
"Jenkins|https://ci.bstein.dev"
|
||||||
|
"Zot|https://registry.bstein.dev"
|
||||||
|
"Vault|https://secret.bstein.dev"
|
||||||
|
"Jitsi|https://meet.bstein.dev"
|
||||||
|
"Grafana|https://metrics.bstein.dev"
|
||||||
|
"Chat LLM|https://chat.ai.bstein.dev"
|
||||||
|
"Vision|https://draw.ai.bstein.dev"
|
||||||
|
"STT/TTS|https://talk.ai.bstein.dev"
|
||||||
|
)
|
||||||
|
|
||||||
|
log "Seeding external links"
|
||||||
|
for entry in "${SITES[@]}"; do
|
||||||
|
IFS="|" read -r name url <<<"${entry}"
|
||||||
|
curl -sf "${AUTH[@]}" -X POST "${API_BASE}/sites?format=json" \
|
||||||
|
-d "name=${name}" \
|
||||||
|
-d "url=${url}" \
|
||||||
|
-d "lang=" \
|
||||||
|
-d "type=link" \
|
||||||
|
-d "device=" \
|
||||||
|
-d "icon=" \
|
||||||
|
-d "groups[]=" \
|
||||||
|
-d "redirect=1" >/dev/null
|
||||||
|
done
|
||||||
|
|
||||||
|
log "Maintenance run completed"
|
||||||
58
scripts/tests/test_dashboards_render_atlas.py
Normal file
58
scripts/tests/test_dashboards_render_atlas.py
Normal file
@ -0,0 +1,58 @@
|
|||||||
|
import importlib.util
|
||||||
|
import pathlib
|
||||||
|
|
||||||
|
|
||||||
|
def load_module():
|
||||||
|
path = pathlib.Path(__file__).resolve().parents[1] / "dashboards_render_atlas.py"
|
||||||
|
spec = importlib.util.spec_from_file_location("dashboards_render_atlas", path)
|
||||||
|
module = importlib.util.module_from_spec(spec)
|
||||||
|
assert spec.loader is not None
|
||||||
|
spec.loader.exec_module(module)
|
||||||
|
return module
|
||||||
|
|
||||||
|
|
||||||
|
def test_table_panel_options_and_filterable():
|
||||||
|
mod = load_module()
|
||||||
|
panel = mod.table_panel(
|
||||||
|
1,
|
||||||
|
"test",
|
||||||
|
"metric",
|
||||||
|
{"h": 1, "w": 1, "x": 0, "y": 0},
|
||||||
|
unit="percent",
|
||||||
|
transformations=[{"id": "labelsToFields", "options": {}}],
|
||||||
|
instant=True,
|
||||||
|
options={"showColumnFilters": False},
|
||||||
|
filterable=False,
|
||||||
|
footer={"show": False, "fields": "", "calcs": []},
|
||||||
|
format="table",
|
||||||
|
)
|
||||||
|
assert panel["fieldConfig"]["defaults"]["unit"] == "percent"
|
||||||
|
assert panel["fieldConfig"]["defaults"]["custom"]["filterable"] is False
|
||||||
|
assert panel["options"]["showHeader"] is True
|
||||||
|
assert panel["targets"][0]["format"] == "table"
|
||||||
|
|
||||||
|
|
||||||
|
def test_node_filter_and_expr_helpers():
|
||||||
|
mod = load_module()
|
||||||
|
expr = mod.node_filter("titan-.*")
|
||||||
|
assert "label_replace" in expr
|
||||||
|
cpu_expr = mod.node_cpu_expr("titan-.*")
|
||||||
|
mem_expr = mod.node_mem_expr("titan-.*")
|
||||||
|
assert "node_cpu_seconds_total" in cpu_expr
|
||||||
|
assert "node_memory_MemAvailable_bytes" in mem_expr
|
||||||
|
|
||||||
|
|
||||||
|
def test_render_configmap_writes(tmp_path):
|
||||||
|
mod = load_module()
|
||||||
|
mod.DASHBOARD_DIR = tmp_path / "dash"
|
||||||
|
mod.ROOT = tmp_path
|
||||||
|
uid = "atlas-test"
|
||||||
|
info = {"configmap": tmp_path / "cm.yaml"}
|
||||||
|
data = {"title": "Atlas Test"}
|
||||||
|
mod.write_json(uid, data)
|
||||||
|
mod.render_configmap(uid, info)
|
||||||
|
json_path = mod.DASHBOARD_DIR / f"{uid}.json"
|
||||||
|
assert json_path.exists()
|
||||||
|
content = (tmp_path / "cm.yaml").read_text()
|
||||||
|
assert "kind: ConfigMap" in content
|
||||||
|
assert f"{uid}.json" in content
|
||||||
181
scripts/tests/test_mailu_sync.py
Normal file
181
scripts/tests/test_mailu_sync.py
Normal file
@ -0,0 +1,181 @@
|
|||||||
|
import importlib.util
|
||||||
|
import pathlib
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
|
||||||
|
|
||||||
|
def load_sync_module(monkeypatch):
|
||||||
|
# Minimal env required by module import
|
||||||
|
env = {
|
||||||
|
"KEYCLOAK_BASE_URL": "http://keycloak",
|
||||||
|
"KEYCLOAK_REALM": "atlas",
|
||||||
|
"KEYCLOAK_CLIENT_ID": "mailu-sync",
|
||||||
|
"KEYCLOAK_CLIENT_SECRET": "secret",
|
||||||
|
"MAILU_DOMAIN": "example.com",
|
||||||
|
"MAILU_DB_HOST": "localhost",
|
||||||
|
"MAILU_DB_PORT": "5432",
|
||||||
|
"MAILU_DB_NAME": "mailu",
|
||||||
|
"MAILU_DB_USER": "mailu",
|
||||||
|
"MAILU_DB_PASSWORD": "pw",
|
||||||
|
}
|
||||||
|
for k, v in env.items():
|
||||||
|
monkeypatch.setenv(k, v)
|
||||||
|
module_path = pathlib.Path(__file__).resolve().parents[1] / "mailu_sync.py"
|
||||||
|
spec = importlib.util.spec_from_file_location("mailu_sync_testmod", module_path)
|
||||||
|
module = importlib.util.module_from_spec(spec)
|
||||||
|
assert spec.loader is not None
|
||||||
|
spec.loader.exec_module(module)
|
||||||
|
return module
|
||||||
|
|
||||||
|
|
||||||
|
def test_random_password_length_and_charset(monkeypatch):
|
||||||
|
sync = load_sync_module(monkeypatch)
|
||||||
|
pw = sync.random_password()
|
||||||
|
assert len(pw) == 24
|
||||||
|
assert all(ch.isalnum() for ch in pw)
|
||||||
|
|
||||||
|
|
||||||
|
class _FakeResponse:
|
||||||
|
def __init__(self, json_data=None, status=200):
|
||||||
|
self._json_data = json_data or {}
|
||||||
|
self.status_code = status
|
||||||
|
|
||||||
|
def raise_for_status(self):
|
||||||
|
if self.status_code >= 400:
|
||||||
|
raise AssertionError(f"status {self.status_code}")
|
||||||
|
|
||||||
|
def json(self):
|
||||||
|
return self._json_data
|
||||||
|
|
||||||
|
|
||||||
|
class _FakeSession:
|
||||||
|
def __init__(self, put_resp, get_resp):
|
||||||
|
self.put_resp = put_resp
|
||||||
|
self.get_resp = get_resp
|
||||||
|
self.put_called = False
|
||||||
|
self.get_called = False
|
||||||
|
|
||||||
|
def post(self, *args, **kwargs):
|
||||||
|
return _FakeResponse({"access_token": "dummy"})
|
||||||
|
|
||||||
|
def put(self, *args, **kwargs):
|
||||||
|
self.put_called = True
|
||||||
|
return self.put_resp
|
||||||
|
|
||||||
|
def get(self, *args, **kwargs):
|
||||||
|
self.get_called = True
|
||||||
|
return self.get_resp
|
||||||
|
|
||||||
|
|
||||||
|
def test_kc_update_attributes_succeeds(monkeypatch):
|
||||||
|
sync = load_sync_module(monkeypatch)
|
||||||
|
ok_resp = _FakeResponse({"attributes": {"mailu_app_password": ["abc"]}})
|
||||||
|
sync.SESSION = _FakeSession(_FakeResponse({}), ok_resp)
|
||||||
|
sync.kc_update_attributes("token", {"id": "u1", "username": "u1"}, {"mailu_app_password": "abc"})
|
||||||
|
assert sync.SESSION.put_called and sync.SESSION.get_called
|
||||||
|
|
||||||
|
|
||||||
|
def test_kc_update_attributes_raises_without_attribute(monkeypatch):
|
||||||
|
sync = load_sync_module(monkeypatch)
|
||||||
|
missing_attr_resp = _FakeResponse({"attributes": {}}, status=200)
|
||||||
|
sync.SESSION = _FakeSession(_FakeResponse({}), missing_attr_resp)
|
||||||
|
with pytest.raises(Exception):
|
||||||
|
sync.kc_update_attributes("token", {"id": "u1", "username": "u1"}, {"mailu_app_password": "abc"})
|
||||||
|
|
||||||
|
|
||||||
|
def test_kc_get_users_paginates(monkeypatch):
|
||||||
|
sync = load_sync_module(monkeypatch)
|
||||||
|
|
||||||
|
class _PagedSession:
|
||||||
|
def __init__(self):
|
||||||
|
self.calls = 0
|
||||||
|
|
||||||
|
def post(self, *_, **__):
|
||||||
|
return _FakeResponse({"access_token": "tok"})
|
||||||
|
|
||||||
|
def get(self, *_, **__):
|
||||||
|
self.calls += 1
|
||||||
|
if self.calls == 1:
|
||||||
|
return _FakeResponse([{"id": "u1"}, {"id": "u2"}])
|
||||||
|
return _FakeResponse([]) # stop pagination
|
||||||
|
|
||||||
|
sync.SESSION = _PagedSession()
|
||||||
|
users = sync.kc_get_users("tok")
|
||||||
|
assert [u["id"] for u in users] == ["u1", "u2"]
|
||||||
|
assert sync.SESSION.calls == 2
|
||||||
|
|
||||||
|
|
||||||
|
def test_ensure_mailu_user_skips_foreign_domain(monkeypatch):
|
||||||
|
sync = load_sync_module(monkeypatch)
|
||||||
|
executed = []
|
||||||
|
|
||||||
|
class _Cursor:
|
||||||
|
def execute(self, sql, params):
|
||||||
|
executed.append((sql, params))
|
||||||
|
|
||||||
|
sync.ensure_mailu_user(_Cursor(), "user@other.com", "pw", "User")
|
||||||
|
assert not executed
|
||||||
|
|
||||||
|
|
||||||
|
def test_ensure_mailu_user_upserts(monkeypatch):
|
||||||
|
sync = load_sync_module(monkeypatch)
|
||||||
|
captured = {}
|
||||||
|
|
||||||
|
class _Cursor:
|
||||||
|
def execute(self, sql, params):
|
||||||
|
captured.update(params)
|
||||||
|
|
||||||
|
sync.ensure_mailu_user(_Cursor(), "user@example.com", "pw", "User Example")
|
||||||
|
assert captured["email"] == "user@example.com"
|
||||||
|
assert captured["localpart"] == "user"
|
||||||
|
# password should be hashed, not the raw string
|
||||||
|
assert captured["password"] != "pw"
|
||||||
|
|
||||||
|
|
||||||
|
def test_main_generates_password_and_upserts(monkeypatch):
|
||||||
|
sync = load_sync_module(monkeypatch)
|
||||||
|
users = [
|
||||||
|
{"id": "u1", "username": "user1", "email": "user1@example.com", "attributes": {}},
|
||||||
|
{"id": "u2", "username": "user2", "email": "user2@example.com", "attributes": {"mailu_app_password": ["keepme"]}},
|
||||||
|
{"id": "u3", "username": "user3", "email": "user3@other.com", "attributes": {}},
|
||||||
|
]
|
||||||
|
updated = []
|
||||||
|
|
||||||
|
class _Cursor:
|
||||||
|
def __init__(self):
|
||||||
|
self.executions = []
|
||||||
|
|
||||||
|
def execute(self, sql, params):
|
||||||
|
self.executions.append(params)
|
||||||
|
|
||||||
|
def close(self):
|
||||||
|
return None
|
||||||
|
|
||||||
|
class _Conn:
|
||||||
|
def __init__(self):
|
||||||
|
self.autocommit = False
|
||||||
|
self._cursor = _Cursor()
|
||||||
|
|
||||||
|
def cursor(self, cursor_factory=None):
|
||||||
|
return self._cursor
|
||||||
|
|
||||||
|
def close(self):
|
||||||
|
return None
|
||||||
|
|
||||||
|
monkeypatch.setattr(sync, "get_kc_token", lambda: "tok")
|
||||||
|
monkeypatch.setattr(sync, "kc_get_users", lambda token: users)
|
||||||
|
monkeypatch.setattr(sync, "kc_update_attributes", lambda token, user, attrs: updated.append((user["id"], attrs["mailu_app_password"])))
|
||||||
|
conns = []
|
||||||
|
|
||||||
|
def _connect(**kwargs):
|
||||||
|
conn = _Conn()
|
||||||
|
conns.append(conn)
|
||||||
|
return conn
|
||||||
|
|
||||||
|
monkeypatch.setattr(sync.psycopg2, "connect", _connect)
|
||||||
|
|
||||||
|
sync.main()
|
||||||
|
|
||||||
|
# Should attempt two inserts (third user skipped due to domain mismatch)
|
||||||
|
assert len(updated) == 1 # only one missing attr was backfilled
|
||||||
|
assert conns and len(conns[0]._cursor.executions) == 2
|
||||||
@ -5,7 +5,7 @@ metadata:
|
|||||||
name: gitea-ingress
|
name: gitea-ingress
|
||||||
namespace: gitea
|
namespace: gitea
|
||||||
annotations:
|
annotations:
|
||||||
cert-manager.io/cluster-issuer: "letsencrypt-prod"
|
cert-manager.io/cluster-issuer: letsencrypt
|
||||||
nginx.ingress.kubernetes.io/ssl-redirect: "true"
|
nginx.ingress.kubernetes.io/ssl-redirect: "true"
|
||||||
spec:
|
spec:
|
||||||
tls:
|
tls:
|
||||||
|
|||||||
49
services/gitops-ui/helmrelease.yaml
Normal file
49
services/gitops-ui/helmrelease.yaml
Normal file
@ -0,0 +1,49 @@
|
|||||||
|
# services/gitops-ui/helmrelease.yaml
|
||||||
|
apiVersion: helm.toolkit.fluxcd.io/v2
|
||||||
|
kind: HelmRelease
|
||||||
|
metadata:
|
||||||
|
name: weave-gitops
|
||||||
|
namespace: flux-system
|
||||||
|
spec:
|
||||||
|
interval: 30m
|
||||||
|
chart:
|
||||||
|
spec:
|
||||||
|
chart: ./charts/gitops-server
|
||||||
|
sourceRef:
|
||||||
|
kind: GitRepository
|
||||||
|
name: weave-gitops-upstream
|
||||||
|
namespace: flux-system
|
||||||
|
# track upstream tag; see source object for version pin
|
||||||
|
install:
|
||||||
|
remediation:
|
||||||
|
retries: 3
|
||||||
|
upgrade:
|
||||||
|
remediation:
|
||||||
|
retries: 3
|
||||||
|
remediateLastFailure: true
|
||||||
|
cleanupOnFail: true
|
||||||
|
values:
|
||||||
|
adminUser:
|
||||||
|
create: true
|
||||||
|
createClusterRole: true
|
||||||
|
createSecret: true
|
||||||
|
username: admin
|
||||||
|
# bcrypt hash for temporary password "G1tOps!2025" (rotate after login)
|
||||||
|
passwordHash: "$2y$12$wDEOzR1Gc2dbvNSJ3ZXNdOBVFEjC6YASIxnZmHIbO.W1m0fie/QVi"
|
||||||
|
ingress:
|
||||||
|
enabled: true
|
||||||
|
className: traefik
|
||||||
|
annotations:
|
||||||
|
cert-manager.io/cluster-issuer: letsencrypt-prod
|
||||||
|
traefik.ingress.kubernetes.io/router.entrypoints: websecure
|
||||||
|
hosts:
|
||||||
|
- host: cd.bstein.dev
|
||||||
|
paths:
|
||||||
|
- path: /
|
||||||
|
pathType: Prefix
|
||||||
|
tls:
|
||||||
|
- secretName: gitops-ui-tls
|
||||||
|
hosts:
|
||||||
|
- cd.bstein.dev
|
||||||
|
metrics:
|
||||||
|
enabled: true
|
||||||
7
services/gitops-ui/kustomization.yaml
Normal file
7
services/gitops-ui/kustomization.yaml
Normal file
@ -0,0 +1,7 @@
|
|||||||
|
# services/gitops-ui/kustomization.yaml
|
||||||
|
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||||
|
kind: Kustomization
|
||||||
|
namespace: flux-system
|
||||||
|
resources:
|
||||||
|
- source.yaml
|
||||||
|
- helmrelease.yaml
|
||||||
11
services/gitops-ui/source.yaml
Normal file
11
services/gitops-ui/source.yaml
Normal file
@ -0,0 +1,11 @@
|
|||||||
|
# services/gitops-ui/source.yaml
|
||||||
|
apiVersion: source.toolkit.fluxcd.io/v1
|
||||||
|
kind: GitRepository
|
||||||
|
metadata:
|
||||||
|
name: weave-gitops-upstream
|
||||||
|
namespace: flux-system
|
||||||
|
spec:
|
||||||
|
interval: 1h
|
||||||
|
url: https://github.com/weaveworks/weave-gitops.git
|
||||||
|
ref:
|
||||||
|
tag: v0.38.0
|
||||||
@ -5,7 +5,7 @@ metadata:
|
|||||||
name: jitsi
|
name: jitsi
|
||||||
namespace: jitsi
|
namespace: jitsi
|
||||||
annotations:
|
annotations:
|
||||||
cert-manager.io/cluster-issuer: "letsencrypt-prod"
|
cert-manager.io/cluster-issuer: letsencrypt
|
||||||
spec:
|
spec:
|
||||||
ingressClassName: traefik
|
ingressClassName: traefik
|
||||||
tls:
|
tls:
|
||||||
|
|||||||
@ -48,6 +48,20 @@ spec:
|
|||||||
runAsGroup: 0
|
runAsGroup: 0
|
||||||
fsGroup: 1000
|
fsGroup: 1000
|
||||||
fsGroupChangePolicy: OnRootMismatch
|
fsGroupChangePolicy: OnRootMismatch
|
||||||
|
imagePullSecrets:
|
||||||
|
- name: zot-regcred
|
||||||
|
initContainers:
|
||||||
|
- name: mailu-http-listener
|
||||||
|
image: registry.bstein.dev/sso/mailu-http-listener:0.1.0
|
||||||
|
imagePullPolicy: IfNotPresent
|
||||||
|
command: ["/bin/sh", "-c"]
|
||||||
|
args:
|
||||||
|
- |
|
||||||
|
cp /plugin/mailu-http-listener-0.1.0.jar /providers/
|
||||||
|
cp -r /plugin/src /providers/src
|
||||||
|
volumeMounts:
|
||||||
|
- name: providers
|
||||||
|
mountPath: /providers
|
||||||
containers:
|
containers:
|
||||||
- name: keycloak
|
- name: keycloak
|
||||||
image: quay.io/keycloak/keycloak:26.0.7
|
image: quay.io/keycloak/keycloak:26.0.7
|
||||||
@ -104,6 +118,10 @@ spec:
|
|||||||
secretKeyRef:
|
secretKeyRef:
|
||||||
name: keycloak-admin
|
name: keycloak-admin
|
||||||
key: password
|
key: password
|
||||||
|
- name: KC_EVENTS_LISTENERS
|
||||||
|
value: jboss-logging,mailu-http
|
||||||
|
- name: KC_SPI_EVENTS_LISTENER_MAILU-HTTP_ENDPOINT
|
||||||
|
value: http://mailu-sync-listener.mailu-mailserver.svc.cluster.local:8080/events
|
||||||
ports:
|
ports:
|
||||||
- containerPort: 8080
|
- containerPort: 8080
|
||||||
name: http
|
name: http
|
||||||
@ -126,7 +144,11 @@ spec:
|
|||||||
volumeMounts:
|
volumeMounts:
|
||||||
- name: data
|
- name: data
|
||||||
mountPath: /opt/keycloak/data
|
mountPath: /opt/keycloak/data
|
||||||
|
- name: providers
|
||||||
|
mountPath: /opt/keycloak/providers
|
||||||
volumes:
|
volumes:
|
||||||
- name: data
|
- name: data
|
||||||
persistentVolumeClaim:
|
persistentVolumeClaim:
|
||||||
claimName: keycloak-data
|
claimName: keycloak-data
|
||||||
|
- name: providers
|
||||||
|
emptyDir: {}
|
||||||
|
|||||||
13
services/mailu/certificate.yaml
Normal file
13
services/mailu/certificate.yaml
Normal file
@ -0,0 +1,13 @@
|
|||||||
|
# services/mailu/certificate.yaml
|
||||||
|
apiVersion: cert-manager.io/v1
|
||||||
|
kind: Certificate
|
||||||
|
metadata:
|
||||||
|
name: mailu-tls
|
||||||
|
namespace: mailu-mailserver
|
||||||
|
spec:
|
||||||
|
secretName: mailu-certificates
|
||||||
|
issuerRef:
|
||||||
|
kind: ClusterIssuer
|
||||||
|
name: letsencrypt-prod
|
||||||
|
dnsNames:
|
||||||
|
- mail.bstein.dev
|
||||||
287
services/mailu/helmrelease.yaml
Normal file
287
services/mailu/helmrelease.yaml
Normal file
@ -0,0 +1,287 @@
|
|||||||
|
# services/mailu/helmrelease.yaml
|
||||||
|
apiVersion: helm.toolkit.fluxcd.io/v2
|
||||||
|
kind: HelmRelease
|
||||||
|
metadata:
|
||||||
|
name: mailu
|
||||||
|
namespace: mailu-mailserver
|
||||||
|
spec:
|
||||||
|
interval: 30m
|
||||||
|
chart:
|
||||||
|
spec:
|
||||||
|
chart: mailu
|
||||||
|
version: 2.1.2
|
||||||
|
sourceRef:
|
||||||
|
kind: HelmRepository
|
||||||
|
name: mailu
|
||||||
|
namespace: flux-system
|
||||||
|
install:
|
||||||
|
remediation: { retries: 3 }
|
||||||
|
timeout: 10m
|
||||||
|
upgrade:
|
||||||
|
remediation:
|
||||||
|
retries: 3
|
||||||
|
remediateLastFailure: true
|
||||||
|
cleanupOnFail: true
|
||||||
|
timeout: 10m
|
||||||
|
values:
|
||||||
|
mailuVersion: "2024.06"
|
||||||
|
domain: bstein.dev
|
||||||
|
hostnames: [mail.bstein.dev]
|
||||||
|
domains:
|
||||||
|
- name: bstein.dev
|
||||||
|
enabled: true
|
||||||
|
dkim:
|
||||||
|
enabled: true
|
||||||
|
externalRelay:
|
||||||
|
host: "[email-smtp.us-east-2.amazonaws.com]:587"
|
||||||
|
existingSecret: mailu-ses-relay
|
||||||
|
usernameKey: relay-username
|
||||||
|
passwordKey: relay-password
|
||||||
|
timezone: Etc/UTC
|
||||||
|
subnet: 10.42.0.0/16
|
||||||
|
existingSecret: mailu-secret
|
||||||
|
tls:
|
||||||
|
outboundLevel: encrypt
|
||||||
|
externalDatabase:
|
||||||
|
enabled: true
|
||||||
|
type: postgresql
|
||||||
|
host: postgres-service.postgres.svc.cluster.local
|
||||||
|
port: 5432
|
||||||
|
database: mailu
|
||||||
|
username: mailu
|
||||||
|
existingSecret: mailu-db-secret
|
||||||
|
existingSecretUsernameKey: username
|
||||||
|
existingSecretPasswordKey: password
|
||||||
|
existingSecretDatabaseKey: database
|
||||||
|
initialAccount:
|
||||||
|
enabled: true
|
||||||
|
username: test
|
||||||
|
domain: bstein.dev
|
||||||
|
existingSecret: mailu-initial-account-secret
|
||||||
|
existingSecretPasswordKey: password
|
||||||
|
persistence:
|
||||||
|
accessModes: [ReadWriteMany]
|
||||||
|
size: 100Gi
|
||||||
|
storageClass: astreae
|
||||||
|
single_pvc: true
|
||||||
|
front:
|
||||||
|
hostnames: [mail.bstein.dev]
|
||||||
|
proxied: true
|
||||||
|
hostPort:
|
||||||
|
enabled: false
|
||||||
|
https:
|
||||||
|
enabled: false
|
||||||
|
external: false
|
||||||
|
forceHttps: false
|
||||||
|
externalService:
|
||||||
|
enabled: true
|
||||||
|
type: LoadBalancer
|
||||||
|
externalTrafficPolicy: Cluster
|
||||||
|
ports:
|
||||||
|
submission: true
|
||||||
|
nodePorts:
|
||||||
|
pop3: 30010
|
||||||
|
pop3s: 30011
|
||||||
|
imap: 30143
|
||||||
|
imaps: 30993
|
||||||
|
manageSieve: 30419
|
||||||
|
smtp: 30025
|
||||||
|
smtps: 30465
|
||||||
|
submission: 30587
|
||||||
|
logLevel: DEBUG
|
||||||
|
nodeSelector:
|
||||||
|
hardware: rpi4
|
||||||
|
admin:
|
||||||
|
logLevel: DEBUG
|
||||||
|
nodeSelector:
|
||||||
|
hardware: rpi4
|
||||||
|
podLivenessProbe:
|
||||||
|
enabled: true
|
||||||
|
initialDelaySeconds: 30
|
||||||
|
periodSeconds: 10
|
||||||
|
timeoutSeconds: 5
|
||||||
|
failureThreshold: 6
|
||||||
|
successThreshold: 1
|
||||||
|
podReadinessProbe:
|
||||||
|
enabled: true
|
||||||
|
initialDelaySeconds: 20
|
||||||
|
periodSeconds: 10
|
||||||
|
timeoutSeconds: 5
|
||||||
|
failureThreshold: 6
|
||||||
|
successThreshold: 1
|
||||||
|
extraEnvVars:
|
||||||
|
- name: FLASK_DEBUG
|
||||||
|
value: "1"
|
||||||
|
- name: ACCESSLOG
|
||||||
|
value: /dev/stdout
|
||||||
|
- name: ERRORLOG
|
||||||
|
value: /dev/stderr
|
||||||
|
- name: WEBROOT_REDIRECT
|
||||||
|
value: ""
|
||||||
|
- name: FORWARDED_ALLOW_IPS
|
||||||
|
value: 127.0.0.1,10.42.0.0/16
|
||||||
|
- name: DNS_RESOLVERS
|
||||||
|
value: 1.1.1.1,9.9.9.9
|
||||||
|
extraVolumes:
|
||||||
|
- name: unbound-config
|
||||||
|
configMap:
|
||||||
|
name: mailu-unbound
|
||||||
|
- name: unbound-run
|
||||||
|
emptyDir: {}
|
||||||
|
extraVolumeMounts:
|
||||||
|
- name: unbound-run
|
||||||
|
mountPath: /var/lib/unbound
|
||||||
|
extraContainers:
|
||||||
|
- name: unbound
|
||||||
|
image: docker.io/alpine:3.20
|
||||||
|
command: ["/bin/sh", "-c"]
|
||||||
|
args:
|
||||||
|
- |
|
||||||
|
while :; do
|
||||||
|
printf "nameserver 10.43.0.10\n" > /etc/resolv.conf
|
||||||
|
if apk add --no-cache unbound bind-tools; then
|
||||||
|
break
|
||||||
|
fi
|
||||||
|
echo "apk failed, retrying" >&2
|
||||||
|
sleep 10
|
||||||
|
done
|
||||||
|
cat >/etc/resolv.conf <<'EOF'
|
||||||
|
search mailu-mailserver.svc.cluster.local svc.cluster.local cluster.local
|
||||||
|
nameserver 127.0.0.1
|
||||||
|
EOF
|
||||||
|
unbound-anchor -a /var/lib/unbound/root.key || true
|
||||||
|
exec unbound -d -c /opt/unbound/etc/unbound/unbound.conf
|
||||||
|
ports:
|
||||||
|
- containerPort: 53
|
||||||
|
protocol: UDP
|
||||||
|
- containerPort: 53
|
||||||
|
protocol: TCP
|
||||||
|
volumeMounts:
|
||||||
|
- name: unbound-config
|
||||||
|
mountPath: /opt/unbound/etc/unbound
|
||||||
|
- name: unbound-run
|
||||||
|
mountPath: /var/lib/unbound
|
||||||
|
dnsPolicy: None
|
||||||
|
dnsConfig:
|
||||||
|
nameservers:
|
||||||
|
- 127.0.0.1
|
||||||
|
searches:
|
||||||
|
- mailu-mailserver.svc.cluster.local
|
||||||
|
- svc.cluster.local
|
||||||
|
- cluster.local
|
||||||
|
clamav:
|
||||||
|
image:
|
||||||
|
repository: clamav/clamav-debian
|
||||||
|
tag: "1.4"
|
||||||
|
logLevel: DEBUG
|
||||||
|
nodeSelector:
|
||||||
|
hardware: rpi5
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: 200m
|
||||||
|
memory: 1Gi
|
||||||
|
limits:
|
||||||
|
cpu: 500m
|
||||||
|
memory: 3Gi
|
||||||
|
livenessProbe:
|
||||||
|
enabled: false
|
||||||
|
initialDelaySeconds: 300
|
||||||
|
periodSeconds: 30
|
||||||
|
timeoutSeconds: 5
|
||||||
|
failureThreshold: 6
|
||||||
|
successThreshold: 1
|
||||||
|
startupProbe:
|
||||||
|
enabled: false
|
||||||
|
initialDelaySeconds: 60
|
||||||
|
periodSeconds: 30
|
||||||
|
timeoutSeconds: 5
|
||||||
|
failureThreshold: 20
|
||||||
|
successThreshold: 1
|
||||||
|
readinessProbe:
|
||||||
|
enabled: false
|
||||||
|
initialDelaySeconds: 300
|
||||||
|
periodSeconds: 30
|
||||||
|
timeoutSeconds: 5
|
||||||
|
failureThreshold: 6
|
||||||
|
successThreshold: 1
|
||||||
|
dovecot:
|
||||||
|
logLevel: DEBUG
|
||||||
|
nodeSelector:
|
||||||
|
hardware: rpi4
|
||||||
|
oletools:
|
||||||
|
logLevel: DEBUG
|
||||||
|
nodeSelector:
|
||||||
|
hardware: rpi4
|
||||||
|
postfix:
|
||||||
|
logLevel: DEBUG
|
||||||
|
nodeSelector:
|
||||||
|
hardware: rpi4
|
||||||
|
overrides:
|
||||||
|
smtp_use_tls: "yes"
|
||||||
|
smtp_tls_security_level: "encrypt"
|
||||||
|
smtp_sasl_security_options: "noanonymous"
|
||||||
|
redis:
|
||||||
|
enabled: true
|
||||||
|
architecture: standalone
|
||||||
|
logLevel: DEBUG
|
||||||
|
image:
|
||||||
|
repository: bitnamilegacy/redis
|
||||||
|
tag: 8.0.3-debian-12-r3
|
||||||
|
master:
|
||||||
|
nodeSelector:
|
||||||
|
hardware: rpi4
|
||||||
|
persistence:
|
||||||
|
enabled: true
|
||||||
|
accessModes: [ReadWriteMany]
|
||||||
|
size: 8Gi
|
||||||
|
storageClass: astreae
|
||||||
|
rspamd:
|
||||||
|
logLevel: DEBUG
|
||||||
|
nodeSelector:
|
||||||
|
hardware: rpi4
|
||||||
|
persistence:
|
||||||
|
accessModes: [ReadWriteOnce]
|
||||||
|
size: 8Gi
|
||||||
|
storageClass: astreae
|
||||||
|
tika:
|
||||||
|
logLevel: DEBUG
|
||||||
|
nodeSelector:
|
||||||
|
hardware: rpi4
|
||||||
|
global:
|
||||||
|
logLevel: DEBUG
|
||||||
|
storageClass: astreae
|
||||||
|
webmail:
|
||||||
|
enabled: false
|
||||||
|
nodeSelector:
|
||||||
|
hardware: rpi4
|
||||||
|
ingress:
|
||||||
|
enabled: false
|
||||||
|
ingressClassName: traefik
|
||||||
|
tls: true
|
||||||
|
existingSecret: mailu-certificates
|
||||||
|
annotations:
|
||||||
|
traefik.ingress.kubernetes.io/router.entrypoints: websecure
|
||||||
|
traefik.ingress.kubernetes.io/service.serversscheme: https
|
||||||
|
traefik.ingress.kubernetes.io/service.serverstransport: mailu-transport@kubernetescrd
|
||||||
|
extraRules:
|
||||||
|
- host: mail.bstein.dev
|
||||||
|
http:
|
||||||
|
paths:
|
||||||
|
- path: /
|
||||||
|
pathType: Prefix
|
||||||
|
backend:
|
||||||
|
service:
|
||||||
|
name: mailu-front
|
||||||
|
port:
|
||||||
|
number: 443
|
||||||
|
service:
|
||||||
|
ports:
|
||||||
|
smtp:
|
||||||
|
port: 25
|
||||||
|
targetPort: 25
|
||||||
|
smtps:
|
||||||
|
port: 465
|
||||||
|
targetPort: 465
|
||||||
|
submission:
|
||||||
|
port: 587
|
||||||
|
targetPort: 587
|
||||||
19
services/mailu/ingressroute.yaml
Normal file
19
services/mailu/ingressroute.yaml
Normal file
@ -0,0 +1,19 @@
|
|||||||
|
# services/mailu/ingressroute.yaml
|
||||||
|
apiVersion: traefik.io/v1alpha1
|
||||||
|
kind: IngressRoute
|
||||||
|
metadata:
|
||||||
|
name: mailu
|
||||||
|
namespace: mailu-mailserver
|
||||||
|
spec:
|
||||||
|
entryPoints:
|
||||||
|
- websecure
|
||||||
|
routes:
|
||||||
|
- match: Host(`mail.bstein.dev`)
|
||||||
|
kind: Rule
|
||||||
|
services:
|
||||||
|
- name: mailu-front
|
||||||
|
port: 443
|
||||||
|
scheme: https
|
||||||
|
serversTransport: mailu-transport
|
||||||
|
tls:
|
||||||
|
secretName: mailu-certificates
|
||||||
23
services/mailu/kustomization.yaml
Normal file
23
services/mailu/kustomization.yaml
Normal file
@ -0,0 +1,23 @@
|
|||||||
|
# services/mailu/kustomization.yaml
|
||||||
|
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||||
|
kind: Kustomization
|
||||||
|
namespace: mailu-mailserver
|
||||||
|
resources:
|
||||||
|
- namespace.yaml
|
||||||
|
- helmrelease.yaml
|
||||||
|
- certificate.yaml
|
||||||
|
- vip-controller.yaml
|
||||||
|
- unbound-configmap.yaml
|
||||||
|
- serverstransport.yaml
|
||||||
|
- ingressroute.yaml
|
||||||
|
- mailu-sync-job.yaml
|
||||||
|
- mailu-sync-cronjob.yaml
|
||||||
|
- mailu-sync-listener.yaml
|
||||||
|
|
||||||
|
configMapGenerator:
|
||||||
|
- name: mailu-sync-script
|
||||||
|
namespace: mailu-mailserver
|
||||||
|
files:
|
||||||
|
- sync.py=../../scripts/mailu_sync.py
|
||||||
|
options:
|
||||||
|
disableNameSuffixHash: true
|
||||||
77
services/mailu/mailu-sync-cronjob.yaml
Normal file
77
services/mailu/mailu-sync-cronjob.yaml
Normal file
@ -0,0 +1,77 @@
|
|||||||
|
# services/mailu/mailu-sync-cronjob.yaml
|
||||||
|
apiVersion: batch/v1
|
||||||
|
kind: CronJob
|
||||||
|
metadata:
|
||||||
|
name: mailu-sync-nightly
|
||||||
|
namespace: mailu-mailserver
|
||||||
|
spec:
|
||||||
|
schedule: "30 4 * * *"
|
||||||
|
concurrencyPolicy: Forbid
|
||||||
|
jobTemplate:
|
||||||
|
spec:
|
||||||
|
template:
|
||||||
|
spec:
|
||||||
|
restartPolicy: OnFailure
|
||||||
|
containers:
|
||||||
|
- name: mailu-sync
|
||||||
|
image: python:3.11-alpine
|
||||||
|
imagePullPolicy: IfNotPresent
|
||||||
|
command: ["/bin/sh", "-c"]
|
||||||
|
args:
|
||||||
|
- |
|
||||||
|
pip install --no-cache-dir requests psycopg2-binary passlib >/tmp/pip.log \
|
||||||
|
&& python /app/sync.py
|
||||||
|
env:
|
||||||
|
- name: KEYCLOAK_BASE_URL
|
||||||
|
value: http://keycloak.sso.svc.cluster.local
|
||||||
|
- name: KEYCLOAK_REALM
|
||||||
|
value: atlas
|
||||||
|
- name: MAILU_DOMAIN
|
||||||
|
value: bstein.dev
|
||||||
|
- name: MAILU_DEFAULT_QUOTA
|
||||||
|
value: "20000000000"
|
||||||
|
- name: MAILU_DB_HOST
|
||||||
|
value: postgres-service.postgres.svc.cluster.local
|
||||||
|
- name: MAILU_DB_PORT
|
||||||
|
value: "5432"
|
||||||
|
- name: MAILU_DB_NAME
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: mailu-db-secret
|
||||||
|
key: database
|
||||||
|
- name: MAILU_DB_USER
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: mailu-db-secret
|
||||||
|
key: username
|
||||||
|
- name: MAILU_DB_PASSWORD
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: mailu-db-secret
|
||||||
|
key: password
|
||||||
|
- name: KEYCLOAK_CLIENT_ID
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: mailu-sync-credentials
|
||||||
|
key: client-id
|
||||||
|
- name: KEYCLOAK_CLIENT_SECRET
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: mailu-sync-credentials
|
||||||
|
key: client-secret
|
||||||
|
volumeMounts:
|
||||||
|
- name: sync-script
|
||||||
|
mountPath: /app/sync.py
|
||||||
|
subPath: sync.py
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: 50m
|
||||||
|
memory: 128Mi
|
||||||
|
limits:
|
||||||
|
cpu: 200m
|
||||||
|
memory: 256Mi
|
||||||
|
volumes:
|
||||||
|
- name: sync-script
|
||||||
|
configMap:
|
||||||
|
name: mailu-sync-script
|
||||||
|
defaultMode: 0444
|
||||||
73
services/mailu/mailu-sync-job.yaml
Normal file
73
services/mailu/mailu-sync-job.yaml
Normal file
@ -0,0 +1,73 @@
|
|||||||
|
# services/mailu/mailu-sync-job.yaml
|
||||||
|
apiVersion: batch/v1
|
||||||
|
kind: Job
|
||||||
|
metadata:
|
||||||
|
name: mailu-sync
|
||||||
|
namespace: mailu-mailserver
|
||||||
|
spec:
|
||||||
|
template:
|
||||||
|
spec:
|
||||||
|
restartPolicy: OnFailure
|
||||||
|
containers:
|
||||||
|
- name: mailu-sync
|
||||||
|
image: python:3.11-alpine
|
||||||
|
imagePullPolicy: IfNotPresent
|
||||||
|
command: ["/bin/sh", "-c"]
|
||||||
|
args:
|
||||||
|
- |
|
||||||
|
pip install --no-cache-dir requests psycopg2-binary passlib >/tmp/pip.log \
|
||||||
|
&& python /app/sync.py
|
||||||
|
env:
|
||||||
|
- name: KEYCLOAK_BASE_URL
|
||||||
|
value: http://keycloak.sso.svc.cluster.local
|
||||||
|
- name: KEYCLOAK_REALM
|
||||||
|
value: atlas
|
||||||
|
- name: MAILU_DOMAIN
|
||||||
|
value: bstein.dev
|
||||||
|
- name: MAILU_DEFAULT_QUOTA
|
||||||
|
value: "20000000000"
|
||||||
|
- name: MAILU_DB_HOST
|
||||||
|
value: postgres-service.postgres.svc.cluster.local
|
||||||
|
- name: MAILU_DB_PORT
|
||||||
|
value: "5432"
|
||||||
|
- name: MAILU_DB_NAME
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: mailu-db-secret
|
||||||
|
key: database
|
||||||
|
- name: MAILU_DB_USER
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: mailu-db-secret
|
||||||
|
key: username
|
||||||
|
- name: MAILU_DB_PASSWORD
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: mailu-db-secret
|
||||||
|
key: password
|
||||||
|
- name: KEYCLOAK_CLIENT_ID
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: mailu-sync-credentials
|
||||||
|
key: client-id
|
||||||
|
- name: KEYCLOAK_CLIENT_SECRET
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: mailu-sync-credentials
|
||||||
|
key: client-secret
|
||||||
|
volumeMounts:
|
||||||
|
- name: sync-script
|
||||||
|
mountPath: /app/sync.py
|
||||||
|
subPath: sync.py
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: 50m
|
||||||
|
memory: 128Mi
|
||||||
|
limits:
|
||||||
|
cpu: 200m
|
||||||
|
memory: 256Mi
|
||||||
|
volumes:
|
||||||
|
- name: sync-script
|
||||||
|
configMap:
|
||||||
|
name: mailu-sync-script
|
||||||
|
defaultMode: 0444
|
||||||
154
services/mailu/mailu-sync-listener.yaml
Normal file
154
services/mailu/mailu-sync-listener.yaml
Normal file
@ -0,0 +1,154 @@
|
|||||||
|
# services/mailu/mailu-sync-listener.yaml
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Service
|
||||||
|
metadata:
|
||||||
|
name: mailu-sync-listener
|
||||||
|
namespace: mailu-mailserver
|
||||||
|
spec:
|
||||||
|
selector:
|
||||||
|
app: mailu-sync-listener
|
||||||
|
ports:
|
||||||
|
- name: http
|
||||||
|
port: 8080
|
||||||
|
targetPort: 8080
|
||||||
|
---
|
||||||
|
apiVersion: apps/v1
|
||||||
|
kind: Deployment
|
||||||
|
metadata:
|
||||||
|
name: mailu-sync-listener
|
||||||
|
namespace: mailu-mailserver
|
||||||
|
labels:
|
||||||
|
app: mailu-sync-listener
|
||||||
|
spec:
|
||||||
|
replicas: 1
|
||||||
|
selector:
|
||||||
|
matchLabels:
|
||||||
|
app: mailu-sync-listener
|
||||||
|
template:
|
||||||
|
metadata:
|
||||||
|
labels:
|
||||||
|
app: mailu-sync-listener
|
||||||
|
spec:
|
||||||
|
restartPolicy: Always
|
||||||
|
containers:
|
||||||
|
- name: listener
|
||||||
|
image: python:3.11-alpine
|
||||||
|
imagePullPolicy: IfNotPresent
|
||||||
|
command: ["/bin/sh", "-c"]
|
||||||
|
args:
|
||||||
|
- |
|
||||||
|
pip install --no-cache-dir requests psycopg2-binary passlib >/tmp/pip.log \
|
||||||
|
&& python /app/listener.py
|
||||||
|
env:
|
||||||
|
- name: KEYCLOAK_BASE_URL
|
||||||
|
value: http://keycloak.sso.svc.cluster.local
|
||||||
|
- name: KEYCLOAK_REALM
|
||||||
|
value: atlas
|
||||||
|
- name: MAILU_DOMAIN
|
||||||
|
value: bstein.dev
|
||||||
|
- name: MAILU_DEFAULT_QUOTA
|
||||||
|
value: "20000000000"
|
||||||
|
- name: MAILU_DB_HOST
|
||||||
|
value: postgres-service.postgres.svc.cluster.local
|
||||||
|
- name: MAILU_DB_PORT
|
||||||
|
value: "5432"
|
||||||
|
- name: MAILU_DB_NAME
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: mailu-db-secret
|
||||||
|
key: database
|
||||||
|
- name: MAILU_DB_USER
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: mailu-db-secret
|
||||||
|
key: username
|
||||||
|
- name: MAILU_DB_PASSWORD
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: mailu-db-secret
|
||||||
|
key: password
|
||||||
|
- name: KEYCLOAK_CLIENT_ID
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: mailu-sync-credentials
|
||||||
|
key: client-id
|
||||||
|
- name: KEYCLOAK_CLIENT_SECRET
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: mailu-sync-credentials
|
||||||
|
key: client-secret
|
||||||
|
volumeMounts:
|
||||||
|
- name: sync-script
|
||||||
|
mountPath: /app/sync.py
|
||||||
|
subPath: sync.py
|
||||||
|
- name: listener-script
|
||||||
|
mountPath: /app/listener.py
|
||||||
|
subPath: listener.py
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: 50m
|
||||||
|
memory: 128Mi
|
||||||
|
limits:
|
||||||
|
cpu: 200m
|
||||||
|
memory: 256Mi
|
||||||
|
volumes:
|
||||||
|
- name: sync-script
|
||||||
|
configMap:
|
||||||
|
name: mailu-sync-script
|
||||||
|
defaultMode: 0444
|
||||||
|
- name: listener-script
|
||||||
|
configMap:
|
||||||
|
name: mailu-sync-listener
|
||||||
|
defaultMode: 0444
|
||||||
|
---
|
||||||
|
apiVersion: v1
|
||||||
|
kind: ConfigMap
|
||||||
|
metadata:
|
||||||
|
name: mailu-sync-listener
|
||||||
|
namespace: mailu-mailserver
|
||||||
|
data:
|
||||||
|
listener.py: |
|
||||||
|
import http.server
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import subprocess
|
||||||
|
import threading
|
||||||
|
|
||||||
|
from time import time
|
||||||
|
|
||||||
|
# Simple debounce to avoid hammering on bursts
|
||||||
|
MIN_INTERVAL_SECONDS = 10
|
||||||
|
last_run = 0.0
|
||||||
|
lock = threading.Lock()
|
||||||
|
|
||||||
|
def trigger_sync():
|
||||||
|
global last_run
|
||||||
|
with lock:
|
||||||
|
now = time()
|
||||||
|
if now - last_run < MIN_INTERVAL_SECONDS:
|
||||||
|
return
|
||||||
|
last_run = now
|
||||||
|
# Fire and forget; output to stdout
|
||||||
|
subprocess.Popen(["python", "/app/sync.py"], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
|
||||||
|
|
||||||
|
class Handler(http.server.BaseHTTPRequestHandler):
|
||||||
|
def do_POST(self):
|
||||||
|
length = int(self.headers.get("Content-Length", 0))
|
||||||
|
body = self.rfile.read(length) if length else b""
|
||||||
|
try:
|
||||||
|
json.loads(body or b"{}")
|
||||||
|
except json.JSONDecodeError:
|
||||||
|
self.send_response(400)
|
||||||
|
self.end_headers()
|
||||||
|
return
|
||||||
|
trigger_sync()
|
||||||
|
self.send_response(202)
|
||||||
|
self.end_headers()
|
||||||
|
|
||||||
|
def log_message(self, fmt, *args):
|
||||||
|
# Quiet logging
|
||||||
|
return
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
server = http.server.ThreadingHTTPServer(("", 8080), Handler)
|
||||||
|
server.serve_forever()
|
||||||
5
services/mailu/namespace.yaml
Normal file
5
services/mailu/namespace.yaml
Normal file
@ -0,0 +1,5 @@
|
|||||||
|
# services/mailu/namespace.yaml
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Namespace
|
||||||
|
metadata:
|
||||||
|
name: mailu-mailserver
|
||||||
10
services/mailu/serverstransport.yaml
Normal file
10
services/mailu/serverstransport.yaml
Normal file
@ -0,0 +1,10 @@
|
|||||||
|
# services/mailu/serverstransport.yaml
|
||||||
|
apiVersion: traefik.io/v1alpha1
|
||||||
|
kind: ServersTransport
|
||||||
|
metadata:
|
||||||
|
name: mailu-transport
|
||||||
|
namespace: mailu-mailserver
|
||||||
|
spec:
|
||||||
|
# Force SNI to mail.bstein.dev and skip backend cert verification (backend cert is for the host, not the pod IP).
|
||||||
|
serverName: mail.bstein.dev
|
||||||
|
insecureSkipVerify: true
|
||||||
49
services/mailu/unbound-configmap.yaml
Normal file
49
services/mailu/unbound-configmap.yaml
Normal file
@ -0,0 +1,49 @@
|
|||||||
|
# services/mailu/unbound-configmap.yaml
|
||||||
|
apiVersion: v1
|
||||||
|
kind: ConfigMap
|
||||||
|
metadata:
|
||||||
|
name: mailu-unbound
|
||||||
|
namespace: mailu-mailserver
|
||||||
|
data:
|
||||||
|
unbound.conf: |
|
||||||
|
server:
|
||||||
|
verbosity: 1
|
||||||
|
interface: 0.0.0.0
|
||||||
|
do-ip4: yes
|
||||||
|
do-ip6: no
|
||||||
|
do-udp: yes
|
||||||
|
do-tcp: yes
|
||||||
|
auto-trust-anchor-file: "/var/lib/unbound/root.key"
|
||||||
|
prefetch: yes
|
||||||
|
qname-minimisation: yes
|
||||||
|
harden-dnssec-stripped: yes
|
||||||
|
val-clean-additional: yes
|
||||||
|
domain-insecure: "mailu-mailserver.svc.cluster.local."
|
||||||
|
domain-insecure: "svc.cluster.local."
|
||||||
|
domain-insecure: "cluster.local."
|
||||||
|
cache-min-ttl: 120
|
||||||
|
cache-max-ttl: 86400
|
||||||
|
access-control: 0.0.0.0/0 allow
|
||||||
|
|
||||||
|
forward-zone:
|
||||||
|
name: "mailu-mailserver.svc.cluster.local."
|
||||||
|
forward-addr: 10.43.0.10
|
||||||
|
forward-no-cache: yes
|
||||||
|
forward-first: yes
|
||||||
|
|
||||||
|
forward-zone:
|
||||||
|
name: "svc.cluster.local."
|
||||||
|
forward-addr: 10.43.0.10
|
||||||
|
forward-no-cache: yes
|
||||||
|
forward-first: yes
|
||||||
|
|
||||||
|
forward-zone:
|
||||||
|
name: "cluster.local."
|
||||||
|
forward-addr: 10.43.0.10
|
||||||
|
forward-no-cache: yes
|
||||||
|
forward-first: yes
|
||||||
|
|
||||||
|
forward-zone:
|
||||||
|
name: "."
|
||||||
|
forward-addr: 9.9.9.9
|
||||||
|
forward-addr: 1.1.1.1
|
||||||
71
services/mailu/vip-controller.yaml
Normal file
71
services/mailu/vip-controller.yaml
Normal file
@ -0,0 +1,71 @@
|
|||||||
|
# services/mailu/vip-controller.yaml
|
||||||
|
---
|
||||||
|
apiVersion: v1
|
||||||
|
kind: ServiceAccount
|
||||||
|
metadata:
|
||||||
|
name: vip-controller
|
||||||
|
namespace: mailu-mailserver
|
||||||
|
---
|
||||||
|
apiVersion: rbac.authorization.k8s.io/v1
|
||||||
|
kind: Role
|
||||||
|
metadata:
|
||||||
|
name: vip-controller-role
|
||||||
|
namespace: mailu-mailserver
|
||||||
|
rules:
|
||||||
|
- apiGroups: ["apps"]
|
||||||
|
resources: ["deployments"]
|
||||||
|
verbs: ["get", "list", "patch", "update"]
|
||||||
|
---
|
||||||
|
apiVersion: rbac.authorization.k8s.io/v1
|
||||||
|
kind: RoleBinding
|
||||||
|
metadata:
|
||||||
|
name: vip-controller-binding
|
||||||
|
namespace: mailu-mailserver
|
||||||
|
roleRef:
|
||||||
|
apiGroup: rbac.authorization.k8s.io
|
||||||
|
kind: Role
|
||||||
|
name: vip-controller-role
|
||||||
|
subjects:
|
||||||
|
- kind: ServiceAccount
|
||||||
|
name: vip-controller
|
||||||
|
namespace: mailu-mailserver
|
||||||
|
---
|
||||||
|
apiVersion: apps/v1
|
||||||
|
kind: DaemonSet
|
||||||
|
metadata:
|
||||||
|
name: vip-controller
|
||||||
|
namespace: mailu-mailserver
|
||||||
|
spec:
|
||||||
|
selector:
|
||||||
|
matchLabels:
|
||||||
|
app: vip-controller
|
||||||
|
template:
|
||||||
|
metadata:
|
||||||
|
labels:
|
||||||
|
app: vip-controller
|
||||||
|
spec:
|
||||||
|
serviceAccountName: vip-controller
|
||||||
|
hostNetwork: true
|
||||||
|
nodeSelector:
|
||||||
|
mailu.bstein.dev/vip: "true"
|
||||||
|
containers:
|
||||||
|
- name: vip-controller
|
||||||
|
image: lachlanevenson/k8s-kubectl:latest
|
||||||
|
imagePullPolicy: IfNotPresent
|
||||||
|
command:
|
||||||
|
- /bin/sh
|
||||||
|
- -c
|
||||||
|
args:
|
||||||
|
- |
|
||||||
|
set -e
|
||||||
|
while true; do
|
||||||
|
if ip addr show end0 | grep -q 'inet 192\.168\.22\.9/32'; then
|
||||||
|
NODE=$(hostname)
|
||||||
|
echo "VIP found on node ${NODE}."
|
||||||
|
kubectl patch deployment mailu-front -n mailu-mailserver --type='merge' \
|
||||||
|
-p "{\"spec\":{\"template\":{\"spec\":{\"nodeSelector\":{\"kubernetes.io/hostname\":\"${NODE}\"}}}}}"
|
||||||
|
else
|
||||||
|
echo "No VIP on node ${HOSTNAME}."
|
||||||
|
fi
|
||||||
|
sleep 60
|
||||||
|
done
|
||||||
@ -1,28 +0,0 @@
|
|||||||
# services/monitoring
|
|
||||||
|
|
||||||
## Grafana admin secret
|
|
||||||
|
|
||||||
The Grafana Helm release expects a pre-existing secret named `grafana-admin`
|
|
||||||
in the `monitoring` namespace. Create or rotate it with:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
kubectl create secret generic grafana-admin \
|
|
||||||
--namespace monitoring \
|
|
||||||
--from-literal=admin-user=admin \
|
|
||||||
--from-literal=admin-password='REPLACE_ME'
|
|
||||||
```
|
|
||||||
|
|
||||||
Update the password whenever you rotate credentials.
|
|
||||||
|
|
||||||
## DCGM exporter image
|
|
||||||
|
|
||||||
The NVIDIA GPU metrics DaemonSet expects `registry.bstein.dev/monitoring/dcgm-exporter:4.4.2-4.7.0-ubuntu22.04`, mirrored from `docker.io/nvidia/dcgm-exporter:4.4.2-4.7.0-ubuntu22.04`. Refresh it in Zot when bumping versions:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
skopeo copy \
|
|
||||||
--all \
|
|
||||||
docker://docker.io/nvidia/dcgm-exporter:4.4.2-4.7.0-ubuntu22.04 \
|
|
||||||
docker://registry.bstein.dev/monitoring/dcgm-exporter:4.4.2-4.7.0-ubuntu22.04
|
|
||||||
```
|
|
||||||
|
|
||||||
When finished mirroring from the control-plane, you can remove temporary tooling with `sudo apt-get purge -y skopeo && sudo apt-get autoremove -y` and clear `~/.config/containers/auth.json`.
|
|
||||||
@ -40,9 +40,7 @@
|
|||||||
"placement": "right"
|
"placement": "right"
|
||||||
},
|
},
|
||||||
"pieType": "pie",
|
"pieType": "pie",
|
||||||
"displayLabels": [
|
"displayLabels": [],
|
||||||
"percent"
|
|
||||||
],
|
|
||||||
"tooltip": {
|
"tooltip": {
|
||||||
"mode": "single"
|
"mode": "single"
|
||||||
},
|
},
|
||||||
@ -153,12 +151,16 @@
|
|||||||
],
|
],
|
||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"unit": "percent"
|
"unit": "percent",
|
||||||
|
"custom": {
|
||||||
|
"filterable": true
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"overrides": []
|
"overrides": []
|
||||||
},
|
},
|
||||||
"options": {
|
"options": {
|
||||||
"showHeader": true
|
"showHeader": true,
|
||||||
|
"columnFilters": false
|
||||||
},
|
},
|
||||||
"transformations": [
|
"transformations": [
|
||||||
{
|
{
|
||||||
|
|||||||
@ -7,46 +7,55 @@
|
|||||||
{
|
{
|
||||||
"id": 1,
|
"id": 1,
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"title": "Ingress Traffic",
|
"title": "Ingress Success Rate (5m)",
|
||||||
"datasource": {
|
"datasource": {
|
||||||
"type": "prometheus",
|
"type": "prometheus",
|
||||||
"uid": "atlas-vm"
|
"uid": "atlas-vm"
|
||||||
},
|
},
|
||||||
"gridPos": {
|
"gridPos": {
|
||||||
"h": 4,
|
"h": 4,
|
||||||
"w": 8,
|
"w": 6,
|
||||||
"x": 0,
|
"x": 0,
|
||||||
"y": 0
|
"y": 0
|
||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "sum(rate(node_network_receive_bytes_total{device!~\"lo|cni.*|veth.*|flannel.*|docker.*|virbr.*|vxlan.*|wg.*\"}[5m])) or on() vector(0)",
|
"expr": "(sum(rate(traefik_entrypoint_requests_total{code!~\"5..\"}[5m]))) / clamp_min(sum(rate(traefik_entrypoint_requests_total[5m])), 1)",
|
||||||
"refId": "A"
|
"refId": "A"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
"mode": "absolute",
|
"mode": "absolute",
|
||||||
"steps": [
|
"steps": [
|
||||||
{
|
{
|
||||||
"color": "rgba(115, 115, 115, 1)",
|
"color": "red",
|
||||||
"value": null
|
"value": null
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 0.995
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "yellow",
|
||||||
|
"value": 0.999
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"color": "green",
|
"color": "green",
|
||||||
"value": 1
|
"value": 0.9995
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"unit": "Bps",
|
"unit": "percentunit",
|
||||||
"custom": {
|
"custom": {
|
||||||
"displayMode": "auto"
|
"displayMode": "auto"
|
||||||
}
|
},
|
||||||
|
"decimals": 2
|
||||||
},
|
},
|
||||||
"overrides": []
|
"overrides": []
|
||||||
},
|
},
|
||||||
@ -67,46 +76,55 @@
|
|||||||
{
|
{
|
||||||
"id": 2,
|
"id": 2,
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"title": "Egress Traffic",
|
"title": "Error Budget Burn (1h)",
|
||||||
"datasource": {
|
"datasource": {
|
||||||
"type": "prometheus",
|
"type": "prometheus",
|
||||||
"uid": "atlas-vm"
|
"uid": "atlas-vm"
|
||||||
},
|
},
|
||||||
"gridPos": {
|
"gridPos": {
|
||||||
"h": 4,
|
"h": 4,
|
||||||
"w": 8,
|
"w": 6,
|
||||||
"x": 8,
|
"x": 6,
|
||||||
"y": 0
|
"y": 0
|
||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "sum(rate(node_network_transmit_bytes_total{device!~\"lo|cni.*|veth.*|flannel.*|docker.*|virbr.*|vxlan.*|wg.*\"}[5m])) or on() vector(0)",
|
"expr": "(1 - ((sum(rate(traefik_entrypoint_requests_total{code!~\"5..\"}[1h]))) / clamp_min(sum(rate(traefik_entrypoint_requests_total[1h])), 1))) / 0.0010000000000000009",
|
||||||
"refId": "A"
|
"refId": "A"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
"mode": "absolute",
|
"mode": "absolute",
|
||||||
"steps": [
|
"steps": [
|
||||||
{
|
{
|
||||||
"color": "rgba(115, 115, 115, 1)",
|
"color": "green",
|
||||||
"value": null
|
"value": null
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"color": "green",
|
"color": "yellow",
|
||||||
"value": 1
|
"value": 1
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 2
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "red",
|
||||||
|
"value": 4
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"unit": "Bps",
|
"unit": "none",
|
||||||
"custom": {
|
"custom": {
|
||||||
"displayMode": "auto"
|
"displayMode": "auto"
|
||||||
}
|
},
|
||||||
|
"decimals": 2
|
||||||
},
|
},
|
||||||
"overrides": []
|
"overrides": []
|
||||||
},
|
},
|
||||||
@ -127,7 +145,145 @@
|
|||||||
{
|
{
|
||||||
"id": 3,
|
"id": 3,
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"title": "Intra-Cluster Traffic",
|
"title": "Error Budget Burn (6h)",
|
||||||
|
"datasource": {
|
||||||
|
"type": "prometheus",
|
||||||
|
"uid": "atlas-vm"
|
||||||
|
},
|
||||||
|
"gridPos": {
|
||||||
|
"h": 4,
|
||||||
|
"w": 6,
|
||||||
|
"x": 12,
|
||||||
|
"y": 0
|
||||||
|
},
|
||||||
|
"targets": [
|
||||||
|
{
|
||||||
|
"expr": "(1 - ((sum(rate(traefik_entrypoint_requests_total{code!~\"5..\"}[6h]))) / clamp_min(sum(rate(traefik_entrypoint_requests_total[6h])), 1))) / 0.0010000000000000009",
|
||||||
|
"refId": "A"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"fieldConfig": {
|
||||||
|
"defaults": {
|
||||||
|
"color": {
|
||||||
|
"mode": "thresholds"
|
||||||
|
},
|
||||||
|
"mappings": [],
|
||||||
|
"thresholds": {
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{
|
||||||
|
"color": "green",
|
||||||
|
"value": null
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "yellow",
|
||||||
|
"value": 1
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 2
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "red",
|
||||||
|
"value": 4
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"unit": "none",
|
||||||
|
"custom": {
|
||||||
|
"displayMode": "auto"
|
||||||
|
},
|
||||||
|
"decimals": 2
|
||||||
|
},
|
||||||
|
"overrides": []
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"colorMode": "value",
|
||||||
|
"graphMode": "area",
|
||||||
|
"justifyMode": "center",
|
||||||
|
"reduceOptions": {
|
||||||
|
"calcs": [
|
||||||
|
"lastNotNull"
|
||||||
|
],
|
||||||
|
"fields": "",
|
||||||
|
"values": false
|
||||||
|
},
|
||||||
|
"textMode": "value"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 4,
|
||||||
|
"type": "stat",
|
||||||
|
"title": "Edge P99 Latency (ms)",
|
||||||
|
"datasource": {
|
||||||
|
"type": "prometheus",
|
||||||
|
"uid": "atlas-vm"
|
||||||
|
},
|
||||||
|
"gridPos": {
|
||||||
|
"h": 4,
|
||||||
|
"w": 6,
|
||||||
|
"x": 18,
|
||||||
|
"y": 0
|
||||||
|
},
|
||||||
|
"targets": [
|
||||||
|
{
|
||||||
|
"expr": "histogram_quantile(0.99, sum by (le) (rate(traefik_entrypoint_request_duration_seconds_bucket[5m]))) * 1000",
|
||||||
|
"refId": "A"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"fieldConfig": {
|
||||||
|
"defaults": {
|
||||||
|
"color": {
|
||||||
|
"mode": "thresholds"
|
||||||
|
},
|
||||||
|
"mappings": [],
|
||||||
|
"thresholds": {
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{
|
||||||
|
"color": "green",
|
||||||
|
"value": null
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "yellow",
|
||||||
|
"value": 200
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 350
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "red",
|
||||||
|
"value": 500
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"unit": "ms",
|
||||||
|
"custom": {
|
||||||
|
"displayMode": "auto"
|
||||||
|
},
|
||||||
|
"decimals": 1
|
||||||
|
},
|
||||||
|
"overrides": []
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"colorMode": "value",
|
||||||
|
"graphMode": "area",
|
||||||
|
"justifyMode": "center",
|
||||||
|
"reduceOptions": {
|
||||||
|
"calcs": [
|
||||||
|
"lastNotNull"
|
||||||
|
],
|
||||||
|
"fields": "",
|
||||||
|
"values": false
|
||||||
|
},
|
||||||
|
"textMode": "value"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 5,
|
||||||
|
"type": "stat",
|
||||||
|
"title": "Ingress Traffic",
|
||||||
"datasource": {
|
"datasource": {
|
||||||
"type": "prometheus",
|
"type": "prometheus",
|
||||||
"uid": "atlas-vm"
|
"uid": "atlas-vm"
|
||||||
@ -135,19 +291,19 @@
|
|||||||
"gridPos": {
|
"gridPos": {
|
||||||
"h": 4,
|
"h": 4,
|
||||||
"w": 8,
|
"w": 8,
|
||||||
"x": 16,
|
"x": 0,
|
||||||
"y": 0
|
"y": 4
|
||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "sum(rate(container_network_receive_bytes_total{namespace!=\"traefik\",pod!=\"\"}[5m]) + rate(container_network_transmit_bytes_total{namespace!=\"traefik\",pod!=\"\"}[5m])) or on() vector(0)",
|
"expr": "sum(rate(node_network_receive_bytes_total{device!~\"lo|cni.*|veth.*|flannel.*|docker.*|virbr.*|vxlan.*|wg.*\"}[5m])) or on() vector(0)",
|
||||||
"refId": "A"
|
"refId": "A"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -185,9 +341,9 @@
|
|||||||
}
|
}
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"id": 4,
|
"id": 6,
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"title": "Top Router req/s",
|
"title": "Egress Traffic",
|
||||||
"datasource": {
|
"datasource": {
|
||||||
"type": "prometheus",
|
"type": "prometheus",
|
||||||
"uid": "atlas-vm"
|
"uid": "atlas-vm"
|
||||||
@ -195,20 +351,19 @@
|
|||||||
"gridPos": {
|
"gridPos": {
|
||||||
"h": 4,
|
"h": 4,
|
||||||
"w": 8,
|
"w": 8,
|
||||||
"x": 0,
|
"x": 8,
|
||||||
"y": 4
|
"y": 4
|
||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "topk(1, sum by (router) (rate(traefik_router_requests_total[5m])))",
|
"expr": "sum(rate(node_network_transmit_bytes_total{device!~\"lo|cni.*|veth.*|flannel.*|docker.*|virbr.*|vxlan.*|wg.*\"}[5m])) or on() vector(0)",
|
||||||
"refId": "A",
|
"refId": "A"
|
||||||
"legendFormat": "{{router}}"
|
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -224,7 +379,7 @@
|
|||||||
}
|
}
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"unit": "req/s",
|
"unit": "Bps",
|
||||||
"custom": {
|
"custom": {
|
||||||
"displayMode": "auto"
|
"displayMode": "auto"
|
||||||
}
|
}
|
||||||
@ -246,7 +401,67 @@
|
|||||||
}
|
}
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"id": 5,
|
"id": 7,
|
||||||
|
"type": "stat",
|
||||||
|
"title": "Intra-Cluster Traffic",
|
||||||
|
"datasource": {
|
||||||
|
"type": "prometheus",
|
||||||
|
"uid": "atlas-vm"
|
||||||
|
},
|
||||||
|
"gridPos": {
|
||||||
|
"h": 4,
|
||||||
|
"w": 8,
|
||||||
|
"x": 16,
|
||||||
|
"y": 4
|
||||||
|
},
|
||||||
|
"targets": [
|
||||||
|
{
|
||||||
|
"expr": "sum(rate(container_network_receive_bytes_total{namespace!=\"traefik\",pod!=\"\"}[5m]) + rate(container_network_transmit_bytes_total{namespace!=\"traefik\",pod!=\"\"}[5m])) or on() vector(0)",
|
||||||
|
"refId": "A"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"fieldConfig": {
|
||||||
|
"defaults": {
|
||||||
|
"color": {
|
||||||
|
"mode": "thresholds"
|
||||||
|
},
|
||||||
|
"mappings": [],
|
||||||
|
"thresholds": {
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{
|
||||||
|
"color": "rgba(115, 115, 115, 1)",
|
||||||
|
"value": null
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "green",
|
||||||
|
"value": 1
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"unit": "Bps",
|
||||||
|
"custom": {
|
||||||
|
"displayMode": "auto"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"overrides": []
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"colorMode": "value",
|
||||||
|
"graphMode": "area",
|
||||||
|
"justifyMode": "center",
|
||||||
|
"reduceOptions": {
|
||||||
|
"calcs": [
|
||||||
|
"lastNotNull"
|
||||||
|
],
|
||||||
|
"fields": "",
|
||||||
|
"values": false
|
||||||
|
},
|
||||||
|
"textMode": "value"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 8,
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
"title": "Per-Node Throughput",
|
"title": "Per-Node Throughput",
|
||||||
"datasource": {
|
"datasource": {
|
||||||
@ -283,7 +498,7 @@
|
|||||||
}
|
}
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"id": 6,
|
"id": 9,
|
||||||
"type": "table",
|
"type": "table",
|
||||||
"title": "Top Namespaces",
|
"title": "Top Namespaces",
|
||||||
"datasource": {
|
"datasource": {
|
||||||
@ -304,12 +519,16 @@
|
|||||||
],
|
],
|
||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"unit": "Bps"
|
"unit": "Bps",
|
||||||
|
"custom": {
|
||||||
|
"filterable": true
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"overrides": []
|
"overrides": []
|
||||||
},
|
},
|
||||||
"options": {
|
"options": {
|
||||||
"showHeader": true
|
"showHeader": true,
|
||||||
|
"columnFilters": false
|
||||||
},
|
},
|
||||||
"transformations": [
|
"transformations": [
|
||||||
{
|
{
|
||||||
@ -319,7 +538,7 @@
|
|||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"id": 7,
|
"id": 10,
|
||||||
"type": "table",
|
"type": "table",
|
||||||
"title": "Top Pods",
|
"title": "Top Pods",
|
||||||
"datasource": {
|
"datasource": {
|
||||||
@ -340,12 +559,16 @@
|
|||||||
],
|
],
|
||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"unit": "Bps"
|
"unit": "Bps",
|
||||||
|
"custom": {
|
||||||
|
"filterable": true
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"overrides": []
|
"overrides": []
|
||||||
},
|
},
|
||||||
"options": {
|
"options": {
|
||||||
"showHeader": true
|
"showHeader": true,
|
||||||
|
"columnFilters": false
|
||||||
},
|
},
|
||||||
"transformations": [
|
"transformations": [
|
||||||
{
|
{
|
||||||
@ -355,7 +578,7 @@
|
|||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"id": 8,
|
"id": 11,
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
"title": "Traefik Routers (req/s)",
|
"title": "Traefik Routers (req/s)",
|
||||||
"datasource": {
|
"datasource": {
|
||||||
@ -392,7 +615,7 @@
|
|||||||
}
|
}
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"id": 9,
|
"id": 12,
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
"title": "Traefik Entrypoints (req/s)",
|
"title": "Traefik Entrypoints (req/s)",
|
||||||
"datasource": {
|
"datasource": {
|
||||||
|
|||||||
@ -27,7 +27,7 @@
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -88,7 +88,7 @@
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -149,7 +149,7 @@
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -186,6 +186,213 @@
|
|||||||
"textMode": "value"
|
"textMode": "value"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"id": 9,
|
||||||
|
"type": "stat",
|
||||||
|
"title": "API Server 5xx rate",
|
||||||
|
"datasource": {
|
||||||
|
"type": "prometheus",
|
||||||
|
"uid": "atlas-vm"
|
||||||
|
},
|
||||||
|
"gridPos": {
|
||||||
|
"h": 4,
|
||||||
|
"w": 8,
|
||||||
|
"x": 0,
|
||||||
|
"y": 4
|
||||||
|
},
|
||||||
|
"targets": [
|
||||||
|
{
|
||||||
|
"expr": "sum(rate(apiserver_request_total{code=~\"5..\"}[5m]))",
|
||||||
|
"refId": "A"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"fieldConfig": {
|
||||||
|
"defaults": {
|
||||||
|
"color": {
|
||||||
|
"mode": "thresholds"
|
||||||
|
},
|
||||||
|
"mappings": [],
|
||||||
|
"thresholds": {
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{
|
||||||
|
"color": "green",
|
||||||
|
"value": null
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "yellow",
|
||||||
|
"value": 0.05
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 0.2
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "red",
|
||||||
|
"value": 0.5
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"unit": "req/s",
|
||||||
|
"custom": {
|
||||||
|
"displayMode": "auto"
|
||||||
|
},
|
||||||
|
"decimals": 3
|
||||||
|
},
|
||||||
|
"overrides": []
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"colorMode": "value",
|
||||||
|
"graphMode": "area",
|
||||||
|
"justifyMode": "center",
|
||||||
|
"reduceOptions": {
|
||||||
|
"calcs": [
|
||||||
|
"lastNotNull"
|
||||||
|
],
|
||||||
|
"fields": "",
|
||||||
|
"values": false
|
||||||
|
},
|
||||||
|
"textMode": "value"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 10,
|
||||||
|
"type": "stat",
|
||||||
|
"title": "API Server P99 latency",
|
||||||
|
"datasource": {
|
||||||
|
"type": "prometheus",
|
||||||
|
"uid": "atlas-vm"
|
||||||
|
},
|
||||||
|
"gridPos": {
|
||||||
|
"h": 4,
|
||||||
|
"w": 8,
|
||||||
|
"x": 8,
|
||||||
|
"y": 4
|
||||||
|
},
|
||||||
|
"targets": [
|
||||||
|
{
|
||||||
|
"expr": "histogram_quantile(0.99, sum by (le) (rate(apiserver_request_duration_seconds_bucket[5m]))) * 1000",
|
||||||
|
"refId": "A"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"fieldConfig": {
|
||||||
|
"defaults": {
|
||||||
|
"color": {
|
||||||
|
"mode": "thresholds"
|
||||||
|
},
|
||||||
|
"mappings": [],
|
||||||
|
"thresholds": {
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{
|
||||||
|
"color": "green",
|
||||||
|
"value": null
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "yellow",
|
||||||
|
"value": 250
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 400
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "red",
|
||||||
|
"value": 600
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"unit": "ms",
|
||||||
|
"custom": {
|
||||||
|
"displayMode": "auto"
|
||||||
|
},
|
||||||
|
"decimals": 1
|
||||||
|
},
|
||||||
|
"overrides": []
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"colorMode": "value",
|
||||||
|
"graphMode": "area",
|
||||||
|
"justifyMode": "center",
|
||||||
|
"reduceOptions": {
|
||||||
|
"calcs": [
|
||||||
|
"lastNotNull"
|
||||||
|
],
|
||||||
|
"fields": "",
|
||||||
|
"values": false
|
||||||
|
},
|
||||||
|
"textMode": "value"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 11,
|
||||||
|
"type": "stat",
|
||||||
|
"title": "etcd P99 latency",
|
||||||
|
"datasource": {
|
||||||
|
"type": "prometheus",
|
||||||
|
"uid": "atlas-vm"
|
||||||
|
},
|
||||||
|
"gridPos": {
|
||||||
|
"h": 4,
|
||||||
|
"w": 8,
|
||||||
|
"x": 16,
|
||||||
|
"y": 4
|
||||||
|
},
|
||||||
|
"targets": [
|
||||||
|
{
|
||||||
|
"expr": "histogram_quantile(0.99, sum by (le) (rate(etcd_request_duration_seconds_bucket[5m]))) * 1000",
|
||||||
|
"refId": "A"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"fieldConfig": {
|
||||||
|
"defaults": {
|
||||||
|
"color": {
|
||||||
|
"mode": "thresholds"
|
||||||
|
},
|
||||||
|
"mappings": [],
|
||||||
|
"thresholds": {
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{
|
||||||
|
"color": "green",
|
||||||
|
"value": null
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "yellow",
|
||||||
|
"value": 50
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 100
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "red",
|
||||||
|
"value": 200
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"unit": "ms",
|
||||||
|
"custom": {
|
||||||
|
"displayMode": "auto"
|
||||||
|
},
|
||||||
|
"decimals": 1
|
||||||
|
},
|
||||||
|
"overrides": []
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"colorMode": "value",
|
||||||
|
"graphMode": "area",
|
||||||
|
"justifyMode": "center",
|
||||||
|
"reduceOptions": {
|
||||||
|
"calcs": [
|
||||||
|
"lastNotNull"
|
||||||
|
],
|
||||||
|
"fields": "",
|
||||||
|
"values": false
|
||||||
|
},
|
||||||
|
"textMode": "value"
|
||||||
|
}
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"id": 4,
|
"id": 4,
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
@ -198,7 +405,7 @@
|
|||||||
"h": 9,
|
"h": 9,
|
||||||
"w": 24,
|
"w": 24,
|
||||||
"x": 0,
|
"x": 0,
|
||||||
"y": 4
|
"y": 8
|
||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
@ -238,7 +445,7 @@
|
|||||||
"h": 9,
|
"h": 9,
|
||||||
"w": 24,
|
"w": 24,
|
||||||
"x": 0,
|
"x": 0,
|
||||||
"y": 13
|
"y": 17
|
||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
@ -278,7 +485,7 @@
|
|||||||
"h": 9,
|
"h": 9,
|
||||||
"w": 12,
|
"w": 12,
|
||||||
"x": 0,
|
"x": 0,
|
||||||
"y": 22
|
"y": 26
|
||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
@ -315,7 +522,7 @@
|
|||||||
"h": 9,
|
"h": 9,
|
||||||
"w": 12,
|
"w": 12,
|
||||||
"x": 12,
|
"x": 12,
|
||||||
"y": 22
|
"y": 26
|
||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
@ -352,7 +559,7 @@
|
|||||||
"h": 9,
|
"h": 9,
|
||||||
"w": 24,
|
"w": 24,
|
||||||
"x": 0,
|
"x": 0,
|
||||||
"y": 31
|
"y": 35
|
||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
|
|||||||
@ -7,67 +7,6 @@
|
|||||||
"list": []
|
"list": []
|
||||||
},
|
},
|
||||||
"panels": [
|
"panels": [
|
||||||
{
|
|
||||||
"id": 1,
|
|
||||||
"type": "gauge",
|
|
||||||
"title": "Workers Ready",
|
|
||||||
"datasource": {
|
|
||||||
"type": "prometheus",
|
|
||||||
"uid": "atlas-vm"
|
|
||||||
},
|
|
||||||
"gridPos": {
|
|
||||||
"h": 5,
|
|
||||||
"w": 5,
|
|
||||||
"x": 0,
|
|
||||||
"y": 0
|
|
||||||
},
|
|
||||||
"targets": [
|
|
||||||
{
|
|
||||||
"expr": "sum(kube_node_status_condition{condition=\"Ready\",status=\"true\",node=~\"titan-04|titan-05|titan-06|titan-07|titan-08|titan-09|titan-10|titan-11|titan-12|titan-13|titan-14|titan-15|titan-16|titan-17|titan-18|titan-19|titan-22|titan-24\"})",
|
|
||||||
"refId": "A"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"fieldConfig": {
|
|
||||||
"defaults": {
|
|
||||||
"min": 0,
|
|
||||||
"max": 18,
|
|
||||||
"thresholds": {
|
|
||||||
"mode": "absolute",
|
|
||||||
"steps": [
|
|
||||||
{
|
|
||||||
"color": "red",
|
|
||||||
"value": null
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"color": "orange",
|
|
||||||
"value": 16
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"color": "yellow",
|
|
||||||
"value": 17
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"color": "green",
|
|
||||||
"value": 18
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"overrides": []
|
|
||||||
},
|
|
||||||
"options": {
|
|
||||||
"reduceOptions": {
|
|
||||||
"calcs": [
|
|
||||||
"lastNotNull"
|
|
||||||
],
|
|
||||||
"fields": "",
|
|
||||||
"values": false
|
|
||||||
},
|
|
||||||
"orientation": "auto",
|
|
||||||
"showThresholdMarkers": false,
|
|
||||||
"showThresholdLabels": false
|
|
||||||
}
|
|
||||||
},
|
|
||||||
{
|
{
|
||||||
"id": 2,
|
"id": 2,
|
||||||
"type": "gauge",
|
"type": "gauge",
|
||||||
@ -78,8 +17,8 @@
|
|||||||
},
|
},
|
||||||
"gridPos": {
|
"gridPos": {
|
||||||
"h": 5,
|
"h": 5,
|
||||||
"w": 5,
|
"w": 4,
|
||||||
"x": 5,
|
"x": 0,
|
||||||
"y": 0
|
"y": 0
|
||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
@ -131,8 +70,8 @@
|
|||||||
},
|
},
|
||||||
"gridPos": {
|
"gridPos": {
|
||||||
"h": 5,
|
"h": 5,
|
||||||
"w": 5,
|
"w": 3,
|
||||||
"x": 10,
|
"x": 4,
|
||||||
"y": 0
|
"y": 0
|
||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
@ -144,82 +83,7 @@
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
|
||||||
"mappings": [],
|
|
||||||
"thresholds": {
|
|
||||||
"mode": "absolute",
|
|
||||||
"steps": [
|
|
||||||
{
|
|
||||||
"color": "green",
|
|
||||||
"value": null
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"color": "yellow",
|
|
||||||
"value": 1
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"color": "orange",
|
|
||||||
"value": 2
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"color": "red",
|
|
||||||
"value": 3
|
|
||||||
}
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"unit": "none",
|
|
||||||
"custom": {
|
|
||||||
"displayMode": "auto"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"overrides": []
|
|
||||||
},
|
|
||||||
"options": {
|
|
||||||
"colorMode": "value",
|
|
||||||
"graphMode": "area",
|
|
||||||
"justifyMode": "center",
|
|
||||||
"reduceOptions": {
|
|
||||||
"calcs": [
|
|
||||||
"lastNotNull"
|
|
||||||
],
|
|
||||||
"fields": "",
|
|
||||||
"values": false
|
|
||||||
},
|
|
||||||
"textMode": "value"
|
|
||||||
},
|
|
||||||
"links": [
|
|
||||||
{
|
|
||||||
"title": "Open atlas-pods dashboard",
|
|
||||||
"url": "/d/atlas-pods",
|
|
||||||
"targetBlank": true
|
|
||||||
}
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"id": 4,
|
|
||||||
"type": "stat",
|
|
||||||
"title": "Problem Pods",
|
|
||||||
"datasource": {
|
|
||||||
"type": "prometheus",
|
|
||||||
"uid": "atlas-vm"
|
|
||||||
},
|
|
||||||
"gridPos": {
|
|
||||||
"h": 5,
|
|
||||||
"w": 5,
|
|
||||||
"x": 15,
|
|
||||||
"y": 0
|
|
||||||
},
|
|
||||||
"targets": [
|
|
||||||
{
|
|
||||||
"expr": "sum(max by (namespace,pod) (kube_pod_status_phase{phase!~\"Running|Succeeded\"}))",
|
|
||||||
"refId": "A"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"fieldConfig": {
|
|
||||||
"defaults": {
|
|
||||||
"color": {
|
|
||||||
"mode": "palette-classic"
|
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -281,20 +145,20 @@
|
|||||||
},
|
},
|
||||||
"gridPos": {
|
"gridPos": {
|
||||||
"h": 5,
|
"h": 5,
|
||||||
"w": 4,
|
"w": 3,
|
||||||
"x": 20,
|
"x": 7,
|
||||||
"y": 0
|
"y": 0
|
||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "sum(max by (namespace,pod) (((time() - kube_pod_deletion_timestamp{pod!=\"\"}) > bool 600) and on(namespace,pod) (kube_pod_deletion_timestamp{pod!=\"\"} > bool 0)))",
|
"expr": "sum(max by (namespace,pod) (((time() - kube_pod_deletion_timestamp{pod!=\"\"}) > bool 600) and on(namespace,pod) (kube_pod_deletion_timestamp{pod!=\"\"} > bool 0))) or on() vector(0)",
|
||||||
"refId": "A"
|
"refId": "A"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -346,6 +210,286 @@
|
|||||||
}
|
}
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"id": 27,
|
||||||
|
"type": "stat",
|
||||||
|
"title": "Atlas Availability (30d)",
|
||||||
|
"datasource": {
|
||||||
|
"type": "prometheus",
|
||||||
|
"uid": "atlas-vm"
|
||||||
|
},
|
||||||
|
"gridPos": {
|
||||||
|
"h": 5,
|
||||||
|
"w": 4,
|
||||||
|
"x": 10,
|
||||||
|
"y": 0
|
||||||
|
},
|
||||||
|
"targets": [
|
||||||
|
{
|
||||||
|
"expr": "avg_over_time((min(((sum(kube_node_status_condition{condition=\"Ready\",status=\"true\",node=~\"titan-0a|titan-0b|titan-0c\"}) / 3)), ((sum(kube_deployment_status_replicas_available{namespace=~\"traefik|kube-system\",deployment=\"traefik\"}) / clamp_min(sum(kube_deployment_spec_replicas{namespace=~\"traefik|kube-system\",deployment=\"traefik\"}), 1)))))[30d:5m])",
|
||||||
|
"refId": "A"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"fieldConfig": {
|
||||||
|
"defaults": {
|
||||||
|
"color": {
|
||||||
|
"mode": "thresholds"
|
||||||
|
},
|
||||||
|
"mappings": [],
|
||||||
|
"thresholds": {
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{
|
||||||
|
"color": "red",
|
||||||
|
"value": null
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 0.999
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "yellow",
|
||||||
|
"value": 0.9999
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "green",
|
||||||
|
"value": 0.99999
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"unit": "percentunit",
|
||||||
|
"custom": {
|
||||||
|
"displayMode": "auto"
|
||||||
|
},
|
||||||
|
"decimals": 3
|
||||||
|
},
|
||||||
|
"overrides": []
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"colorMode": "value",
|
||||||
|
"graphMode": "area",
|
||||||
|
"justifyMode": "center",
|
||||||
|
"reduceOptions": {
|
||||||
|
"calcs": [
|
||||||
|
"lastNotNull"
|
||||||
|
],
|
||||||
|
"fields": "",
|
||||||
|
"values": false
|
||||||
|
},
|
||||||
|
"textMode": "value"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 4,
|
||||||
|
"type": "stat",
|
||||||
|
"title": "Problem Pods",
|
||||||
|
"datasource": {
|
||||||
|
"type": "prometheus",
|
||||||
|
"uid": "atlas-vm"
|
||||||
|
},
|
||||||
|
"gridPos": {
|
||||||
|
"h": 5,
|
||||||
|
"w": 3,
|
||||||
|
"x": 14,
|
||||||
|
"y": 0
|
||||||
|
},
|
||||||
|
"targets": [
|
||||||
|
{
|
||||||
|
"expr": "sum(max by (namespace,pod) (kube_pod_status_phase{phase!~\"Running|Succeeded\"})) or on() vector(0)",
|
||||||
|
"refId": "A"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"fieldConfig": {
|
||||||
|
"defaults": {
|
||||||
|
"color": {
|
||||||
|
"mode": "thresholds"
|
||||||
|
},
|
||||||
|
"mappings": [],
|
||||||
|
"thresholds": {
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{
|
||||||
|
"color": "green",
|
||||||
|
"value": null
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "yellow",
|
||||||
|
"value": 1
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 2
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "red",
|
||||||
|
"value": 3
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"unit": "none",
|
||||||
|
"custom": {
|
||||||
|
"displayMode": "auto"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"overrides": []
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"colorMode": "value",
|
||||||
|
"graphMode": "area",
|
||||||
|
"justifyMode": "center",
|
||||||
|
"reduceOptions": {
|
||||||
|
"calcs": [
|
||||||
|
"lastNotNull"
|
||||||
|
],
|
||||||
|
"fields": "",
|
||||||
|
"values": false
|
||||||
|
},
|
||||||
|
"textMode": "value"
|
||||||
|
},
|
||||||
|
"links": [
|
||||||
|
{
|
||||||
|
"title": "Open atlas-pods dashboard",
|
||||||
|
"url": "/d/atlas-pods",
|
||||||
|
"targetBlank": true
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 6,
|
||||||
|
"type": "stat",
|
||||||
|
"title": "CrashLoop / ImagePull",
|
||||||
|
"datasource": {
|
||||||
|
"type": "prometheus",
|
||||||
|
"uid": "atlas-vm"
|
||||||
|
},
|
||||||
|
"gridPos": {
|
||||||
|
"h": 5,
|
||||||
|
"w": 3,
|
||||||
|
"x": 17,
|
||||||
|
"y": 0
|
||||||
|
},
|
||||||
|
"targets": [
|
||||||
|
{
|
||||||
|
"expr": "sum(max by (namespace,pod) (kube_pod_container_status_waiting_reason{reason=~\"CrashLoopBackOff|ImagePullBackOff\"})) or on() vector(0)",
|
||||||
|
"refId": "A"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"fieldConfig": {
|
||||||
|
"defaults": {
|
||||||
|
"color": {
|
||||||
|
"mode": "thresholds"
|
||||||
|
},
|
||||||
|
"mappings": [],
|
||||||
|
"thresholds": {
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{
|
||||||
|
"color": "green",
|
||||||
|
"value": null
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "yellow",
|
||||||
|
"value": 1
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 2
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "red",
|
||||||
|
"value": 3
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"unit": "none",
|
||||||
|
"custom": {
|
||||||
|
"displayMode": "auto"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"overrides": []
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"colorMode": "value",
|
||||||
|
"graphMode": "area",
|
||||||
|
"justifyMode": "center",
|
||||||
|
"reduceOptions": {
|
||||||
|
"calcs": [
|
||||||
|
"lastNotNull"
|
||||||
|
],
|
||||||
|
"fields": "",
|
||||||
|
"values": false
|
||||||
|
},
|
||||||
|
"textMode": "value"
|
||||||
|
},
|
||||||
|
"links": [
|
||||||
|
{
|
||||||
|
"title": "Open atlas-pods dashboard",
|
||||||
|
"url": "/d/atlas-pods",
|
||||||
|
"targetBlank": true
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 1,
|
||||||
|
"type": "gauge",
|
||||||
|
"title": "Workers Ready",
|
||||||
|
"datasource": {
|
||||||
|
"type": "prometheus",
|
||||||
|
"uid": "atlas-vm"
|
||||||
|
},
|
||||||
|
"gridPos": {
|
||||||
|
"h": 5,
|
||||||
|
"w": 4,
|
||||||
|
"x": 20,
|
||||||
|
"y": 0
|
||||||
|
},
|
||||||
|
"targets": [
|
||||||
|
{
|
||||||
|
"expr": "sum(kube_node_status_condition{condition=\"Ready\",status=\"true\",node=~\"titan-04|titan-05|titan-06|titan-07|titan-08|titan-09|titan-10|titan-11|titan-12|titan-13|titan-14|titan-15|titan-16|titan-17|titan-18|titan-19|titan-22|titan-24\"})",
|
||||||
|
"refId": "A"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"fieldConfig": {
|
||||||
|
"defaults": {
|
||||||
|
"min": 0,
|
||||||
|
"max": 18,
|
||||||
|
"thresholds": {
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{
|
||||||
|
"color": "red",
|
||||||
|
"value": null
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 16
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "yellow",
|
||||||
|
"value": 17
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "green",
|
||||||
|
"value": 18
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"overrides": []
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"reduceOptions": {
|
||||||
|
"calcs": [
|
||||||
|
"lastNotNull"
|
||||||
|
],
|
||||||
|
"fields": "",
|
||||||
|
"values": false
|
||||||
|
},
|
||||||
|
"orientation": "auto",
|
||||||
|
"showThresholdMarkers": false,
|
||||||
|
"showThresholdLabels": false
|
||||||
|
}
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"id": 7,
|
"id": 7,
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
@ -371,11 +515,11 @@
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
"mode": "percentage",
|
"mode": "absolute",
|
||||||
"steps": [
|
"steps": [
|
||||||
{
|
{
|
||||||
"color": "green",
|
"color": "green",
|
||||||
@ -383,11 +527,15 @@
|
|||||||
},
|
},
|
||||||
{
|
{
|
||||||
"color": "yellow",
|
"color": "yellow",
|
||||||
"value": 70
|
"value": 50
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 75
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"color": "red",
|
"color": "red",
|
||||||
"value": 85
|
"value": 91.5
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@ -444,11 +592,11 @@
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
"mode": "percentage",
|
"mode": "absolute",
|
||||||
"steps": [
|
"steps": [
|
||||||
{
|
{
|
||||||
"color": "green",
|
"color": "green",
|
||||||
@ -456,11 +604,15 @@
|
|||||||
},
|
},
|
||||||
{
|
{
|
||||||
"color": "yellow",
|
"color": "yellow",
|
||||||
"value": 70
|
"value": 50
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 75
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"color": "red",
|
"color": "red",
|
||||||
"value": 85
|
"value": 91.5
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@ -517,7 +669,7 @@
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -586,7 +738,7 @@
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -653,11 +805,11 @@
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
"mode": "percentage",
|
"mode": "absolute",
|
||||||
"steps": [
|
"steps": [
|
||||||
{
|
{
|
||||||
"color": "green",
|
"color": "green",
|
||||||
@ -665,11 +817,15 @@
|
|||||||
},
|
},
|
||||||
{
|
{
|
||||||
"color": "yellow",
|
"color": "yellow",
|
||||||
"value": 70
|
"value": 50
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 75
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"color": "red",
|
"color": "red",
|
||||||
"value": 85
|
"value": 91.5
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@ -724,11 +880,11 @@
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
"mode": "percentage",
|
"mode": "absolute",
|
||||||
"steps": [
|
"steps": [
|
||||||
{
|
{
|
||||||
"color": "green",
|
"color": "green",
|
||||||
@ -736,11 +892,15 @@
|
|||||||
},
|
},
|
||||||
{
|
{
|
||||||
"color": "yellow",
|
"color": "yellow",
|
||||||
"value": 70
|
"value": 50
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 75
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"color": "red",
|
"color": "red",
|
||||||
"value": 85
|
"value": 91.5
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@ -795,7 +955,7 @@
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -862,7 +1022,7 @@
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -942,9 +1102,7 @@
|
|||||||
"placement": "right"
|
"placement": "right"
|
||||||
},
|
},
|
||||||
"pieType": "pie",
|
"pieType": "pie",
|
||||||
"displayLabels": [
|
"displayLabels": [],
|
||||||
"percent"
|
|
||||||
],
|
|
||||||
"tooltip": {
|
"tooltip": {
|
||||||
"mode": "single"
|
"mode": "single"
|
||||||
},
|
},
|
||||||
@ -995,9 +1153,7 @@
|
|||||||
"placement": "right"
|
"placement": "right"
|
||||||
},
|
},
|
||||||
"pieType": "pie",
|
"pieType": "pie",
|
||||||
"displayLabels": [
|
"displayLabels": [],
|
||||||
"percent"
|
|
||||||
],
|
|
||||||
"tooltip": {
|
"tooltip": {
|
||||||
"mode": "single"
|
"mode": "single"
|
||||||
},
|
},
|
||||||
@ -1048,9 +1204,7 @@
|
|||||||
"placement": "right"
|
"placement": "right"
|
||||||
},
|
},
|
||||||
"pieType": "pie",
|
"pieType": "pie",
|
||||||
"displayLabels": [
|
"displayLabels": [],
|
||||||
"percent"
|
|
||||||
],
|
|
||||||
"tooltip": {
|
"tooltip": {
|
||||||
"mode": "single"
|
"mode": "single"
|
||||||
},
|
},
|
||||||
@ -1175,7 +1329,7 @@
|
|||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "(avg by (node) (((1 - avg by (instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m]))) * 100) * on(instance) group_left(node) label_replace(node_uname_info{nodename!=\"\"}, \"node\", \"$1\", \"nodename\", \"(.*)\"))) * on(node) group_left() label_replace(node_uname_info{nodename=~\"titan-0a|titan-0b|titan-0c\"}, \"node\", \"$1\", \"nodename\", \"(.*)\")",
|
"expr": "(avg by (node) (((1 - avg by (instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m]))) * 100) * on(instance) group_left(node) label_replace(node_uname_info{nodename!=\"\"}, \"node\", \"$1\", \"nodename\", \"(.*)\"))) * on(node) group_left() label_replace(node_uname_info{nodename=~\"titan-0a|titan-0b|titan-0c|titan-db\"}, \"node\", \"$1\", \"nodename\", \"(.*)\")",
|
||||||
"refId": "A",
|
"refId": "A",
|
||||||
"legendFormat": "{{node}}"
|
"legendFormat": "{{node}}"
|
||||||
}
|
}
|
||||||
@ -1212,7 +1366,7 @@
|
|||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "(avg by (node) ((avg by (instance) ((node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100)) * on(instance) group_left(node) label_replace(node_uname_info{nodename!=\"\"}, \"node\", \"$1\", \"nodename\", \"(.*)\"))) * on(node) group_left() label_replace(node_uname_info{nodename=~\"titan-0a|titan-0b|titan-0c\"}, \"node\", \"$1\", \"nodename\", \"(.*)\")",
|
"expr": "(avg by (node) ((avg by (instance) ((node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100)) * on(instance) group_left(node) label_replace(node_uname_info{nodename!=\"\"}, \"node\", \"$1\", \"nodename\", \"(.*)\"))) * on(node) group_left() label_replace(node_uname_info{nodename=~\"titan-0a|titan-0b|titan-0c|titan-db\"}, \"node\", \"$1\", \"nodename\", \"(.*)\")",
|
||||||
"refId": "A",
|
"refId": "A",
|
||||||
"legendFormat": "{{node}}"
|
"legendFormat": "{{node}}"
|
||||||
}
|
}
|
||||||
@ -1233,6 +1387,138 @@
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"id": 28,
|
||||||
|
"type": "piechart",
|
||||||
|
"title": "Node Pod Share",
|
||||||
|
"datasource": {
|
||||||
|
"type": "prometheus",
|
||||||
|
"uid": "atlas-vm"
|
||||||
|
},
|
||||||
|
"gridPos": {
|
||||||
|
"h": 10,
|
||||||
|
"w": 12,
|
||||||
|
"x": 0,
|
||||||
|
"y": 54
|
||||||
|
},
|
||||||
|
"targets": [
|
||||||
|
{
|
||||||
|
"expr": "(sum(kube_pod_info{pod!=\"\" , node!=\"\"}) by (node) / clamp_min(sum(kube_pod_info{pod!=\"\" , node!=\"\"}), 1)) * 100",
|
||||||
|
"refId": "A",
|
||||||
|
"legendFormat": "{{namespace}}"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"fieldConfig": {
|
||||||
|
"defaults": {
|
||||||
|
"unit": "percent",
|
||||||
|
"color": {
|
||||||
|
"mode": "palette-classic"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"overrides": []
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"legend": {
|
||||||
|
"displayMode": "list",
|
||||||
|
"placement": "right"
|
||||||
|
},
|
||||||
|
"pieType": "pie",
|
||||||
|
"displayLabels": [],
|
||||||
|
"tooltip": {
|
||||||
|
"mode": "single"
|
||||||
|
},
|
||||||
|
"colorScheme": "interpolateSpectral",
|
||||||
|
"colorBy": "value",
|
||||||
|
"reduceOptions": {
|
||||||
|
"calcs": [
|
||||||
|
"lastNotNull"
|
||||||
|
],
|
||||||
|
"fields": "",
|
||||||
|
"values": false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 29,
|
||||||
|
"type": "bargauge",
|
||||||
|
"title": "Top Nodes by Pod Count",
|
||||||
|
"datasource": {
|
||||||
|
"type": "prometheus",
|
||||||
|
"uid": "atlas-vm"
|
||||||
|
},
|
||||||
|
"gridPos": {
|
||||||
|
"h": 10,
|
||||||
|
"w": 12,
|
||||||
|
"x": 12,
|
||||||
|
"y": 54
|
||||||
|
},
|
||||||
|
"targets": [
|
||||||
|
{
|
||||||
|
"expr": "topk(12, sum(kube_pod_info{pod!=\"\" , node!=\"\"}) by (node))",
|
||||||
|
"refId": "A",
|
||||||
|
"legendFormat": "{{node}}",
|
||||||
|
"instant": true
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"fieldConfig": {
|
||||||
|
"defaults": {
|
||||||
|
"unit": "none",
|
||||||
|
"min": 0,
|
||||||
|
"max": null,
|
||||||
|
"thresholds": {
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{
|
||||||
|
"color": "green",
|
||||||
|
"value": null
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "yellow",
|
||||||
|
"value": 50
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 75
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "red",
|
||||||
|
"value": 100
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"decimals": 0
|
||||||
|
},
|
||||||
|
"overrides": []
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"displayMode": "gradient",
|
||||||
|
"orientation": "horizontal",
|
||||||
|
"reduceOptions": {
|
||||||
|
"calcs": [
|
||||||
|
"lastNotNull"
|
||||||
|
],
|
||||||
|
"fields": "",
|
||||||
|
"values": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"transformations": [
|
||||||
|
{
|
||||||
|
"id": "sortBy",
|
||||||
|
"options": {
|
||||||
|
"fields": [
|
||||||
|
"Value"
|
||||||
|
],
|
||||||
|
"order": "desc"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "limit",
|
||||||
|
"options": {
|
||||||
|
"limit": 12
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"id": 18,
|
"id": 18,
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
@ -1377,7 +1663,7 @@
|
|||||||
"h": 16,
|
"h": 16,
|
||||||
"w": 12,
|
"w": 12,
|
||||||
"x": 0,
|
"x": 0,
|
||||||
"y": 54
|
"y": 64
|
||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
@ -1425,7 +1711,7 @@
|
|||||||
"h": 16,
|
"h": 16,
|
||||||
"w": 12,
|
"w": 12,
|
||||||
"x": 12,
|
"x": 12,
|
||||||
"y": 54
|
"y": 64
|
||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
@ -1452,11 +1738,11 @@
|
|||||||
},
|
},
|
||||||
{
|
{
|
||||||
"color": "orange",
|
"color": "orange",
|
||||||
"value": 70
|
"value": 75
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"color": "red",
|
"color": "red",
|
||||||
"value": 85
|
"value": 91.5
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
@ -1480,6 +1766,17 @@
|
|||||||
"url": "/d/atlas-storage",
|
"url": "/d/atlas-storage",
|
||||||
"targetBlank": true
|
"targetBlank": true
|
||||||
}
|
}
|
||||||
|
],
|
||||||
|
"transformations": [
|
||||||
|
{
|
||||||
|
"id": "sortBy",
|
||||||
|
"options": {
|
||||||
|
"fields": [
|
||||||
|
"Value"
|
||||||
|
],
|
||||||
|
"order": "desc"
|
||||||
|
}
|
||||||
|
}
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
@ -1497,36 +1794,5 @@
|
|||||||
"to": "now"
|
"to": "now"
|
||||||
},
|
},
|
||||||
"refresh": "1m",
|
"refresh": "1m",
|
||||||
"links": [
|
"links": []
|
||||||
{
|
|
||||||
"title": "Atlas Pods",
|
|
||||||
"type": "dashboard",
|
|
||||||
"dashboardUid": "atlas-pods",
|
|
||||||
"keepTime": false
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"title": "Atlas Nodes",
|
|
||||||
"type": "dashboard",
|
|
||||||
"dashboardUid": "atlas-nodes",
|
|
||||||
"keepTime": false
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"title": "Atlas Storage",
|
|
||||||
"type": "dashboard",
|
|
||||||
"dashboardUid": "atlas-storage",
|
|
||||||
"keepTime": false
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"title": "Atlas Network",
|
|
||||||
"type": "dashboard",
|
|
||||||
"dashboardUid": "atlas-network",
|
|
||||||
"keepTime": false
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"title": "Atlas GPU",
|
|
||||||
"type": "dashboard",
|
|
||||||
"dashboardUid": "atlas-gpu",
|
|
||||||
"keepTime": false
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
}
|
||||||
|
|||||||
@ -20,14 +20,14 @@
|
|||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "sum(max by (namespace,pod) (kube_pod_status_phase{phase!~\"Running|Succeeded\"}))",
|
"expr": "sum(max by (namespace,pod) (kube_pod_status_phase{phase!~\"Running|Succeeded\"})) or on() vector(0)",
|
||||||
"refId": "A"
|
"refId": "A"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -80,14 +80,14 @@
|
|||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "sum(max by (namespace,pod) (kube_pod_container_status_waiting_reason{reason=~\"CrashLoopBackOff|ImagePullBackOff\"}))",
|
"expr": "sum(max by (namespace,pod) (kube_pod_container_status_waiting_reason{reason=~\"CrashLoopBackOff|ImagePullBackOff\"})) or on() vector(0)",
|
||||||
"refId": "A"
|
"refId": "A"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -140,14 +140,14 @@
|
|||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "sum(max by (namespace,pod) (((time() - kube_pod_deletion_timestamp{pod!=\"\"}) > bool 600) and on(namespace,pod) (kube_pod_deletion_timestamp{pod!=\"\"} > bool 0)))",
|
"expr": "sum(max by (namespace,pod) (((time() - kube_pod_deletion_timestamp{pod!=\"\"}) > bool 600) and on(namespace,pod) (kube_pod_deletion_timestamp{pod!=\"\"} > bool 0))) or on() vector(0)",
|
||||||
"refId": "A"
|
"refId": "A"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -207,7 +207,7 @@
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -266,12 +266,16 @@
|
|||||||
],
|
],
|
||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"unit": "s"
|
"unit": "s",
|
||||||
|
"custom": {
|
||||||
|
"filterable": true
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"overrides": []
|
"overrides": []
|
||||||
},
|
},
|
||||||
"options": {
|
"options": {
|
||||||
"showHeader": true
|
"showHeader": true,
|
||||||
|
"columnFilters": false
|
||||||
},
|
},
|
||||||
"transformations": [
|
"transformations": [
|
||||||
{
|
{
|
||||||
@ -302,12 +306,16 @@
|
|||||||
],
|
],
|
||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"unit": "s"
|
"unit": "s",
|
||||||
|
"custom": {
|
||||||
|
"filterable": true
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"overrides": []
|
"overrides": []
|
||||||
},
|
},
|
||||||
"options": {
|
"options": {
|
||||||
"showHeader": true
|
"showHeader": true,
|
||||||
|
"columnFilters": false
|
||||||
},
|
},
|
||||||
"transformations": [
|
"transformations": [
|
||||||
{
|
{
|
||||||
@ -338,12 +346,16 @@
|
|||||||
],
|
],
|
||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"unit": "s"
|
"unit": "s",
|
||||||
|
"custom": {
|
||||||
|
"filterable": true
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"overrides": []
|
"overrides": []
|
||||||
},
|
},
|
||||||
"options": {
|
"options": {
|
||||||
"showHeader": true
|
"showHeader": true,
|
||||||
|
"columnFilters": false
|
||||||
},
|
},
|
||||||
"transformations": [
|
"transformations": [
|
||||||
{
|
{
|
||||||
@ -359,6 +371,233 @@
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 8,
|
||||||
|
"type": "piechart",
|
||||||
|
"title": "Node Pod Share",
|
||||||
|
"datasource": {
|
||||||
|
"type": "prometheus",
|
||||||
|
"uid": "atlas-vm"
|
||||||
|
},
|
||||||
|
"gridPos": {
|
||||||
|
"h": 8,
|
||||||
|
"w": 12,
|
||||||
|
"x": 12,
|
||||||
|
"y": 34
|
||||||
|
},
|
||||||
|
"targets": [
|
||||||
|
{
|
||||||
|
"expr": "(sum(kube_pod_info{pod!=\"\" , node!=\"\"}) by (node) / clamp_min(sum(kube_pod_info{pod!=\"\" , node!=\"\"}), 1)) * 100",
|
||||||
|
"refId": "A",
|
||||||
|
"legendFormat": "{{namespace}}"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"fieldConfig": {
|
||||||
|
"defaults": {
|
||||||
|
"unit": "percent",
|
||||||
|
"color": {
|
||||||
|
"mode": "palette-classic"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"overrides": []
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"legend": {
|
||||||
|
"displayMode": "list",
|
||||||
|
"placement": "right"
|
||||||
|
},
|
||||||
|
"pieType": "pie",
|
||||||
|
"displayLabels": [],
|
||||||
|
"tooltip": {
|
||||||
|
"mode": "single"
|
||||||
|
},
|
||||||
|
"colorScheme": "interpolateSpectral",
|
||||||
|
"colorBy": "value",
|
||||||
|
"reduceOptions": {
|
||||||
|
"calcs": [
|
||||||
|
"lastNotNull"
|
||||||
|
],
|
||||||
|
"fields": "",
|
||||||
|
"values": false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 9,
|
||||||
|
"type": "bargauge",
|
||||||
|
"title": "Top Nodes by Pod Count",
|
||||||
|
"datasource": {
|
||||||
|
"type": "prometheus",
|
||||||
|
"uid": "atlas-vm"
|
||||||
|
},
|
||||||
|
"gridPos": {
|
||||||
|
"h": 8,
|
||||||
|
"w": 12,
|
||||||
|
"x": 0,
|
||||||
|
"y": 34
|
||||||
|
},
|
||||||
|
"targets": [
|
||||||
|
{
|
||||||
|
"expr": "topk(12, sum(kube_pod_info{pod!=\"\" , node!=\"\"}) by (node))",
|
||||||
|
"refId": "A",
|
||||||
|
"legendFormat": "{{node}}",
|
||||||
|
"instant": true
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"fieldConfig": {
|
||||||
|
"defaults": {
|
||||||
|
"unit": "none",
|
||||||
|
"min": 0,
|
||||||
|
"max": null,
|
||||||
|
"thresholds": {
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{
|
||||||
|
"color": "green",
|
||||||
|
"value": null
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "yellow",
|
||||||
|
"value": 50
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 75
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "red",
|
||||||
|
"value": 100
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"decimals": 0
|
||||||
|
},
|
||||||
|
"overrides": []
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"displayMode": "gradient",
|
||||||
|
"orientation": "horizontal",
|
||||||
|
"reduceOptions": {
|
||||||
|
"calcs": [
|
||||||
|
"lastNotNull"
|
||||||
|
],
|
||||||
|
"fields": "",
|
||||||
|
"values": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"transformations": [
|
||||||
|
{
|
||||||
|
"id": "sortBy",
|
||||||
|
"options": {
|
||||||
|
"fields": [
|
||||||
|
"Value"
|
||||||
|
],
|
||||||
|
"order": "desc"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "limit",
|
||||||
|
"options": {
|
||||||
|
"limit": 12
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 10,
|
||||||
|
"type": "table",
|
||||||
|
"title": "Namespace Plurality by Node v27",
|
||||||
|
"datasource": {
|
||||||
|
"type": "prometheus",
|
||||||
|
"uid": "atlas-vm"
|
||||||
|
},
|
||||||
|
"gridPos": {
|
||||||
|
"h": 8,
|
||||||
|
"w": 24,
|
||||||
|
"x": 0,
|
||||||
|
"y": 42
|
||||||
|
},
|
||||||
|
"targets": [
|
||||||
|
{
|
||||||
|
"expr": "(sum by (namespace,node) (kube_pod_info{pod!=\"\" , node!=\"\"}) / on(namespace) group_left() clamp_min(sum by (namespace) (kube_pod_info{pod!=\"\"}), 1) * 100) * on(namespace,node) group_left() ((sum by (namespace,node) (kube_pod_info{pod!=\"\" , node!=\"\"}) / on(namespace) group_left() clamp_min(sum by (namespace) (kube_pod_info{pod!=\"\"}), 1) * 100) + on(node) group_left() ((sum by (node) (kube_node_info{node=\"titan-0a\"}) * 0 + 0.001) or (sum by (node) (kube_node_info{node=\"titan-0b\"}) * 0 + 0.002) or (sum by (node) (kube_node_info{node=\"titan-0c\"}) * 0 + 0.003) or (sum by (node) (kube_node_info{node=\"titan-db\"}) * 0 + 0.004) or (sum by (node) (kube_node_info{node=\"titan-04\"}) * 0 + 0.005) or (sum by (node) (kube_node_info{node=\"titan-05\"}) * 0 + 0.006) or (sum by (node) (kube_node_info{node=\"titan-06\"}) * 0 + 0.007) or (sum by (node) (kube_node_info{node=\"titan-07\"}) * 0 + 0.008) or (sum by (node) (kube_node_info{node=\"titan-08\"}) * 0 + 0.009000000000000001) or (sum by (node) (kube_node_info{node=\"titan-09\"}) * 0 + 0.01) or (sum by (node) (kube_node_info{node=\"titan-10\"}) * 0 + 0.011) or (sum by (node) (kube_node_info{node=\"titan-11\"}) * 0 + 0.012) or (sum by (node) (kube_node_info{node=\"titan-12\"}) * 0 + 0.013000000000000001) or (sum by (node) (kube_node_info{node=\"titan-13\"}) * 0 + 0.014) or (sum by (node) (kube_node_info{node=\"titan-14\"}) * 0 + 0.015) or (sum by (node) (kube_node_info{node=\"titan-15\"}) * 0 + 0.016) or (sum by (node) (kube_node_info{node=\"titan-16\"}) * 0 + 0.017) or (sum by (node) (kube_node_info{node=\"titan-17\"}) * 0 + 0.018000000000000002) or (sum by (node) (kube_node_info{node=\"titan-18\"}) * 0 + 0.019) or (sum by (node) (kube_node_info{node=\"titan-19\"}) * 0 + 0.02) or (sum by (node) (kube_node_info{node=\"titan-22\"}) * 0 + 0.021) or (sum by (node) (kube_node_info{node=\"titan-24\"}) * 0 + 0.022)) == bool on(namespace) group_left() (max by (namespace) ((sum by (namespace,node) (kube_pod_info{pod!=\"\" , node!=\"\"}) / on(namespace) group_left() clamp_min(sum by (namespace) (kube_pod_info{pod!=\"\"}), 1) * 100) + on(node) group_left() ((sum by (node) (kube_node_info{node=\"titan-0a\"}) * 0 + 0.001) or (sum by (node) (kube_node_info{node=\"titan-0b\"}) * 0 + 0.002) or (sum by (node) (kube_node_info{node=\"titan-0c\"}) * 0 + 0.003) or (sum by (node) (kube_node_info{node=\"titan-db\"}) * 0 + 0.004) or (sum by (node) (kube_node_info{node=\"titan-04\"}) * 0 + 0.005) or (sum by (node) (kube_node_info{node=\"titan-05\"}) * 0 + 0.006) or (sum by (node) (kube_node_info{node=\"titan-06\"}) * 0 + 0.007) or (sum by (node) (kube_node_info{node=\"titan-07\"}) * 0 + 0.008) or (sum by (node) (kube_node_info{node=\"titan-08\"}) * 0 + 0.009000000000000001) or (sum by (node) (kube_node_info{node=\"titan-09\"}) * 0 + 0.01) or (sum by (node) (kube_node_info{node=\"titan-10\"}) * 0 + 0.011) or (sum by (node) (kube_node_info{node=\"titan-11\"}) * 0 + 0.012) or (sum by (node) (kube_node_info{node=\"titan-12\"}) * 0 + 0.013000000000000001) or (sum by (node) (kube_node_info{node=\"titan-13\"}) * 0 + 0.014) or (sum by (node) (kube_node_info{node=\"titan-14\"}) * 0 + 0.015) or (sum by (node) (kube_node_info{node=\"titan-15\"}) * 0 + 0.016) or (sum by (node) (kube_node_info{node=\"titan-16\"}) * 0 + 0.017) or (sum by (node) (kube_node_info{node=\"titan-17\"}) * 0 + 0.018000000000000002) or (sum by (node) (kube_node_info{node=\"titan-18\"}) * 0 + 0.019) or (sum by (node) (kube_node_info{node=\"titan-19\"}) * 0 + 0.02) or (sum by (node) (kube_node_info{node=\"titan-22\"}) * 0 + 0.021) or (sum by (node) (kube_node_info{node=\"titan-24\"}) * 0 + 0.022)))))",
|
||||||
|
"refId": "A",
|
||||||
|
"instant": true,
|
||||||
|
"format": "table"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"fieldConfig": {
|
||||||
|
"defaults": {
|
||||||
|
"unit": "percent",
|
||||||
|
"custom": {
|
||||||
|
"filterable": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"overrides": []
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"showHeader": true,
|
||||||
|
"columnFilters": false,
|
||||||
|
"showColumnFilters": false,
|
||||||
|
"footer": {
|
||||||
|
"show": false,
|
||||||
|
"fields": "",
|
||||||
|
"calcs": []
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"transformations": [
|
||||||
|
{
|
||||||
|
"id": "labelsToFields",
|
||||||
|
"options": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "organize",
|
||||||
|
"options": {
|
||||||
|
"excludeByName": {
|
||||||
|
"Time": true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "filterByValue",
|
||||||
|
"options": {
|
||||||
|
"match": "Value",
|
||||||
|
"operator": "gt",
|
||||||
|
"value": 0
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "sortBy",
|
||||||
|
"options": {
|
||||||
|
"fields": [
|
||||||
|
"Value"
|
||||||
|
],
|
||||||
|
"order": "desc"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "groupBy",
|
||||||
|
"options": {
|
||||||
|
"fields": {
|
||||||
|
"namespace": {
|
||||||
|
"aggregations": [
|
||||||
|
{
|
||||||
|
"field": "Value",
|
||||||
|
"operation": "max"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"field": "node",
|
||||||
|
"operation": "first"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"rowBy": [
|
||||||
|
"namespace"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"time": {
|
"time": {
|
||||||
|
|||||||
@ -27,11 +27,11 @@
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
"mode": "percentage",
|
"mode": "absolute",
|
||||||
"steps": [
|
"steps": [
|
||||||
{
|
{
|
||||||
"color": "green",
|
"color": "green",
|
||||||
@ -39,11 +39,15 @@
|
|||||||
},
|
},
|
||||||
{
|
{
|
||||||
"color": "yellow",
|
"color": "yellow",
|
||||||
"value": 70
|
"value": 50
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 75
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"color": "red",
|
"color": "red",
|
||||||
"value": 85
|
"value": 91.5
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@ -91,11 +95,11 @@
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
"mode": "percentage",
|
"mode": "absolute",
|
||||||
"steps": [
|
"steps": [
|
||||||
{
|
{
|
||||||
"color": "green",
|
"color": "green",
|
||||||
@ -103,11 +107,15 @@
|
|||||||
},
|
},
|
||||||
{
|
{
|
||||||
"color": "yellow",
|
"color": "yellow",
|
||||||
"value": 70
|
"value": 50
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 75
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"color": "red",
|
"color": "red",
|
||||||
"value": 85
|
"value": 91.5
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@ -155,7 +163,7 @@
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -215,7 +223,7 @@
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
|
|||||||
@ -49,9 +49,7 @@ data:
|
|||||||
"placement": "right"
|
"placement": "right"
|
||||||
},
|
},
|
||||||
"pieType": "pie",
|
"pieType": "pie",
|
||||||
"displayLabels": [
|
"displayLabels": [],
|
||||||
"percent"
|
|
||||||
],
|
|
||||||
"tooltip": {
|
"tooltip": {
|
||||||
"mode": "single"
|
"mode": "single"
|
||||||
},
|
},
|
||||||
@ -162,12 +160,16 @@ data:
|
|||||||
],
|
],
|
||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"unit": "percent"
|
"unit": "percent",
|
||||||
|
"custom": {
|
||||||
|
"filterable": true
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"overrides": []
|
"overrides": []
|
||||||
},
|
},
|
||||||
"options": {
|
"options": {
|
||||||
"showHeader": true
|
"showHeader": true,
|
||||||
|
"columnFilters": false
|
||||||
},
|
},
|
||||||
"transformations": [
|
"transformations": [
|
||||||
{
|
{
|
||||||
|
|||||||
@ -16,46 +16,55 @@ data:
|
|||||||
{
|
{
|
||||||
"id": 1,
|
"id": 1,
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"title": "Ingress Traffic",
|
"title": "Ingress Success Rate (5m)",
|
||||||
"datasource": {
|
"datasource": {
|
||||||
"type": "prometheus",
|
"type": "prometheus",
|
||||||
"uid": "atlas-vm"
|
"uid": "atlas-vm"
|
||||||
},
|
},
|
||||||
"gridPos": {
|
"gridPos": {
|
||||||
"h": 4,
|
"h": 4,
|
||||||
"w": 8,
|
"w": 6,
|
||||||
"x": 0,
|
"x": 0,
|
||||||
"y": 0
|
"y": 0
|
||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "sum(rate(node_network_receive_bytes_total{device!~\"lo|cni.*|veth.*|flannel.*|docker.*|virbr.*|vxlan.*|wg.*\"}[5m])) or on() vector(0)",
|
"expr": "(sum(rate(traefik_entrypoint_requests_total{code!~\"5..\"}[5m]))) / clamp_min(sum(rate(traefik_entrypoint_requests_total[5m])), 1)",
|
||||||
"refId": "A"
|
"refId": "A"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
"mode": "absolute",
|
"mode": "absolute",
|
||||||
"steps": [
|
"steps": [
|
||||||
{
|
{
|
||||||
"color": "rgba(115, 115, 115, 1)",
|
"color": "red",
|
||||||
"value": null
|
"value": null
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 0.995
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "yellow",
|
||||||
|
"value": 0.999
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"color": "green",
|
"color": "green",
|
||||||
"value": 1
|
"value": 0.9995
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"unit": "Bps",
|
"unit": "percentunit",
|
||||||
"custom": {
|
"custom": {
|
||||||
"displayMode": "auto"
|
"displayMode": "auto"
|
||||||
}
|
},
|
||||||
|
"decimals": 2
|
||||||
},
|
},
|
||||||
"overrides": []
|
"overrides": []
|
||||||
},
|
},
|
||||||
@ -76,46 +85,55 @@ data:
|
|||||||
{
|
{
|
||||||
"id": 2,
|
"id": 2,
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"title": "Egress Traffic",
|
"title": "Error Budget Burn (1h)",
|
||||||
"datasource": {
|
"datasource": {
|
||||||
"type": "prometheus",
|
"type": "prometheus",
|
||||||
"uid": "atlas-vm"
|
"uid": "atlas-vm"
|
||||||
},
|
},
|
||||||
"gridPos": {
|
"gridPos": {
|
||||||
"h": 4,
|
"h": 4,
|
||||||
"w": 8,
|
"w": 6,
|
||||||
"x": 8,
|
"x": 6,
|
||||||
"y": 0
|
"y": 0
|
||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "sum(rate(node_network_transmit_bytes_total{device!~\"lo|cni.*|veth.*|flannel.*|docker.*|virbr.*|vxlan.*|wg.*\"}[5m])) or on() vector(0)",
|
"expr": "(1 - ((sum(rate(traefik_entrypoint_requests_total{code!~\"5..\"}[1h]))) / clamp_min(sum(rate(traefik_entrypoint_requests_total[1h])), 1))) / 0.0010000000000000009",
|
||||||
"refId": "A"
|
"refId": "A"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
"mode": "absolute",
|
"mode": "absolute",
|
||||||
"steps": [
|
"steps": [
|
||||||
{
|
{
|
||||||
"color": "rgba(115, 115, 115, 1)",
|
"color": "green",
|
||||||
"value": null
|
"value": null
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"color": "green",
|
"color": "yellow",
|
||||||
"value": 1
|
"value": 1
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 2
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "red",
|
||||||
|
"value": 4
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"unit": "Bps",
|
"unit": "none",
|
||||||
"custom": {
|
"custom": {
|
||||||
"displayMode": "auto"
|
"displayMode": "auto"
|
||||||
}
|
},
|
||||||
|
"decimals": 2
|
||||||
},
|
},
|
||||||
"overrides": []
|
"overrides": []
|
||||||
},
|
},
|
||||||
@ -136,7 +154,145 @@ data:
|
|||||||
{
|
{
|
||||||
"id": 3,
|
"id": 3,
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"title": "Intra-Cluster Traffic",
|
"title": "Error Budget Burn (6h)",
|
||||||
|
"datasource": {
|
||||||
|
"type": "prometheus",
|
||||||
|
"uid": "atlas-vm"
|
||||||
|
},
|
||||||
|
"gridPos": {
|
||||||
|
"h": 4,
|
||||||
|
"w": 6,
|
||||||
|
"x": 12,
|
||||||
|
"y": 0
|
||||||
|
},
|
||||||
|
"targets": [
|
||||||
|
{
|
||||||
|
"expr": "(1 - ((sum(rate(traefik_entrypoint_requests_total{code!~\"5..\"}[6h]))) / clamp_min(sum(rate(traefik_entrypoint_requests_total[6h])), 1))) / 0.0010000000000000009",
|
||||||
|
"refId": "A"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"fieldConfig": {
|
||||||
|
"defaults": {
|
||||||
|
"color": {
|
||||||
|
"mode": "thresholds"
|
||||||
|
},
|
||||||
|
"mappings": [],
|
||||||
|
"thresholds": {
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{
|
||||||
|
"color": "green",
|
||||||
|
"value": null
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "yellow",
|
||||||
|
"value": 1
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 2
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "red",
|
||||||
|
"value": 4
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"unit": "none",
|
||||||
|
"custom": {
|
||||||
|
"displayMode": "auto"
|
||||||
|
},
|
||||||
|
"decimals": 2
|
||||||
|
},
|
||||||
|
"overrides": []
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"colorMode": "value",
|
||||||
|
"graphMode": "area",
|
||||||
|
"justifyMode": "center",
|
||||||
|
"reduceOptions": {
|
||||||
|
"calcs": [
|
||||||
|
"lastNotNull"
|
||||||
|
],
|
||||||
|
"fields": "",
|
||||||
|
"values": false
|
||||||
|
},
|
||||||
|
"textMode": "value"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 4,
|
||||||
|
"type": "stat",
|
||||||
|
"title": "Edge P99 Latency (ms)",
|
||||||
|
"datasource": {
|
||||||
|
"type": "prometheus",
|
||||||
|
"uid": "atlas-vm"
|
||||||
|
},
|
||||||
|
"gridPos": {
|
||||||
|
"h": 4,
|
||||||
|
"w": 6,
|
||||||
|
"x": 18,
|
||||||
|
"y": 0
|
||||||
|
},
|
||||||
|
"targets": [
|
||||||
|
{
|
||||||
|
"expr": "histogram_quantile(0.99, sum by (le) (rate(traefik_entrypoint_request_duration_seconds_bucket[5m]))) * 1000",
|
||||||
|
"refId": "A"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"fieldConfig": {
|
||||||
|
"defaults": {
|
||||||
|
"color": {
|
||||||
|
"mode": "thresholds"
|
||||||
|
},
|
||||||
|
"mappings": [],
|
||||||
|
"thresholds": {
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{
|
||||||
|
"color": "green",
|
||||||
|
"value": null
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "yellow",
|
||||||
|
"value": 200
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 350
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "red",
|
||||||
|
"value": 500
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"unit": "ms",
|
||||||
|
"custom": {
|
||||||
|
"displayMode": "auto"
|
||||||
|
},
|
||||||
|
"decimals": 1
|
||||||
|
},
|
||||||
|
"overrides": []
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"colorMode": "value",
|
||||||
|
"graphMode": "area",
|
||||||
|
"justifyMode": "center",
|
||||||
|
"reduceOptions": {
|
||||||
|
"calcs": [
|
||||||
|
"lastNotNull"
|
||||||
|
],
|
||||||
|
"fields": "",
|
||||||
|
"values": false
|
||||||
|
},
|
||||||
|
"textMode": "value"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 5,
|
||||||
|
"type": "stat",
|
||||||
|
"title": "Ingress Traffic",
|
||||||
"datasource": {
|
"datasource": {
|
||||||
"type": "prometheus",
|
"type": "prometheus",
|
||||||
"uid": "atlas-vm"
|
"uid": "atlas-vm"
|
||||||
@ -144,19 +300,19 @@ data:
|
|||||||
"gridPos": {
|
"gridPos": {
|
||||||
"h": 4,
|
"h": 4,
|
||||||
"w": 8,
|
"w": 8,
|
||||||
"x": 16,
|
"x": 0,
|
||||||
"y": 0
|
"y": 4
|
||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "sum(rate(container_network_receive_bytes_total{namespace!=\"traefik\",pod!=\"\"}[5m]) + rate(container_network_transmit_bytes_total{namespace!=\"traefik\",pod!=\"\"}[5m])) or on() vector(0)",
|
"expr": "sum(rate(node_network_receive_bytes_total{device!~\"lo|cni.*|veth.*|flannel.*|docker.*|virbr.*|vxlan.*|wg.*\"}[5m])) or on() vector(0)",
|
||||||
"refId": "A"
|
"refId": "A"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -194,9 +350,9 @@ data:
|
|||||||
}
|
}
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"id": 4,
|
"id": 6,
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
"title": "Top Router req/s",
|
"title": "Egress Traffic",
|
||||||
"datasource": {
|
"datasource": {
|
||||||
"type": "prometheus",
|
"type": "prometheus",
|
||||||
"uid": "atlas-vm"
|
"uid": "atlas-vm"
|
||||||
@ -204,20 +360,19 @@ data:
|
|||||||
"gridPos": {
|
"gridPos": {
|
||||||
"h": 4,
|
"h": 4,
|
||||||
"w": 8,
|
"w": 8,
|
||||||
"x": 0,
|
"x": 8,
|
||||||
"y": 4
|
"y": 4
|
||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "topk(1, sum by (router) (rate(traefik_router_requests_total[5m])))",
|
"expr": "sum(rate(node_network_transmit_bytes_total{device!~\"lo|cni.*|veth.*|flannel.*|docker.*|virbr.*|vxlan.*|wg.*\"}[5m])) or on() vector(0)",
|
||||||
"refId": "A",
|
"refId": "A"
|
||||||
"legendFormat": "{{router}}"
|
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -233,7 +388,7 @@ data:
|
|||||||
}
|
}
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
"unit": "req/s",
|
"unit": "Bps",
|
||||||
"custom": {
|
"custom": {
|
||||||
"displayMode": "auto"
|
"displayMode": "auto"
|
||||||
}
|
}
|
||||||
@ -255,7 +410,67 @@ data:
|
|||||||
}
|
}
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"id": 5,
|
"id": 7,
|
||||||
|
"type": "stat",
|
||||||
|
"title": "Intra-Cluster Traffic",
|
||||||
|
"datasource": {
|
||||||
|
"type": "prometheus",
|
||||||
|
"uid": "atlas-vm"
|
||||||
|
},
|
||||||
|
"gridPos": {
|
||||||
|
"h": 4,
|
||||||
|
"w": 8,
|
||||||
|
"x": 16,
|
||||||
|
"y": 4
|
||||||
|
},
|
||||||
|
"targets": [
|
||||||
|
{
|
||||||
|
"expr": "sum(rate(container_network_receive_bytes_total{namespace!=\"traefik\",pod!=\"\"}[5m]) + rate(container_network_transmit_bytes_total{namespace!=\"traefik\",pod!=\"\"}[5m])) or on() vector(0)",
|
||||||
|
"refId": "A"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"fieldConfig": {
|
||||||
|
"defaults": {
|
||||||
|
"color": {
|
||||||
|
"mode": "thresholds"
|
||||||
|
},
|
||||||
|
"mappings": [],
|
||||||
|
"thresholds": {
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{
|
||||||
|
"color": "rgba(115, 115, 115, 1)",
|
||||||
|
"value": null
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "green",
|
||||||
|
"value": 1
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"unit": "Bps",
|
||||||
|
"custom": {
|
||||||
|
"displayMode": "auto"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"overrides": []
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"colorMode": "value",
|
||||||
|
"graphMode": "area",
|
||||||
|
"justifyMode": "center",
|
||||||
|
"reduceOptions": {
|
||||||
|
"calcs": [
|
||||||
|
"lastNotNull"
|
||||||
|
],
|
||||||
|
"fields": "",
|
||||||
|
"values": false
|
||||||
|
},
|
||||||
|
"textMode": "value"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 8,
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
"title": "Per-Node Throughput",
|
"title": "Per-Node Throughput",
|
||||||
"datasource": {
|
"datasource": {
|
||||||
@ -292,7 +507,7 @@ data:
|
|||||||
}
|
}
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"id": 6,
|
"id": 9,
|
||||||
"type": "table",
|
"type": "table",
|
||||||
"title": "Top Namespaces",
|
"title": "Top Namespaces",
|
||||||
"datasource": {
|
"datasource": {
|
||||||
@ -313,12 +528,16 @@ data:
|
|||||||
],
|
],
|
||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"unit": "Bps"
|
"unit": "Bps",
|
||||||
|
"custom": {
|
||||||
|
"filterable": true
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"overrides": []
|
"overrides": []
|
||||||
},
|
},
|
||||||
"options": {
|
"options": {
|
||||||
"showHeader": true
|
"showHeader": true,
|
||||||
|
"columnFilters": false
|
||||||
},
|
},
|
||||||
"transformations": [
|
"transformations": [
|
||||||
{
|
{
|
||||||
@ -328,7 +547,7 @@ data:
|
|||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"id": 7,
|
"id": 10,
|
||||||
"type": "table",
|
"type": "table",
|
||||||
"title": "Top Pods",
|
"title": "Top Pods",
|
||||||
"datasource": {
|
"datasource": {
|
||||||
@ -349,12 +568,16 @@ data:
|
|||||||
],
|
],
|
||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"unit": "Bps"
|
"unit": "Bps",
|
||||||
|
"custom": {
|
||||||
|
"filterable": true
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"overrides": []
|
"overrides": []
|
||||||
},
|
},
|
||||||
"options": {
|
"options": {
|
||||||
"showHeader": true
|
"showHeader": true,
|
||||||
|
"columnFilters": false
|
||||||
},
|
},
|
||||||
"transformations": [
|
"transformations": [
|
||||||
{
|
{
|
||||||
@ -364,7 +587,7 @@ data:
|
|||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"id": 8,
|
"id": 11,
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
"title": "Traefik Routers (req/s)",
|
"title": "Traefik Routers (req/s)",
|
||||||
"datasource": {
|
"datasource": {
|
||||||
@ -401,7 +624,7 @@ data:
|
|||||||
}
|
}
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"id": 9,
|
"id": 12,
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
"title": "Traefik Entrypoints (req/s)",
|
"title": "Traefik Entrypoints (req/s)",
|
||||||
"datasource": {
|
"datasource": {
|
||||||
|
|||||||
@ -36,7 +36,7 @@ data:
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -97,7 +97,7 @@ data:
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -158,7 +158,7 @@ data:
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -195,6 +195,213 @@ data:
|
|||||||
"textMode": "value"
|
"textMode": "value"
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"id": 9,
|
||||||
|
"type": "stat",
|
||||||
|
"title": "API Server 5xx rate",
|
||||||
|
"datasource": {
|
||||||
|
"type": "prometheus",
|
||||||
|
"uid": "atlas-vm"
|
||||||
|
},
|
||||||
|
"gridPos": {
|
||||||
|
"h": 4,
|
||||||
|
"w": 8,
|
||||||
|
"x": 0,
|
||||||
|
"y": 4
|
||||||
|
},
|
||||||
|
"targets": [
|
||||||
|
{
|
||||||
|
"expr": "sum(rate(apiserver_request_total{code=~\"5..\"}[5m]))",
|
||||||
|
"refId": "A"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"fieldConfig": {
|
||||||
|
"defaults": {
|
||||||
|
"color": {
|
||||||
|
"mode": "thresholds"
|
||||||
|
},
|
||||||
|
"mappings": [],
|
||||||
|
"thresholds": {
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{
|
||||||
|
"color": "green",
|
||||||
|
"value": null
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "yellow",
|
||||||
|
"value": 0.05
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 0.2
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "red",
|
||||||
|
"value": 0.5
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"unit": "req/s",
|
||||||
|
"custom": {
|
||||||
|
"displayMode": "auto"
|
||||||
|
},
|
||||||
|
"decimals": 3
|
||||||
|
},
|
||||||
|
"overrides": []
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"colorMode": "value",
|
||||||
|
"graphMode": "area",
|
||||||
|
"justifyMode": "center",
|
||||||
|
"reduceOptions": {
|
||||||
|
"calcs": [
|
||||||
|
"lastNotNull"
|
||||||
|
],
|
||||||
|
"fields": "",
|
||||||
|
"values": false
|
||||||
|
},
|
||||||
|
"textMode": "value"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 10,
|
||||||
|
"type": "stat",
|
||||||
|
"title": "API Server P99 latency",
|
||||||
|
"datasource": {
|
||||||
|
"type": "prometheus",
|
||||||
|
"uid": "atlas-vm"
|
||||||
|
},
|
||||||
|
"gridPos": {
|
||||||
|
"h": 4,
|
||||||
|
"w": 8,
|
||||||
|
"x": 8,
|
||||||
|
"y": 4
|
||||||
|
},
|
||||||
|
"targets": [
|
||||||
|
{
|
||||||
|
"expr": "histogram_quantile(0.99, sum by (le) (rate(apiserver_request_duration_seconds_bucket[5m]))) * 1000",
|
||||||
|
"refId": "A"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"fieldConfig": {
|
||||||
|
"defaults": {
|
||||||
|
"color": {
|
||||||
|
"mode": "thresholds"
|
||||||
|
},
|
||||||
|
"mappings": [],
|
||||||
|
"thresholds": {
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{
|
||||||
|
"color": "green",
|
||||||
|
"value": null
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "yellow",
|
||||||
|
"value": 250
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 400
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "red",
|
||||||
|
"value": 600
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"unit": "ms",
|
||||||
|
"custom": {
|
||||||
|
"displayMode": "auto"
|
||||||
|
},
|
||||||
|
"decimals": 1
|
||||||
|
},
|
||||||
|
"overrides": []
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"colorMode": "value",
|
||||||
|
"graphMode": "area",
|
||||||
|
"justifyMode": "center",
|
||||||
|
"reduceOptions": {
|
||||||
|
"calcs": [
|
||||||
|
"lastNotNull"
|
||||||
|
],
|
||||||
|
"fields": "",
|
||||||
|
"values": false
|
||||||
|
},
|
||||||
|
"textMode": "value"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 11,
|
||||||
|
"type": "stat",
|
||||||
|
"title": "etcd P99 latency",
|
||||||
|
"datasource": {
|
||||||
|
"type": "prometheus",
|
||||||
|
"uid": "atlas-vm"
|
||||||
|
},
|
||||||
|
"gridPos": {
|
||||||
|
"h": 4,
|
||||||
|
"w": 8,
|
||||||
|
"x": 16,
|
||||||
|
"y": 4
|
||||||
|
},
|
||||||
|
"targets": [
|
||||||
|
{
|
||||||
|
"expr": "histogram_quantile(0.99, sum by (le) (rate(etcd_request_duration_seconds_bucket[5m]))) * 1000",
|
||||||
|
"refId": "A"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"fieldConfig": {
|
||||||
|
"defaults": {
|
||||||
|
"color": {
|
||||||
|
"mode": "thresholds"
|
||||||
|
},
|
||||||
|
"mappings": [],
|
||||||
|
"thresholds": {
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{
|
||||||
|
"color": "green",
|
||||||
|
"value": null
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "yellow",
|
||||||
|
"value": 50
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 100
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "red",
|
||||||
|
"value": 200
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"unit": "ms",
|
||||||
|
"custom": {
|
||||||
|
"displayMode": "auto"
|
||||||
|
},
|
||||||
|
"decimals": 1
|
||||||
|
},
|
||||||
|
"overrides": []
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"colorMode": "value",
|
||||||
|
"graphMode": "area",
|
||||||
|
"justifyMode": "center",
|
||||||
|
"reduceOptions": {
|
||||||
|
"calcs": [
|
||||||
|
"lastNotNull"
|
||||||
|
],
|
||||||
|
"fields": "",
|
||||||
|
"values": false
|
||||||
|
},
|
||||||
|
"textMode": "value"
|
||||||
|
}
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"id": 4,
|
"id": 4,
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
@ -207,7 +414,7 @@ data:
|
|||||||
"h": 9,
|
"h": 9,
|
||||||
"w": 24,
|
"w": 24,
|
||||||
"x": 0,
|
"x": 0,
|
||||||
"y": 4
|
"y": 8
|
||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
@ -247,7 +454,7 @@ data:
|
|||||||
"h": 9,
|
"h": 9,
|
||||||
"w": 24,
|
"w": 24,
|
||||||
"x": 0,
|
"x": 0,
|
||||||
"y": 13
|
"y": 17
|
||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
@ -287,7 +494,7 @@ data:
|
|||||||
"h": 9,
|
"h": 9,
|
||||||
"w": 12,
|
"w": 12,
|
||||||
"x": 0,
|
"x": 0,
|
||||||
"y": 22
|
"y": 26
|
||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
@ -324,7 +531,7 @@ data:
|
|||||||
"h": 9,
|
"h": 9,
|
||||||
"w": 12,
|
"w": 12,
|
||||||
"x": 12,
|
"x": 12,
|
||||||
"y": 22
|
"y": 26
|
||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
@ -361,7 +568,7 @@ data:
|
|||||||
"h": 9,
|
"h": 9,
|
||||||
"w": 24,
|
"w": 24,
|
||||||
"x": 0,
|
"x": 0,
|
||||||
"y": 31
|
"y": 35
|
||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
|
|||||||
@ -16,67 +16,6 @@ data:
|
|||||||
"list": []
|
"list": []
|
||||||
},
|
},
|
||||||
"panels": [
|
"panels": [
|
||||||
{
|
|
||||||
"id": 1,
|
|
||||||
"type": "gauge",
|
|
||||||
"title": "Workers Ready",
|
|
||||||
"datasource": {
|
|
||||||
"type": "prometheus",
|
|
||||||
"uid": "atlas-vm"
|
|
||||||
},
|
|
||||||
"gridPos": {
|
|
||||||
"h": 5,
|
|
||||||
"w": 5,
|
|
||||||
"x": 0,
|
|
||||||
"y": 0
|
|
||||||
},
|
|
||||||
"targets": [
|
|
||||||
{
|
|
||||||
"expr": "sum(kube_node_status_condition{condition=\"Ready\",status=\"true\",node=~\"titan-04|titan-05|titan-06|titan-07|titan-08|titan-09|titan-10|titan-11|titan-12|titan-13|titan-14|titan-15|titan-16|titan-17|titan-18|titan-19|titan-22|titan-24\"})",
|
|
||||||
"refId": "A"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"fieldConfig": {
|
|
||||||
"defaults": {
|
|
||||||
"min": 0,
|
|
||||||
"max": 18,
|
|
||||||
"thresholds": {
|
|
||||||
"mode": "absolute",
|
|
||||||
"steps": [
|
|
||||||
{
|
|
||||||
"color": "red",
|
|
||||||
"value": null
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"color": "orange",
|
|
||||||
"value": 16
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"color": "yellow",
|
|
||||||
"value": 17
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"color": "green",
|
|
||||||
"value": 18
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"overrides": []
|
|
||||||
},
|
|
||||||
"options": {
|
|
||||||
"reduceOptions": {
|
|
||||||
"calcs": [
|
|
||||||
"lastNotNull"
|
|
||||||
],
|
|
||||||
"fields": "",
|
|
||||||
"values": false
|
|
||||||
},
|
|
||||||
"orientation": "auto",
|
|
||||||
"showThresholdMarkers": false,
|
|
||||||
"showThresholdLabels": false
|
|
||||||
}
|
|
||||||
},
|
|
||||||
{
|
{
|
||||||
"id": 2,
|
"id": 2,
|
||||||
"type": "gauge",
|
"type": "gauge",
|
||||||
@ -87,8 +26,8 @@ data:
|
|||||||
},
|
},
|
||||||
"gridPos": {
|
"gridPos": {
|
||||||
"h": 5,
|
"h": 5,
|
||||||
"w": 5,
|
"w": 4,
|
||||||
"x": 5,
|
"x": 0,
|
||||||
"y": 0
|
"y": 0
|
||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
@ -140,8 +79,8 @@ data:
|
|||||||
},
|
},
|
||||||
"gridPos": {
|
"gridPos": {
|
||||||
"h": 5,
|
"h": 5,
|
||||||
"w": 5,
|
"w": 3,
|
||||||
"x": 10,
|
"x": 4,
|
||||||
"y": 0
|
"y": 0
|
||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
@ -153,82 +92,7 @@ data:
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
|
||||||
"mappings": [],
|
|
||||||
"thresholds": {
|
|
||||||
"mode": "absolute",
|
|
||||||
"steps": [
|
|
||||||
{
|
|
||||||
"color": "green",
|
|
||||||
"value": null
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"color": "yellow",
|
|
||||||
"value": 1
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"color": "orange",
|
|
||||||
"value": 2
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"color": "red",
|
|
||||||
"value": 3
|
|
||||||
}
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"unit": "none",
|
|
||||||
"custom": {
|
|
||||||
"displayMode": "auto"
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"overrides": []
|
|
||||||
},
|
|
||||||
"options": {
|
|
||||||
"colorMode": "value",
|
|
||||||
"graphMode": "area",
|
|
||||||
"justifyMode": "center",
|
|
||||||
"reduceOptions": {
|
|
||||||
"calcs": [
|
|
||||||
"lastNotNull"
|
|
||||||
],
|
|
||||||
"fields": "",
|
|
||||||
"values": false
|
|
||||||
},
|
|
||||||
"textMode": "value"
|
|
||||||
},
|
|
||||||
"links": [
|
|
||||||
{
|
|
||||||
"title": "Open atlas-pods dashboard",
|
|
||||||
"url": "/d/atlas-pods",
|
|
||||||
"targetBlank": true
|
|
||||||
}
|
|
||||||
]
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"id": 4,
|
|
||||||
"type": "stat",
|
|
||||||
"title": "Problem Pods",
|
|
||||||
"datasource": {
|
|
||||||
"type": "prometheus",
|
|
||||||
"uid": "atlas-vm"
|
|
||||||
},
|
|
||||||
"gridPos": {
|
|
||||||
"h": 5,
|
|
||||||
"w": 5,
|
|
||||||
"x": 15,
|
|
||||||
"y": 0
|
|
||||||
},
|
|
||||||
"targets": [
|
|
||||||
{
|
|
||||||
"expr": "sum(max by (namespace,pod) (kube_pod_status_phase{phase!~\"Running|Succeeded\"}))",
|
|
||||||
"refId": "A"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"fieldConfig": {
|
|
||||||
"defaults": {
|
|
||||||
"color": {
|
|
||||||
"mode": "palette-classic"
|
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -290,20 +154,20 @@ data:
|
|||||||
},
|
},
|
||||||
"gridPos": {
|
"gridPos": {
|
||||||
"h": 5,
|
"h": 5,
|
||||||
"w": 4,
|
"w": 3,
|
||||||
"x": 20,
|
"x": 7,
|
||||||
"y": 0
|
"y": 0
|
||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "sum(max by (namespace,pod) (((time() - kube_pod_deletion_timestamp{pod!=\"\"}) > bool 600) and on(namespace,pod) (kube_pod_deletion_timestamp{pod!=\"\"} > bool 0)))",
|
"expr": "sum(max by (namespace,pod) (((time() - kube_pod_deletion_timestamp{pod!=\"\"}) > bool 600) and on(namespace,pod) (kube_pod_deletion_timestamp{pod!=\"\"} > bool 0))) or on() vector(0)",
|
||||||
"refId": "A"
|
"refId": "A"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -355,6 +219,286 @@ data:
|
|||||||
}
|
}
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"id": 27,
|
||||||
|
"type": "stat",
|
||||||
|
"title": "Atlas Availability (30d)",
|
||||||
|
"datasource": {
|
||||||
|
"type": "prometheus",
|
||||||
|
"uid": "atlas-vm"
|
||||||
|
},
|
||||||
|
"gridPos": {
|
||||||
|
"h": 5,
|
||||||
|
"w": 4,
|
||||||
|
"x": 10,
|
||||||
|
"y": 0
|
||||||
|
},
|
||||||
|
"targets": [
|
||||||
|
{
|
||||||
|
"expr": "avg_over_time((min(((sum(kube_node_status_condition{condition=\"Ready\",status=\"true\",node=~\"titan-0a|titan-0b|titan-0c\"}) / 3)), ((sum(kube_deployment_status_replicas_available{namespace=~\"traefik|kube-system\",deployment=\"traefik\"}) / clamp_min(sum(kube_deployment_spec_replicas{namespace=~\"traefik|kube-system\",deployment=\"traefik\"}), 1)))))[30d:5m])",
|
||||||
|
"refId": "A"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"fieldConfig": {
|
||||||
|
"defaults": {
|
||||||
|
"color": {
|
||||||
|
"mode": "thresholds"
|
||||||
|
},
|
||||||
|
"mappings": [],
|
||||||
|
"thresholds": {
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{
|
||||||
|
"color": "red",
|
||||||
|
"value": null
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 0.999
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "yellow",
|
||||||
|
"value": 0.9999
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "green",
|
||||||
|
"value": 0.99999
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"unit": "percentunit",
|
||||||
|
"custom": {
|
||||||
|
"displayMode": "auto"
|
||||||
|
},
|
||||||
|
"decimals": 3
|
||||||
|
},
|
||||||
|
"overrides": []
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"colorMode": "value",
|
||||||
|
"graphMode": "area",
|
||||||
|
"justifyMode": "center",
|
||||||
|
"reduceOptions": {
|
||||||
|
"calcs": [
|
||||||
|
"lastNotNull"
|
||||||
|
],
|
||||||
|
"fields": "",
|
||||||
|
"values": false
|
||||||
|
},
|
||||||
|
"textMode": "value"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 4,
|
||||||
|
"type": "stat",
|
||||||
|
"title": "Problem Pods",
|
||||||
|
"datasource": {
|
||||||
|
"type": "prometheus",
|
||||||
|
"uid": "atlas-vm"
|
||||||
|
},
|
||||||
|
"gridPos": {
|
||||||
|
"h": 5,
|
||||||
|
"w": 3,
|
||||||
|
"x": 14,
|
||||||
|
"y": 0
|
||||||
|
},
|
||||||
|
"targets": [
|
||||||
|
{
|
||||||
|
"expr": "sum(max by (namespace,pod) (kube_pod_status_phase{phase!~\"Running|Succeeded\"})) or on() vector(0)",
|
||||||
|
"refId": "A"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"fieldConfig": {
|
||||||
|
"defaults": {
|
||||||
|
"color": {
|
||||||
|
"mode": "thresholds"
|
||||||
|
},
|
||||||
|
"mappings": [],
|
||||||
|
"thresholds": {
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{
|
||||||
|
"color": "green",
|
||||||
|
"value": null
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "yellow",
|
||||||
|
"value": 1
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 2
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "red",
|
||||||
|
"value": 3
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"unit": "none",
|
||||||
|
"custom": {
|
||||||
|
"displayMode": "auto"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"overrides": []
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"colorMode": "value",
|
||||||
|
"graphMode": "area",
|
||||||
|
"justifyMode": "center",
|
||||||
|
"reduceOptions": {
|
||||||
|
"calcs": [
|
||||||
|
"lastNotNull"
|
||||||
|
],
|
||||||
|
"fields": "",
|
||||||
|
"values": false
|
||||||
|
},
|
||||||
|
"textMode": "value"
|
||||||
|
},
|
||||||
|
"links": [
|
||||||
|
{
|
||||||
|
"title": "Open atlas-pods dashboard",
|
||||||
|
"url": "/d/atlas-pods",
|
||||||
|
"targetBlank": true
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 6,
|
||||||
|
"type": "stat",
|
||||||
|
"title": "CrashLoop / ImagePull",
|
||||||
|
"datasource": {
|
||||||
|
"type": "prometheus",
|
||||||
|
"uid": "atlas-vm"
|
||||||
|
},
|
||||||
|
"gridPos": {
|
||||||
|
"h": 5,
|
||||||
|
"w": 3,
|
||||||
|
"x": 17,
|
||||||
|
"y": 0
|
||||||
|
},
|
||||||
|
"targets": [
|
||||||
|
{
|
||||||
|
"expr": "sum(max by (namespace,pod) (kube_pod_container_status_waiting_reason{reason=~\"CrashLoopBackOff|ImagePullBackOff\"})) or on() vector(0)",
|
||||||
|
"refId": "A"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"fieldConfig": {
|
||||||
|
"defaults": {
|
||||||
|
"color": {
|
||||||
|
"mode": "thresholds"
|
||||||
|
},
|
||||||
|
"mappings": [],
|
||||||
|
"thresholds": {
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{
|
||||||
|
"color": "green",
|
||||||
|
"value": null
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "yellow",
|
||||||
|
"value": 1
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 2
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "red",
|
||||||
|
"value": 3
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"unit": "none",
|
||||||
|
"custom": {
|
||||||
|
"displayMode": "auto"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"overrides": []
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"colorMode": "value",
|
||||||
|
"graphMode": "area",
|
||||||
|
"justifyMode": "center",
|
||||||
|
"reduceOptions": {
|
||||||
|
"calcs": [
|
||||||
|
"lastNotNull"
|
||||||
|
],
|
||||||
|
"fields": "",
|
||||||
|
"values": false
|
||||||
|
},
|
||||||
|
"textMode": "value"
|
||||||
|
},
|
||||||
|
"links": [
|
||||||
|
{
|
||||||
|
"title": "Open atlas-pods dashboard",
|
||||||
|
"url": "/d/atlas-pods",
|
||||||
|
"targetBlank": true
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 1,
|
||||||
|
"type": "gauge",
|
||||||
|
"title": "Workers Ready",
|
||||||
|
"datasource": {
|
||||||
|
"type": "prometheus",
|
||||||
|
"uid": "atlas-vm"
|
||||||
|
},
|
||||||
|
"gridPos": {
|
||||||
|
"h": 5,
|
||||||
|
"w": 4,
|
||||||
|
"x": 20,
|
||||||
|
"y": 0
|
||||||
|
},
|
||||||
|
"targets": [
|
||||||
|
{
|
||||||
|
"expr": "sum(kube_node_status_condition{condition=\"Ready\",status=\"true\",node=~\"titan-04|titan-05|titan-06|titan-07|titan-08|titan-09|titan-10|titan-11|titan-12|titan-13|titan-14|titan-15|titan-16|titan-17|titan-18|titan-19|titan-22|titan-24\"})",
|
||||||
|
"refId": "A"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"fieldConfig": {
|
||||||
|
"defaults": {
|
||||||
|
"min": 0,
|
||||||
|
"max": 18,
|
||||||
|
"thresholds": {
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{
|
||||||
|
"color": "red",
|
||||||
|
"value": null
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 16
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "yellow",
|
||||||
|
"value": 17
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "green",
|
||||||
|
"value": 18
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"overrides": []
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"reduceOptions": {
|
||||||
|
"calcs": [
|
||||||
|
"lastNotNull"
|
||||||
|
],
|
||||||
|
"fields": "",
|
||||||
|
"values": false
|
||||||
|
},
|
||||||
|
"orientation": "auto",
|
||||||
|
"showThresholdMarkers": false,
|
||||||
|
"showThresholdLabels": false
|
||||||
|
}
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"id": 7,
|
"id": 7,
|
||||||
"type": "stat",
|
"type": "stat",
|
||||||
@ -380,11 +524,11 @@ data:
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
"mode": "percentage",
|
"mode": "absolute",
|
||||||
"steps": [
|
"steps": [
|
||||||
{
|
{
|
||||||
"color": "green",
|
"color": "green",
|
||||||
@ -392,11 +536,15 @@ data:
|
|||||||
},
|
},
|
||||||
{
|
{
|
||||||
"color": "yellow",
|
"color": "yellow",
|
||||||
"value": 70
|
"value": 50
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 75
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"color": "red",
|
"color": "red",
|
||||||
"value": 85
|
"value": 91.5
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@ -453,11 +601,11 @@ data:
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
"mode": "percentage",
|
"mode": "absolute",
|
||||||
"steps": [
|
"steps": [
|
||||||
{
|
{
|
||||||
"color": "green",
|
"color": "green",
|
||||||
@ -465,11 +613,15 @@ data:
|
|||||||
},
|
},
|
||||||
{
|
{
|
||||||
"color": "yellow",
|
"color": "yellow",
|
||||||
"value": 70
|
"value": 50
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 75
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"color": "red",
|
"color": "red",
|
||||||
"value": 85
|
"value": 91.5
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@ -526,7 +678,7 @@ data:
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -595,7 +747,7 @@ data:
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -662,11 +814,11 @@ data:
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
"mode": "percentage",
|
"mode": "absolute",
|
||||||
"steps": [
|
"steps": [
|
||||||
{
|
{
|
||||||
"color": "green",
|
"color": "green",
|
||||||
@ -674,11 +826,15 @@ data:
|
|||||||
},
|
},
|
||||||
{
|
{
|
||||||
"color": "yellow",
|
"color": "yellow",
|
||||||
"value": 70
|
"value": 50
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 75
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"color": "red",
|
"color": "red",
|
||||||
"value": 85
|
"value": 91.5
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@ -733,11 +889,11 @@ data:
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
"mode": "percentage",
|
"mode": "absolute",
|
||||||
"steps": [
|
"steps": [
|
||||||
{
|
{
|
||||||
"color": "green",
|
"color": "green",
|
||||||
@ -745,11 +901,15 @@ data:
|
|||||||
},
|
},
|
||||||
{
|
{
|
||||||
"color": "yellow",
|
"color": "yellow",
|
||||||
"value": 70
|
"value": 50
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 75
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"color": "red",
|
"color": "red",
|
||||||
"value": 85
|
"value": 91.5
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@ -804,7 +964,7 @@ data:
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -871,7 +1031,7 @@ data:
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -951,9 +1111,7 @@ data:
|
|||||||
"placement": "right"
|
"placement": "right"
|
||||||
},
|
},
|
||||||
"pieType": "pie",
|
"pieType": "pie",
|
||||||
"displayLabels": [
|
"displayLabels": [],
|
||||||
"percent"
|
|
||||||
],
|
|
||||||
"tooltip": {
|
"tooltip": {
|
||||||
"mode": "single"
|
"mode": "single"
|
||||||
},
|
},
|
||||||
@ -1004,9 +1162,7 @@ data:
|
|||||||
"placement": "right"
|
"placement": "right"
|
||||||
},
|
},
|
||||||
"pieType": "pie",
|
"pieType": "pie",
|
||||||
"displayLabels": [
|
"displayLabels": [],
|
||||||
"percent"
|
|
||||||
],
|
|
||||||
"tooltip": {
|
"tooltip": {
|
||||||
"mode": "single"
|
"mode": "single"
|
||||||
},
|
},
|
||||||
@ -1057,9 +1213,7 @@ data:
|
|||||||
"placement": "right"
|
"placement": "right"
|
||||||
},
|
},
|
||||||
"pieType": "pie",
|
"pieType": "pie",
|
||||||
"displayLabels": [
|
"displayLabels": [],
|
||||||
"percent"
|
|
||||||
],
|
|
||||||
"tooltip": {
|
"tooltip": {
|
||||||
"mode": "single"
|
"mode": "single"
|
||||||
},
|
},
|
||||||
@ -1184,7 +1338,7 @@ data:
|
|||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "(avg by (node) (((1 - avg by (instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m]))) * 100) * on(instance) group_left(node) label_replace(node_uname_info{nodename!=\"\"}, \"node\", \"$1\", \"nodename\", \"(.*)\"))) * on(node) group_left() label_replace(node_uname_info{nodename=~\"titan-0a|titan-0b|titan-0c\"}, \"node\", \"$1\", \"nodename\", \"(.*)\")",
|
"expr": "(avg by (node) (((1 - avg by (instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m]))) * 100) * on(instance) group_left(node) label_replace(node_uname_info{nodename!=\"\"}, \"node\", \"$1\", \"nodename\", \"(.*)\"))) * on(node) group_left() label_replace(node_uname_info{nodename=~\"titan-0a|titan-0b|titan-0c|titan-db\"}, \"node\", \"$1\", \"nodename\", \"(.*)\")",
|
||||||
"refId": "A",
|
"refId": "A",
|
||||||
"legendFormat": "{{node}}"
|
"legendFormat": "{{node}}"
|
||||||
}
|
}
|
||||||
@ -1221,7 +1375,7 @@ data:
|
|||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "(avg by (node) ((avg by (instance) ((node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100)) * on(instance) group_left(node) label_replace(node_uname_info{nodename!=\"\"}, \"node\", \"$1\", \"nodename\", \"(.*)\"))) * on(node) group_left() label_replace(node_uname_info{nodename=~\"titan-0a|titan-0b|titan-0c\"}, \"node\", \"$1\", \"nodename\", \"(.*)\")",
|
"expr": "(avg by (node) ((avg by (instance) ((node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100)) * on(instance) group_left(node) label_replace(node_uname_info{nodename!=\"\"}, \"node\", \"$1\", \"nodename\", \"(.*)\"))) * on(node) group_left() label_replace(node_uname_info{nodename=~\"titan-0a|titan-0b|titan-0c|titan-db\"}, \"node\", \"$1\", \"nodename\", \"(.*)\")",
|
||||||
"refId": "A",
|
"refId": "A",
|
||||||
"legendFormat": "{{node}}"
|
"legendFormat": "{{node}}"
|
||||||
}
|
}
|
||||||
@ -1242,6 +1396,138 @@ data:
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"id": 28,
|
||||||
|
"type": "piechart",
|
||||||
|
"title": "Node Pod Share",
|
||||||
|
"datasource": {
|
||||||
|
"type": "prometheus",
|
||||||
|
"uid": "atlas-vm"
|
||||||
|
},
|
||||||
|
"gridPos": {
|
||||||
|
"h": 10,
|
||||||
|
"w": 12,
|
||||||
|
"x": 0,
|
||||||
|
"y": 54
|
||||||
|
},
|
||||||
|
"targets": [
|
||||||
|
{
|
||||||
|
"expr": "(sum(kube_pod_info{pod!=\"\" , node!=\"\"}) by (node) / clamp_min(sum(kube_pod_info{pod!=\"\" , node!=\"\"}), 1)) * 100",
|
||||||
|
"refId": "A",
|
||||||
|
"legendFormat": "{{namespace}}"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"fieldConfig": {
|
||||||
|
"defaults": {
|
||||||
|
"unit": "percent",
|
||||||
|
"color": {
|
||||||
|
"mode": "palette-classic"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"overrides": []
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"legend": {
|
||||||
|
"displayMode": "list",
|
||||||
|
"placement": "right"
|
||||||
|
},
|
||||||
|
"pieType": "pie",
|
||||||
|
"displayLabels": [],
|
||||||
|
"tooltip": {
|
||||||
|
"mode": "single"
|
||||||
|
},
|
||||||
|
"colorScheme": "interpolateSpectral",
|
||||||
|
"colorBy": "value",
|
||||||
|
"reduceOptions": {
|
||||||
|
"calcs": [
|
||||||
|
"lastNotNull"
|
||||||
|
],
|
||||||
|
"fields": "",
|
||||||
|
"values": false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 29,
|
||||||
|
"type": "bargauge",
|
||||||
|
"title": "Top Nodes by Pod Count",
|
||||||
|
"datasource": {
|
||||||
|
"type": "prometheus",
|
||||||
|
"uid": "atlas-vm"
|
||||||
|
},
|
||||||
|
"gridPos": {
|
||||||
|
"h": 10,
|
||||||
|
"w": 12,
|
||||||
|
"x": 12,
|
||||||
|
"y": 54
|
||||||
|
},
|
||||||
|
"targets": [
|
||||||
|
{
|
||||||
|
"expr": "topk(12, sum(kube_pod_info{pod!=\"\" , node!=\"\"}) by (node))",
|
||||||
|
"refId": "A",
|
||||||
|
"legendFormat": "{{node}}",
|
||||||
|
"instant": true
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"fieldConfig": {
|
||||||
|
"defaults": {
|
||||||
|
"unit": "none",
|
||||||
|
"min": 0,
|
||||||
|
"max": null,
|
||||||
|
"thresholds": {
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{
|
||||||
|
"color": "green",
|
||||||
|
"value": null
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "yellow",
|
||||||
|
"value": 50
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 75
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "red",
|
||||||
|
"value": 100
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"decimals": 0
|
||||||
|
},
|
||||||
|
"overrides": []
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"displayMode": "gradient",
|
||||||
|
"orientation": "horizontal",
|
||||||
|
"reduceOptions": {
|
||||||
|
"calcs": [
|
||||||
|
"lastNotNull"
|
||||||
|
],
|
||||||
|
"fields": "",
|
||||||
|
"values": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"transformations": [
|
||||||
|
{
|
||||||
|
"id": "sortBy",
|
||||||
|
"options": {
|
||||||
|
"fields": [
|
||||||
|
"Value"
|
||||||
|
],
|
||||||
|
"order": "desc"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "limit",
|
||||||
|
"options": {
|
||||||
|
"limit": 12
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"id": 18,
|
"id": 18,
|
||||||
"type": "timeseries",
|
"type": "timeseries",
|
||||||
@ -1386,7 +1672,7 @@ data:
|
|||||||
"h": 16,
|
"h": 16,
|
||||||
"w": 12,
|
"w": 12,
|
||||||
"x": 0,
|
"x": 0,
|
||||||
"y": 54
|
"y": 64
|
||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
@ -1434,7 +1720,7 @@ data:
|
|||||||
"h": 16,
|
"h": 16,
|
||||||
"w": 12,
|
"w": 12,
|
||||||
"x": 12,
|
"x": 12,
|
||||||
"y": 54
|
"y": 64
|
||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
@ -1461,11 +1747,11 @@ data:
|
|||||||
},
|
},
|
||||||
{
|
{
|
||||||
"color": "orange",
|
"color": "orange",
|
||||||
"value": 70
|
"value": 75
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"color": "red",
|
"color": "red",
|
||||||
"value": 85
|
"value": 91.5
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
@ -1489,6 +1775,17 @@ data:
|
|||||||
"url": "/d/atlas-storage",
|
"url": "/d/atlas-storage",
|
||||||
"targetBlank": true
|
"targetBlank": true
|
||||||
}
|
}
|
||||||
|
],
|
||||||
|
"transformations": [
|
||||||
|
{
|
||||||
|
"id": "sortBy",
|
||||||
|
"options": {
|
||||||
|
"fields": [
|
||||||
|
"Value"
|
||||||
|
],
|
||||||
|
"order": "desc"
|
||||||
|
}
|
||||||
|
}
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
@ -1506,36 +1803,5 @@ data:
|
|||||||
"to": "now"
|
"to": "now"
|
||||||
},
|
},
|
||||||
"refresh": "1m",
|
"refresh": "1m",
|
||||||
"links": [
|
"links": []
|
||||||
{
|
|
||||||
"title": "Atlas Pods",
|
|
||||||
"type": "dashboard",
|
|
||||||
"dashboardUid": "atlas-pods",
|
|
||||||
"keepTime": false
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"title": "Atlas Nodes",
|
|
||||||
"type": "dashboard",
|
|
||||||
"dashboardUid": "atlas-nodes",
|
|
||||||
"keepTime": false
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"title": "Atlas Storage",
|
|
||||||
"type": "dashboard",
|
|
||||||
"dashboardUid": "atlas-storage",
|
|
||||||
"keepTime": false
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"title": "Atlas Network",
|
|
||||||
"type": "dashboard",
|
|
||||||
"dashboardUid": "atlas-network",
|
|
||||||
"keepTime": false
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"title": "Atlas GPU",
|
|
||||||
"type": "dashboard",
|
|
||||||
"dashboardUid": "atlas-gpu",
|
|
||||||
"keepTime": false
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
}
|
||||||
|
|||||||
@ -29,14 +29,14 @@ data:
|
|||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "sum(max by (namespace,pod) (kube_pod_status_phase{phase!~\"Running|Succeeded\"}))",
|
"expr": "sum(max by (namespace,pod) (kube_pod_status_phase{phase!~\"Running|Succeeded\"})) or on() vector(0)",
|
||||||
"refId": "A"
|
"refId": "A"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -89,14 +89,14 @@ data:
|
|||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "sum(max by (namespace,pod) (kube_pod_container_status_waiting_reason{reason=~\"CrashLoopBackOff|ImagePullBackOff\"}))",
|
"expr": "sum(max by (namespace,pod) (kube_pod_container_status_waiting_reason{reason=~\"CrashLoopBackOff|ImagePullBackOff\"})) or on() vector(0)",
|
||||||
"refId": "A"
|
"refId": "A"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -149,14 +149,14 @@ data:
|
|||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "sum(max by (namespace,pod) (((time() - kube_pod_deletion_timestamp{pod!=\"\"}) > bool 600) and on(namespace,pod) (kube_pod_deletion_timestamp{pod!=\"\"} > bool 0)))",
|
"expr": "sum(max by (namespace,pod) (((time() - kube_pod_deletion_timestamp{pod!=\"\"}) > bool 600) and on(namespace,pod) (kube_pod_deletion_timestamp{pod!=\"\"} > bool 0))) or on() vector(0)",
|
||||||
"refId": "A"
|
"refId": "A"
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -216,7 +216,7 @@ data:
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -275,12 +275,16 @@ data:
|
|||||||
],
|
],
|
||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"unit": "s"
|
"unit": "s",
|
||||||
|
"custom": {
|
||||||
|
"filterable": true
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"overrides": []
|
"overrides": []
|
||||||
},
|
},
|
||||||
"options": {
|
"options": {
|
||||||
"showHeader": true
|
"showHeader": true,
|
||||||
|
"columnFilters": false
|
||||||
},
|
},
|
||||||
"transformations": [
|
"transformations": [
|
||||||
{
|
{
|
||||||
@ -311,12 +315,16 @@ data:
|
|||||||
],
|
],
|
||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"unit": "s"
|
"unit": "s",
|
||||||
|
"custom": {
|
||||||
|
"filterable": true
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"overrides": []
|
"overrides": []
|
||||||
},
|
},
|
||||||
"options": {
|
"options": {
|
||||||
"showHeader": true
|
"showHeader": true,
|
||||||
|
"columnFilters": false
|
||||||
},
|
},
|
||||||
"transformations": [
|
"transformations": [
|
||||||
{
|
{
|
||||||
@ -347,12 +355,16 @@ data:
|
|||||||
],
|
],
|
||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"unit": "s"
|
"unit": "s",
|
||||||
|
"custom": {
|
||||||
|
"filterable": true
|
||||||
|
}
|
||||||
},
|
},
|
||||||
"overrides": []
|
"overrides": []
|
||||||
},
|
},
|
||||||
"options": {
|
"options": {
|
||||||
"showHeader": true
|
"showHeader": true,
|
||||||
|
"columnFilters": false
|
||||||
},
|
},
|
||||||
"transformations": [
|
"transformations": [
|
||||||
{
|
{
|
||||||
@ -368,6 +380,233 @@ data:
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 8,
|
||||||
|
"type": "piechart",
|
||||||
|
"title": "Node Pod Share",
|
||||||
|
"datasource": {
|
||||||
|
"type": "prometheus",
|
||||||
|
"uid": "atlas-vm"
|
||||||
|
},
|
||||||
|
"gridPos": {
|
||||||
|
"h": 8,
|
||||||
|
"w": 12,
|
||||||
|
"x": 12,
|
||||||
|
"y": 34
|
||||||
|
},
|
||||||
|
"targets": [
|
||||||
|
{
|
||||||
|
"expr": "(sum(kube_pod_info{pod!=\"\" , node!=\"\"}) by (node) / clamp_min(sum(kube_pod_info{pod!=\"\" , node!=\"\"}), 1)) * 100",
|
||||||
|
"refId": "A",
|
||||||
|
"legendFormat": "{{namespace}}"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"fieldConfig": {
|
||||||
|
"defaults": {
|
||||||
|
"unit": "percent",
|
||||||
|
"color": {
|
||||||
|
"mode": "palette-classic"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"overrides": []
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"legend": {
|
||||||
|
"displayMode": "list",
|
||||||
|
"placement": "right"
|
||||||
|
},
|
||||||
|
"pieType": "pie",
|
||||||
|
"displayLabels": [],
|
||||||
|
"tooltip": {
|
||||||
|
"mode": "single"
|
||||||
|
},
|
||||||
|
"colorScheme": "interpolateSpectral",
|
||||||
|
"colorBy": "value",
|
||||||
|
"reduceOptions": {
|
||||||
|
"calcs": [
|
||||||
|
"lastNotNull"
|
||||||
|
],
|
||||||
|
"fields": "",
|
||||||
|
"values": false
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 9,
|
||||||
|
"type": "bargauge",
|
||||||
|
"title": "Top Nodes by Pod Count",
|
||||||
|
"datasource": {
|
||||||
|
"type": "prometheus",
|
||||||
|
"uid": "atlas-vm"
|
||||||
|
},
|
||||||
|
"gridPos": {
|
||||||
|
"h": 8,
|
||||||
|
"w": 12,
|
||||||
|
"x": 0,
|
||||||
|
"y": 34
|
||||||
|
},
|
||||||
|
"targets": [
|
||||||
|
{
|
||||||
|
"expr": "topk(12, sum(kube_pod_info{pod!=\"\" , node!=\"\"}) by (node))",
|
||||||
|
"refId": "A",
|
||||||
|
"legendFormat": "{{node}}",
|
||||||
|
"instant": true
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"fieldConfig": {
|
||||||
|
"defaults": {
|
||||||
|
"unit": "none",
|
||||||
|
"min": 0,
|
||||||
|
"max": null,
|
||||||
|
"thresholds": {
|
||||||
|
"mode": "absolute",
|
||||||
|
"steps": [
|
||||||
|
{
|
||||||
|
"color": "green",
|
||||||
|
"value": null
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "yellow",
|
||||||
|
"value": 50
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 75
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "red",
|
||||||
|
"value": 100
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"decimals": 0
|
||||||
|
},
|
||||||
|
"overrides": []
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"displayMode": "gradient",
|
||||||
|
"orientation": "horizontal",
|
||||||
|
"reduceOptions": {
|
||||||
|
"calcs": [
|
||||||
|
"lastNotNull"
|
||||||
|
],
|
||||||
|
"fields": "",
|
||||||
|
"values": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"transformations": [
|
||||||
|
{
|
||||||
|
"id": "sortBy",
|
||||||
|
"options": {
|
||||||
|
"fields": [
|
||||||
|
"Value"
|
||||||
|
],
|
||||||
|
"order": "desc"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "limit",
|
||||||
|
"options": {
|
||||||
|
"limit": 12
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": 10,
|
||||||
|
"type": "table",
|
||||||
|
"title": "Namespace Plurality by Node v27",
|
||||||
|
"datasource": {
|
||||||
|
"type": "prometheus",
|
||||||
|
"uid": "atlas-vm"
|
||||||
|
},
|
||||||
|
"gridPos": {
|
||||||
|
"h": 8,
|
||||||
|
"w": 24,
|
||||||
|
"x": 0,
|
||||||
|
"y": 42
|
||||||
|
},
|
||||||
|
"targets": [
|
||||||
|
{
|
||||||
|
"expr": "(sum by (namespace,node) (kube_pod_info{pod!=\"\" , node!=\"\"}) / on(namespace) group_left() clamp_min(sum by (namespace) (kube_pod_info{pod!=\"\"}), 1) * 100) * on(namespace,node) group_left() ((sum by (namespace,node) (kube_pod_info{pod!=\"\" , node!=\"\"}) / on(namespace) group_left() clamp_min(sum by (namespace) (kube_pod_info{pod!=\"\"}), 1) * 100) + on(node) group_left() ((sum by (node) (kube_node_info{node=\"titan-0a\"}) * 0 + 0.001) or (sum by (node) (kube_node_info{node=\"titan-0b\"}) * 0 + 0.002) or (sum by (node) (kube_node_info{node=\"titan-0c\"}) * 0 + 0.003) or (sum by (node) (kube_node_info{node=\"titan-db\"}) * 0 + 0.004) or (sum by (node) (kube_node_info{node=\"titan-04\"}) * 0 + 0.005) or (sum by (node) (kube_node_info{node=\"titan-05\"}) * 0 + 0.006) or (sum by (node) (kube_node_info{node=\"titan-06\"}) * 0 + 0.007) or (sum by (node) (kube_node_info{node=\"titan-07\"}) * 0 + 0.008) or (sum by (node) (kube_node_info{node=\"titan-08\"}) * 0 + 0.009000000000000001) or (sum by (node) (kube_node_info{node=\"titan-09\"}) * 0 + 0.01) or (sum by (node) (kube_node_info{node=\"titan-10\"}) * 0 + 0.011) or (sum by (node) (kube_node_info{node=\"titan-11\"}) * 0 + 0.012) or (sum by (node) (kube_node_info{node=\"titan-12\"}) * 0 + 0.013000000000000001) or (sum by (node) (kube_node_info{node=\"titan-13\"}) * 0 + 0.014) or (sum by (node) (kube_node_info{node=\"titan-14\"}) * 0 + 0.015) or (sum by (node) (kube_node_info{node=\"titan-15\"}) * 0 + 0.016) or (sum by (node) (kube_node_info{node=\"titan-16\"}) * 0 + 0.017) or (sum by (node) (kube_node_info{node=\"titan-17\"}) * 0 + 0.018000000000000002) or (sum by (node) (kube_node_info{node=\"titan-18\"}) * 0 + 0.019) or (sum by (node) (kube_node_info{node=\"titan-19\"}) * 0 + 0.02) or (sum by (node) (kube_node_info{node=\"titan-22\"}) * 0 + 0.021) or (sum by (node) (kube_node_info{node=\"titan-24\"}) * 0 + 0.022)) == bool on(namespace) group_left() (max by (namespace) ((sum by (namespace,node) (kube_pod_info{pod!=\"\" , node!=\"\"}) / on(namespace) group_left() clamp_min(sum by (namespace) (kube_pod_info{pod!=\"\"}), 1) * 100) + on(node) group_left() ((sum by (node) (kube_node_info{node=\"titan-0a\"}) * 0 + 0.001) or (sum by (node) (kube_node_info{node=\"titan-0b\"}) * 0 + 0.002) or (sum by (node) (kube_node_info{node=\"titan-0c\"}) * 0 + 0.003) or (sum by (node) (kube_node_info{node=\"titan-db\"}) * 0 + 0.004) or (sum by (node) (kube_node_info{node=\"titan-04\"}) * 0 + 0.005) or (sum by (node) (kube_node_info{node=\"titan-05\"}) * 0 + 0.006) or (sum by (node) (kube_node_info{node=\"titan-06\"}) * 0 + 0.007) or (sum by (node) (kube_node_info{node=\"titan-07\"}) * 0 + 0.008) or (sum by (node) (kube_node_info{node=\"titan-08\"}) * 0 + 0.009000000000000001) or (sum by (node) (kube_node_info{node=\"titan-09\"}) * 0 + 0.01) or (sum by (node) (kube_node_info{node=\"titan-10\"}) * 0 + 0.011) or (sum by (node) (kube_node_info{node=\"titan-11\"}) * 0 + 0.012) or (sum by (node) (kube_node_info{node=\"titan-12\"}) * 0 + 0.013000000000000001) or (sum by (node) (kube_node_info{node=\"titan-13\"}) * 0 + 0.014) or (sum by (node) (kube_node_info{node=\"titan-14\"}) * 0 + 0.015) or (sum by (node) (kube_node_info{node=\"titan-15\"}) * 0 + 0.016) or (sum by (node) (kube_node_info{node=\"titan-16\"}) * 0 + 0.017) or (sum by (node) (kube_node_info{node=\"titan-17\"}) * 0 + 0.018000000000000002) or (sum by (node) (kube_node_info{node=\"titan-18\"}) * 0 + 0.019) or (sum by (node) (kube_node_info{node=\"titan-19\"}) * 0 + 0.02) or (sum by (node) (kube_node_info{node=\"titan-22\"}) * 0 + 0.021) or (sum by (node) (kube_node_info{node=\"titan-24\"}) * 0 + 0.022)))))",
|
||||||
|
"refId": "A",
|
||||||
|
"instant": true,
|
||||||
|
"format": "table"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"fieldConfig": {
|
||||||
|
"defaults": {
|
||||||
|
"unit": "percent",
|
||||||
|
"custom": {
|
||||||
|
"filterable": false
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"overrides": []
|
||||||
|
},
|
||||||
|
"options": {
|
||||||
|
"showHeader": true,
|
||||||
|
"columnFilters": false,
|
||||||
|
"showColumnFilters": false,
|
||||||
|
"footer": {
|
||||||
|
"show": false,
|
||||||
|
"fields": "",
|
||||||
|
"calcs": []
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"transformations": [
|
||||||
|
{
|
||||||
|
"id": "labelsToFields",
|
||||||
|
"options": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "organize",
|
||||||
|
"options": {
|
||||||
|
"excludeByName": {
|
||||||
|
"Time": true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "filterByValue",
|
||||||
|
"options": {
|
||||||
|
"match": "Value",
|
||||||
|
"operator": "gt",
|
||||||
|
"value": 0
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "sortBy",
|
||||||
|
"options": {
|
||||||
|
"fields": [
|
||||||
|
"Value"
|
||||||
|
],
|
||||||
|
"order": "desc"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"id": "groupBy",
|
||||||
|
"options": {
|
||||||
|
"fields": {
|
||||||
|
"namespace": {
|
||||||
|
"aggregations": [
|
||||||
|
{
|
||||||
|
"field": "Value",
|
||||||
|
"operation": "max"
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"field": "node",
|
||||||
|
"operation": "first"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"rowBy": [
|
||||||
|
"namespace"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
]
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"time": {
|
"time": {
|
||||||
|
|||||||
@ -36,11 +36,11 @@ data:
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
"mode": "percentage",
|
"mode": "absolute",
|
||||||
"steps": [
|
"steps": [
|
||||||
{
|
{
|
||||||
"color": "green",
|
"color": "green",
|
||||||
@ -48,11 +48,15 @@ data:
|
|||||||
},
|
},
|
||||||
{
|
{
|
||||||
"color": "yellow",
|
"color": "yellow",
|
||||||
"value": 70
|
"value": 50
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 75
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"color": "red",
|
"color": "red",
|
||||||
"value": 85
|
"value": 91.5
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@ -100,11 +104,11 @@ data:
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
"mode": "percentage",
|
"mode": "absolute",
|
||||||
"steps": [
|
"steps": [
|
||||||
{
|
{
|
||||||
"color": "green",
|
"color": "green",
|
||||||
@ -112,11 +116,15 @@ data:
|
|||||||
},
|
},
|
||||||
{
|
{
|
||||||
"color": "yellow",
|
"color": "yellow",
|
||||||
"value": 70
|
"value": 50
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"color": "orange",
|
||||||
|
"value": 75
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"color": "red",
|
"color": "red",
|
||||||
"value": 85
|
"value": 91.5
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
@ -164,7 +172,7 @@ data:
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
@ -224,7 +232,7 @@ data:
|
|||||||
"fieldConfig": {
|
"fieldConfig": {
|
||||||
"defaults": {
|
"defaults": {
|
||||||
"color": {
|
"color": {
|
||||||
"mode": "palette-classic"
|
"mode": "thresholds"
|
||||||
},
|
},
|
||||||
"mappings": [],
|
"mappings": [],
|
||||||
"thresholds": {
|
"thresholds": {
|
||||||
|
|||||||
@ -65,13 +65,13 @@ spec:
|
|||||||
namespace: flux-system
|
namespace: flux-system
|
||||||
values:
|
values:
|
||||||
server:
|
server:
|
||||||
# keep ~3 months; change as you like (supports "d", "y")
|
# keep 1 year; supports "d", "y"
|
||||||
extraArgs:
|
extraArgs:
|
||||||
retentionPeriod: "90d" # VM flag -retentionPeriod=90d. :contentReference[oaicite:11]{index=11}
|
retentionPeriod: "1y" # VM flag -retentionPeriod=1y. :contentReference[oaicite:11]{index=11}
|
||||||
|
|
||||||
persistentVolume:
|
persistentVolume:
|
||||||
enabled: true
|
enabled: true
|
||||||
size: 100Gi
|
size: 250Gi
|
||||||
|
|
||||||
# Enable built-in Kubernetes scraping
|
# Enable built-in Kubernetes scraping
|
||||||
scrape:
|
scrape:
|
||||||
@ -186,6 +186,15 @@ spec:
|
|||||||
- targets: ["longhorn-backend.longhorn-system.svc:9500"]
|
- targets: ["longhorn-backend.longhorn-system.svc:9500"]
|
||||||
metrics_path: /metrics
|
metrics_path: /metrics
|
||||||
|
|
||||||
|
# --- titan-db node_exporter (external control-plane DB host) ---
|
||||||
|
- job_name: "titan-db"
|
||||||
|
static_configs:
|
||||||
|
- targets: ["192.168.22.10:9100"]
|
||||||
|
relabel_configs:
|
||||||
|
- source_labels: [__address__]
|
||||||
|
target_label: instance
|
||||||
|
replacement: titan-db
|
||||||
|
|
||||||
# --- cert-manager (pods expose on 9402) ---
|
# --- cert-manager (pods expose on 9402) ---
|
||||||
- job_name: "cert-manager"
|
- job_name: "cert-manager"
|
||||||
kubernetes_sd_configs: [{ role: pod }]
|
kubernetes_sd_configs: [{ role: pod }]
|
||||||
@ -209,16 +218,6 @@ spec:
|
|||||||
- action: keep
|
- action: keep
|
||||||
source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_pod_label_app_kubernetes_io_part_of]
|
source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_pod_label_app_kubernetes_io_part_of]
|
||||||
regex: flux-system;flux
|
regex: flux-system;flux
|
||||||
- job_name: "titan-db"
|
|
||||||
static_configs:
|
|
||||||
- targets: ["titan-db:9100"]
|
|
||||||
relabel_configs:
|
|
||||||
- source_labels: [__address__]
|
|
||||||
target_label: instance
|
|
||||||
metric_relabel_configs:
|
|
||||||
- source_labels: [instance]
|
|
||||||
target_label: node
|
|
||||||
replacement: titan-db
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
48
services/nextcloud/configmap.yaml
Normal file
48
services/nextcloud/configmap.yaml
Normal file
@ -0,0 +1,48 @@
|
|||||||
|
# services/nextcloud/configmap.yaml
|
||||||
|
apiVersion: v1
|
||||||
|
kind: ConfigMap
|
||||||
|
metadata:
|
||||||
|
name: nextcloud-config
|
||||||
|
namespace: nextcloud
|
||||||
|
data:
|
||||||
|
extra.config.php: |
|
||||||
|
<?php
|
||||||
|
$CONFIG = array (
|
||||||
|
'trusted_domains' =>
|
||||||
|
array (
|
||||||
|
0 => 'cloud.bstein.dev',
|
||||||
|
),
|
||||||
|
'overwritehost' => 'cloud.bstein.dev',
|
||||||
|
'overwriteprotocol' => 'https',
|
||||||
|
'overwrite.cli.url' => 'https://cloud.bstein.dev',
|
||||||
|
'default_phone_region' => 'US',
|
||||||
|
'mail_smtpmode' => 'smtp',
|
||||||
|
'mail_sendmailmode' => 'smtp',
|
||||||
|
'mail_smtphost' => 'mail.bstein.dev',
|
||||||
|
'mail_smtpport' => '587',
|
||||||
|
'mail_smtpsecure' => 'tls',
|
||||||
|
'mail_smtpauth' => true,
|
||||||
|
'mail_smtpauthtype' => 'LOGIN',
|
||||||
|
'mail_domain' => 'bstein.dev',
|
||||||
|
'mail_from_address' => 'no-reply',
|
||||||
|
'oidc_login_provider_url' => 'https://sso.bstein.dev/realms/atlas',
|
||||||
|
'oidc_login_client_id' => getenv('OIDC_CLIENT_ID'),
|
||||||
|
'oidc_login_client_secret' => getenv('OIDC_CLIENT_SECRET'),
|
||||||
|
'oidc_login_auto_redirect' => false,
|
||||||
|
'oidc_login_end_session_redirect' => true,
|
||||||
|
'oidc_login_button_text' => 'Login with Keycloak',
|
||||||
|
'oidc_login_hide_password_form' => false,
|
||||||
|
'oidc_login_attributes' =>
|
||||||
|
array (
|
||||||
|
'id' => 'preferred_username',
|
||||||
|
'mail' => 'email',
|
||||||
|
'name' => 'name',
|
||||||
|
),
|
||||||
|
'oidc_login_scope' => 'openid profile email',
|
||||||
|
'oidc_login_unique_id' => 'preferred_username',
|
||||||
|
'oidc_login_use_pkce' => true,
|
||||||
|
'oidc_login_disable_registration' => false,
|
||||||
|
'oidc_login_create_groups' => false,
|
||||||
|
# External storage for user data should be configured to Asteria via the External Storage app (admin UI),
|
||||||
|
# keeping the astreae PVC for app internals only.
|
||||||
|
);
|
||||||
32
services/nextcloud/cronjob.yaml
Normal file
32
services/nextcloud/cronjob.yaml
Normal file
@ -0,0 +1,32 @@
|
|||||||
|
# services/nextcloud/cronjob.yaml
|
||||||
|
apiVersion: batch/v1
|
||||||
|
kind: CronJob
|
||||||
|
metadata:
|
||||||
|
name: nextcloud-cron
|
||||||
|
namespace: nextcloud
|
||||||
|
spec:
|
||||||
|
schedule: "*/5 * * * *"
|
||||||
|
concurrencyPolicy: Forbid
|
||||||
|
jobTemplate:
|
||||||
|
spec:
|
||||||
|
template:
|
||||||
|
spec:
|
||||||
|
securityContext:
|
||||||
|
runAsUser: 33
|
||||||
|
runAsGroup: 33
|
||||||
|
fsGroup: 33
|
||||||
|
restartPolicy: OnFailure
|
||||||
|
containers:
|
||||||
|
- name: nextcloud-cron
|
||||||
|
image: nextcloud:29-apache
|
||||||
|
imagePullPolicy: IfNotPresent
|
||||||
|
command: ["/bin/sh", "-c"]
|
||||||
|
args:
|
||||||
|
- "cd /var/www/html && php -f cron.php"
|
||||||
|
volumeMounts:
|
||||||
|
- name: nextcloud-data
|
||||||
|
mountPath: /var/www/html
|
||||||
|
volumes:
|
||||||
|
- name: nextcloud-data
|
||||||
|
persistentVolumeClaim:
|
||||||
|
claimName: nextcloud-data
|
||||||
143
services/nextcloud/deployment.yaml
Normal file
143
services/nextcloud/deployment.yaml
Normal file
@ -0,0 +1,143 @@
|
|||||||
|
# services/nextcloud/deployment.yaml
|
||||||
|
apiVersion: apps/v1
|
||||||
|
kind: Deployment
|
||||||
|
metadata:
|
||||||
|
name: nextcloud
|
||||||
|
namespace: nextcloud
|
||||||
|
labels:
|
||||||
|
app: nextcloud
|
||||||
|
spec:
|
||||||
|
replicas: 1
|
||||||
|
selector:
|
||||||
|
matchLabels:
|
||||||
|
app: nextcloud
|
||||||
|
template:
|
||||||
|
metadata:
|
||||||
|
labels:
|
||||||
|
app: nextcloud
|
||||||
|
spec:
|
||||||
|
nodeSelector:
|
||||||
|
hardware: rpi5
|
||||||
|
securityContext:
|
||||||
|
fsGroup: 33
|
||||||
|
runAsUser: 33
|
||||||
|
runAsGroup: 33
|
||||||
|
initContainers:
|
||||||
|
- name: fix-perms
|
||||||
|
image: alpine:3.20
|
||||||
|
command: ["/bin/sh", "-c"]
|
||||||
|
args:
|
||||||
|
- |
|
||||||
|
chown -R 33:33 /var/www/html/config || true
|
||||||
|
chown -R 33:33 /var/www/html/data || true
|
||||||
|
securityContext:
|
||||||
|
runAsUser: 0
|
||||||
|
runAsGroup: 0
|
||||||
|
volumeMounts:
|
||||||
|
- name: nextcloud-data
|
||||||
|
mountPath: /var/www/html
|
||||||
|
- name: nextcloud-config
|
||||||
|
mountPath: /var/www/html/config/extra.config.php
|
||||||
|
subPath: extra.config.php
|
||||||
|
containers:
|
||||||
|
- name: nextcloud
|
||||||
|
image: nextcloud:29-apache
|
||||||
|
imagePullPolicy: IfNotPresent
|
||||||
|
env:
|
||||||
|
# DB (external secret required: nextcloud-db with keys username,password,database)
|
||||||
|
- name: POSTGRES_HOST
|
||||||
|
value: postgres-service.postgres.svc.cluster.local
|
||||||
|
- name: POSTGRES_DB
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: nextcloud-db
|
||||||
|
key: database
|
||||||
|
- name: POSTGRES_USER
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: nextcloud-db
|
||||||
|
key: db-username
|
||||||
|
- name: POSTGRES_PASSWORD
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: nextcloud-db
|
||||||
|
key: db-password
|
||||||
|
# Admin bootstrap (external secret: nextcloud-admin with keys admin-user, admin-password)
|
||||||
|
- name: NEXTCLOUD_ADMIN_USER
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: nextcloud-admin
|
||||||
|
key: admin-user
|
||||||
|
- name: NEXTCLOUD_ADMIN_PASSWORD
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: nextcloud-admin
|
||||||
|
key: admin-password
|
||||||
|
- name: NEXTCLOUD_TRUSTED_DOMAINS
|
||||||
|
value: cloud.bstein.dev
|
||||||
|
- name: OVERWRITEHOST
|
||||||
|
value: cloud.bstein.dev
|
||||||
|
- name: OVERWRITEPROTOCOL
|
||||||
|
value: https
|
||||||
|
- name: OVERWRITECLIURL
|
||||||
|
value: https://cloud.bstein.dev
|
||||||
|
# SMTP (external secret: nextcloud-smtp with keys username, password)
|
||||||
|
- name: SMTP_HOST
|
||||||
|
value: mail.bstein.dev
|
||||||
|
- name: SMTP_PORT
|
||||||
|
value: "587"
|
||||||
|
- name: SMTP_SECURE
|
||||||
|
value: tls
|
||||||
|
- name: SMTP_NAME
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: nextcloud-smtp
|
||||||
|
key: smtp-username
|
||||||
|
- name: SMTP_PASSWORD
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: nextcloud-smtp
|
||||||
|
key: smtp-password
|
||||||
|
- name: MAIL_FROM_ADDRESS
|
||||||
|
value: no-reply
|
||||||
|
- name: MAIL_DOMAIN
|
||||||
|
value: bstein.dev
|
||||||
|
# OIDC (external secret: nextcloud-oidc with keys client-id, client-secret)
|
||||||
|
- name: OIDC_CLIENT_ID
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: nextcloud-oidc
|
||||||
|
key: client-id
|
||||||
|
- name: OIDC_CLIENT_SECRET
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: nextcloud-oidc
|
||||||
|
key: client-secret
|
||||||
|
- name: NEXTCLOUD_UPDATE
|
||||||
|
value: "1"
|
||||||
|
- name: APP_INSTALL
|
||||||
|
value: "mail,oidc_login,external"
|
||||||
|
ports:
|
||||||
|
- containerPort: 80
|
||||||
|
name: http
|
||||||
|
volumeMounts:
|
||||||
|
- name: nextcloud-data
|
||||||
|
mountPath: /var/www/html
|
||||||
|
- name: nextcloud-config
|
||||||
|
mountPath: /var/www/html/config/extra.config.php
|
||||||
|
subPath: extra.config.php
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: 250m
|
||||||
|
memory: 1Gi
|
||||||
|
limits:
|
||||||
|
cpu: 1
|
||||||
|
memory: 3Gi
|
||||||
|
volumes:
|
||||||
|
- name: nextcloud-data
|
||||||
|
persistentVolumeClaim:
|
||||||
|
claimName: nextcloud-data
|
||||||
|
- name: nextcloud-config
|
||||||
|
configMap:
|
||||||
|
name: nextcloud-config
|
||||||
|
defaultMode: 0444
|
||||||
25
services/nextcloud/ingress.yaml
Normal file
25
services/nextcloud/ingress.yaml
Normal file
@ -0,0 +1,25 @@
|
|||||||
|
# services/nextcloud/ingress.yaml
|
||||||
|
apiVersion: networking.k8s.io/v1
|
||||||
|
kind: Ingress
|
||||||
|
metadata:
|
||||||
|
name: nextcloud
|
||||||
|
namespace: nextcloud
|
||||||
|
annotations:
|
||||||
|
cert-manager.io/cluster-issuer: letsencrypt-prod
|
||||||
|
traefik.ingress.kubernetes.io/router.entrypoints: websecure
|
||||||
|
spec:
|
||||||
|
tls:
|
||||||
|
- hosts:
|
||||||
|
- cloud.bstein.dev
|
||||||
|
secretName: nextcloud-tls
|
||||||
|
rules:
|
||||||
|
- host: cloud.bstein.dev
|
||||||
|
http:
|
||||||
|
paths:
|
||||||
|
- path: /
|
||||||
|
pathType: Prefix
|
||||||
|
backend:
|
||||||
|
service:
|
||||||
|
name: nextcloud
|
||||||
|
port:
|
||||||
|
number: 80
|
||||||
25
services/nextcloud/kustomization.yaml
Normal file
25
services/nextcloud/kustomization.yaml
Normal file
@ -0,0 +1,25 @@
|
|||||||
|
# services/nextcloud/kustomization.yaml
|
||||||
|
apiVersion: kustomize.config.k8s.io/v1beta1
|
||||||
|
kind: Kustomization
|
||||||
|
namespace: nextcloud
|
||||||
|
resources:
|
||||||
|
- namespace.yaml
|
||||||
|
- configmap.yaml
|
||||||
|
- pvc.yaml
|
||||||
|
- deployment.yaml
|
||||||
|
- service.yaml
|
||||||
|
- ingress.yaml
|
||||||
|
- cronjob.yaml
|
||||||
|
- mail-sync-cronjob.yaml
|
||||||
|
- maintenance-cronjob.yaml
|
||||||
|
configMapGenerator:
|
||||||
|
- name: nextcloud-maintenance-script
|
||||||
|
files:
|
||||||
|
- maintenance.sh=../../scripts/nextcloud-maintenance.sh
|
||||||
|
options:
|
||||||
|
disableNameSuffixHash: true
|
||||||
|
- name: nextcloud-mail-sync-script
|
||||||
|
files:
|
||||||
|
- sync.sh=../../scripts/nextcloud-mail-sync.sh
|
||||||
|
options:
|
||||||
|
disableNameSuffixHash: true
|
||||||
58
services/nextcloud/mail-sync-cronjob.yaml
Normal file
58
services/nextcloud/mail-sync-cronjob.yaml
Normal file
@ -0,0 +1,58 @@
|
|||||||
|
# services/nextcloud/mail-sync-cronjob.yaml
|
||||||
|
apiVersion: batch/v1
|
||||||
|
kind: CronJob
|
||||||
|
metadata:
|
||||||
|
name: nextcloud-mail-sync
|
||||||
|
namespace: nextcloud
|
||||||
|
spec:
|
||||||
|
schedule: "0 5 * * *"
|
||||||
|
concurrencyPolicy: Forbid
|
||||||
|
jobTemplate:
|
||||||
|
spec:
|
||||||
|
template:
|
||||||
|
spec:
|
||||||
|
restartPolicy: OnFailure
|
||||||
|
securityContext:
|
||||||
|
runAsUser: 0
|
||||||
|
runAsGroup: 0
|
||||||
|
containers:
|
||||||
|
- name: mail-sync
|
||||||
|
image: nextcloud:29-apache
|
||||||
|
imagePullPolicy: IfNotPresent
|
||||||
|
command: ["/bin/bash", "/sync/sync.sh"]
|
||||||
|
env:
|
||||||
|
- name: KC_BASE
|
||||||
|
value: https://sso.bstein.dev
|
||||||
|
- name: KC_REALM
|
||||||
|
value: atlas
|
||||||
|
- name: KC_ADMIN_USER
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: nextcloud-keycloak-admin
|
||||||
|
key: username
|
||||||
|
- name: KC_ADMIN_PASS
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: nextcloud-keycloak-admin
|
||||||
|
key: password
|
||||||
|
volumeMounts:
|
||||||
|
- name: nextcloud-data
|
||||||
|
mountPath: /var/www/html
|
||||||
|
- name: sync-script
|
||||||
|
mountPath: /sync/sync.sh
|
||||||
|
subPath: sync.sh
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: 100m
|
||||||
|
memory: 256Mi
|
||||||
|
limits:
|
||||||
|
cpu: 500m
|
||||||
|
memory: 512Mi
|
||||||
|
volumes:
|
||||||
|
- name: nextcloud-data
|
||||||
|
persistentVolumeClaim:
|
||||||
|
claimName: nextcloud-data
|
||||||
|
- name: sync-script
|
||||||
|
configMap:
|
||||||
|
name: nextcloud-mail-sync-script
|
||||||
|
defaultMode: 0755
|
||||||
56
services/nextcloud/maintenance-cronjob.yaml
Normal file
56
services/nextcloud/maintenance-cronjob.yaml
Normal file
@ -0,0 +1,56 @@
|
|||||||
|
# services/nextcloud/maintenance-cronjob.yaml
|
||||||
|
apiVersion: batch/v1
|
||||||
|
kind: CronJob
|
||||||
|
metadata:
|
||||||
|
name: nextcloud-maintenance
|
||||||
|
namespace: nextcloud
|
||||||
|
spec:
|
||||||
|
schedule: "30 4 * * *"
|
||||||
|
concurrencyPolicy: Forbid
|
||||||
|
jobTemplate:
|
||||||
|
spec:
|
||||||
|
template:
|
||||||
|
spec:
|
||||||
|
restartPolicy: OnFailure
|
||||||
|
securityContext:
|
||||||
|
runAsUser: 0
|
||||||
|
runAsGroup: 0
|
||||||
|
containers:
|
||||||
|
- name: maintenance
|
||||||
|
image: nextcloud:29-apache
|
||||||
|
imagePullPolicy: IfNotPresent
|
||||||
|
command: ["/bin/bash", "/maintenance/maintenance.sh"]
|
||||||
|
env:
|
||||||
|
- name: NC_URL
|
||||||
|
value: https://cloud.bstein.dev
|
||||||
|
- name: ADMIN_USER
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: nextcloud-admin
|
||||||
|
key: admin-user
|
||||||
|
- name: ADMIN_PASS
|
||||||
|
valueFrom:
|
||||||
|
secretKeyRef:
|
||||||
|
name: nextcloud-admin
|
||||||
|
key: admin-password
|
||||||
|
volumeMounts:
|
||||||
|
- name: nextcloud-data
|
||||||
|
mountPath: /var/www/html
|
||||||
|
- name: maintenance-script
|
||||||
|
mountPath: /maintenance/maintenance.sh
|
||||||
|
subPath: maintenance.sh
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
cpu: 100m
|
||||||
|
memory: 256Mi
|
||||||
|
limits:
|
||||||
|
cpu: 500m
|
||||||
|
memory: 512Mi
|
||||||
|
volumes:
|
||||||
|
- name: nextcloud-data
|
||||||
|
persistentVolumeClaim:
|
||||||
|
claimName: nextcloud-data
|
||||||
|
- name: maintenance-script
|
||||||
|
configMap:
|
||||||
|
name: nextcloud-maintenance-script
|
||||||
|
defaultMode: 0755
|
||||||
5
services/nextcloud/namespace.yaml
Normal file
5
services/nextcloud/namespace.yaml
Normal file
@ -0,0 +1,5 @@
|
|||||||
|
# services/nextcloud/namespace.yaml
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Namespace
|
||||||
|
metadata:
|
||||||
|
name: nextcloud
|
||||||
13
services/nextcloud/pvc.yaml
Normal file
13
services/nextcloud/pvc.yaml
Normal file
@ -0,0 +1,13 @@
|
|||||||
|
# services/nextcloud/pvc.yaml
|
||||||
|
apiVersion: v1
|
||||||
|
kind: PersistentVolumeClaim
|
||||||
|
metadata:
|
||||||
|
name: nextcloud-data
|
||||||
|
namespace: nextcloud
|
||||||
|
spec:
|
||||||
|
accessModes:
|
||||||
|
- ReadWriteMany
|
||||||
|
resources:
|
||||||
|
requests:
|
||||||
|
storage: 200Gi
|
||||||
|
storageClassName: astreae
|
||||||
13
services/nextcloud/service.yaml
Normal file
13
services/nextcloud/service.yaml
Normal file
@ -0,0 +1,13 @@
|
|||||||
|
# services/nextcloud/service.yaml
|
||||||
|
apiVersion: v1
|
||||||
|
kind: Service
|
||||||
|
metadata:
|
||||||
|
name: nextcloud
|
||||||
|
namespace: nextcloud
|
||||||
|
spec:
|
||||||
|
selector:
|
||||||
|
app: nextcloud
|
||||||
|
ports:
|
||||||
|
- name: http
|
||||||
|
port: 80
|
||||||
|
targetPort: http
|
||||||
@ -8,7 +8,7 @@ metadata:
|
|||||||
kubernetes.io/ingress.class: traefik
|
kubernetes.io/ingress.class: traefik
|
||||||
traefik.ingress.kubernetes.io/router.entrypoints: websecure
|
traefik.ingress.kubernetes.io/router.entrypoints: websecure
|
||||||
traefik.ingress.kubernetes.io/router.tls: "true"
|
traefik.ingress.kubernetes.io/router.tls: "true"
|
||||||
cert-manager.io/cluster-issuer: letsencrypt-prod
|
cert-manager.io/cluster-issuer: letsencrypt
|
||||||
spec:
|
spec:
|
||||||
tls:
|
tls:
|
||||||
- hosts: [ "pegasus.bstein.dev" ]
|
- hosts: [ "pegasus.bstein.dev" ]
|
||||||
|
|||||||
@ -8,7 +8,7 @@ spec:
|
|||||||
secretName: vault-server-tls
|
secretName: vault-server-tls
|
||||||
issuerRef:
|
issuerRef:
|
||||||
kind: ClusterIssuer
|
kind: ClusterIssuer
|
||||||
name: letsencrypt-prod
|
name: letsencrypt
|
||||||
commonName: secret.bstein.dev
|
commonName: secret.bstein.dev
|
||||||
dnsNames:
|
dnsNames:
|
||||||
- secret.bstein.dev
|
- secret.bstein.dev
|
||||||
|
|||||||
@ -5,7 +5,7 @@ metadata:
|
|||||||
name: zot
|
name: zot
|
||||||
namespace: zot
|
namespace: zot
|
||||||
annotations:
|
annotations:
|
||||||
cert-manager.io/cluster-issuer: letsencrypt-prod
|
cert-manager.io/cluster-issuer: letsencrypt
|
||||||
traefik.ingress.kubernetes.io/router.entrypoints: websecure
|
traefik.ingress.kubernetes.io/router.entrypoints: websecure
|
||||||
traefik.ingress.kubernetes.io/router.tls: "true"
|
traefik.ingress.kubernetes.io/router.tls: "true"
|
||||||
traefik.ingress.kubernetes.io/router.middlewares: zot-zot-resp-headers@kubernetescrd
|
traefik.ingress.kubernetes.io/router.middlewares: zot-zot-resp-headers@kubernetescrd
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user