atlas pods: stabilize plurality query to avoid 422
This commit is contained in:
parent
cf9dacd4ea
commit
68d4f43903
@ -1178,13 +1178,10 @@ def build_pods_dashboard():
|
|||||||
10,
|
10,
|
||||||
"Namespace Plurality by Node",
|
"Namespace Plurality by Node",
|
||||||
(
|
(
|
||||||
"("
|
"max by (namespace,node) ("
|
||||||
" {share}"
|
" {share}"
|
||||||
")"
|
|
||||||
" * on(namespace) group_left(node) ("
|
" * on(namespace) group_left(node) ("
|
||||||
" ({share})"
|
" {share} == bool on(namespace) group_left() (max by (namespace) ({share}))"
|
||||||
" == bool on(namespace) group_left() ("
|
|
||||||
" max by (namespace) ({share})"
|
|
||||||
" )"
|
" )"
|
||||||
")"
|
")"
|
||||||
).format(
|
).format(
|
||||||
|
|||||||
@ -1,28 +0,0 @@
|
|||||||
# services/monitoring
|
|
||||||
|
|
||||||
## Grafana admin secret
|
|
||||||
|
|
||||||
The Grafana Helm release expects a pre-existing secret named `grafana-admin`
|
|
||||||
in the `monitoring` namespace. Create or rotate it with:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
kubectl create secret generic grafana-admin \
|
|
||||||
--namespace monitoring \
|
|
||||||
--from-literal=admin-user=admin \
|
|
||||||
--from-literal=admin-password='REPLACE_ME'
|
|
||||||
```
|
|
||||||
|
|
||||||
Update the password whenever you rotate credentials.
|
|
||||||
|
|
||||||
## DCGM exporter image
|
|
||||||
|
|
||||||
The NVIDIA GPU metrics DaemonSet expects `registry.bstein.dev/monitoring/dcgm-exporter:4.4.2-4.7.0-ubuntu22.04`, mirrored from `docker.io/nvidia/dcgm-exporter:4.4.2-4.7.0-ubuntu22.04`. Refresh it in Zot when bumping versions:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
skopeo copy \
|
|
||||||
--all \
|
|
||||||
docker://docker.io/nvidia/dcgm-exporter:4.4.2-4.7.0-ubuntu22.04 \
|
|
||||||
docker://registry.bstein.dev/monitoring/dcgm-exporter:4.4.2-4.7.0-ubuntu22.04
|
|
||||||
```
|
|
||||||
|
|
||||||
When finished mirroring from the control-plane, you can remove temporary tooling with `sudo apt-get purge -y skopeo && sudo apt-get autoremove -y` and clear `~/.config/containers/auth.json`.
|
|
||||||
@ -508,7 +508,7 @@
|
|||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "( (sum by (namespace,node) (kube_pod_info{pod!=\"\"}) / on(namespace) group_left() clamp_min(sum by (namespace) (kube_pod_info{pod!=\"\"}), 1) * 100))* on(namespace) group_left(node) ( ((sum by (namespace,node) (kube_pod_info{pod!=\"\"}) / on(namespace) group_left() clamp_min(sum by (namespace) (kube_pod_info{pod!=\"\"}), 1) * 100)) == bool on(namespace) group_left() ( max by (namespace) ((sum by (namespace,node) (kube_pod_info{pod!=\"\"}) / on(namespace) group_left() clamp_min(sum by (namespace) (kube_pod_info{pod!=\"\"}), 1) * 100)) ))",
|
"expr": "max by (namespace,node) ( (sum by (namespace,node) (kube_pod_info{pod!=\"\"}) / on(namespace) group_left() clamp_min(sum by (namespace) (kube_pod_info{pod!=\"\"}), 1) * 100) * on(namespace) group_left(node) ( (sum by (namespace,node) (kube_pod_info{pod!=\"\"}) / on(namespace) group_left() clamp_min(sum by (namespace) (kube_pod_info{pod!=\"\"}), 1) * 100) == bool on(namespace) group_left() (max by (namespace) ((sum by (namespace,node) (kube_pod_info{pod!=\"\"}) / on(namespace) group_left() clamp_min(sum by (namespace) (kube_pod_info{pod!=\"\"}), 1) * 100))) ))",
|
||||||
"refId": "A",
|
"refId": "A",
|
||||||
"instant": true
|
"instant": true
|
||||||
}
|
}
|
||||||
|
|||||||
@ -517,7 +517,7 @@ data:
|
|||||||
},
|
},
|
||||||
"targets": [
|
"targets": [
|
||||||
{
|
{
|
||||||
"expr": "( (sum by (namespace,node) (kube_pod_info{pod!=\"\"}) / on(namespace) group_left() clamp_min(sum by (namespace) (kube_pod_info{pod!=\"\"}), 1) * 100))* on(namespace) group_left(node) ( ((sum by (namespace,node) (kube_pod_info{pod!=\"\"}) / on(namespace) group_left() clamp_min(sum by (namespace) (kube_pod_info{pod!=\"\"}), 1) * 100)) == bool on(namespace) group_left() ( max by (namespace) ((sum by (namespace,node) (kube_pod_info{pod!=\"\"}) / on(namespace) group_left() clamp_min(sum by (namespace) (kube_pod_info{pod!=\"\"}), 1) * 100)) ))",
|
"expr": "max by (namespace,node) ( (sum by (namespace,node) (kube_pod_info{pod!=\"\"}) / on(namespace) group_left() clamp_min(sum by (namespace) (kube_pod_info{pod!=\"\"}), 1) * 100) * on(namespace) group_left(node) ( (sum by (namespace,node) (kube_pod_info{pod!=\"\"}) / on(namespace) group_left() clamp_min(sum by (namespace) (kube_pod_info{pod!=\"\"}), 1) * 100) == bool on(namespace) group_left() (max by (namespace) ((sum by (namespace,node) (kube_pod_info{pod!=\"\"}) / on(namespace) group_left() clamp_min(sum by (namespace) (kube_pod_info{pod!=\"\"}), 1) * 100))) ))",
|
||||||
"refId": "A",
|
"refId": "A",
|
||||||
"instant": true
|
"instant": true
|
||||||
}
|
}
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user