Observability: Grafana + VictoriaMetrics (how to query safely)

Where it is configured

services/monitoring/helmrelease.yaml (Grafana + Alertmanager + VM values)
services/monitoring/grafana-dashboard-*.yaml (dashboards and their PromQL)

Using metrics as a “tool” for Atlas assistants

The safest pattern is: map a small set of intents → fixed PromQL queries, then summarize results.

Examples (intents)

“Is the cluster healthy?” → node readiness + pod restart rate
“Why is Element Call failing?” → LiveKit/coturn pod restarts + synapse errors + ingress 5xx
“Is Jenkins slow?” → pod CPU/memory + HTTP latency metrics (if exported)

Why dashboards are not the KB

Dashboards are great references, but the assistant should query VictoriaMetrics directly for live answers and keep the KB focused on wiring, runbooks, and stable conventions.

1.1 KiB Raw Blame History

Observability: Grafana + VictoriaMetrics (how to query safely)

Where it is configured

Using metrics as a “tool” for Atlas assistants

Why dashboards are not the KB

1.1 KiB

Raw Blame History