1.1 KiB

title tags owners entrypoints source_paths
Observability: Grafana + VictoriaMetrics (how to query safely)
atlas
monitoring
grafana
victoriametrics
brad
metrics.bstein.dev
alerts.bstein.dev
services/monitoring

Observability: Grafana + VictoriaMetrics (how to query safely)

Where it is configured

  • services/monitoring/helmrelease.yaml (Grafana + Alertmanager + VM values)
  • services/monitoring/grafana-dashboard-*.yaml (dashboards and their PromQL)

Using metrics as a “tool” for Atlas assistants

The safest pattern is: map a small set of intents → fixed PromQL queries, then summarize results.

Examples (intents)

  • “Is the cluster healthy?” → node readiness + pod restart rate
  • “Why is Element Call failing?” → LiveKit/coturn pod restarts + synapse errors + ingress 5xx
  • “Is Jenkins slow?” → pod CPU/memory + HTTP latency metrics (if exported)

Why dashboards are not the KB

Dashboards are great references, but the assistant should query VictoriaMetrics directly for live answers and keep the KB focused on wiring, runbooks, and stable conventions.