70 Commits

Author SHA1 Message Date
e0b124ca4e monitoring: switch power telemetry to ananke metrics 2026-04-08 23:33:17 -03:00
96bc93670b monitoring(power): rename hecate UPS peers to Pyrphoros and Statera 2026-04-04 05:54:16 -03:00
82e1b87b8f monitoring(overview): refine ups-climate row and climate/fan stat display 2026-04-04 04:40:22 -03:00
1b682cc60f monitoring(grafana): restart to pick up latest overview layout 2026-04-04 04:35:26 -03:00
d5fc6c89c4 monitoring(grafana): bump restart revision for overview dashboard reload 2026-04-04 01:34:36 -03:00
7ef4c895ba monitoring(grafana): bump restart revision to reload provisioned dashboards 2026-04-03 20:54:12 -03:00
fd71c6644b monitoring(power): wire generated power dashboard and split per-UPS panels 2026-04-03 17:49:09 -03:00
bc9bf0310a monitoring: add power dashboard and reorder atlas overview rows 2026-04-03 14:55:16 -03:00
3cf426e23a monitoring: roll grafana to apply latest alert rules 2026-03-30 18:41:26 -03:00
0aeb08d375 monitoring: fix noisy grafana email alerts and reload rules 2026-03-30 18:33:02 -03:00
a49fa6dd33 monitoring: restart grafana for alerting reload 2026-01-27 23:29:46 -03:00
32884e0b7e monitoring: fix grafana smtp from address 2026-01-27 22:28:37 -03:00
7b43e8654f monitoring: send grafana alerts via postmark 2026-01-27 22:00:19 -03:00
993702afee monitoring: alert on VM outage 2026-01-23 11:51:28 -03:00
fc87432fdf monitoring: refresh jobs dashboards 2026-01-21 13:37:36 -03:00
5fe70b1471 grafana: allow email-based oauth user lookup 2026-01-21 11:45:11 -03:00
14d75ccf7a monitoring: label cronjob metrics and move grafana to arm64 2026-01-18 12:20:45 -03:00
60dee25f08 monitoring: add atlas testing dashboard folder 2026-01-18 12:07:45 -03:00
8b86c5dd67 monitoring: avoid titan-22 for core pods 2026-01-18 11:43:28 -03:00
4bc57cf445 monitoring: restore grafana persistence 2026-01-18 11:37:01 -03:00
8fb73e023c monitoring: disable grafana persistence to recover 2026-01-18 09:55:28 -03:00
b0698887a4 monitoring: add testing dashboard and switch postmark apikey 2026-01-18 09:21:33 -03:00
af86a610d9 fix ingress tls routing 2026-01-16 01:40:50 -03:00
98ca8f6b1a smtp: use mail.bstein.dev for app relays 2026-01-15 04:04:50 -03:00
e6ce9b0d88 smtp: point services at mailu relay 2026-01-15 03:58:03 -03:00
5fc530b6de vault: fix hyphenated key templates 2026-01-14 22:37:18 -03:00
dd0b4e28e7 vault: inject comms and grafana secrets 2026-01-14 22:29:27 -03:00
c9483b2d80 vault: sync harbor pulls 2026-01-14 10:07:31 -03:00
ff29339a19 chore: refresh knowledge catalog headers 2026-01-14 01:08:05 -03:00
c2aef63e95 monitoring: allow grafana upgrade remediation 2026-01-13 21:18:42 -03:00
4daa5f0e50 monitoring: align victoria-metrics PVC size 2026-01-13 21:15:10 -03:00
6ac61e7b44 monitoring: wire grafana smtp sync and alerting provisioning 2026-01-11 00:29:20 -03:00
a8da8731d0 logging: remove loki and backfill to opensearch 2026-01-09 18:08:39 -03:00
b33be4a7c2 logging: add loki and fluent-bit 2026-01-08 22:31:45 -03:00
29e8cb5857 monitoring: add titan-jh control plane node 2026-01-06 09:50:40 -03:00
7e4b0e1eb0 monitoring: add Postmark mail dashboard 2026-01-05 21:55:59 -03:00
44d5263d83 monitoring: dual-provision overview orgs 2026-01-01 18:20:40 -03:00
3eabdef431 monitoring: recreate grafana rollouts 2026-01-01 18:00:07 -03:00
ee7489ae4f monitoring: split overview org 2026-01-01 17:54:01 -03:00
f4434c860e grafana,jitsi: enable pkce and tcp fallback 2025-12-24 18:15:25 -03:00
b2904dba30 grafana: allow public overview via oidc 2025-12-24 17:43:07 -03:00
573cde6cad monitoring: longer data history 2025-12-14 14:47:20 -03:00
5905c0f243 monitoring: drop duplicate titan-db scrape job 2025-12-12 21:48:03 -03:00
df9c0c1ae0 monitoring: scrape titan-db node_exporter 2025-12-12 21:38:10 -03:00
7525289a0c auth: wire oauth2-proxy and enable grafana oidc 2025-12-07 02:01:21 -03:00
1963fadec1 monitoring: polish dashboards and folders 2025-12-02 14:41:39 -03:00
d23e2fe78c monitoring: regen dashboards with gpu details 2025-12-02 13:16:00 -03:00
a34e58d319 monitoring: fix hottest stats and titan-db scrape 2025-11-17 19:38:40 -03:00
665dfa2e52 monitoring: rebuild atlas dashboards 2025-11-17 16:27:38 -03:00
5858a80c72 monitoring: restructure grafana dashboards 2025-11-17 14:22:46 -03:00