135 Commits

Author SHA1 Message Date
c3555d59f7 monitoring: fix GPU share attribution 2026-01-28 19:08:53 -03:00
a255c60aed monitoring: fix gpu idle label 2026-01-27 21:46:58 -03:00
b4f5fbeb2b monitoring: unify gpu namespace usage 2026-01-27 21:43:37 -03:00
577e2a158d monitoring: keep idle label in gpu share 2026-01-27 18:44:58 -03:00
86cd5194ea monitoring: fix gpu idle share 2026-01-27 17:51:13 -03:00
0fbbbf39e9 monitoring: fix jetson gpu metrics 2026-01-27 16:19:54 -03:00
995050f544 monitoring: unify jetson gpu metrics 2026-01-26 22:26:24 -03:00
f7d4425740 ariadne: reduce comms noise, fix gpu labels 2026-01-26 20:54:33 -03:00
8b8766b0f0 monitoring: add postgres metrics and update overview 2026-01-22 18:23:26 -03:00
307d1bf7a6 ops: restore portal/ariadne and add postgres panels 2026-01-22 15:23:23 -03:00
e0308b89fd monitoring: enforce sorted job lists 2026-01-21 15:12:53 -03:00
9db260e482 monitoring: tighten jobs/overview ordering 2026-01-21 15:01:02 -03:00
2fd87aea45 monitoring: refine jobs/overview panels 2026-01-21 14:31:11 -03:00
fc87432fdf monitoring: refresh jobs dashboards 2026-01-21 13:37:36 -03:00
aaeb933625 monitoring: refresh testing dashboard 2026-01-21 11:29:48 -03:00
c804ec040c glue: centralize sync tasks in ariadne 2026-01-21 02:57:40 -03:00
587a0af1d7 maintenance: wire ariadne db and dashboards 2026-01-20 23:03:39 -03:00
11a06e7683 feat: add Ariadne service and glue scheduling 2026-01-19 16:58:02 -03:00
847c98a7db monitoring: fix glue dashboard queries 2026-01-18 12:26:04 -03:00
b0698887a4 monitoring: add testing dashboard and switch postmark apikey 2026-01-18 09:21:33 -03:00
2b9a8eb8eb monitoring: add glue row and fix mail dns 2026-01-18 08:12:06 -03:00
84710b99e8 monitoring: add glue dashboard and tag cronjobs 2026-01-18 02:50:07 -03:00
e897858d97 monitoring: move grafana smtp to vault 2026-01-14 06:41:34 -03:00
fddf58346d monitoring: treat cert-manager as infrastructure 2026-01-12 00:26:46 -03:00
98d405bc42 monitoring: regenerate dashboards with expanded infra namespaces 2026-01-11 23:55:43 -03:00
879ff7c16b monitoring: fix infra scopes and add jetson metrics 2026-01-11 23:46:24 -03:00
f500e81606 monitoring: maintenance panels, extra alerts, update overview 2026-01-11 02:28:39 -03:00
25907da229 monitoring: remove titan-16 and add titan-20/21 to worker dashboards 2026-01-11 02:20:47 -03:00
4a01632f6b monitoring: add alert rules and include titan-20/21 in dashboards 2026-01-11 02:02:47 -03:00
7d2d6ad6e4 nextcloud/monitoring: fix perms and mail panels 2026-01-06 14:38:10 -03:00
7225e28712 mailu: harden relay + fix postmark exporter 2026-01-06 14:00:14 -03:00
29e8cb5857 monitoring: add titan-jh control plane node 2026-01-06 09:50:40 -03:00
c58583fd74 monitoring: refine mail overview panels 2026-01-06 02:34:52 -03:00
aa58115318 monitoring: refine mail stats and add send-limit usage 2026-01-06 02:06:20 -03:00
7e4b0e1eb0 monitoring: add Postmark mail dashboard 2026-01-05 21:55:59 -03:00
05a888aeb6 monitoring(dashboards): tune namespace share metrics 2026-01-05 13:30:51 -03:00
ceea2539bc monitoring: per-panel namespace share filters 2026-01-01 14:44:33 -03:00
bcc1ceef6d monitoring: ensure gpu idle share renders 2026-01-01 14:21:43 -03:00
91de1c1d8d gpu: enable time-slicing and refresh dashboards 2026-01-01 14:16:08 -03:00
1b57ea7adb Increase Atlas availability stat to 4 decimals 2025-12-19 15:18:14 -03:00
2ab38d6205 Reduce Atlas availability query density 2025-12-19 14:56:29 -03:00
2f6988189b Expand Atlas availability window to 1y 2025-12-19 13:46:34 -03:00
c85961e1fe Regenerate dashboards after availability thresholds tweak 2025-12-15 22:14:26 -03:00
917178a392 Group namespace plurality rows to one per namespace 2025-12-13 22:17:47 -03:00
88ec7d5690 Fix namespace plurality mask and bump v26 2025-12-13 20:53:11 -03:00
81105b0b7e Use OR-joined node ranks for plurality tie-break 2025-12-13 19:04:22 -03:00
28b1056324 Deduplicate namespace plurality rows with ranked tie-break 2025-12-13 18:39:31 -03:00
9b45775575 Restore namespace plurality panel data 2025-12-13 18:25:03 -03:00
2baa537ec7 Use table format for namespace plurality panel 2025-12-13 18:23:19 -03:00
8af4a689eb Simplify namespace plurality table rendering 2025-12-13 18:07:56 -03:00