83 Commits

Author SHA1 Message Date
8b35ab0292 monitoring: refresh jobs dashboards 2026-01-21 13:37:36 -03:00
1fb3d179ef monitoring: add testing dashboard and switch postmark apikey 2026-01-18 09:21:33 -03:00
d7812623cd monitoring: add glue row and fix mail dns 2026-01-18 08:12:06 -03:00
343d41ecc7 monitoring: add glue dashboard and tag cronjobs 2026-01-18 02:50:07 -03:00
13df82e07a monitoring: treat cert-manager as infrastructure 2026-01-12 00:26:46 -03:00
fb2c7b22d5 monitoring: regenerate dashboards with expanded infra namespaces 2026-01-11 23:55:43 -03:00
fcc0a49369 monitoring: fix infra scopes and add jetson metrics 2026-01-11 23:46:24 -03:00
54358df569 monitoring: maintenance panels, extra alerts, update overview 2026-01-11 02:28:39 -03:00
33b89c7dc2 monitoring: remove titan-16 and add titan-20/21 to worker dashboards 2026-01-11 02:20:47 -03:00
734a537a28 monitoring: add alert rules and include titan-20/21 in dashboards 2026-01-11 02:02:47 -03:00
c693e695b4 mailu: harden relay + fix postmark exporter 2026-01-06 14:00:14 -03:00
a14726350c monitoring: add titan-jh control plane node 2026-01-06 09:50:40 -03:00
5fcff4fc8a monitoring: refine mail overview panels 2026-01-06 02:34:52 -03:00
d5d2fc66b9 monitoring: refine mail stats and add send-limit usage 2026-01-06 02:06:20 -03:00
9be25e16fe monitoring: add Postmark mail dashboard 2026-01-05 21:55:59 -03:00
28a5d53c98 monitoring(dashboards): tune namespace share metrics 2026-01-05 13:30:51 -03:00
5093f77c0a monitoring: per-panel namespace share filters 2026-01-01 14:44:33 -03:00
f18f1df1ce monitoring: ensure gpu idle share renders 2026-01-01 14:21:43 -03:00
6a76fc0fa3 gpu: enable time-slicing and refresh dashboards 2026-01-01 14:16:08 -03:00
a2b34c5712 Increase Atlas availability stat to 4 decimals 2025-12-19 15:18:14 -03:00
89f95157d8 Reduce Atlas availability query density 2025-12-19 14:56:29 -03:00
8be89cbd53 Expand Atlas availability window to 1y 2025-12-19 13:46:34 -03:00
0f49849761 Regenerate dashboards after availability thresholds tweak 2025-12-15 22:14:26 -03:00
6f8a70fd58 atlas overview: include titan-db in control plane panels 2025-12-12 21:55:53 -03:00
1166069640 atlas dashboards: align percent thresholds and disk bars 2025-12-12 21:13:31 -03:00
e56bed284e atlas overview: refine alert thresholds and availability colors 2025-12-12 20:50:41 -03:00
24376594ff atlas dashboards: use threshold colors for stats 2025-12-12 20:44:20 -03:00
5277c98385 atlas dashboards: fix pod share display and zero/red stat thresholds 2025-12-12 20:40:32 -03:00
056b7b7770 atlas dashboards: show pod counts (not %) and make zero-friendly stats 2025-12-12 20:30:00 -03:00
b770575b42 atlas dashboards: show pod counts with top12 bars 2025-12-12 20:20:13 -03:00
9e76277c22 atlas dashboards: drop empty nodes and enforce top12 pod bars 2025-12-12 19:09:51 -03:00
93b3c6d2ec atlas dashboards: cap pod count bars at top12 2025-12-12 18:56:13 -03:00
596bf46863 atlas dashboards: sort pod counts and add pod row to overview 2025-12-12 18:51:43 -03:00
ec59d25ad8 atlas dashboards: fix overview links and add pods-by-node pie 2025-12-12 18:32:45 -03:00
0a0966db78 atlas overview: fix availability scaling 2025-12-12 16:36:47 -03:00
87fbba0d3e atlas overview: show availability percent with 3 decimals 2025-12-12 16:15:37 -03:00
b200dba5b9 atlas overview: show availability percent and keep uptime centered 2025-12-12 16:11:28 -03:00
697ce3c18f atlas overview: center uptime and reorder top row 2025-12-12 15:56:33 -03:00
8e39c6a28b atlas overview: add uptime and crashloop panels 2025-12-12 15:23:51 -03:00
0db149605d monitoring: show GPU share over dashboard range 2025-12-02 20:28:35 -03:00
6eba26b359 monitoring: show top12 root disks 2025-12-02 15:21:02 -03:00
ace383bedd monitoring: expand worker/control/root rows 2025-12-02 15:15:21 -03:00
b93636ecb9 monitoring: shrink hottest node row height 2025-12-02 15:12:16 -03:00
5df94a7937 monitoring: fix gpu share query and root bar labels 2025-12-02 14:56:36 -03:00
a3dc9391ee monitoring: polish dashboards and folders 2025-12-02 14:41:39 -03:00
eed67b3db0 monitoring: regen dashboards with gpu details 2025-12-02 13:16:00 -03:00
e4f93e85d2 monitoring: control-plane stat and namespace share tweaks 2025-11-18 17:09:13 -03:00
f06be37f44 monitoring: refine network metrics and control-plane allowance 2025-11-18 16:18:52 -03:00
c7b7bc7a6d monitoring: adjust overview spacing and net panels 2025-11-18 15:55:24 -03:00
ff056551c7 monitoring: refresh overview dashboards 2025-11-18 14:08:33 -03:00