|
|
b0698887a4
|
monitoring: add testing dashboard and switch postmark apikey
|
2026-01-18 09:21:33 -03:00 |
|
|
|
2b9a8eb8eb
|
monitoring: add glue row and fix mail dns
|
2026-01-18 08:12:06 -03:00 |
|
|
|
84710b99e8
|
monitoring: add glue dashboard and tag cronjobs
|
2026-01-18 02:50:07 -03:00 |
|
|
|
fddf58346d
|
monitoring: treat cert-manager as infrastructure
|
2026-01-12 00:26:46 -03:00 |
|
|
|
98d405bc42
|
monitoring: regenerate dashboards with expanded infra namespaces
|
2026-01-11 23:55:43 -03:00 |
|
|
|
879ff7c16b
|
monitoring: fix infra scopes and add jetson metrics
|
2026-01-11 23:46:24 -03:00 |
|
|
|
f500e81606
|
monitoring: maintenance panels, extra alerts, update overview
|
2026-01-11 02:28:39 -03:00 |
|
|
|
25907da229
|
monitoring: remove titan-16 and add titan-20/21 to worker dashboards
|
2026-01-11 02:20:47 -03:00 |
|
|
|
4a01632f6b
|
monitoring: add alert rules and include titan-20/21 in dashboards
|
2026-01-11 02:02:47 -03:00 |
|
|
|
7225e28712
|
mailu: harden relay + fix postmark exporter
|
2026-01-06 14:00:14 -03:00 |
|
|
|
29e8cb5857
|
monitoring: add titan-jh control plane node
|
2026-01-06 09:50:40 -03:00 |
|
|
|
c58583fd74
|
monitoring: refine mail overview panels
|
2026-01-06 02:34:52 -03:00 |
|
|
|
aa58115318
|
monitoring: refine mail stats and add send-limit usage
|
2026-01-06 02:06:20 -03:00 |
|
|
|
7e4b0e1eb0
|
monitoring: add Postmark mail dashboard
|
2026-01-05 21:55:59 -03:00 |
|
|
|
05a888aeb6
|
monitoring(dashboards): tune namespace share metrics
|
2026-01-05 13:30:51 -03:00 |
|
|
|
ceea2539bc
|
monitoring: per-panel namespace share filters
|
2026-01-01 14:44:33 -03:00 |
|
|
|
bcc1ceef6d
|
monitoring: ensure gpu idle share renders
|
2026-01-01 14:21:43 -03:00 |
|
|
|
91de1c1d8d
|
gpu: enable time-slicing and refresh dashboards
|
2026-01-01 14:16:08 -03:00 |
|
|
|
1b57ea7adb
|
Increase Atlas availability stat to 4 decimals
|
2025-12-19 15:18:14 -03:00 |
|
|
|
2ab38d6205
|
Reduce Atlas availability query density
|
2025-12-19 14:56:29 -03:00 |
|
|
|
2f6988189b
|
Expand Atlas availability window to 1y
|
2025-12-19 13:46:34 -03:00 |
|
|
|
c85961e1fe
|
Regenerate dashboards after availability thresholds tweak
|
2025-12-15 22:14:26 -03:00 |
|
|
|
c9c13372a8
|
atlas overview: include titan-db in control plane panels
|
2025-12-12 21:55:53 -03:00 |
|
|
|
f884ce8146
|
atlas dashboards: align percent thresholds and disk bars
|
2025-12-12 21:13:31 -03:00 |
|
|
|
755a6926ab
|
atlas overview: refine alert thresholds and availability colors
|
2025-12-12 20:50:41 -03:00 |
|
|
|
73deee09af
|
atlas dashboards: use threshold colors for stats
|
2025-12-12 20:44:20 -03:00 |
|
|
|
2e18a4e1c5
|
atlas dashboards: fix pod share display and zero/red stat thresholds
|
2025-12-12 20:40:32 -03:00 |
|
|
|
da8ed7a3b0
|
atlas dashboards: show pod counts (not %) and make zero-friendly stats
|
2025-12-12 20:30:00 -03:00 |
|
|
|
ca1b2351c0
|
atlas dashboards: show pod counts with top12 bars
|
2025-12-12 20:20:13 -03:00 |
|
|
|
0a520e1d4b
|
atlas dashboards: drop empty nodes and enforce top12 pod bars
|
2025-12-12 19:09:51 -03:00 |
|
|
|
1fefca3b3e
|
atlas dashboards: cap pod count bars at top12
|
2025-12-12 18:56:13 -03:00 |
|
|
|
8ed23c673c
|
atlas dashboards: sort pod counts and add pod row to overview
|
2025-12-12 18:51:43 -03:00 |
|
|
|
c093f98522
|
atlas dashboards: fix overview links and add pods-by-node pie
|
2025-12-12 18:32:45 -03:00 |
|
|
|
1a38bffdf3
|
atlas overview: fix availability scaling
|
2025-12-12 16:36:47 -03:00 |
|
|
|
92a7688a2f
|
atlas overview: show availability percent with 3 decimals
|
2025-12-12 16:15:37 -03:00 |
|
|
|
72d4fd60d2
|
atlas overview: show availability percent and keep uptime centered
|
2025-12-12 16:11:28 -03:00 |
|
|
|
9320d809f4
|
atlas overview: center uptime and reorder top row
|
2025-12-12 15:56:33 -03:00 |
|
|
|
27f4e60f30
|
atlas overview: add uptime and crashloop panels
|
2025-12-12 15:23:51 -03:00 |
|
|
|
2906e3e5d9
|
monitoring: show GPU share over dashboard range
|
2025-12-02 20:28:35 -03:00 |
|
|
|
42b3ac0139
|
monitoring: show top12 root disks
|
2025-12-02 15:21:02 -03:00 |
|
|
|
e53ca4dd91
|
monitoring: expand worker/control/root rows
|
2025-12-02 15:15:21 -03:00 |
|
|
|
134e39d9a4
|
monitoring: shrink hottest node row height
|
2025-12-02 15:12:16 -03:00 |
|
|
|
12fd5229dc
|
monitoring: fix gpu share query and root bar labels
|
2025-12-02 14:56:36 -03:00 |
|
|
|
1963fadec1
|
monitoring: polish dashboards and folders
|
2025-12-02 14:41:39 -03:00 |
|
|
|
d23e2fe78c
|
monitoring: regen dashboards with gpu details
|
2025-12-02 13:16:00 -03:00 |
|
|
|
f7f124ad71
|
monitoring: control-plane stat and namespace share tweaks
|
2025-11-18 17:09:13 -03:00 |
|
|
|
d062c10675
|
monitoring: refine network metrics and control-plane allowance
|
2025-11-18 16:18:52 -03:00 |
|
|
|
97b7b479bc
|
monitoring: adjust overview spacing and net panels
|
2025-11-18 15:55:24 -03:00 |
|
|
|
e4f0eeca99
|
monitoring: refresh overview dashboards
|
2025-11-18 14:08:33 -03:00 |
|
|
|
00e9c90746
|
monitoring: rework gpu share + gauges
|
2025-11-18 12:11:47 -03:00 |
|