• Joined on 2025-03-24
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-11-20 16:11:35 +00:00
d99bb06eeb monitoring: reenable dcgm exporter
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-11-18 23:01:28 +00:00
75f6a59316 traefik: use responding timeouts only
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-11-18 22:43:28 +00:00
630f1f2a81 traefik: extend upload timeouts
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-11-18 20:09:24 +00:00
e4f93e85d2 monitoring: control-plane stat and namespace share tweaks
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-11-18 19:19:03 +00:00
f06be37f44 monitoring: refine network metrics and control-plane allowance
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-11-18 18:55:36 +00:00
c7b7bc7a6d monitoring: adjust overview spacing and net panels
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-11-18 18:11:08 +00:00
7b2a69cfe3 monitoring: disable dcgm exporter
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-11-18 18:04:34 +00:00
909cb4ff26 flux: disable wait for monitoring
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-11-18 17:33:35 +00:00
5a2575d54e flux: scope monitoring health checks
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-11-18 17:21:07 +00:00
46410c9a9d monitoring: fix dcgm image
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-11-18 17:08:48 +00:00
ff056551c7 monitoring: refresh overview dashboards
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-11-18 15:12:02 +00:00
8e6c0a3cfe monitoring: rework gpu share + gauges
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-11-18 14:42:36 +00:00
497164a1ad monitoring: clean namespace gpu share and layout
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-11-18 14:30:49 +00:00
fab5552039 monitoring: resolve pie errors and network data
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-11-18 14:12:17 +00:00
7009a4f9ff monitoring: fix namespace gpu share and network stats
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-11-18 13:47:35 +00:00
d7e4bcd533 monitoring: add gpu node fallback
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-11-18 04:01:27 +00:00
ec76563a86 monitoring: source gpu pie from limits and node nets
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-11-18 03:32:04 +00:00
5144bbe1f2 monitoring: fix gpu pie data and network panels
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-11-18 03:20:02 +00:00
ac62387e07 monitoring: stabilize namespace pies and labels
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-11-18 03:11:55 +00:00
2ba642d49f monitoring: add gpu pie and tidy net panels