• Joined on 2025-03-24
bstein merged pull request bstein/titan-iac#3 2025-12-02 20:52:36 +00:00
feature/atlas-monitoring
bstein created pull request bstein/titan-iac#3 2025-12-02 20:52:17 +00:00
feature/atlas-monitoring
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-12-02 20:50:51 +00:00
bstein pushed to feature/sso at bstein/titan-iac 2025-12-02 20:50:26 +00:00
bstein created branch feature/sso in bstein/titan-iac 2025-12-02 20:50:26 +00:00
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-12-02 20:47:03 +00:00
7b4a189fe4 keycloak: add raw manifests backed by shared postgres
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-12-02 20:36:48 +00:00
e80505a773 notes: add postgres centralization guidance
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-12-02 20:15:04 +00:00
762aa7bb0f notes: add sso plan sketch
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-12-02 20:01:47 +00:00
839fb94836 notes: update monitoring and next steps
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-12-02 18:21:11 +00:00
6eba26b359 monitoring: show top12 root disks
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-12-02 18:15:30 +00:00
ace383bedd monitoring: expand worker/control/root rows
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-12-02 18:12:27 +00:00
b93636ecb9 monitoring: shrink hottest node row height
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-12-02 17:56:53 +00:00
5df94a7937 monitoring: fix gpu share query and root bar labels
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-12-02 17:41:50 +00:00
a3dc9391ee monitoring: polish dashboards and folders
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-12-02 16:16:11 +00:00
eed67b3db0 monitoring: regen dashboards with gpu details
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-12-02 15:36:55 +00:00
f1d0970aa0 monitoring: mirror dcgm-exporter as multi-arch
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-12-02 15:25:39 +00:00
e26ef44d1a monitoring: run dcgm-exporter with nvidia runtime
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-12-02 15:19:26 +00:00
a18c3e6f67 monitoring: always pull dcgm-exporter tag
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-12-02 15:07:37 +00:00
ee923df567 monitoring: add registry pull secret for dcgm-exporter
bstein pushed to feature/atlas-monitoring at bstein/titan-iac 2025-12-02 15:00:15 +00:00
d87a1dbc47 monitoring: allow dcgm rollout with unavailable node