|
|
de727eee07
|
keycloak: restrict to worker rpis with titan-24 fallback
|
2025-12-06 01:44:23 -03:00 |
|
|
|
2122ce3e31
|
keycloak: require rpi nodes with titan-24 fallback
|
2025-12-06 01:40:24 -03:00 |
|
|
|
f2d496c6c0
|
keycloak: prefer rpi nodes, avoid titan-24
|
2025-12-06 01:36:33 -03:00 |
|
|
|
127d09755e
|
keycloak: honor xforwarded headers and hostname url
|
2025-12-06 01:23:07 -03:00 |
|
|
|
9f5e61ebed
|
keycloak: enable health/metrics management port
|
2025-12-06 00:51:47 -03:00 |
|
|
|
b1b39c4dcd
|
keycloak: set fsGroup for data volume
|
2025-12-06 00:49:17 -03:00 |
|
|
|
65d8986279
|
keycloak: remove optimized flag for first start
|
2025-12-06 00:43:24 -03:00 |
|
|
|
b9202b6829
|
chore: drop AGENTS.md from repo
|
2025-12-06 00:43:17 -03:00 |
|
|
|
1e8de60198
|
notes: capture GPU share change and flux branch
|
2025-12-03 12:28:45 -03:00 |
|
|
|
2906e3e5d9
|
monitoring: show GPU share over dashboard range
|
2025-12-02 20:28:35 -03:00 |
|
|
|
7210c0784d
|
flux: add keycloak kustomization
|
2025-12-02 18:10:20 -03:00 |
|
|
|
46b6d471eb
|
flux: track feature/sso
|
2025-12-02 18:00:49 -03:00 |
|
|
|
7e46ffc075
|
keycloak: add raw manifests backed by shared postgres
|
2025-12-02 17:58:19 -03:00 |
|
|
|
d8f466e53e
|
Merge pull request 'feature/atlas-monitoring' (#3) from feature/atlas-monitoring into main
Reviewed-on: #3
|
2025-12-02 20:52:35 +00:00 |
|
|
|
ffdb4ed010
|
notes: add postgres centralization guidance
|
2025-12-02 17:36:37 -03:00 |
|
|
|
5af23034de
|
notes: add sso plan sketch
|
2025-12-02 17:14:45 -03:00 |
|
|
|
72a83a1af9
|
notes: update monitoring and next steps
|
2025-12-02 17:01:32 -03:00 |
|
|
|
42b3ac0139
|
monitoring: show top12 root disks
|
2025-12-02 15:21:02 -03:00 |
|
|
|
e53ca4dd91
|
monitoring: expand worker/control/root rows
|
2025-12-02 15:15:21 -03:00 |
|
|
|
134e39d9a4
|
monitoring: shrink hottest node row height
|
2025-12-02 15:12:16 -03:00 |
|
|
|
12fd5229dc
|
monitoring: fix gpu share query and root bar labels
|
2025-12-02 14:56:36 -03:00 |
|
|
|
1963fadec1
|
monitoring: polish dashboards and folders
|
2025-12-02 14:41:39 -03:00 |
|
|
|
d23e2fe78c
|
monitoring: regen dashboards with gpu details
|
2025-12-02 13:16:00 -03:00 |
|
|
|
e7d521f203
|
monitoring: mirror dcgm-exporter as multi-arch
|
2025-12-02 12:36:24 -03:00 |
|
|
|
54e4a1ed93
|
monitoring: run dcgm-exporter with nvidia runtime
|
2025-12-02 12:25:30 -03:00 |
|
|
|
9895695b36
|
monitoring: always pull dcgm-exporter tag
|
2025-12-02 12:19:16 -03:00 |
|
|
|
2fc73097ba
|
monitoring: add registry pull secret for dcgm-exporter
|
2025-12-02 12:07:11 -03:00 |
|
|
|
7b1cc7061a
|
monitoring: allow dcgm rollout with unavailable node
|
2025-12-02 11:59:55 -03:00 |
|
|
|
f44370c41f
|
monitoring: use mirrored dcgm-exporter tag
|
2025-12-02 11:54:53 -03:00 |
|
|
|
3fbaa54f4f
|
monitoring: reenable dcgm exporter
|
2025-11-20 13:11:13 -03:00 |
|
|
|
ea60425d42
|
traefik: use responding timeouts only
|
2025-11-18 20:01:16 -03:00 |
|
|
|
a8cb8c0287
|
traefik: extend upload timeouts
|
2025-11-18 19:43:19 -03:00 |
|
|
|
f7f124ad71
|
monitoring: control-plane stat and namespace share tweaks
|
2025-11-18 17:09:13 -03:00 |
|
|
|
d062c10675
|
monitoring: refine network metrics and control-plane allowance
|
2025-11-18 16:18:52 -03:00 |
|
|
|
97b7b479bc
|
monitoring: adjust overview spacing and net panels
|
2025-11-18 15:55:24 -03:00 |
|
|
|
0b44f2d1d4
|
monitoring: disable dcgm exporter
|
2025-11-18 15:10:58 -03:00 |
|
|
|
bcda1b396d
|
flux: disable wait for monitoring
|
2025-11-18 15:04:18 -03:00 |
|
|
|
a15ee26ae2
|
flux: scope monitoring health checks
|
2025-11-18 14:33:24 -03:00 |
|
|
|
1970b820e7
|
monitoring: fix dcgm image
|
2025-11-18 14:19:23 -03:00 |
|
|
|
e4f0eeca99
|
monitoring: refresh overview dashboards
|
2025-11-18 14:08:33 -03:00 |
|
|
|
00e9c90746
|
monitoring: rework gpu share + gauges
|
2025-11-18 12:11:47 -03:00 |
|
|
|
b1d84d646a
|
monitoring: clean namespace gpu share and layout
|
2025-11-18 11:42:24 -03:00 |
|
|
|
7e4b2f8ba2
|
monitoring: resolve pie errors and network data
|
2025-11-18 11:30:33 -03:00 |
|
|
|
a028fde4f7
|
monitoring: fix namespace gpu share and network stats
|
2025-11-18 11:12:03 -03:00 |
|
|
|
703e1d4e3c
|
monitoring: add gpu node fallback
|
2025-11-18 10:47:24 -03:00 |
|
|
|
16f8b5f30b
|
monitoring: source gpu pie from limits and node nets
|
2025-11-18 01:01:10 -03:00 |
|
|
|
ebfeb78e87
|
monitoring: fix gpu pie data and network panels
|
2025-11-18 00:31:51 -03:00 |
|
|
|
d5e1003de8
|
monitoring: stabilize namespace pies and labels
|
2025-11-18 00:19:45 -03:00 |
|
|
|
a411694bda
|
monitoring: add gpu pie and tidy net panels
|
2025-11-18 00:11:39 -03:00 |
|
|
|
1df06f18f6
|
Revert GPU pie chart additions
|
2025-11-17 23:42:55 -03:00 |
|