220 Commits

Author SHA1 Message Date
25ee698021 oauth2-proxy: ensure error middleware on auth ingress 2025-12-07 12:03:14 -03:00
4a089876ba auth: use internal oauth2-proxy svc for forward-auth 2025-12-07 11:25:29 -03:00
20bb776625 auth: add 401 redirect middleware to oauth2-proxy 2025-12-07 11:14:25 -03:00
5e59f20bc3 auth: point forward-auth to external auth host 2025-12-07 11:09:09 -03:00
dbede55ad4 oauth2-proxy: temporarily drop group restriction 2025-12-07 10:42:13 -03:00
27e5c9391c auth: add namespace-local forward-auth middlewares 2025-12-07 10:25:44 -03:00
8d5e6c267c auth: wire oauth2-proxy and enable grafana oidc 2025-12-07 02:01:21 -03:00
a55502fe27 add oauth2-proxy for SSO forward-auth 2025-12-06 14:42:24 -03:00
598bdfc727 keycloak: restrict to worker rpis with titan-24 fallback 2025-12-06 01:44:23 -03:00
88c7a1c2aa keycloak: require rpi nodes with titan-24 fallback 2025-12-06 01:40:24 -03:00
f4da27271e keycloak: prefer rpi nodes, avoid titan-24 2025-12-06 01:36:33 -03:00
141c05b08f keycloak: honor xforwarded headers and hostname url 2025-12-06 01:23:07 -03:00
f0a8f6d35e keycloak: enable health/metrics management port 2025-12-06 00:51:47 -03:00
1b01052eda keycloak: set fsGroup for data volume 2025-12-06 00:49:17 -03:00
1d346edd28 keycloak: remove optimized flag for first start 2025-12-06 00:43:24 -03:00
0db149605d monitoring: show GPU share over dashboard range 2025-12-02 20:28:35 -03:00
2db550afdd keycloak: add raw manifests backed by shared postgres 2025-12-02 17:58:19 -03:00
6eba26b359 monitoring: show top12 root disks 2025-12-02 15:21:02 -03:00
ace383bedd monitoring: expand worker/control/root rows 2025-12-02 15:15:21 -03:00
b93636ecb9 monitoring: shrink hottest node row height 2025-12-02 15:12:16 -03:00
5df94a7937 monitoring: fix gpu share query and root bar labels 2025-12-02 14:56:36 -03:00
a3dc9391ee monitoring: polish dashboards and folders 2025-12-02 14:41:39 -03:00
eed67b3db0 monitoring: regen dashboards with gpu details 2025-12-02 13:16:00 -03:00
f1d0970aa0 monitoring: mirror dcgm-exporter as multi-arch 2025-12-02 12:36:24 -03:00
e26ef44d1a monitoring: run dcgm-exporter with nvidia runtime 2025-12-02 12:25:30 -03:00
a18c3e6f67 monitoring: always pull dcgm-exporter tag 2025-12-02 12:19:16 -03:00
ee923df567 monitoring: add registry pull secret for dcgm-exporter 2025-12-02 12:07:11 -03:00
d87a1dbc47 monitoring: allow dcgm rollout with unavailable node 2025-12-02 11:59:55 -03:00
5b89b0533e monitoring: use mirrored dcgm-exporter tag 2025-12-02 11:54:53 -03:00
d99bb06eeb monitoring: reenable dcgm exporter 2025-11-20 13:11:13 -03:00
e4f93e85d2 monitoring: control-plane stat and namespace share tweaks 2025-11-18 17:09:13 -03:00
f06be37f44 monitoring: refine network metrics and control-plane allowance 2025-11-18 16:18:52 -03:00
c7b7bc7a6d monitoring: adjust overview spacing and net panels 2025-11-18 15:55:24 -03:00
7b2a69cfe3 monitoring: disable dcgm exporter 2025-11-18 15:10:58 -03:00
46410c9a9d monitoring: fix dcgm image 2025-11-18 14:19:23 -03:00
ff056551c7 monitoring: refresh overview dashboards 2025-11-18 14:08:33 -03:00
8e6c0a3cfe monitoring: rework gpu share + gauges 2025-11-18 12:11:47 -03:00
497164a1ad monitoring: clean namespace gpu share and layout 2025-11-18 11:42:24 -03:00
fab5552039 monitoring: resolve pie errors and network data 2025-11-18 11:30:33 -03:00
7009a4f9ff monitoring: fix namespace gpu share and network stats 2025-11-18 11:12:03 -03:00
d7e4bcd533 monitoring: add gpu node fallback 2025-11-18 10:47:24 -03:00
ec76563a86 monitoring: source gpu pie from limits and node nets 2025-11-18 01:01:10 -03:00
5144bbe1f2 monitoring: fix gpu pie data and network panels 2025-11-18 00:31:51 -03:00
ac62387e07 monitoring: stabilize namespace pies and labels 2025-11-18 00:19:45 -03:00
2ba642d49f monitoring: add gpu pie and tidy net panels 2025-11-18 00:11:39 -03:00
beb3243839 Revert GPU pie chart additions 2025-11-17 23:42:55 -03:00
aef3176c1c monitoring: fix hottest stats and gpu share 2025-11-17 23:40:22 -03:00
f4dd1de43f monitoring: reorder namespace pies and add gpu data 2025-11-17 23:18:53 -03:00
0708522b28 monitoring: add namespace gpu share 2025-11-17 23:12:16 -03:00
c53c518301 monitoring: express namespace share as cluster percent 2025-11-17 22:58:57 -03:00