55fa2cbce4
zot: restore main branch config
2025-12-11 17:26:15 -03:00
d5a526c5fa
zot: revert to unauthenticated registry
2025-12-11 17:22:16 -03:00
efd258fc71
vault: drop traefik basicauth
2025-12-11 17:09:05 -03:00
3852ebc0f1
zot,vault: remove oauth2-proxy sso
2025-12-11 17:04:19 -03:00
88db462f8f
longhorn/vault: gate via oauth2-proxy
2025-12-07 19:44:02 -03:00
e44def25f8
auth: remove error middleware to allow redirect
2025-12-07 13:19:45 -03:00
7ae8bf9705
oauth2-proxy: drop groups scope to avoid invalid_scope
2025-12-07 13:09:29 -03:00
088fed6720
auth: forward-auth via external auth host (svc traffic flaky)
2025-12-07 13:03:29 -03:00
84e4dc0616
oauth2-proxy: schedule on worker rpis
2025-12-07 12:49:38 -03:00
96a8d271a9
oauth2-proxy: ensure error middleware on auth ingress
2025-12-07 12:03:14 -03:00
84aa870cda
auth: use internal oauth2-proxy svc for forward-auth
2025-12-07 11:25:29 -03:00
876ec19543
auth: add 401 redirect middleware to oauth2-proxy
2025-12-07 11:14:25 -03:00
ec1d33f1ca
auth: point forward-auth to external auth host
2025-12-07 11:09:09 -03:00
1de9d94138
oauth2-proxy: temporarily drop group restriction
2025-12-07 10:42:13 -03:00
571bf759a2
auth: add namespace-local forward-auth middlewares
2025-12-07 10:25:44 -03:00
7525289a0c
auth: wire oauth2-proxy and enable grafana oidc
2025-12-07 02:01:21 -03:00
c7b73555c4
add oauth2-proxy for SSO forward-auth
2025-12-06 14:42:24 -03:00
de727eee07
keycloak: restrict to worker rpis with titan-24 fallback
2025-12-06 01:44:23 -03:00
2122ce3e31
keycloak: require rpi nodes with titan-24 fallback
2025-12-06 01:40:24 -03:00
f2d496c6c0
keycloak: prefer rpi nodes, avoid titan-24
2025-12-06 01:36:33 -03:00
127d09755e
keycloak: honor xforwarded headers and hostname url
2025-12-06 01:23:07 -03:00
9f5e61ebed
keycloak: enable health/metrics management port
2025-12-06 00:51:47 -03:00
b1b39c4dcd
keycloak: set fsGroup for data volume
2025-12-06 00:49:17 -03:00
65d8986279
keycloak: remove optimized flag for first start
2025-12-06 00:43:24 -03:00
b9202b6829
chore: drop AGENTS.md from repo
2025-12-06 00:43:17 -03:00
1e8de60198
notes: capture GPU share change and flux branch
2025-12-03 12:28:45 -03:00
2906e3e5d9
monitoring: show GPU share over dashboard range
2025-12-02 20:28:35 -03:00
7210c0784d
flux: add keycloak kustomization
2025-12-02 18:10:20 -03:00
46b6d471eb
flux: track feature/sso
2025-12-02 18:00:49 -03:00
7e46ffc075
keycloak: add raw manifests backed by shared postgres
2025-12-02 17:58:19 -03:00
d8f466e53e
Merge pull request 'feature/atlas-monitoring' ( #3 ) from feature/atlas-monitoring into main
...
Reviewed-on: #3
2025-12-02 20:52:35 +00:00
ffdb4ed010
notes: add postgres centralization guidance
2025-12-02 17:36:37 -03:00
5af23034de
notes: add sso plan sketch
2025-12-02 17:14:45 -03:00
72a83a1af9
notes: update monitoring and next steps
2025-12-02 17:01:32 -03:00
42b3ac0139
monitoring: show top12 root disks
2025-12-02 15:21:02 -03:00
e53ca4dd91
monitoring: expand worker/control/root rows
2025-12-02 15:15:21 -03:00
134e39d9a4
monitoring: shrink hottest node row height
2025-12-02 15:12:16 -03:00
12fd5229dc
monitoring: fix gpu share query and root bar labels
2025-12-02 14:56:36 -03:00
1963fadec1
monitoring: polish dashboards and folders
2025-12-02 14:41:39 -03:00
d23e2fe78c
monitoring: regen dashboards with gpu details
2025-12-02 13:16:00 -03:00
e7d521f203
monitoring: mirror dcgm-exporter as multi-arch
2025-12-02 12:36:24 -03:00
54e4a1ed93
monitoring: run dcgm-exporter with nvidia runtime
2025-12-02 12:25:30 -03:00
9895695b36
monitoring: always pull dcgm-exporter tag
2025-12-02 12:19:16 -03:00
2fc73097ba
monitoring: add registry pull secret for dcgm-exporter
2025-12-02 12:07:11 -03:00
7b1cc7061a
monitoring: allow dcgm rollout with unavailable node
2025-12-02 11:59:55 -03:00
f44370c41f
monitoring: use mirrored dcgm-exporter tag
2025-12-02 11:54:53 -03:00
3fbaa54f4f
monitoring: reenable dcgm exporter
2025-11-20 13:11:13 -03:00
ea60425d42
traefik: use responding timeouts only
2025-11-18 20:01:16 -03:00
a8cb8c0287
traefik: extend upload timeouts
2025-11-18 19:43:19 -03:00
f7f124ad71
monitoring: control-plane stat and namespace share tweaks
2025-11-18 17:09:13 -03:00
d062c10675
monitoring: refine network metrics and control-plane allowance
2025-11-18 16:18:52 -03:00
97b7b479bc
monitoring: adjust overview spacing and net panels
2025-11-18 15:55:24 -03:00
0b44f2d1d4
monitoring: disable dcgm exporter
2025-11-18 15:10:58 -03:00
bcda1b396d
flux: disable wait for monitoring
2025-11-18 15:04:18 -03:00
a15ee26ae2
flux: scope monitoring health checks
2025-11-18 14:33:24 -03:00
1970b820e7
monitoring: fix dcgm image
2025-11-18 14:19:23 -03:00
e4f0eeca99
monitoring: refresh overview dashboards
2025-11-18 14:08:33 -03:00
00e9c90746
monitoring: rework gpu share + gauges
2025-11-18 12:11:47 -03:00
b1d84d646a
monitoring: clean namespace gpu share and layout
2025-11-18 11:42:24 -03:00
7e4b2f8ba2
monitoring: resolve pie errors and network data
2025-11-18 11:30:33 -03:00
a028fde4f7
monitoring: fix namespace gpu share and network stats
2025-11-18 11:12:03 -03:00
703e1d4e3c
monitoring: add gpu node fallback
2025-11-18 10:47:24 -03:00
16f8b5f30b
monitoring: source gpu pie from limits and node nets
2025-11-18 01:01:10 -03:00
ebfeb78e87
monitoring: fix gpu pie data and network panels
2025-11-18 00:31:51 -03:00
d5e1003de8
monitoring: stabilize namespace pies and labels
2025-11-18 00:19:45 -03:00
a411694bda
monitoring: add gpu pie and tidy net panels
2025-11-18 00:11:39 -03:00
1df06f18f6
Revert GPU pie chart additions
2025-11-17 23:42:55 -03:00
9bd7effdee
monitoring: fix hottest stats and gpu share
2025-11-17 23:40:22 -03:00
991d6defc4
monitoring: reorder namespace pies and add gpu data
2025-11-17 23:18:53 -03:00
43b9265cdf
monitoring: add namespace gpu share
2025-11-17 23:12:16 -03:00
9233ba60fc
monitoring: express namespace share as cluster percent
2025-11-17 22:58:57 -03:00
ccca363fb4
monitoring: fix pie colors & thresholds
2025-11-17 22:39:50 -03:00
f22c19bc5d
monitoring: color namespace pies
2025-11-17 22:36:50 -03:00
0e9b293e95
monitoring: fix namespace share percentages
2025-11-17 22:19:01 -03:00
5a2cafb5db
monitoring: normalize namespace share
2025-11-17 22:06:06 -03:00
5ce1493b3b
monitoring: unify namespace share panels
2025-11-17 21:57:40 -03:00
c85c6b1bc3
monitoring: worker/control-plane splits
2025-11-17 21:48:12 -03:00
64059a08f5
monitoring: restore top1 hottest stats
2025-11-17 21:20:19 -03:00
2073ffe944
monitoring: fix net/io legend labels
2025-11-17 20:19:20 -03:00
a99e1ba227
monitoring: attach nodes to net/io stats
2025-11-17 20:14:11 -03:00
8d42f501e5
monitoring: tidy hottest node labels
2025-11-17 20:04:50 -03:00
7358f9e618
monitoring: show hottest node labels
2025-11-17 20:00:40 -03:00
831d1fe707
monitoring: fix hottest node labels
2025-11-17 19:56:57 -03:00
8c263b36b9
monitoring: show hottest node names
2025-11-17 19:53:39 -03:00
bf31272339
monitoring: reorder overview stats
2025-11-17 19:49:50 -03:00
a34e58d319
monitoring: fix hottest stats and titan-db scrape
2025-11-17 19:38:40 -03:00
6a60e4284a
monitoring: tighten overview stats
2025-11-17 19:24:03 -03:00
0f7d0b7bac
monitoring: polish dashboards
2025-11-17 18:55:11 -03:00
665dfa2e52
monitoring: rebuild atlas dashboards
2025-11-17 16:27:38 -03:00
5858a80c72
monitoring: restructure grafana dashboards
2025-11-17 14:22:46 -03:00
d844e068ec
monitoring: enrich dashboards
2025-11-16 12:58:08 -03:00
77c3e260a3
monitoring: refresh grafana dashboards
2025-11-15 21:03:11 -03:00
2e6b9a47c8
dashboards: improve public view and fix color
2025-11-15 11:59:48 -03:00
48f9c6d715
grafana: set datasource uid
2025-11-15 11:35:27 -03:00
da82ebd469
grafana: use atlas metrics hostname
2025-11-15 11:18:40 -03:00
37b93de3e7
victoria-metrics: revert storageclass change
2025-11-15 11:16:37 -03:00
89c0fbfd44
monitoring: fix domain
2025-11-14 19:13:40 -03:00
cb402d0bb9
monitoring: fix ingress and env formats
2025-11-14 08:51:09 -03:00
597556d1c0
grafana: use string host format
2025-11-14 08:37:46 -03:00
f886e2b873
grafana: fix dashboard provider list
2025-11-14 08:33:53 -03:00
94f0cd939d
monitoring: fix grafana values
2025-11-14 08:29:59 -03:00
bc757265cf
monitoring: add grafana and alertmanager
2025-11-14 00:02:59 -03:00
4d3a4cd2b4
flux-system: track main branch
2025-11-12 01:06:26 -03:00
ac7863802a
monitoring: disable wait on node-exporter
2025-11-09 14:03:14 -03:00
afb926439f
core: disable wait to unblock reconciliation
2025-11-09 13:46:56 -03:00
ebf5a8aef9
core: remove gpu health gate
2025-11-09 13:37:59 -03:00
dca749cc04
gpu: drop runtimeClass from minipc plugin
2025-11-09 13:28:40 -03:00
65b3e3fbb8
monitoring: disable kube-state annotations
2025-11-09 13:20:50 -03:00
45ad2a2b06
monitoring: clean helm values
2025-11-09 13:16:21 -03:00
396acb818a
monitoring: disable chart prometheusScrape
2025-11-09 13:11:40 -03:00
aae55a14f8
monitoring: annotate kube-state svc manually
2025-11-09 13:07:39 -03:00
8ac040a7d8
monitoring: drop duplicate annotations
2025-11-09 13:03:40 -03:00
79a17412af
monitoring: reference prometheus repo
2025-11-09 12:59:03 -03:00
1bdc0efdac
core: point flux to infrastructure path
2025-11-09 12:49:54 -03:00
8b6ddcd44d
platform: fix relative paths
2025-11-09 12:39:32 -03:00
ffbfee1ebd
platform: include cert-manager clusterissuer
2025-11-09 12:38:20 -03:00
85aa07c0cc
chore: fix vmagent relabel indentation
2025-11-09 12:33:11 -03:00
e2e2916139
fix: flux automation and monitoring config
2025-11-09 12:31:38 -03:00
077654fa2d
refactor: restructure atlas flux layout
2025-11-09 11:48:45 -03:00
3c229baece
pegasus on
2025-10-09 23:26:20 -05:00
48995cc6ed
Merge pull request 'minor tweaks' ( #2 ) from fea/titan24-gpu into main
...
Reviewed-on: #2
2025-10-10 02:23:01 +00:00
c94959a687
minor tweaks
2025-10-09 21:21:54 -05:00
d992be1061
Merge pull request 'gpu(titan-24): add RuntimeClass + NVIDIA device-plugin DS; enable containerd nvidia runtime' ( #1 ) from fea/titan24-gpu into main
...
Reviewed-on: #1
2025-10-09 23:29:26 +00:00
79d71f471f
gpu(titan-24): add RuntimeClass + NVIDIA device-plugin DS; enable containerd nvidia runtime
2025-10-09 18:28:20 -05:00
8f724e02be
pegasus chill
2025-10-08 04:26:26 -05:00
d2ffd738ef
storageclass update
2025-10-08 03:13:12 -05:00
16b2c15eda
asteria corrections
2025-10-08 00:50:42 -05:00
761fdd29b2
jellyfin restart
2025-10-07 23:28:40 -05:00
4567b1685c
monitoring add, jellyfin/pegasus update, and traefik tweaks
2025-10-07 23:26:27 -05:00
2182e98c05
jellyfin pvc size increase
2025-10-04 09:00:41 -05:00
503a95a8e8
fixed jellyfin pv issue
2025-10-04 08:50:56 -05:00
9dfe6bb700
jellyfin and pegasus in same group
2025-09-18 10:12:08 -05:00
358da0ea00
jellyfin and pegasus in same group
2025-09-18 09:55:00 -05:00
3b50199e1d
jellyfin and pegasus in same group
2025-09-18 09:38:46 -05:00
5b97966395
jellyfin and pegasus in same group
2025-09-18 08:52:58 -05:00
9a34ee3d2e
pegasus 1.2.32
2025-09-18 02:33:37 -05:00
53d3079bce
gavilon to gavilan
2025-09-17 19:12:03 -05:00
259451e273
added gavilon to account for pegasus
2025-09-17 18:29:33 -05:00
518d7bb160
pegasus 1.2.31
2025-09-17 18:08:49 -05:00
632949c29c
pegasus 1.2.31
2025-09-17 09:38:49 -05:00
6a77f7749f
pegasus 1.2.30
2025-09-17 09:09:24 -05:00
16997fba10
pegasus 1.2.29
2025-09-17 09:00:52 -05:00
3637a99bfb
pegasus 1.2.28
2025-09-17 08:52:11 -05:00
7e2baa343c
pegasus 1.2.27
2025-09-17 08:21:51 -05:00
02bde10852
pegasus 1.2.26
2025-09-17 07:57:36 -05:00
e224215406
pegasus 1.2.25
2025-09-17 07:46:48 -05:00
03d43d097b
pegasus 1.2.24
2025-09-17 07:24:10 -05:00
ca62df5508
pegasus 1.2.22
2025-09-17 01:33:11 -05:00
2f68bc664a
pegasus 1.2.22
2025-09-17 01:02:33 -05:00
3878d39579
pegasus 1.2.21
2025-09-17 00:08:18 -05:00
19ae80e5e0
pegasus 1.2.20
2025-09-16 23:10:58 -05:00
46f02ee826
pegasus 1.2.17
2025-09-16 22:45:15 -05:00
e34744d144
pegasus 1.2.17
2025-09-16 20:08:50 -05:00
fdbd8ef048
pegasus 1.2.17
2025-09-16 18:02:55 -05:00
535c3de0bf
pegasus 1.2.16
2025-09-16 17:18:42 -05:00
2be629a998
pegasus 1.2.15
2025-09-16 16:56:49 -05:00
0b5aed217d
pegasus 1.2.14
2025-09-16 09:53:26 -05:00
eb6aeae2d2
pegasus 1.2.13
2025-09-16 09:12:41 -05:00
3276e4f196
pegasus 1.2.12
2025-09-16 08:54:32 -05:00
e31bf05cc1
pegasus 1.2.11
2025-09-16 08:29:47 -05:00
e0169b5bba
pegasus 1.2.10
2025-09-16 07:19:54 -05:00
ba140fb638
pegasus 1.2.9
2025-09-16 05:33:36 -05:00
10b34c353b
pegasus 1.2.8
2025-09-16 04:09:10 -05:00
26e15f7651
pegasus 1.2.7 - json fix
2025-09-16 03:35:12 -05:00
22683b0dc4
pegasus 1.2.6 - json fix
2025-09-16 03:05:50 -05:00
7468e62023
mapping to list
2025-09-16 02:36:43 -05:00
0d492eb622
pegasus updates 1.2.5
2025-09-16 01:55:36 -05:00
c8a91ebe4f
pegasus updates 1.2.4
2025-09-16 01:01:23 -05:00
ee3b0f3f25
pegasus updates
2025-09-16 00:06:26 -05:00
ab02f4537e
pegasus updates
2025-09-15 22:52:58 -05:00
f51c06efac
pegasus updates
2025-09-15 22:40:00 -05:00
773637273d
pegasus updates
2025-09-15 19:55:20 -05:00
8b1c083fe0
pegasus: pin image digest + command + probes + tls
2025-09-15 13:00:39 -05:00
128fad192c
pegasus flux'd
2025-09-15 12:32:52 -05:00
eac7aaa91b
pegasus flux'd
2025-09-15 12:28:56 -05:00
28903add8f
pegasus fix
2025-09-15 12:09:24 -05:00
eea64c7eb1
pegasus on
2025-09-15 02:45:22 -05:00
c7a184eace
zot fix
2025-09-15 02:15:27 -05:00
ba233fd909
zot fix
2025-09-15 01:03:32 -05:00
04cd5b0c62
zot middleware add
2025-09-09 11:27:42 -05:00
ec744e45bf
zot middleware add
2025-09-09 01:43:13 -05:00
b16eda5894
zot simplification
2025-09-09 01:16:33 -05:00
1ba463001a
zot simplification
2025-09-09 00:22:24 -05:00
2304c41ba8
zot configmap update
2025-09-08 23:08:32 -05:00
7ca10afce7
zot version pin
2025-09-08 22:52:41 -05:00
ead0c486a5
zot troubleshooting
2025-09-08 22:25:41 -05:00
1de7fcc287
zot middleware fix
2025-09-08 21:58:50 -05:00
7efc4a4dfb
jitsi corrections
2025-09-07 14:31:53 -05:00
19bfa0878c
pegasus corrections
2025-09-07 13:34:06 -05:00
fab2d944ff
jitsi setup
2025-09-07 13:20:49 -05:00