249 Commits

Author SHA1 Message Date
0b44f2d1d4 monitoring: disable dcgm exporter 2025-11-18 15:10:58 -03:00
bcda1b396d flux: disable wait for monitoring 2025-11-18 15:04:18 -03:00
a15ee26ae2 flux: scope monitoring health checks 2025-11-18 14:33:24 -03:00
1970b820e7 monitoring: fix dcgm image 2025-11-18 14:19:23 -03:00
e4f0eeca99 monitoring: refresh overview dashboards 2025-11-18 14:08:33 -03:00
00e9c90746 monitoring: rework gpu share + gauges 2025-11-18 12:11:47 -03:00
b1d84d646a monitoring: clean namespace gpu share and layout 2025-11-18 11:42:24 -03:00
7e4b2f8ba2 monitoring: resolve pie errors and network data 2025-11-18 11:30:33 -03:00
a028fde4f7 monitoring: fix namespace gpu share and network stats 2025-11-18 11:12:03 -03:00
703e1d4e3c monitoring: add gpu node fallback 2025-11-18 10:47:24 -03:00
16f8b5f30b monitoring: source gpu pie from limits and node nets 2025-11-18 01:01:10 -03:00
ebfeb78e87 monitoring: fix gpu pie data and network panels 2025-11-18 00:31:51 -03:00
d5e1003de8 monitoring: stabilize namespace pies and labels 2025-11-18 00:19:45 -03:00
a411694bda monitoring: add gpu pie and tidy net panels 2025-11-18 00:11:39 -03:00
1df06f18f6 Revert GPU pie chart additions 2025-11-17 23:42:55 -03:00
9bd7effdee monitoring: fix hottest stats and gpu share 2025-11-17 23:40:22 -03:00
991d6defc4 monitoring: reorder namespace pies and add gpu data 2025-11-17 23:18:53 -03:00
43b9265cdf monitoring: add namespace gpu share 2025-11-17 23:12:16 -03:00
9233ba60fc monitoring: express namespace share as cluster percent 2025-11-17 22:58:57 -03:00
ccca363fb4 monitoring: fix pie colors & thresholds 2025-11-17 22:39:50 -03:00
f22c19bc5d monitoring: color namespace pies 2025-11-17 22:36:50 -03:00
0e9b293e95 monitoring: fix namespace share percentages 2025-11-17 22:19:01 -03:00
5a2cafb5db monitoring: normalize namespace share 2025-11-17 22:06:06 -03:00
5ce1493b3b monitoring: unify namespace share panels 2025-11-17 21:57:40 -03:00
c85c6b1bc3 monitoring: worker/control-plane splits 2025-11-17 21:48:12 -03:00
64059a08f5 monitoring: restore top1 hottest stats 2025-11-17 21:20:19 -03:00
2073ffe944 monitoring: fix net/io legend labels 2025-11-17 20:19:20 -03:00
a99e1ba227 monitoring: attach nodes to net/io stats 2025-11-17 20:14:11 -03:00
8d42f501e5 monitoring: tidy hottest node labels 2025-11-17 20:04:50 -03:00
7358f9e618 monitoring: show hottest node labels 2025-11-17 20:00:40 -03:00
831d1fe707 monitoring: fix hottest node labels 2025-11-17 19:56:57 -03:00
8c263b36b9 monitoring: show hottest node names 2025-11-17 19:53:39 -03:00
bf31272339 monitoring: reorder overview stats 2025-11-17 19:49:50 -03:00
a34e58d319 monitoring: fix hottest stats and titan-db scrape 2025-11-17 19:38:40 -03:00
6a60e4284a monitoring: tighten overview stats 2025-11-17 19:24:03 -03:00
0f7d0b7bac monitoring: polish dashboards 2025-11-17 18:55:11 -03:00
665dfa2e52 monitoring: rebuild atlas dashboards 2025-11-17 16:27:38 -03:00
5858a80c72 monitoring: restructure grafana dashboards 2025-11-17 14:22:46 -03:00
d844e068ec monitoring: enrich dashboards 2025-11-16 12:58:08 -03:00
77c3e260a3 monitoring: refresh grafana dashboards 2025-11-15 21:03:11 -03:00
2e6b9a47c8 dashboards: improve public view and fix color 2025-11-15 11:59:48 -03:00
48f9c6d715 grafana: set datasource uid 2025-11-15 11:35:27 -03:00
da82ebd469 grafana: use atlas metrics hostname 2025-11-15 11:18:40 -03:00
37b93de3e7 victoria-metrics: revert storageclass change 2025-11-15 11:16:37 -03:00
89c0fbfd44 monitoring: fix domain 2025-11-14 19:13:40 -03:00
cb402d0bb9 monitoring: fix ingress and env formats 2025-11-14 08:51:09 -03:00
597556d1c0 grafana: use string host format 2025-11-14 08:37:46 -03:00
f886e2b873 grafana: fix dashboard provider list 2025-11-14 08:33:53 -03:00
94f0cd939d monitoring: fix grafana values 2025-11-14 08:29:59 -03:00
bc757265cf monitoring: add grafana and alertmanager 2025-11-14 00:02:59 -03:00