233 Commits

Author SHA1 Message Date
577e2a158d monitoring: keep idle label in gpu share 2026-01-27 18:44:58 -03:00
86cd5194ea monitoring: fix gpu idle share 2026-01-27 17:51:13 -03:00
c0073b08cc monitoring: fix tegrastats regexes 2026-01-27 16:44:00 -03:00
0a64708b3d monitoring: expose jetson scrape line length 2026-01-27 16:38:09 -03:00
aacfc8f28c monitoring: read tegrastats per scrape 2026-01-27 16:34:31 -03:00
3b20290561 monitoring: read jetson stats on demand 2026-01-27 16:27:45 -03:00
eb809524b5 monitoring: refresh jetson stats on scrape 2026-01-27 16:23:23 -03:00
0fbbbf39e9 monitoring: fix jetson gpu metrics 2026-01-27 16:19:54 -03:00
d5478e272e monitoring: restart jetson exporter 2026-01-26 22:51:41 -03:00
5393585f3e monitoring: fix jetson metrics newlines 2026-01-26 22:50:33 -03:00
995050f544 monitoring: unify jetson gpu metrics 2026-01-26 22:26:24 -03:00
6a5c9fb0e6 monitoring: map dcgm to shared gpu resources 2026-01-26 20:58:06 -03:00
f7d4425740 ariadne: reduce comms noise, fix gpu labels 2026-01-26 20:54:33 -03:00
993702afee monitoring: alert on VM outage 2026-01-23 11:51:28 -03:00
8b8766b0f0 monitoring: add postgres metrics and update overview 2026-01-22 18:23:26 -03:00
307d1bf7a6 ops: restore portal/ariadne and add postgres panels 2026-01-22 15:23:23 -03:00
e0308b89fd monitoring: enforce sorted job lists 2026-01-21 15:12:53 -03:00
9db260e482 monitoring: tighten jobs/overview ordering 2026-01-21 15:01:02 -03:00
2fd87aea45 monitoring: refine jobs/overview panels 2026-01-21 14:31:11 -03:00
fc87432fdf monitoring: refresh jobs dashboards 2026-01-21 13:37:36 -03:00
4699ffbf2c monitoring: reschedule grafana user dedupe 2026-01-21 12:31:54 -03:00
190caf1729 monitoring: harden grafana user dedupe 2026-01-21 12:30:08 -03:00
d89d441486 monitoring: fix grafana user dedupe job 2026-01-21 12:25:53 -03:00
e8859e605a monitoring: prepopulate vault for dedupe job 2026-01-21 12:18:57 -03:00
10704a22d6 monitoring: wire vault sa for dedupe job 2026-01-21 12:16:26 -03:00
2f37a47869 monitoring: use python dedupe job 2026-01-21 12:15:03 -03:00
af789c0d0b monitoring: dedupe grafana user via api 2026-01-21 12:11:28 -03:00
d963001104 monitoring: add grafana user dedupe job 2026-01-21 12:08:23 -03:00
5fe70b1471 grafana: allow email-based oauth user lookup 2026-01-21 11:45:11 -03:00
aaeb933625 monitoring: refresh testing dashboard 2026-01-21 11:29:48 -03:00
c804ec040c glue: centralize sync tasks in ariadne 2026-01-21 02:57:40 -03:00
587a0af1d7 maintenance: wire ariadne db and dashboards 2026-01-20 23:03:39 -03:00
d07415e623 core: fix postmark DNS and time sync 2026-01-19 23:45:31 -03:00
f3620aa2a4 chore: centralize harbor pull credentials 2026-01-19 19:02:14 -03:00
11a06e7683 feat: add Ariadne service and glue scheduling 2026-01-19 16:58:02 -03:00
847c98a7db monitoring: fix glue dashboard queries 2026-01-18 12:26:04 -03:00
14d75ccf7a monitoring: label cronjob metrics and move grafana to arm64 2026-01-18 12:20:45 -03:00
60dee25f08 monitoring: add atlas testing dashboard folder 2026-01-18 12:07:45 -03:00
fbf4fe8c4f monitoring: keep postmark exporter off titan-22 2026-01-18 11:52:36 -03:00
8b86c5dd67 monitoring: avoid titan-22 for core pods 2026-01-18 11:43:28 -03:00
4bc57cf445 monitoring: restore grafana persistence 2026-01-18 11:37:01 -03:00
8fb73e023c monitoring: disable grafana persistence to recover 2026-01-18 09:55:28 -03:00
b0698887a4 monitoring: add testing dashboard and switch postmark apikey 2026-01-18 09:21:33 -03:00
2b9a8eb8eb monitoring: add glue row and fix mail dns 2026-01-18 08:12:06 -03:00
84710b99e8 monitoring: add glue dashboard and tag cronjobs 2026-01-18 02:50:07 -03:00
66679c428f jobs: bump names after affinity update 2026-01-17 01:52:16 -03:00
7cf0344d59 jobs: prefer arm64 workers 2026-01-17 01:47:53 -03:00
af86a610d9 fix ingress tls routing 2026-01-16 01:40:50 -03:00
98ca8f6b1a smtp: use mail.bstein.dev for app relays 2026-01-15 04:04:50 -03:00
e6ce9b0d88 smtp: point services at mailu relay 2026-01-15 03:58:03 -03:00