223 Commits

Author SHA1 Message Date
1e0e73a28f monitoring: combine Ariadne and Metis tests 2026-03-31 13:54:04 -03:00
f19eaf3b6b move atlasbot to ai namespace 2026-02-02 09:46:50 -03:00
c3555d59f7 monitoring: fix GPU share attribution 2026-01-28 19:08:53 -03:00
a255c60aed monitoring: fix gpu idle label 2026-01-27 21:46:58 -03:00
b4f5fbeb2b monitoring: unify gpu namespace usage 2026-01-27 21:43:37 -03:00
577e2a158d monitoring: keep idle label in gpu share 2026-01-27 18:44:58 -03:00
86cd5194ea monitoring: fix gpu idle share 2026-01-27 17:51:13 -03:00
0fbbbf39e9 monitoring: fix jetson gpu metrics 2026-01-27 16:19:54 -03:00
995050f544 monitoring: unify jetson gpu metrics 2026-01-26 22:26:24 -03:00
f7d4425740 ariadne: reduce comms noise, fix gpu labels 2026-01-26 20:54:33 -03:00
4f9479c7d5 atlasbot: add metrics kb and long timeout 2026-01-26 14:08:11 -03:00
b5e8192731 atlasbot: answer jetson nodes from knowledge 2026-01-26 12:06:48 -03:00
87db5b2bd2 comms: sync atlas knowledge and use ariadne state 2026-01-26 03:32:17 -03:00
8b8766b0f0 monitoring: add postgres metrics and update overview 2026-01-22 18:23:26 -03:00
307d1bf7a6 ops: restore portal/ariadne and add postgres panels 2026-01-22 15:23:23 -03:00
e0308b89fd monitoring: enforce sorted job lists 2026-01-21 15:12:53 -03:00
9db260e482 monitoring: tighten jobs/overview ordering 2026-01-21 15:01:02 -03:00
2fd87aea45 monitoring: refine jobs/overview panels 2026-01-21 14:31:11 -03:00
fc87432fdf monitoring: refresh jobs dashboards 2026-01-21 13:37:36 -03:00
aaeb933625 monitoring: refresh testing dashboard 2026-01-21 11:29:48 -03:00
c804ec040c glue: centralize sync tasks in ariadne 2026-01-21 02:57:40 -03:00
587a0af1d7 maintenance: wire ariadne db and dashboards 2026-01-20 23:03:39 -03:00
11a06e7683 feat: add Ariadne service and glue scheduling 2026-01-19 16:58:02 -03:00
847c98a7db monitoring: fix glue dashboard queries 2026-01-18 12:26:04 -03:00
b0698887a4 monitoring: add testing dashboard and switch postmark apikey 2026-01-18 09:21:33 -03:00
2b9a8eb8eb monitoring: add glue row and fix mail dns 2026-01-18 08:12:06 -03:00
84710b99e8 monitoring: add glue dashboard and tag cronjobs 2026-01-18 02:50:07 -03:00
4aae99356f mailu: backfill mailu_enabled for legacy users 2026-01-18 02:03:13 -03:00
2302cfb607 mailu: preserve keycloak profile fields 2026-01-18 01:08:31 -03:00
df1ec16429 mailu: gate sync to approved users 2026-01-18 00:47:38 -03:00
e897858d97 monitoring: move grafana smtp to vault 2026-01-14 06:41:34 -03:00
ff29339a19 chore: refresh knowledge catalog headers 2026-01-14 01:08:05 -03:00
0d81dcd7fd ops: prepare vault-consumption branch 2026-01-13 19:01:07 -03:00
e576daf98b iac: localize configmap scripts 2026-01-13 12:07:03 -03:00
6fa2203561 iac: externalize ConfigMap scripts 2026-01-13 10:00:19 -03:00
fddf58346d monitoring: treat cert-manager as infrastructure 2026-01-12 00:26:46 -03:00
4c07bd7553 monitoring: classify logging/postgres/maintenance as infra 2026-01-11 23:52:40 -03:00
879ff7c16b monitoring: fix infra scopes and add jetson metrics 2026-01-11 23:46:24 -03:00
f500e81606 monitoring: maintenance panels, extra alerts, update overview 2026-01-11 02:28:39 -03:00
25907da229 monitoring: remove titan-16 and add titan-20/21 to worker dashboards 2026-01-11 02:20:47 -03:00
4a01632f6b monitoring: add alert rules and include titan-20/21 in dashboards 2026-01-11 02:02:47 -03:00
c887aaeecf logging: add trace analytics ingestion 2026-01-10 00:13:59 -03:00
9c2f2631ce logging: seed OpenSearch observability 2026-01-09 23:58:12 -03:00
ea6d1e0baa logging: expand OpenSearch dashboards 2026-01-09 22:55:39 -03:00
cd1c5232cc logging: add OpenSearch dashboards generator 2026-01-09 22:20:36 -03:00
fc5d0aa682 comms: consolidate stack manifests 2026-01-08 01:55:58 -03:00
4daff40692 atlasbot: add KB + read-only tools 2026-01-06 14:46:36 -03:00
92691c415e nextcloud: ensure data dir and perms 2026-01-06 14:43:18 -03:00
7d2d6ad6e4 nextcloud/monitoring: fix perms and mail panels 2026-01-06 14:38:10 -03:00
a285f78626 nextcloud: restore app files for maintenance job 2026-01-06 14:22:26 -03:00