1846 Commits

Author SHA1 Message Date
094d202803 monitoring: remove combined UPS draw series from history panels 2026-04-19 14:50:24 -03:00
411bc6b90d monitoring: elevate Atlas Testing dashboard and no-data fallbacks 2026-04-18 14:50:59 -03:00
629df65c7b monitoring(soteria): tune PVC backup age thresholds for nightly cadence 2026-04-14 02:14:43 -03:00
e5a824e4e1 typhon: register app and add v2-safe ble/control runtime toggles 2026-04-13 22:02:57 -03:00
6815a67c1f maintenance(soteria): roll out 0.1.0-35 2026-04-13 16:51:46 -03:00
deefdb53ad maintenance(soteria): roll out 0.1.0-34 2026-04-13 14:23:24 -03:00
4e4c310cd4 maintenance(soteria): roll out 0.1.0-33 2026-04-13 13:58:44 -03:00
df79cad1c3 maintenance(soteria): grant pod logs and roll out 0.1.0-32 2026-04-13 12:51:38 -03:00
b3d8b13f39 maintenance(soteria): roll pvc-node pin fix and pod-read rbac 2026-04-13 03:32:25 -03:00
a23b6a4b93 maintenance(soteria): move restic vault path to shared scope 2026-04-13 03:01:29 -03:00
38abbd9fe1 maintenance(vault): roll sync pod after soteria secret mapping 2026-04-13 02:55:42 -03:00
ac12a9bfed maintenance(soteria): source restic credentials from vault 2026-04-13 02:54:05 -03:00
8a371e1267 monitoring(alerts): make soteria backup health rule driver-agnostic 2026-04-13 02:38:53 -03:00
f25186ef7e maintenance(soteria): switch to encrypted restic backups 2026-04-13 02:14:39 -03:00
a01dc0813a maintenance(soteria): enable b2 usage scan config and alert 2026-04-12 19:47:58 -03:00
609cfcb696 monitoring: force horizontal stat layout for power/climate panels 2026-04-12 19:04:35 -03:00
75a992b829 maintenance(soteria): tighten oauth2 ingress and drill validation 2026-04-12 14:58:25 -03:00
a87a5f7bff monitoring: fix typhon low-threshold alert semantics 2026-04-12 14:56:34 -03:00
a1c8a99866 monitoring(alerts): watch soteria authz denial spikes 2026-04-12 12:19:42 -03:00
7b3dfa335b maintenance(soteria): harden ingress path and add backup alerts 2026-04-12 12:12:43 -03:00
e1bba18b52 maintenance: set explicit jenkins cleanup schedule 2026-04-12 11:36:50 -03:00
52882f1bb5 maintenance(soteria): add serviceaccount and rbac manifests 2026-04-12 11:36:33 -03:00
5128741c53 maintenance: default jenkins cleanup to dry-run 2026-04-12 11:28:48 -03:00
96f923ae4c maintenance(soteria): add protected UI, OIDC bootstrap, and backup health panel wiring 2026-04-12 11:16:29 -03:00
95bc3953d1 maintenance: wire jenkins cleanup permissions 2026-04-12 11:00:50 -03:00
f4e921bb33 scheduling: keep app workloads off control-plane 2026-04-12 04:26:52 -03:00
616c6308b1 maintenance: remove pi-usb-scratch guard rollout 2026-04-12 01:02:41 -03:00
d9b30d6c5b maintenance(pi-usb-scratch): skip k3s runtime rsync during cutover 2026-04-11 12:11:15 -03:00
7c337ad5a1 maintenance(pi-usb-scratch): disable rollout jitter for initial cutover 2026-04-11 12:00:30 -03:00
3823b68ee2 maintenance(pi-usb-scratch): fix false mount conflict detection 2026-04-11 11:57:50 -03:00
40de2b59a5 maintenance: enforce Astraios + tmpfs /tmp on worker Pis 2026-04-11 11:54:43 -03:00
5483c04bb3 maintenance: add worker pi usb scratch rollout 2026-04-11 01:03:42 -03:00
64b4f14018 ariadne: remove remaining cronjobs and migrate schedule ownership 2026-04-10 22:40:58 -03:00
166020ca1d ariadne: migrate glue cronjobs to schedules 2026-04-10 21:22:35 -03:00
60446ee830 testing(ci): centralize quality gate contract 2026-04-10 17:06:53 -03:00
c38b6c5e27 ci: publish titan-iac tests and seed ananke/lesavka jobs 2026-04-10 16:38:55 -03:00
9419c4b26b dashboards: unify suite pass-rate metrics on platform counters 2026-04-10 15:35:20 -03:00
5f4641553c monitoring: replace failure table with 24h suite pass snapshot 2026-04-09 20:16:44 -03:00
530f440679 monitoring: add suite probe metrics and align fan labels 2026-04-09 20:10:52 -03:00
5e3aadc640 monitoring: set overview platform test panel to 7d 2026-04-09 20:05:10 -03:00
12b85f4597 monitoring: add platform quality push gateway for test metrics 2026-04-09 19:30:16 -03:00
ad1cbd6f85 monitoring: make test panel point-based and failure-by-suite 2026-04-09 19:27:48 -03:00
5cf9a16d97 monitoring: align overview panels with jobs and point-based suite rates 2026-04-09 16:35:14 -03:00
f8c1243dfd monitoring: add generic suite metric slots for platform tests 2026-04-09 16:16:35 -03:00
7b0e9acbb1 monitoring: make suite pass rate 30d rolling for sparse tests 2026-04-09 16:14:26 -03:00
0273727cb4 monitoring: make platform test success one line per suite 2026-04-09 15:21:59 -03:00
09fa3e716c monitoring/atlas: merge top rows and fix platform test pass-rate panel 2026-04-09 14:56:43 -03:00
293cd83999 monitoring/atlas: resize test/ops rows and source overview tests from atlas-jobs 2026-04-09 13:39:55 -03:00
764bfe189e monitoring/recovery: harden ananke checks and OIDC-gated service validation 2026-04-09 01:44:26 -03:00
e0b124ca4e monitoring: switch power telemetry to ananke metrics 2026-04-08 23:33:17 -03:00