79 Commits

Author SHA1 Message Date
codex
83d987f43a recovery: avoid repeated manual cordon alerts 2026-06-19 15:59:00 -03:00
codex
85c0741b3e recovery: expire automatic node cordons 2026-06-19 15:43:44 -03:00
codex
b3076a23a9 recovery: skip runtime-wedged workers during startup 2026-06-19 04:31:36 -03:00
codex
e22a9150e9 recovery: quarantine container runtime wedge nodes 2026-06-19 04:25:39 -03:00
codex
707458cfc5 recovery: force clear safe deleting pods 2026-06-19 04:15:59 -03:00
codex
9031e09f4e recovery: exempt veles longhorn host from cryptsetup guard 2026-06-19 04:03:37 -03:00
codex
54f0b29bce recovery: require flux bootstrap health 2026-06-18 23:42:29 -03:00
codex
c8ccc970e6 recovery: tolerate transient startup soak checks 2026-06-18 23:33:11 -03:00
codex
3e337043d5 recovery: preflight encrypted longhorn hosts 2026-06-18 23:18:31 -03:00
codex
c415516376 recovery: force clear safe stale pods 2026-06-18 23:05:02 -03:00
codex
83723d0358 recovery: clean failed stale controller pods 2026-06-18 22:55:44 -03:00
codex
7f3a9c1428 recovery: skip ignored workers during startup 2026-06-18 22:49:05 -03:00
codex
566765696b recovery: recycle stale unknown controller pods 2026-06-18 22:46:02 -03:00
codex
4151254ba1 recovery: avoid encrypted volume nodes missing cryptsetup 2026-06-18 22:37:54 -03:00
codex
93d98e1397 recovery: repair encrypted volume mount prerequisites 2026-06-18 22:34:59 -03:00
codex
904f6b1a62 recovery: keep flux held before safe resume 2026-06-18 22:08:14 -03:00
codex
3b5cacdc34 test: split kubelet proxy autoheal coverage 2026-05-17 04:49:42 -03:00
codex
e3afc9ea7b test: cover kubelet proxy autoheal 2026-05-17 04:40:17 -03:00
codex
0b4b05233e autoheal: repair broken kubelet proxies 2026-05-17 04:24:00 -03:00
codex
087728d481 monitoring: export gitops state from ananke 2026-05-15 19:36:58 -03:00
codex
d105e43e49 recovery(ananke): auto-heal sealed vault and dead-node drift 2026-05-05 13:24:25 -03:00
codex
d225291c5d recovery(ananke): quarantine scheduling storm workloads 2026-05-05 12:09:58 -03:00
codex
b7f7486350 recovery(ananke): trigger earlier on battery outages 2026-05-05 10:53:00 -03:00
codex
1f656de5df startup(ananke): scope emergency recovery to core services 2026-05-05 05:17:59 -03:00
codex
a3e24b9b15 startup(ananke): unseal vault before startup gates 2026-04-27 07:12:21 -03:00
codex
bc65124db0 test(ananke): make startup cooldown coverage deterministic 2026-04-21 17:33:26 -03:00
632a1e2824 test: migrate execx and metrics into testing module 2026-04-10 17:04:23 -03:00
9732272d17 service: harden daemon coverage for host quality gate 2026-04-09 05:24:46 -03:00
b229f47af8 testing: make quality gate root-safe and deterministic 2026-04-09 04:56:41 -03:00
a493670dbd test: make quality gate deterministic under host sudo installs 2026-04-09 04:28:58 -03:00
fba6c2c940 metrics: emit default quality-gate counters when file missing 2026-04-09 02:08:55 -03:00
af2d94a53b startup: require real keycloak admin session before robotuser checks 2026-04-09 02:05:30 -03:00
c2c79e5821 ananke: refactor orchestrator, enforce quality gates, and harden startup checks 2026-04-09 01:40:02 -03:00
baead1426e build: reconcile split modules and restore clean checkout integrity 2026-04-08 23:52:29 -03:00
95fefba244 startup: enforce external service behavior checks 2026-04-08 23:42:09 -03:00
2268e8915a startup: adapt flux wait window to kustomization timeouts 2026-04-08 01:13:06 -03:00
14a9d67088 startup: auto-heal ingress-backed workloads when checks fail 2026-04-08 01:01:44 -03:00
0f48773572 startup: run convergence before post-start probes to avoid early deadlock 2026-04-08 00:47:06 -03:00
c7d7407008 startup: add strict preflight, ssh auth gate, ingress checks, and startup report 2026-04-07 22:40:15 -03:00
1f54cd3d46 shutdown: default to cluster-only and require explicit poweroff 2026-04-07 20:58:41 -03:00
22c581b24d startup: accept longhorn checklist responses 200 or 302 2026-04-07 14:31:16 -03:00
00a2528908 startup: auto-heal stuck vault-init and broaden external checks 2026-04-07 14:22:00 -03:00
78faf9a123 startup: make checklist body matching whitespace-tolerant 2026-04-07 13:57:54 -03:00
c605a083ee rename runtime surfaces from hecate to ananke 2026-04-07 13:14:23 -03:00
c8c3304797 startup: unblock on harbor during recovery and add controlled-cycle drill 2026-04-05 20:25:14 -03:00
11a2f66e41 startup: order vault before harbor and fail-safe flux resume 2026-04-05 16:47:47 -03:00
73b1c2063b use hecate intent API for peer guard checks 2026-04-05 13:29:42 -03:00
ad4361322d test startup intent guard helpers 2026-04-05 13:23:36 -03:00
1935c5eb3f harden startup guards and etcd restore validation 2026-04-05 13:18:34 -03:00
437a6b62cd startup: add off-site break-glass unseal fallback 2026-04-05 11:30:54 -03:00