65 Commits

Author SHA1 Message Date
codex
85c0741b3e recovery: expire automatic node cordons 2026-06-19 15:43:44 -03:00
codex
b3076a23a9 recovery: skip runtime-wedged workers during startup 2026-06-19 04:31:36 -03:00
codex
e22a9150e9 recovery: quarantine container runtime wedge nodes 2026-06-19 04:25:39 -03:00
codex
707458cfc5 recovery: force clear safe deleting pods 2026-06-19 04:15:59 -03:00
codex
9031e09f4e recovery: exempt veles longhorn host from cryptsetup guard 2026-06-19 04:03:37 -03:00
codex
54f0b29bce recovery: require flux bootstrap health 2026-06-18 23:42:29 -03:00
codex
c8ccc970e6 recovery: tolerate transient startup soak checks 2026-06-18 23:33:11 -03:00
codex
3e337043d5 recovery: preflight encrypted longhorn hosts 2026-06-18 23:18:31 -03:00
codex
c415516376 recovery: force clear safe stale pods 2026-06-18 23:05:02 -03:00
codex
83723d0358 recovery: clean failed stale controller pods 2026-06-18 22:55:44 -03:00
codex
7f3a9c1428 recovery: skip ignored workers during startup 2026-06-18 22:49:05 -03:00
codex
566765696b recovery: recycle stale unknown controller pods 2026-06-18 22:46:02 -03:00
codex
4151254ba1 recovery: avoid encrypted volume nodes missing cryptsetup 2026-06-18 22:37:54 -03:00
codex
93d98e1397 recovery: repair encrypted volume mount prerequisites 2026-06-18 22:34:59 -03:00
codex
904f6b1a62 recovery: keep flux held before safe resume 2026-06-18 22:08:14 -03:00
codex
3b5cacdc34 test: split kubelet proxy autoheal coverage 2026-05-17 04:49:42 -03:00
codex
e3afc9ea7b test: cover kubelet proxy autoheal 2026-05-17 04:40:17 -03:00
codex
0b4b05233e autoheal: repair broken kubelet proxies 2026-05-17 04:24:00 -03:00
codex
d105e43e49 recovery(ananke): auto-heal sealed vault and dead-node drift 2026-05-05 13:24:25 -03:00
codex
d225291c5d recovery(ananke): quarantine scheduling storm workloads 2026-05-05 12:09:58 -03:00
codex
1f656de5df startup(ananke): scope emergency recovery to core services 2026-05-05 05:17:59 -03:00
codex
a3e24b9b15 startup(ananke): unseal vault before startup gates 2026-04-27 07:12:21 -03:00
af2d94a53b startup: require real keycloak admin session before robotuser checks 2026-04-09 02:05:30 -03:00
c2c79e5821 ananke: refactor orchestrator, enforce quality gates, and harden startup checks 2026-04-09 01:40:02 -03:00
baead1426e build: reconcile split modules and restore clean checkout integrity 2026-04-08 23:52:29 -03:00
95fefba244 startup: enforce external service behavior checks 2026-04-08 23:42:09 -03:00
2268e8915a startup: adapt flux wait window to kustomization timeouts 2026-04-08 01:13:06 -03:00
14a9d67088 startup: auto-heal ingress-backed workloads when checks fail 2026-04-08 01:01:44 -03:00
0f48773572 startup: run convergence before post-start probes to avoid early deadlock 2026-04-08 00:47:06 -03:00
c7d7407008 startup: add strict preflight, ssh auth gate, ingress checks, and startup report 2026-04-07 22:40:15 -03:00
1f54cd3d46 shutdown: default to cluster-only and require explicit poweroff 2026-04-07 20:58:41 -03:00
00a2528908 startup: auto-heal stuck vault-init and broaden external checks 2026-04-07 14:22:00 -03:00
78faf9a123 startup: make checklist body matching whitespace-tolerant 2026-04-07 13:57:54 -03:00
c605a083ee rename runtime surfaces from hecate to ananke 2026-04-07 13:14:23 -03:00
c8c3304797 startup: unblock on harbor during recovery and add controlled-cycle drill 2026-04-05 20:25:14 -03:00
11a2f66e41 startup: order vault before harbor and fail-safe flux resume 2026-04-05 16:47:47 -03:00
73b1c2063b use hecate intent API for peer guard checks 2026-04-05 13:29:42 -03:00
ad4361322d test startup intent guard helpers 2026-04-05 13:23:36 -03:00
1935c5eb3f harden startup guards and etcd restore validation 2026-04-05 13:18:34 -03:00
437a6b62cd startup: add off-site break-glass unseal fallback 2026-04-05 11:30:54 -03:00
d2526edf0e hecate: harden startup ssh access checks and k3s command paths 2026-04-05 10:03:15 -03:00
72d33bc2ce hecate: harden startup with storage gates and fallback cache 2026-04-05 01:55:56 -03:00
a05973bf2b hecate: exclude dry-run history from shutdown budget estimation 2026-04-05 00:21:05 -03:00
f020f77d2b hecate: harden emergency thresholds and bootstrap branch handling 2026-04-05 00:15:09 -03:00
b5f27a79e0 hecate: retry ssh with known_hosts repair on silent 255 2026-04-04 22:40:39 -03:00
75ad091898 hecate: auto-clear stale startup intent on lock 2026-04-04 22:37:00 -03:00
ba76e81ec2 hecate: harden startup recovery and ssh/state self-heal 2026-04-04 22:24:56 -03:00
773e0234b8 hecate: make etcd-restore dry-run non-fatal 2026-04-04 20:57:54 -03:00
522df2f6e8 hecate: handle external datastore in auto-recovery 2026-04-04 20:56:16 -03:00
5d8bfd5de6 hecate: harden outage recovery startup and etcd restore 2026-04-04 20:50:58 -03:00