diff --git a/README.md b/README.md index c8841ff..88c8cb7 100644 --- a/README.md +++ b/README.md @@ -20,6 +20,21 @@ Recovery cordons are given short 1hr leases. If Ananke cordons a node to repair The following are notes for future Brad. +## Bring-up dependencies + +Ananke should be one of the first things working. It does not need Harbor, Gitea, Longhorn, Grafana, or the apps to be healthy before it starts; those are often the mess it is there to sort out. + +It does need: + +- an Ananke host that came up on its own: usually `titan-db`, with the `tethys`/`titan-24` peer path as the backup +- `/etc/ananke/ananke.yaml`, the Ananke SSH key, and enough host config to reach nodes on the Atlas SSH port +- Kubernetes API access once the control plane is answering; before that it can only do host-side checks +- Flux CRDs/controllers and the `titan-iac` source once the API is up, because most startup gates are Flux-shaped +- basic node hygiene that Ananke cannot fake forever: SSH, sudo for managed repairs, sane clocks, and Longhorn host packages like `cryptsetup`, `open-iscsi`, `dmsetup`, and `nfs-common` +- NUT/UPS access if this is making real shutdown decisions instead of just doing startup recovery + +If this is a total bring-up, start Ananke after the host boots and before waiting on applications. If Ananke is not running, Atlas is missing the thing that knows the order of operations. + ## Daily commands ```bash