docs: note ananke bring-up dependencies
This commit is contained in:
parent
09409660a2
commit
9a741def21
15
README.md
15
README.md
@ -20,6 +20,21 @@ Recovery cordons are given short 1hr leases. If Ananke cordons a node to repair
|
|||||||
|
|
||||||
The following are notes for future Brad.
|
The following are notes for future Brad.
|
||||||
|
|
||||||
|
## Bring-up dependencies
|
||||||
|
|
||||||
|
Ananke should be one of the first things working. It does not need Harbor, Gitea, Longhorn, Grafana, or the apps to be healthy before it starts; those are often the mess it is there to sort out.
|
||||||
|
|
||||||
|
It does need:
|
||||||
|
|
||||||
|
- an Ananke host that came up on its own: usually `titan-db`, with the `tethys`/`titan-24` peer path as the backup
|
||||||
|
- `/etc/ananke/ananke.yaml`, the Ananke SSH key, and enough host config to reach nodes on the Atlas SSH port
|
||||||
|
- Kubernetes API access once the control plane is answering; before that it can only do host-side checks
|
||||||
|
- Flux CRDs/controllers and the `titan-iac` source once the API is up, because most startup gates are Flux-shaped
|
||||||
|
- basic node hygiene that Ananke cannot fake forever: SSH, sudo for managed repairs, sane clocks, and Longhorn host packages like `cryptsetup`, `open-iscsi`, `dmsetup`, and `nfs-common`
|
||||||
|
- NUT/UPS access if this is making real shutdown decisions instead of just doing startup recovery
|
||||||
|
|
||||||
|
If this is a total bring-up, start Ananke after the host boots and before waiting on applications. If Ananke is not running, Atlas is missing the thing that knows the order of operations.
|
||||||
|
|
||||||
## Daily commands
|
## Daily commands
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user