metis/docs/titan-rpi4-recovery.md

90 lines
2.6 KiB
Markdown

# Titan rpi4 Longhorn Recovery
This flow is for `titan-13`, `titan-15`, `titan-17`, and `titan-19`.
## Why this works
- The replacement card is burned from a plain Armbian rpi4 image.
- Metis injects the original node identity, k3s config, SSH key, and Longhorn disk UUIDs.
- The image also carries a static NetworkManager profile for the node IP plus local `k3s` and `open-iscsi` payloads sourced from a healthy rpi4 Longhorn node.
- An Armbian first-boot hook finishes the host bootstrap automatically:
- enables SSH on port `2277`
- mounts `/mnt/astreae` and `/mnt/asteria`
- ensures the iSCSI initiator identity exists
- starts `open-iscsi`
- starts `k3s-agent`
- For this Armbian flow, the important recovery files live on the root partition; boot NoCloud files are optional and not required for node recovery.
## Before burning
For a same-name replacement, remove the old node object first so k3s can re-register the node cleanly.
```bash
kubectl delete node titan-13
kubectl delete node titan-19
```
Then export the live cluster join token:
```bash
export METIS_K3S_TOKEN="$(ssh titan-0a 'sudo cat /var/lib/rancher/k3s/server/node-token')"
export METIS_IMAGE_RPI4_ARMBIAN_LONGHORN="file://${HOME}/Downloads/Armbian_25.8.1_Rpi4b_noble_current_6.12.41.img"
```
## Burn commands
Inspect the merged config first:
```bash
go run ./cmd/metis config --inventory inventory.titan-rpi4.yaml --node titan-13
go run ./cmd/metis config --inventory inventory.titan-rpi4.yaml --node titan-19
```
If you want ready-to-flash artifacts before inserting SD cards, build them first:
```bash
go run ./cmd/metis image \
--inventory inventory.titan-rpi4.yaml \
--node titan-13 \
--cache "${HOME}/.cache/metis" \
--output artifacts/titan-13.img
go run ./cmd/metis image \
--inventory inventory.titan-rpi4.yaml \
--node titan-19 \
--cache "${HOME}/.cache/metis" \
--output artifacts/titan-19.img
```
Burn the cards:
```bash
sudo -E go run ./cmd/metis burn \
--inventory inventory.titan-rpi4.yaml \
--node titan-13 \
--device /dev/sdX \
--cache "${HOME}/.cache/metis" \
--auto-mount \
--yes
sudo -E go run ./cmd/metis burn \
--inventory inventory.titan-rpi4.yaml \
--node titan-19 \
--device /dev/sdY \
--cache "${HOME}/.cache/metis" \
--auto-mount \
--yes
```
## After boot
Because the hardware stays the same, the Pi should keep the same MAC address and reclaim the same DHCP reservation.
Validate:
```bash
kubectl get nodes | grep 'titan-13\|titan-19'
kubectl -n longhorn-system get nodes.longhorn.io
kubectl -n longhorn-system get replicas.longhorn.io -o wide | grep 'titan-13\|titan-19'
```