# Titan rpi4 Remote Replacement

This is the low-touch replacement flow for `titan-13` and `titan-19` when the
person onsite can only:

1. insert an SD card into the flashing machine
2. swap the card into the Pi
3. power-cycle the Pi

The remote operator does everything else.

## What the image does by itself

After the stale Kubernetes node object is deleted and the replacement image is
flashed, the booted Pi is expected to do the rest automatically:

- bring up SSH on port `2277`
- set the node hostname
- bring up the node's static `192.168.22.x` address on `end0`
- mount `/mnt/astreae` and `/mnt/asteria`
- start `open-iscsi`
- start `k3s-agent`
- rejoin the cluster with the baked-in node token and server URL

## Version clarification

As of **March 31, 2026**, the live cluster reports:

- control plane: `k3s v1.33.3+k3s1`
- healthy rpi4 Longhorn workers (`titan-15`, `titan-17`): `k3s v1.31.5+k3s1`

The `6.6.63` and `6.12.41` numbers are Linux kernel versions, not Kubernetes
versions.

Kubernetes' official version skew policy says a `kubelet` may be up to three
minor versions older than the `kube-apiserver`, so `1.31` workers against a
`1.33` control plane are supported today:

- https://kubernetes.io/releases/version-skew-policy/

The replacement images intentionally keep the rpi4 worker `k3s` version aligned
with the healthy HDD-backed rpi4 workers to avoid introducing a Kubernetes minor
change during node recovery.

## Remote flashing flow

Run these commands from the machine that has the `metis` repo and your SSH
access.

### 1. Build the image and delete the stale node object

```bash
cd ~/Development/metis
./scripts/prepare_titan_rpi4_replacement.sh titan-13 titan-22
./scripts/prepare_titan_rpi4_replacement.sh titan-19 titan-22
```

This does all of the following:

- fetches the current cluster node token from `titan-0a`
- deletes the stale Kubernetes `Node` object
- builds the replacement image under `artifacts/`
- copies it to `titan-22:/tmp/metis-images/`

### 2. Ask the onsite helper to insert the SD card into `titan-22`

When the card is inserted, identify the target device:

```bash
./scripts/remote_sd_candidates.sh titan-22
```

### 3. Flash the card remotely

```bash
./scripts/remote_flash_titan_image.sh titan-22 titan-13 /dev/sdX
./scripts/remote_flash_titan_image.sh titan-22 titan-19 /dev/sdY
```

The remote machine will ask for its `sudo` password during the flash.

### 4. Ask the onsite helper to swap the card and power-cycle the Pi

That should be the end of the onsite work.

### 5. Validate remotely

```bash
kubectl get nodes -w
kubectl -n longhorn-system get nodes.longhorn.io
kubectl -n longhorn-system get replicas.longhorn.io -o wide | grep 'titan-13\|titan-19'
ssh titan-13
ssh titan-19
```

## USB boot

Raspberry Pi 4 supports USB mass storage boot via its EEPROM bootloader:

- https://www.raspberrypi.com/documentation/computers/raspberry-pi.html#usb-mass-storage-boot

That means the same general recovery image approach can be used on a USB device
instead of an SD card.

For this cluster, the safer rollout is:

1. first recover `titan-13` and `titan-19` to known-good SD cards
2. pilot USB boot on one non-critical rpi4
3. only then migrate the Longhorn HDD-backed rpi4s

USB boot is attractive for wear reduction, but it adds EEPROM boot-order,
adapter, and power-delivery variables. The emergency replacement process above
should stay SD-based until the USB path has been tested on your actual hardware.