lesavka/docs/hevc-upstream-plan.md

17 KiB

HEVC upstream implementation checklist

This is the working checklist for moving Lesavka upstream media from MJPEG-only transport toward HEVC/H.265 video transport while preserving the already-calibrated MJPEG server-to-RCT path.

Goals

  • Keep the existing MJPEG ingress and MJPEG UVC output calibration valid.
  • Add first-class HEVC ingress on the server, decoded to the existing MJPEG UVC output path.
  • Calibrate server-to-RCT delays per ingress profile and UVC mode.
  • Send client-origin synthetic A/V as bundled audio plus HEVC video through the same handoff used by real capture.
  • Measure final RCT sync, freshness, and smoothness before adding deeper introspection.

Safety constraints

  • Do not reboot Theia unless SSH/service recovery cannot be achieved any other way.
  • If Theia must be rebooted, wait for it to come back online and resume from the last artifact-backed checkpoint.
  • Keep server-to-RCT measurement tooling intact except for additive HEVC/profile support.
  • Do not overwrite the MJPEG static delay profile with HEVC values.
  • Bump package versions before any push that contains code changes.

Access and automation

  • Theia helper supports deploy, restart, status, hevc-prereqs, and reconfigure MODE [hevc|mjpeg].
  • Passwordless sudo works for /usr/local/sbin/lesavka-dev-install.
  • reconfigure 1920x1080@30 hevc was verified with all services active.
  • reconfigure 1280x720@20 hevc rebuilt descriptors; Tethys saw 1280x720@20 MJPEG.
  • Patch local lesavka-core.sh so slow udevadm control --reload becomes a warning instead of aborting reconfigure.
  • Deploy the hardened lesavka-core.sh to Theia once SSH completes banner exchange again.

Server-to-RCT HEVC calibration

  • 1920x1080@30 HEVC ingress was measured ready in /tmp/lesavka-server-rc-mode-matrix-20260507-033941.
  • 1920x1080@30 candidate: video 127952us, audio 0us, p95 max 5.3ms, freshness budget max 252.3ms.
  • Initial failures for 720p modes were traced to capture budget, not media failure: coded events started around 45s into a 52s capture.
  • Longer HEVC calibration budget proof: CAPTURE_SECONDS=90, PROBE_TIMEOUT_SECONDS=90.
  • 1280x720@20 proof run ready in /tmp/lesavka-server-rc-mode-matrix-20260507-050819: video 143741us, audio 0us, p95 9.2ms, freshness 257.8ms, smoothness clean.
  • 1280x720@30 repeated evidence in /tmp/lesavka-server-rc-mode-matrix-20260507-051510: candidates around 129603us, 135090us, and 142615us, all preferred with 15/16 coded pairs.
  • 1280x720@20 and 1280x720@30 completed a post-recovery 3-run HEVC static matrix in /tmp/lesavka-hevc-continuation-20260509-004310/server-rct-hevc-720p-static/lesavka-server-rc-mode-matrix-20260509-005114.
  • 1280x720@20 selected video 173852us, audio 0us; p95 max 14.1ms, median abs max 11.2ms, freshness budget max 309.0ms.
  • 1280x720@30 selected video 145695us, audio 0us; p95 max 27.8ms, median abs max 18.5ms, freshness budget max 309.4ms.
  • Re-run 1920x1080@20 only after a fresh explicit go/no-go decision; this mode is quarantined because the last attempt preceded a Theia outage.
  • Run final all-safe-mode HEVC sanity matrix with LESAVKA_SERVER_RC_TUNE_DELAYS=0; avoid 1920x1080@20 until the quarantine is lifted.

Local artifact consolidation while Theia is offline

These notes are from local artifact review on 2026-05-07. They are useful for choosing the next post-reboot run, but only completed matrix summaries should be treated as final static calibration.

Artifact Mode Result
/tmp/lesavka-server-rc-mode-matrix-20260507-033941 1920x1080@30 Completed 3/3 static run, ready at video 127952us, audio 0us; p95 max 5.3ms, median abs max 4.1ms, freshness budget max 252.3ms.
/tmp/lesavka-server-rc-mode-matrix-20260507-050819 1280x720@20 Single-run proof, ready only because min_runs=1; video 143741us, audio 0us; p95 9.2ms, median abs 6.4ms, freshness 257.8ms.
/tmp/lesavka-server-rc-mode-matrix-20260507-051510 1280x720@20 Interrupted before matrix summary. Per-probe reports show preferred confirmations at 190716us, 160769us, and 153680us; the first confirmation still had median -31.0ms, so rerun this mode before locking it.
/tmp/lesavka-server-rc-mode-matrix-20260507-051510 1280x720@30 Interrupted before matrix summary. Per-probe reports show preferred confirmations at 129603us, 135090us, and 142615us, all with 15/16 paired signatures and p95 between 3.0ms and 11.4ms; 135090us remains a sensible candidate pending a completed static summary.

Post-reboot priority order:

  1. Deploy the local lesavka-core.sh udev timeout hardening.
  2. Treat the completed 2026-05-09 720p static matrix as the source of truth for 720p HEVC values: 1280x720@20=173852, 1280x720@30=145695.
  3. Keep 1920x1080@30=127952 from the completed 3-run static matrix.
  4. Only after an explicit go/no-go, re-run 1920x1080@20 HEVC with explicit mode selection and tighter watchdogs.
  5. Run final safe-mode HEVC sanity with tuning disabled for validated modes; avoid 1920x1080@20 until the quarantine is lifted.

Code work

  • Add server HEVC decode path to MJPEG UVC output.
  • Add client HEVC capture/encode selection.
  • Add synthetic HEVC probe frame encoding.
  • Add mjpeg-cfr RCT capture mode to avoid MJPEG timestamp compression artifacts.
  • Make LESAVKA_CORE_ONESHOT=1 request a descriptor rebuild in lesavka-core.sh.
  • Harden lesavka-core.sh so slow udev reload/trigger commands do not abort gadget rebuild.
  • Add HEVC profile defaults to run_server_to_rc_mode_matrix.sh: longer capture/probe timeout and separate delay map.
  • Stamp LESAVKA_CALIBRATION_PROFILE, LESAVKA_UPLINK_CAMERA_CODEC, and profile-specific delay maps during matrix runtime reconfigure.
  • Add profile-specific factory calibration maps in server calibration without changing MJPEG defaults.
  • Add installer env support for separate MJPEG and HEVC delay maps.
  • Add tests for profile selection, install defaults, and matrix HEVC defaults.
  • Bump local package versions to 0.21.1 for the initial HEVC profile/defaults work.
  • Bump local package versions to 0.21.3 for the 720p HEVC static calibration defaults.
  • Bump local package versions to 0.21.4 for HEVC decoded-MJPEG spool pacing.
  • Bump local package versions to 0.21.5 for synthetic HEVC probe GOP parity with live camera uplink.
  • Bump local package versions to 0.21.6 for encoded probe packets using encoder output PTS.
  • Bump local package versions to 0.21.7 for optional UVC spool metadata fetches and stricter local HEVC bundle identity proof.
  • Verify local HEVC/profile contracts: cargo test -q -p lesavka_server calibration::tests, cargo test -q --test client_server_rc_matrix_script_contract --test server_install_script_contract, cargo test -q -p lesavka_server hevc_probe_frame_encoder_builds_when_x265_is_available, and cargo test -q --test server_upstream_media_v2_handoff_contract --test client_rct_transport_probe_contract.
  • Verify broad local baseline after the HEVC/profile work: cargo test --workspace --all-targets.

Client-to-server-to-RCT transport

  • Confirm at code/contract level that client synthetic media uses the same bundled RPC shape as real capture handoff.
  • Confirm at code/contract level that the client sends audio and negotiated HEVC video as bundled media units.
  • Confirm at code/contract level that the server receives each bundle and queues audio/video into the post-transport UAC/UVC handoff path.
  • Add unattended start-delay support to the client-to-RCT transport probe.
  • Add local HEVC bundle audit plus freshness-biased jitter stress so outgoing synthetic media proves 16/16 coded events and event codes 1..16 before hardware is involved.
  • Add client-to-RCT summary layer attribution for client-local bundle age versus post-send-to-RCT freshness.
  • Add optional UVC spool-boundary metadata fetch/summarization so failed blind runs can distinguish server decode/spool loss from final RCT capture loss.
  • Run blind client-to-RCT HEVC transport probe.
  • Evaluate coded-pair completeness, sync p95/median, freshness budget, smoothness, and transport lag.
  • Add deeper ingress/queue/decode instrumentation only if the blind final RCT result fails.

Deferred downstream/input latency follow-up

Bring this section back up after upstream media is fully optimized and the blind HEVC client-to-server-to-RCT route is consistently healthy. The goal is to minimize the loop from local input to visible downstream evidence:

local input -> server HID write -> RCT response -> capture-card H.264 -> client display

Current downstream facts:

  • Real downstream eye video is H.264 byte-stream pass-through on the server: capture-card v4l2src emits video/x-h264, the server parses it, and the client decodes it.
  • The normal downstream path does not decode/re-encode on the server.
  • Testsrc-only downstream still uses x264enc so the hardware-free contract can prove H.264 packet shape, Annex-B framing, IDR recovery, and timing.
  • HID/input has deterministic freshness/routing/recovery contracts, but no measured end-to-end HID latency probe yet.

Follow-up optimization plan:

  1. Add a T0-T5 downstream/input latency probe: T0 local synthetic input generated, T1 server RPC receive, T2 server HID write, T3 visible RCT response, T4 downstream capture-card frame observed by the server, T5 client display handoff.
  2. Reuse existing probe patterns instead of inventing new infrastructure: client timelines from lesavka-sync-probe, server timing sidecars from UVC metadata work, final capture analysis from sync/freshness tooling, and local performance/input gates for deterministic checks.
  3. Tune downstream buffering after measurement: eye queue depth, appsink depth, client appsrc queue depth, leaky/drop policy, and first-frame/stall watchdogs.
  4. Check capture-card H.264 controls for low-latency settings: GOP/keyframe cadence, bitrate, buffering, and whether the card exposes any hardware latency knobs through V4L2 controls.
  5. Prefer the fastest reliable client H.264 decoder available on the local host; keep sync=false display sinks and verify queue depths are freshness-biased.
  6. Add a focused HID latency measurement, not just reliability tests, so mouse and keyboard can be optimized by observed numbers instead of feel.
  7. Only consider downstream HEVC or alternate transport after the H.264 pass-through path is measured; H.264 pass-through is already cheap on Theia, so the likely wins are buffering, decoder choice, and measurement-guided recovery.

Optional UVC spool-boundary metadata

The client-to-RCT probe remains non-mutating by default. If a server has already been configured to append UVC spool metadata, the probe can fetch that JSONL and write a local summary beside the final RCT capture artifacts:

env LESAVKA_CLIENT_RCT_UVC_FRAME_META_LOG_REMOTE=/tmp/lesavka-uvc-frame-meta.jsonl \
    ./scripts/manual/run_client_to_rct_transport_probe.sh

This does not enable server metadata by itself. It only copies and summarizes a pre-existing server artifact. The useful comparison is:

  • client-transport-timeline.json: what the client generated and bundled.
  • uvc-frame-meta-summary.txt: what reached the server's MJPEG UVC spool.
  • report.txt and client-rct-transport-summary.txt: what the RCT finally observed.

If the spool summary has 16/16 synthetic events and the RCT report does not, the loss is after server decode/spool. If both are incomplete, the next debugging target is client encoding, bundled transport, or server ingest/decode.

Remote safety posture

  • The 2026-05-08 moonshot run is recorded in /tmp/lesavka-hevc-moonshot-latest/report.log.
  • Local validation is green after the all-mode bundle audit hardening: hygiene_gate.sh, quality_gate.sh, and a release build all passed.
  • No reboot or destructive recovery should be attempted by automation.
  • The last hardware run survived the 720p HEVC calibration repeats, then failed during 1920x1080@20 signal conditioning with zero paired events before the host became unreachable.
  • Treat 1920x1080@20 HEVC as quarantined until Theia is back and a low-risk service-only status pass completes. Do not resume that mode as the first remote action.
  • Next remote step once Theia recovers: run status-only checks, deploy the latest binaries if needed, prove a known-safe mode such as 1280x720@30, and only then decide whether to reattempt 1920x1080@20 with tighter watchdogs.

2026-05-08 partial HEVC calibration evidence

These are not final static values, but they are useful breadcrumbs from the interrupted /tmp/lesavka-hevc-moonshot-* matrix:

Mode Evidence Current interpretation
1280x720@30 Three usable tuned confirmations: 150731us, 140349us, and 135090us; p95 stayed in the preferred band with 15 paired coded events per run. Existing 135090us remains acceptable; completed evidence centers closer to 136171-140349us, so do not retune blindly without a final all-mode sanity run.
1280x720@20 Two strict-eligible tuned confirmations around 184870us and 180995us; the third run (159923us) had median/p95 too high for static selection. Needs one more stable completed matrix before baking; likely higher than the older 153680us seed, but not locked.
1920x1080@20 Signal conditioning timed out and produced 0 paired events; host became unreachable afterward. Blocked/risky. Re-enter with status-only checks and a low-risk mode before attempting this again.

2026-05-09 post-recovery 720p HEVC static decision

Mode Selected video delay Evidence Interpretation
1280x720@20 173852us 3/3 static-ready runs, p95 max 14.1ms, median abs max 11.2ms, freshness budget max 309.0ms; follow-up no-tune sanity still passed sync/freshness/smoothness but had one median sample at 21.8ms. Bake this as the 720p20 HEVC default, but keep an eye on median spread in later end-to-end runs.
1280x720@30 145695us 3/3 static-ready runs, p95 max 27.8ms, median abs max 18.5ms, freshness budget max 309.4ms; follow-up no-tune sanity passed with p95 13.7ms. Bake this as the 720p30 HEVC default.

Resume commands

Run these from /home/brad/Development/lesavka after ssh theia 'date -Is' succeeds:

./scripts/manual/run_local_hevc_bundle_audit.sh
./scripts/manual/run_local_hevc_encoder_preflight.sh

The local audit writes a passwordless preflight manifest proving that the client synthetic source is producing one HEVC+PCM bundle train with 16/16 coded video events, event codes 1..16, and nearby audio before any server/RCT hardware is involved. It also runs a deterministic jitter stress case that drops stale events as complete A/V bundles while preserving the analyzer's 13-pair evidence floor.

The encoder preflight is also local-only. It runs the supported 1280x720@20, 1280x720@30, 1920x1080@20, and 1920x1080@30 modes through the available GStreamer HEVC encoder and records whether each mode can produce Annex-B HEVC faster than realtime before remote transport is involved.

Use the re-entry helper for status-only checks first:

./scripts/manual/run_hevc_remote_reentry_check.sh

Or run the post-reboot sequence helper when we are ready for the full unattended runway. It performs the local HEVC preflights, waits for Theia, syncs/builds/ deploys/reconfigures through passwordless lesavka-dev-install, and starts the pending HEVC static calibration matrix:

./scripts/manual/run_hevc_post_reboot_sequence.sh

Then use the same helper for the full no-password loop after the status check is green:

env LESAVKA_HEVC_REENTRY_SYNC=1 \
    LESAVKA_HEVC_REENTRY_BUILD=1 \
    LESAVKA_HEVC_REENTRY_DEPLOY=1 \
    LESAVKA_HEVC_REENTRY_RECONFIGURE=1 \
    LESAVKA_HEVC_REENTRY_WAIT_SECONDS=900 \
    LESAVKA_HEVC_REENTRY_MODE='1280x720@30' \
    ./scripts/manual/run_hevc_remote_reentry_check.sh

Equivalent manual commands are:

rsync -az --exclude target --exclude .git ./ theia:/home/theia/Development/lesavka-codex/
ssh theia 'cd /home/theia/Development/lesavka-codex && cargo build --release --bin lesavka-server --bin lesavka-uvc && sudo -n /usr/local/sbin/lesavka-dev-install deploy'
ssh theia 'sudo -n /usr/local/sbin/lesavka-dev-install reconfigure 1280x720@30 hevc'

Then resume the hardware matrix:

env REMOTE_PULSE_CAPTURE_TOOL=gst \
    REMOTE_PULSE_VIDEO_MODE=mjpeg-cfr \
    LESAVKA_SERVER_RC_PROFILE=hevc \
    LESAVKA_SERVER_RC_MODES='1920x1080@20' \
    LESAVKA_SERVER_RC_REPEAT_COUNT=3 \
    LESAVKA_SERVER_RC_STATIC_MIN_RUNS=3 \
    LESAVKA_SERVER_RC_VERBOSE_PROBES=0 \
    LESAVKA_SERVER_RC_RECONFIGURE=1 \
    LESAVKA_SERVER_RC_RECONFIGURE_COMMAND='ssh theia sudo -n /usr/local/sbin/lesavka-dev-install reconfigure "$LESAVKA_MODE" hevc' \
    CAPTURE_SECONDS=90 \
    PROBE_TIMEOUT_SECONDS=160 \
    PROBE_DURATION_SECONDS=20 \
    PROBE_WARMUP_SECONDS=4 \
    ./scripts/manual/run_server_to_rc_mode_matrix.sh