lesavka/AGENTS.md

21 KiB

Lesavka Agent Notes

A/V Sync Probe And Lip-Sync Validation Checklist

Context: Google Meet testing on 2026-04-30 showed audio roughly 8 seconds behind video even though internal client/server telemetry reported fresh uplink packets. Treat this as a product correctness failure, not a calibration issue. Do not resume blind lip-sync tuning until the probe can explain where delay appears.

Operating Principles

  • Avoid hard-resetting USB, UVC, UAC, display managers, or remote hosts unless the user explicitly approves it.
  • Prefer observation and reversible user-space probes before changing media pipelines.
  • Treat Tethys-only SSH/device inspection as a development luxury, not a production dependency.
  • Do not claim lip sync is fixed from internal telemetry alone; require end-to-end device-level evidence.
  • Keep this checklist updated as work lands.

Phase 1: Build The Probe

  • Create this tracked checklist in AGENTS.md.
  • Inventory existing client/src/sync_probe/ code and decide what can be reused.
    • Reuse the existing synthetic beacon in client/src/sync_probe/.
    • Reuse the existing Tethys capture harness in scripts/manual/run_upstream_av_sync.sh.
    • Reuse and extend lesavka-sync-analyze; current gap is structured evidence output, not capture generation.
  • Define the phase-1 output contract:
    • report.json
    • report.txt
    • per-event rows with event id, video time, audio time, skew, and confidence
    • pass/fail verdict using preferred/acceptable/catastrophic thresholds
  • Add a deterministic local sync beacon source:
    • video flash pattern with event identity or cadence
    • simultaneous audio click/beep
    • stable event schedule suitable for automated detection
  • Add a Tethys-side capture probe:
    • capture Lesavka UVC video device
    • capture Lesavka UAC microphone device
    • record enough raw evidence for debugging when detection fails
    • detect video flashes
    • detect audio clicks
    • pair events and compute skew
  • Add a runner that can launch or instruct the Tethys probe safely over SSH without rebooting or restarting the desktop.
  • Store probe artifacts under /tmp/lesavka-sync-probe-* by default.
  • Keep the probe usable without Google Meet first; Google Meet validation is a later application-level check.

Phase 2: Use Probe To Root-Cause Desync

  • Run probe through direct Lesavka UVC/UAC devices on Tethys.
    • First live run reached the devices but exposed analyzer/tooling gaps instead of a valid skew report.
    • Fixed the manual probe tunnel to preserve HTTPS/mTLS through SSH (LESAVKA_SERVER_SCHEME=https, LESAVKA_TLS_DOMAIN=lesavka-server).
    • Fixed analyzer handling for MJPEG captures whose FFprobe metadata over-reports frames versus decodable video frames.
  • Compare client-generated event times against Tethys-observed times.
    • The preserved Tethys capture had 323 decodable frames with constant brightness, so no video flash reached UVC.
    • Server logs show the probe entered a stale upstream session and dropped audio as ~326 seconds late.
  • Identify whether delay appears before server planning, at server UAC sink, at UVC helper, inside Tethys device capture, or inside browser/WebRTC.
    • Current root cause is server planning/session lifecycle, before UVC/UAC sink output.
    • A previous one-sided microphone session started at 2026-04-30T22:59:52Z; the new probe at 2026-05-01T00:57:08Z inherited its stale playout epoch.
  • Add diagnostics for whichever stage is hiding delay.
    • Existing server lifecycle/planning logs were enough to isolate this run; next gate should preserve these as structured artifacts.
  • Do not tune calibration offsets until gross backlog is ruled out.
    • No calibration offsets were changed during the stale-session investigation.
    • Current evidence points at lifecycle/session planning, not an offset problem.

Phase 3: Fix Lesavka With Evidence

  • If stale upstream lifecycle is confirmed, reset shared A/V timing anchors when a new stream replaces an existing owner.
    • Added a lifecycle guard so normal camera/microphone stream replacement clears stale shared timing anchors before re-pairing.
    • Kept soft microphone recovery intentionally separate so it supersedes the mic owner without disturbing an active healthy camera/shared clock.
    • Added regression coverage for stale timing-anchor replacement and soft microphone recovery preservation.
  • If UAC sink backlog is confirmed, make UAC output freshness-bounded.
  • If audio progress is marked too early, move/augment progress telemetry to reflect actual sink emission readiness.
  • If UVC and UAC are using incompatible freshness semantics, unify them behind one live-media policy.
  • If browser/WebRTC adds delay after devices are already synced, document the application boundary and add browser-specific mitigation or guidance.

Phase 4: Gate And Release Criteria

  • Add deterministic unit/integration tests for probe analysis logic.
  • Add a hardware-in-the-loop/manual gate artifact schema for real Tethys probe runs.
  • Update scripts/ci/media_reliability_gate.sh to report probe evidence when present.
    • Gate now reads LESAVKA_SYNC_PROBE_REPORT_JSON, LESAVKA_SYNC_PROBE_REPORT_DIR, or target/media-reliability-gate/sync-probe/report.json.
    • Gate emits sync-probe verdict/check metrics, skew metrics, event counts, and a verdict info metric.
  • Require a fresh probe report before declaring lip sync fixed.
    • Gate now supports LESAVKA_REQUIRE_SYNC_PROBE=1, which fails media reliability when a valid passing probe report is absent.
    • Product/release judgment still requires a new live Theia/Tethys probe after the lifecycle fix is installed.
  • Suggested thresholds:
    • preferred: p95 skew <= 35 ms
    • acceptable: p95 skew <= 80 ms
    • gross failure: sustained skew > 250 ms
    • catastrophic failure: any sustained skew near or above 1000 ms

Open Questions

  • Decide whether the phase-1 beacon should run as a separate binary, a hidden client mode, or both.
  • Decide whether Tethys probe should be Rust-only, shell plus GStreamer, or a hybrid.
  • Confirm whether sudo/Vault access is available for installing missing probe dependencies on Theia/Tethys.
    • Non-sudo server journal inspection worked; noninteractive sudo over SSH still needs an explicit TTY/password path.

Validation Evidence

  • cargo test -p lesavka_server upstream_media_runtime::tests::lifecycle
  • cargo test -p lesavka_client sync_probe::analyze
  • cargo test -p lesavka_testing upstream_sync_script_tunnels_auto_server_addr_through_ssh
  • bash -n scripts/ci/media_reliability_gate.sh
  • cargo test -p lesavka_testing media_reliability_gate_reports_direct_sync_probe_evidence
  • LESAVKA_REQUIRE_SYNC_PROBE=1 ./scripts/ci/media_reliability_gate.sh
    • Used a synthetic passing report at target/media-reliability-gate/sync-probe/report.json to verify gate parsing/enforcement.
    • This validates CI glue only; a real Theia/Tethys probe is still required for product judgment.

Real Upstream Lip-Sync Fix Checklist

Context: the mirrored browser probe finally reproduced the real failure class on 2026-05-01: activity_start_delta_ms=+9591.1. This means the end-to-end browser-visible path can still start video far ahead of audio. The fix target is not silence in the logs; it is a freshness-first A/V uplink whose startup can heal briefly but cannot drift into seconds of skew.

Acceptance Criteria

  • Mirrored browser probe passes with activity_start_delta_ms <= 1000.
  • Steady-state preferred sync: median skew within 35 ms.
  • Steady-state acceptable sync: p95 absolute skew within 80 ms.
  • Any sustained or startup A/V split near 1000 ms remains a hard failure.
  • No stale audio backlog is ever drained into UAC to catch up.
  • No stale video backlog is ever drained into UVC to catch up.
  • Google Meet manual testing agrees with the mirrored probe instead of revealing hidden seconds-scale skew.

Phase 0: Keep The Probe Honest

  • Split raw activity-start fields from filtered/coded paired-pulse fields in probe reports.
  • Print explicit raw first-video and first-audio timestamps in report.txt.
  • Root-cause the 0.16.17 raw_first_video_activity_s=0.000 artifact as the mirrored probe counting its own bright pre-start positioning card.
  • Make the mirrored stimulus pre-start screen dark/dim so only real flash pulses can be detected as video activity.
  • Add analyzer coverage proving dim pre-start positioning frames are ignored.
  • Replace generic light/dark mirrored flashes with color-coded event IDs.
  • Make mirrored audio pulses unique by the same event ID via pulse width plus tone frequency.
  • Teach the analyzer to decode mirrored video event IDs from color, not grayscale brightness.
  • Tighten real-camera color matching after 0.16.18 accepted washed-out brown/gray remnants as red/yellow events.
  • Preserve raw activity-start timing before cadence cleanup in coded reports.
  • Merge short audio envelope dropouts inside one coded pulse so a single tone burst cannot become two fake events.
  • Add diagnostic coded-pair correlation so stable large skew reports as measured failure instead of not enough pairs.
  • Make coded mirrored verdicts/calibration use matched coded pulses as authority; raw activity-start deltas are reported separately unless they agree with the coded pairs.
  • Print unpaired video/audio onsets in the human report so missed coded pulses are visible during probe triage.
  • Keep the mirrored browser probe as the release/blocking upstream A/V gate.
  • Keep the old raw-device probe as a lower-level diagnostic only.

Phase 1: Stop One-Sided Startup Drift

  • Default upstream planning must require both camera and microphone before live playout.
  • One-sided playout may only happen through an explicit compatibility override.
  • While pairing is overdue, keep replacing the waiting-side anchor with fresh packets instead of preserving stale startup anchors.
  • While awaiting the peer stream, keep only fresh pending camera packets.
  • While awaiting the peer stream, keep only fresh pending microphone packets.
  • Send the latest camera packet from the client uplink queue instead of draining old-but-not-yet-stale video backlog.
  • Add tests proving the pairing window no longer expires into one-sided playout by default.
  • Add tests proving the explicit one-sided override still works for intentional single-stream scenarios.

Phase 2: Bound UAC Freshness

  • Configure UAC appsrc as non-blocking and bounded.
  • Log and drop UAC appsrc push failures instead of treating enqueue as guaranteed playback.
  • Raise calibration offset limits enough to cover the measured MJPEG/UVC path delta without rejecting probe corrections.
  • Update the MJPEG/UVC factory audio baseline from the old -45ms/+720ms values to +1260ms as the mirrored probe exposes the fresh UAC-vs-UVC path delta.
  • Migrate untouched legacy -45ms factory/env calibration files on load so old installs actually receive the new baseline.
  • Make the video/audio-master wait offset-aware so a positive audio playout delay does not freeze UVC video while UAC sleeps before emission.
  • Flush/stop UAC cleanly on session close, replacement, and recovery.
  • Add tests or contract coverage for bounded UAC settings where practical.

Phase 3: Add Real Timing Evidence

  • Add server timing counters for first camera packet, first mic packet, first UVC write, and first UAC push per session.
  • Add dropped-stale audio/video counters to diagnostics.
  • Add a concise health explanation when startup pairing exceeds the healing window.
  • Surface Starting, Healing, Flowing, Lagging, Dropping, and Stale states in chips/diagnostics from real path evidence.

Phase 4: Recovery And Mid-Session Changes

  • Make device changes trigger soft-pause, stream replacement, queue flush, and re-pairing.
  • Keep recovery soft-first; reserve hard UVC/UAC gadget rebuilds for explicit guarded recoveries.
  • Add cooldown/state guards so recovery buttons cannot wedge Theia.
  • Ensure disconnect closes all client/server media tasks for the session.

Phase 5: Verification Loop

  • Run focused upstream runtime tests.
  • Run server/client media contract tests.
  • Run cargo check for touched packages.
  • Bump version for the fix release.
  • Run the mirrored browser probe on installed client/server.
    • 0.16.17 still failed: reported activity_start_delta_ms=+6735.0, but raw_first_video_activity_s=0.000 exposed a probe false-positive from the pre-start screen. Paired pulses still showed real steady-state skew (p95=411.8 ms, median=-99.0 ms), so the product remains unfixed.
    • 0.16.18 captured real colored/audio-coded events but the analyzer still bailed with need at least 3 matching coded pulse pairs; saw 1. Replaying that artifact after analyzer hardening now reports gross_failure: 16/16 coded pairs, p95 775.7 ms, activity start -766.4 ms, and drift -2.8 ms; the failure is stable audio-ahead/video-late skew, not random detector noise.
    • 0.16.19 changes the shipped MJPEG/UVC audio playout baseline to +720ms; the next mirrored browser probe should move the measured median from about -766ms toward roughly -46ms before fine calibration.
    • 0.16.19 mirrored browser probe did not move the measured skew: p95 885.7 ms, median -788.4 ms, activity start -659.1 ms, drift -81.2 ms. SSH inspection showed Theia was on commit c348597, but /etc/lesavka/server.env still contained LESAVKA_UPSTREAM_AUDIO_PLAYOUT_OFFSET_US=-45000; the new +720ms baseline was not actually installed. Patch the installer to migrate leaked legacy ambient -45000 to +720000 unless LESAVKA_INSTALL_UPSTREAM_AUDIO_PLAYOUT_OFFSET_US explicitly asks for the legacy value.
    • 0.16.20 installed the +720ms offset (/etc/lesavka/server.env had LESAVKA_UPSTREAM_AUDIO_PLAYOUT_OFFSET_US=720000), but the mirrored browser capture contained no recognizable color pulses. Theia server logs showed repeated upstream video frame dropped because the audio master never caught up inside the pairing window; UVC was effectively starved by the positive audio delay instead of flowing delayed-but-fresh frames.
    • 0.16.21 makes that wait offset-aware and adds a regression test proving a configured positive audio delay does not freeze UVC video while UAC sleeps before playout.
    • Replaying the 0.16.21 artifact after 0.16.22 analyzer hardening changes the verdict from false catastrophic_failure to gross_failure: p95 273.8 ms, median -188.4 ms, 7 paired coded pulses. The raw activity-start delta (-3620.7 ms) is still printed, but it is ignored for verdict/calibration because it disagrees with coded pairs by 3432.3 ms; unpaired video/audio onsets are printed for triage.
    • 0.16.22 live mirrored run still failed with p95 433.7 ms, median -359.4 ms, and 5 paired coded pulses. Client telemetry showed camera uplink latest_age_ms repeatedly around 300-350 ms, matching the measured skew; patch 0.16.23 to make video queues latest-only instead of draining stale-but-under-budget backlog.
    • 0.16.23 local validation passed for fresh-queue behavior, uplink/probe freshness contracts, sync analyzer tests, client/server binary checks, and whitespace checks.
    • 0.16.23 live mirrored run improved to p95 215.2 ms, median +142.2 ms, 13 paired coded pulses, and raw activity alignment within 6.6 ms of coded pairs. Patch 0.16.24 makes the probe print local client and remote server versions before capture so every run records what was actually tested.
    • 0.16.24 live mirrored run improved again to p95 168.4 ms, median -19.1 ms, 11 paired coded pulses, but still failed because individual paired pulses bounced between about -168 ms and +45 ms. Client logs showed the microphone uplink queue still accumulating depth 16; patch 0.16.25 makes microphone uplink queues latest-only too so stale audio PTS cannot continue acting as the server timing master under backpressure.
    • 0.16.25 removed the client mic backlog but exposed a stable hardware/browser path delta: p95 557.3 ms, median -540.5 ms, drift +9.0 ms, and fresh mic delivery ages around 2-10 ms. Patch 0.16.26 raises the MJPEG/UVC factory audio delay to +1260 ms and expands the calibration clamp so this stable offset can actually be corrected instead of rejected.
  • Re-run the mirrored browser probe after the pre-start false-positive fix.
  • Run Google Meet manual validation.

0.17.0 Tyrannical Upstream Playout Checklist

Context: 0.16.x proved that queue tweaks and static calibration cannot guarantee lip sync. 0.17.0 changes the upstream contract: the server planner is authoritative, audio is the master, video follows by timestamp or freezes, and freshness wins over smooth-but-wrong playback.

Hard Product Invariants

  • No normal live upstream playout may be more than 1000 ms behind the freshest known client capture frontier.

  • Video may not advance outside the audio playhead sync budget.

  • Audio should be continuous when possible, but stale audio must be dropped/skipped rather than drained.

  • Missing or late video should freeze/stutter instead of pulling audio backward.

  • Startup may be ugly, but must either enter fresh synced live mode or declare failure within 60 seconds.

  • Healing may be visible, but must prevent persistent seconds-scale skew.

  • Calibration may fine-tune sub-frame offsets only; it must not be required to rescue seconds-scale desync.

  • Bump Lesavka to 0.17.0 because this is a media-contract change, not a patch tune.

  • Add planner policy config: max live lag, max skew, startup timeout, target playout delay, and healing cooldown.

  • Reset 0.17 defaults so shipped audio/video offsets do not intentionally exceed the freshness budget.

  • Track latest camera/audio input timestamps in the server planner.

  • Track actual planned/emitted audio and video playheads.

  • Enforce audio as master: stale audio is dropped/skipped; it does not drain backlog.

  • Enforce video follower behavior: frames ahead of audio wait; frames too far behind audio are dropped so the previous frame freezes.

  • Re-anchor continuously when the live playhead falls outside the freshness budget, not only once at startup.

  • Keep startup paired-only by default and fail visibly after the startup timeout.

  • Add planner phases and counters to diagnostics/logs: acquiring, syncing, live, healing, failed; stale drops, skew drops, freshness heals, video freezes.

  • Keep UVC/UAC sinks as dumb consumers of planner-approved packets.

  • Update tests to prove stale media cannot be emitted and video cannot outrun/lag audio beyond policy.

  • Update manual/probe diagnostics so 0.17 reports the planner state being tested.

Validation Targets Before Human Test

  • Unit tests for startup pairing, stale audio drop, stale video drop, reanchor, startup timeout, audio-master/video-follower rules.
  • Contract tests for installer defaults and version reporting.
  • cargo check -p lesavka_client -p lesavka_server --bins.
  • Focused lesavka_testing media/runtime contracts.
  • Only after all of the above, run the mirrored browser probe.

Progress Log

  • 2026-05-01: Added 0.17 planner defaults (350ms target playout, 1000ms max live lag, 60000ms startup timeout, 80000us pair slack), reset MJPEG audio factory offset to 0, and migrated old -45ms, +720ms, and +1260ms untouched baselines.
  • 2026-05-01: Server planner now tracks latest input frontier, presented audio/video playheads, phase, stale drops, skew drops, reanchors, startup timeouts, and freezes.
  • 2026-05-01: Runtime tests green for stale audio drop, stale video drop, audio-master/video-follower freeze, repeated reanchor, paired startup timeout, and planner snapshot basics: cargo test -p lesavka_server upstream_media_runtime::tests -- --nocapture.
  • 2026-05-01: Added GetUpstreamSync RPC, lesavka-relayctl upstream-sync, launcher diagnostics text, and mirrored-probe before/after planner snapshots so 0.17 probe runs report the exact planner state under test.
  • 2026-05-01: Validation green: cargo test -p lesavka_server --lib --bins, cargo test -p lesavka_testing, cargo test -p lesavka_client --bins --lib, and targeted installer/RPC/layout contracts.
  • 2026-05-01: First installed 0.17.0 mirrored browser probe on client/server commit 3920e0a failed honestly: planner reported fresh live state (live_lag_ms=10, skew_ms=+20.7) but browser-observed paired pulses showed audio late by median +349.1ms, p95 429.1ms, with 6 video freezes/skew drops. Replayed artifact after analyzer hardening now reports gross_failure instead of false raw-start catastrophic_failure.
  • 2026-05-01: Patch follow-up models the observed MJPEG/UVC browser egress delta by defaulting video playout offset to +350ms and preserving the 1s freshness ceiling. Raw activity-start evidence is now ignored for verdict/calibration when it disagrees with paired pulses that are already failing directly. Existing early-0.17 audio=0/video=0 factory/env calibration files migrate to the new video=+350ms default on load.