lesavka/AGENTS.md

15 KiB

Lesavka Agent Notes

A/V Sync Probe And Lip-Sync Validation Checklist

Context: Google Meet testing on 2026-04-30 showed audio roughly 8 seconds behind video even though internal client/server telemetry reported fresh uplink packets. Treat this as a product correctness failure, not a calibration issue. Do not resume blind lip-sync tuning until the probe can explain where delay appears.

Operating Principles

  • Avoid hard-resetting USB, UVC, UAC, display managers, or remote hosts unless the user explicitly approves it.
  • Prefer observation and reversible user-space probes before changing media pipelines.
  • Treat Tethys-only SSH/device inspection as a development luxury, not a production dependency.
  • Do not claim lip sync is fixed from internal telemetry alone; require end-to-end device-level evidence.
  • Keep this checklist updated as work lands.

Phase 1: Build The Probe

  • Create this tracked checklist in AGENTS.md.
  • Inventory existing client/src/sync_probe/ code and decide what can be reused.
    • Reuse the existing synthetic beacon in client/src/sync_probe/.
    • Reuse the existing Tethys capture harness in scripts/manual/run_upstream_av_sync.sh.
    • Reuse and extend lesavka-sync-analyze; current gap is structured evidence output, not capture generation.
  • Define the phase-1 output contract:
    • report.json
    • report.txt
    • per-event rows with event id, video time, audio time, skew, and confidence
    • pass/fail verdict using preferred/acceptable/catastrophic thresholds
  • Add a deterministic local sync beacon source:
    • video flash pattern with event identity or cadence
    • simultaneous audio click/beep
    • stable event schedule suitable for automated detection
  • Add a Tethys-side capture probe:
    • capture Lesavka UVC video device
    • capture Lesavka UAC microphone device
    • record enough raw evidence for debugging when detection fails
    • detect video flashes
    • detect audio clicks
    • pair events and compute skew
  • Add a runner that can launch or instruct the Tethys probe safely over SSH without rebooting or restarting the desktop.
  • Store probe artifacts under /tmp/lesavka-sync-probe-* by default.
  • Keep the probe usable without Google Meet first; Google Meet validation is a later application-level check.

Phase 2: Use Probe To Root-Cause Desync

  • Run probe through direct Lesavka UVC/UAC devices on Tethys.
    • First live run reached the devices but exposed analyzer/tooling gaps instead of a valid skew report.
    • Fixed the manual probe tunnel to preserve HTTPS/mTLS through SSH (LESAVKA_SERVER_SCHEME=https, LESAVKA_TLS_DOMAIN=lesavka-server).
    • Fixed analyzer handling for MJPEG captures whose FFprobe metadata over-reports frames versus decodable video frames.
  • Compare client-generated event times against Tethys-observed times.
    • The preserved Tethys capture had 323 decodable frames with constant brightness, so no video flash reached UVC.
    • Server logs show the probe entered a stale upstream session and dropped audio as ~326 seconds late.
  • Identify whether delay appears before server planning, at server UAC sink, at UVC helper, inside Tethys device capture, or inside browser/WebRTC.
    • Current root cause is server planning/session lifecycle, before UVC/UAC sink output.
    • A previous one-sided microphone session started at 2026-04-30T22:59:52Z; the new probe at 2026-05-01T00:57:08Z inherited its stale playout epoch.
  • Add diagnostics for whichever stage is hiding delay.
    • Existing server lifecycle/planning logs were enough to isolate this run; next gate should preserve these as structured artifacts.
  • Do not tune calibration offsets until gross backlog is ruled out.
    • No calibration offsets were changed during the stale-session investigation.
    • Current evidence points at lifecycle/session planning, not an offset problem.

Phase 3: Fix Lesavka With Evidence

  • If stale upstream lifecycle is confirmed, reset shared A/V timing anchors when a new stream replaces an existing owner.
    • Added a lifecycle guard so normal camera/microphone stream replacement clears stale shared timing anchors before re-pairing.
    • Kept soft microphone recovery intentionally separate so it supersedes the mic owner without disturbing an active healthy camera/shared clock.
    • Added regression coverage for stale timing-anchor replacement and soft microphone recovery preservation.
  • If UAC sink backlog is confirmed, make UAC output freshness-bounded.
  • If audio progress is marked too early, move/augment progress telemetry to reflect actual sink emission readiness.
  • If UVC and UAC are using incompatible freshness semantics, unify them behind one live-media policy.
  • If browser/WebRTC adds delay after devices are already synced, document the application boundary and add browser-specific mitigation or guidance.

Phase 4: Gate And Release Criteria

  • Add deterministic unit/integration tests for probe analysis logic.
  • Add a hardware-in-the-loop/manual gate artifact schema for real Tethys probe runs.
  • Update scripts/ci/media_reliability_gate.sh to report probe evidence when present.
    • Gate now reads LESAVKA_SYNC_PROBE_REPORT_JSON, LESAVKA_SYNC_PROBE_REPORT_DIR, or target/media-reliability-gate/sync-probe/report.json.
    • Gate emits sync-probe verdict/check metrics, skew metrics, event counts, and a verdict info metric.
  • Require a fresh probe report before declaring lip sync fixed.
    • Gate now supports LESAVKA_REQUIRE_SYNC_PROBE=1, which fails media reliability when a valid passing probe report is absent.
    • Product/release judgment still requires a new live Theia/Tethys probe after the lifecycle fix is installed.
  • Suggested thresholds:
    • preferred: p95 skew <= 35 ms
    • acceptable: p95 skew <= 80 ms
    • gross failure: sustained skew > 250 ms
    • catastrophic failure: any sustained skew near or above 1000 ms

Open Questions

  • Decide whether the phase-1 beacon should run as a separate binary, a hidden client mode, or both.
  • Decide whether Tethys probe should be Rust-only, shell plus GStreamer, or a hybrid.
  • Confirm whether sudo/Vault access is available for installing missing probe dependencies on Theia/Tethys.
    • Non-sudo server journal inspection worked; noninteractive sudo over SSH still needs an explicit TTY/password path.

Validation Evidence

  • cargo test -p lesavka_server upstream_media_runtime::tests::lifecycle
  • cargo test -p lesavka_client sync_probe::analyze
  • cargo test -p lesavka_testing upstream_sync_script_tunnels_auto_server_addr_through_ssh
  • bash -n scripts/ci/media_reliability_gate.sh
  • cargo test -p lesavka_testing media_reliability_gate_reports_direct_sync_probe_evidence
  • LESAVKA_REQUIRE_SYNC_PROBE=1 ./scripts/ci/media_reliability_gate.sh
    • Used a synthetic passing report at target/media-reliability-gate/sync-probe/report.json to verify gate parsing/enforcement.
    • This validates CI glue only; a real Theia/Tethys probe is still required for product judgment.

Real Upstream Lip-Sync Fix Checklist

Context: the mirrored browser probe finally reproduced the real failure class on 2026-05-01: activity_start_delta_ms=+9591.1. This means the end-to-end browser-visible path can still start video far ahead of audio. The fix target is not silence in the logs; it is a freshness-first A/V uplink whose startup can heal briefly but cannot drift into seconds of skew.

Acceptance Criteria

  • Mirrored browser probe passes with activity_start_delta_ms <= 1000.
  • Steady-state preferred sync: median skew within 35 ms.
  • Steady-state acceptable sync: p95 absolute skew within 80 ms.
  • Any sustained or startup A/V split near 1000 ms remains a hard failure.
  • No stale audio backlog is ever drained into UAC to catch up.
  • No stale video backlog is ever drained into UVC to catch up.
  • Google Meet manual testing agrees with the mirrored probe instead of revealing hidden seconds-scale skew.

Phase 0: Keep The Probe Honest

  • Split raw activity-start fields from filtered/coded paired-pulse fields in probe reports.
  • Print explicit raw first-video and first-audio timestamps in report.txt.
  • Root-cause the 0.16.17 raw_first_video_activity_s=0.000 artifact as the mirrored probe counting its own bright pre-start positioning card.
  • Make the mirrored stimulus pre-start screen dark/dim so only real flash pulses can be detected as video activity.
  • Add analyzer coverage proving dim pre-start positioning frames are ignored.
  • Replace generic light/dark mirrored flashes with color-coded event IDs.
  • Make mirrored audio pulses unique by the same event ID via pulse width plus tone frequency.
  • Teach the analyzer to decode mirrored video event IDs from color, not grayscale brightness.
  • Tighten real-camera color matching after 0.16.18 accepted washed-out brown/gray remnants as red/yellow events.
  • Preserve raw activity-start timing before cadence cleanup in coded reports.
  • Merge short audio envelope dropouts inside one coded pulse so a single tone burst cannot become two fake events.
  • Add diagnostic coded-pair correlation so stable large skew reports as measured failure instead of not enough pairs.
  • Make coded mirrored verdicts/calibration use matched coded pulses as authority; raw activity-start deltas are reported separately unless they agree with the coded pairs.
  • Print unpaired video/audio onsets in the human report so missed coded pulses are visible during probe triage.
  • Keep the mirrored browser probe as the release/blocking upstream A/V gate.
  • Keep the old raw-device probe as a lower-level diagnostic only.

Phase 1: Stop One-Sided Startup Drift

  • Default upstream planning must require both camera and microphone before live playout.
  • One-sided playout may only happen through an explicit compatibility override.
  • While pairing is overdue, keep replacing the waiting-side anchor with fresh packets instead of preserving stale startup anchors.
  • While awaiting the peer stream, keep only fresh pending camera packets.
  • While awaiting the peer stream, keep only fresh pending microphone packets.
  • Add tests proving the pairing window no longer expires into one-sided playout by default.
  • Add tests proving the explicit one-sided override still works for intentional single-stream scenarios.

Phase 2: Bound UAC Freshness

  • Configure UAC appsrc as non-blocking and bounded.
  • Log and drop UAC appsrc push failures instead of treating enqueue as guaranteed playback.
  • Raise calibration offset limits to cover one-second healing without rejecting measured probe corrections.
  • Update the MJPEG/UVC factory audio baseline from -45ms to +720ms based on the first trustworthy mirrored browser probe artifact.
  • Migrate untouched legacy -45ms factory/env calibration files on load so old installs actually receive the new baseline.
  • Make the video/audio-master wait offset-aware so a positive audio playout delay does not freeze UVC video while UAC sleeps before emission.
  • Flush/stop UAC cleanly on session close, replacement, and recovery.
  • Add tests or contract coverage for bounded UAC settings where practical.

Phase 3: Add Real Timing Evidence

  • Add server timing counters for first camera packet, first mic packet, first UVC write, and first UAC push per session.
  • Add dropped-stale audio/video counters to diagnostics.
  • Add a concise health explanation when startup pairing exceeds the healing window.
  • Surface Starting, Healing, Flowing, Lagging, Dropping, and Stale states in chips/diagnostics from real path evidence.

Phase 4: Recovery And Mid-Session Changes

  • Make device changes trigger soft-pause, stream replacement, queue flush, and re-pairing.
  • Keep recovery soft-first; reserve hard UVC/UAC gadget rebuilds for explicit guarded recoveries.
  • Add cooldown/state guards so recovery buttons cannot wedge Theia.
  • Ensure disconnect closes all client/server media tasks for the session.

Phase 5: Verification Loop

  • Run focused upstream runtime tests.
  • Run server/client media contract tests.
  • Run cargo check for touched packages.
  • Bump version for the fix release.
  • Run the mirrored browser probe on installed client/server.
    • 0.16.17 still failed: reported activity_start_delta_ms=+6735.0, but raw_first_video_activity_s=0.000 exposed a probe false-positive from the pre-start screen. Paired pulses still showed real steady-state skew (p95=411.8 ms, median=-99.0 ms), so the product remains unfixed.
    • 0.16.18 captured real colored/audio-coded events but the analyzer still bailed with need at least 3 matching coded pulse pairs; saw 1. Replaying that artifact after analyzer hardening now reports gross_failure: 16/16 coded pairs, p95 775.7 ms, activity start -766.4 ms, and drift -2.8 ms; the failure is stable audio-ahead/video-late skew, not random detector noise.
    • 0.16.19 changes the shipped MJPEG/UVC audio playout baseline to +720ms; the next mirrored browser probe should move the measured median from about -766ms toward roughly -46ms before fine calibration.
    • 0.16.19 mirrored browser probe did not move the measured skew: p95 885.7 ms, median -788.4 ms, activity start -659.1 ms, drift -81.2 ms. SSH inspection showed Theia was on commit c348597, but /etc/lesavka/server.env still contained LESAVKA_UPSTREAM_AUDIO_PLAYOUT_OFFSET_US=-45000; the new +720ms baseline was not actually installed. Patch the installer to migrate leaked legacy ambient -45000 to +720000 unless LESAVKA_INSTALL_UPSTREAM_AUDIO_PLAYOUT_OFFSET_US explicitly asks for the legacy value.
    • 0.16.20 installed the +720ms offset (/etc/lesavka/server.env had LESAVKA_UPSTREAM_AUDIO_PLAYOUT_OFFSET_US=720000), but the mirrored browser capture contained no recognizable color pulses. Theia server logs showed repeated upstream video frame dropped because the audio master never caught up inside the pairing window; UVC was effectively starved by the positive audio delay instead of flowing delayed-but-fresh frames.
    • 0.16.21 makes that wait offset-aware and adds a regression test proving a configured positive audio delay does not freeze UVC video while UAC sleeps before playout.
    • Replaying the 0.16.21 artifact after 0.16.22 analyzer hardening changes the verdict from false catastrophic_failure to gross_failure: p95 273.8 ms, median -188.4 ms, 7 paired coded pulses. The raw activity-start delta (-3620.7 ms) is still printed, but it is ignored for verdict/calibration because it disagrees with coded pairs by 3432.3 ms; unpaired video/audio onsets are printed for triage.
  • Re-run the mirrored browser probe after the pre-start false-positive fix.
  • Run Google Meet manual validation.