7.0 KiB
7.0 KiB
Lesavka Agent Notes
A/V Sync Probe And Lip-Sync Validation Checklist
Context: Google Meet testing on 2026-04-30 showed audio roughly 8 seconds behind video even though internal client/server telemetry reported fresh uplink packets. Treat this as a product correctness failure, not a calibration issue. Do not resume blind lip-sync tuning until the probe can explain where delay appears.
Operating Principles
- Avoid hard-resetting USB, UVC, UAC, display managers, or remote hosts unless the user explicitly approves it.
- Prefer observation and reversible user-space probes before changing media pipelines.
- Treat Tethys-only SSH/device inspection as a development luxury, not a production dependency.
- Do not claim lip sync is fixed from internal telemetry alone; require end-to-end device-level evidence.
- Keep this checklist updated as work lands.
Phase 1: Build The Probe
- Create this tracked checklist in
AGENTS.md. - Inventory existing
client/src/sync_probe/code and decide what can be reused.- Reuse the existing synthetic beacon in
client/src/sync_probe/. - Reuse the existing Tethys capture harness in
scripts/manual/run_upstream_av_sync.sh. - Reuse and extend
lesavka-sync-analyze; current gap is structured evidence output, not capture generation.
- Reuse the existing synthetic beacon in
- Define the phase-1 output contract:
report.jsonreport.txt- per-event rows with event id, video time, audio time, skew, and confidence
- pass/fail verdict using preferred/acceptable/catastrophic thresholds
- Add a deterministic local sync beacon source:
- video flash pattern with event identity or cadence
- simultaneous audio click/beep
- stable event schedule suitable for automated detection
- Add a Tethys-side capture probe:
- capture Lesavka UVC video device
- capture Lesavka UAC microphone device
- record enough raw evidence for debugging when detection fails
- detect video flashes
- detect audio clicks
- pair events and compute skew
- Add a runner that can launch or instruct the Tethys probe safely over SSH without rebooting or restarting the desktop.
- Store probe artifacts under
/tmp/lesavka-sync-probe-*by default. - Keep the probe usable without Google Meet first; Google Meet validation is a later application-level check.
Phase 2: Use Probe To Root-Cause Desync
- Run probe through direct Lesavka UVC/UAC devices on Tethys.
- First live run reached the devices but exposed analyzer/tooling gaps instead of a valid skew report.
- Fixed the manual probe tunnel to preserve HTTPS/mTLS through SSH (
LESAVKA_SERVER_SCHEME=https,LESAVKA_TLS_DOMAIN=lesavka-server). - Fixed analyzer handling for MJPEG captures whose FFprobe metadata over-reports frames versus decodable video frames.
- Compare client-generated event times against Tethys-observed times.
- The preserved Tethys capture had 323 decodable frames with constant brightness, so no video flash reached UVC.
- Server logs show the probe entered a stale upstream session and dropped audio as ~326 seconds late.
- Identify whether delay appears before server planning, at server UAC sink, at UVC helper, inside Tethys device capture, or inside browser/WebRTC.
- Current root cause is server planning/session lifecycle, before UVC/UAC sink output.
- A previous one-sided microphone session started at 2026-04-30T22:59:52Z; the new probe at 2026-05-01T00:57:08Z inherited its stale playout epoch.
- Add diagnostics for whichever stage is hiding delay.
- Existing server lifecycle/planning logs were enough to isolate this run; next gate should preserve these as structured artifacts.
- Do not tune calibration offsets until gross backlog is ruled out.
- No calibration offsets were changed during the stale-session investigation.
- Current evidence points at lifecycle/session planning, not an offset problem.
Phase 3: Fix Lesavka With Evidence
- If stale upstream lifecycle is confirmed, reset shared A/V timing anchors when a new stream replaces an existing owner.
- Added a lifecycle guard so normal camera/microphone stream replacement clears stale shared timing anchors before re-pairing.
- Kept soft microphone recovery intentionally separate so it supersedes the mic owner without disturbing an active healthy camera/shared clock.
- Added regression coverage for stale timing-anchor replacement and soft microphone recovery preservation.
- If UAC sink backlog is confirmed, make UAC output freshness-bounded.
- If audio progress is marked too early, move/augment progress telemetry to reflect actual sink emission readiness.
- If UVC and UAC are using incompatible freshness semantics, unify them behind one live-media policy.
- If browser/WebRTC adds delay after devices are already synced, document the application boundary and add browser-specific mitigation or guidance.
Phase 4: Gate And Release Criteria
- Add deterministic unit/integration tests for probe analysis logic.
- Add a hardware-in-the-loop/manual gate artifact schema for real Tethys probe runs.
- Update
scripts/ci/media_reliability_gate.shto report probe evidence when present.- Gate now reads
LESAVKA_SYNC_PROBE_REPORT_JSON,LESAVKA_SYNC_PROBE_REPORT_DIR, ortarget/media-reliability-gate/sync-probe/report.json. - Gate emits sync-probe verdict/check metrics, skew metrics, event counts, and a verdict info metric.
- Gate now reads
- Require a fresh probe report before declaring lip sync fixed.
- Gate now supports
LESAVKA_REQUIRE_SYNC_PROBE=1, which fails media reliability when a valid passing probe report is absent. - Product/release judgment still requires a new live Theia/Tethys probe after the lifecycle fix is installed.
- Gate now supports
- Suggested thresholds:
- preferred: p95 skew <= 35 ms
- acceptable: p95 skew <= 80 ms
- gross failure: sustained skew > 250 ms
- catastrophic failure: any sustained skew near or above 1000 ms
Open Questions
- Decide whether the phase-1 beacon should run as a separate binary, a hidden client mode, or both.
- Decide whether Tethys probe should be Rust-only, shell plus GStreamer, or a hybrid.
- Confirm whether sudo/Vault access is available for installing missing probe dependencies on Theia/Tethys.
- Non-sudo server journal inspection worked; noninteractive sudo over SSH still needs an explicit TTY/password path.
Validation Evidence
cargo test -p lesavka_server upstream_media_runtime::tests::lifecyclecargo test -p lesavka_client sync_probe::analyzecargo test -p lesavka_testing upstream_sync_script_tunnels_auto_server_addr_through_sshbash -n scripts/ci/media_reliability_gate.shcargo test -p lesavka_testing media_reliability_gate_reports_direct_sync_probe_evidenceLESAVKA_REQUIRE_SYNC_PROBE=1 ./scripts/ci/media_reliability_gate.sh- Used a synthetic passing report at
target/media-reliability-gate/sync-probe/report.jsonto verify gate parsing/enforcement. - This validates CI glue only; a real Theia/Tethys probe is still required for product judgment.
- Used a synthetic passing report at