## A/V Sync Probe And Lip-Sync Validation Checklist
Context: Google Meet testing on 2026-04-30 showed audio roughly 8 seconds behind video even though internal client/server telemetry reported fresh uplink packets. Treat this as a product correctness failure, not a calibration issue. Do not resume blind lip-sync tuning until the probe can explain where delay appears.
### Operating Principles
- Avoid hard-resetting USB, UVC, UAC, display managers, or remote hosts unless the user explicitly approves it.
- Prefer observation and reversible user-space probes before changing media pipelines.
- Treat Tethys-only SSH/device inspection as a development luxury, not a production dependency.
- Do not claim lip sync is fixed from internal telemetry alone; require end-to-end device-level evidence.
- Keep this checklist updated as work lands.
### Phase 1: Build The Probe
- [x] Create this tracked checklist in `AGENTS.md`.
- [x] Inventory existing `client/src/sync_probe/` code and decide what can be reused.
- The preserved Tethys capture had 323 decodable frames with constant brightness, so no video flash reached UVC.
- Server logs show the probe entered a stale upstream session and dropped audio as ~326 seconds late.
- [x] Identify whether delay appears before server planning, at server UAC sink, at UVC helper, inside Tethys device capture, or inside browser/WebRTC.
- Current root cause is server planning/session lifecycle, before UVC/UAC sink output.
- A previous one-sided microphone session started at 2026-04-30T22:59:52Z; the new probe at 2026-05-01T00:57:08Z inherited its stale playout epoch.
- [x] Add diagnostics for whichever stage is hiding delay.
- Existing server lifecycle/planning logs were enough to isolate this run; next gate should preserve these as structured artifacts.
- [x] Do not tune calibration offsets until gross backlog is ruled out.
- No calibration offsets were changed during the stale-session investigation.
- Current evidence points at lifecycle/session planning, not an offset problem.
### Phase 3: Fix Lesavka With Evidence
- [x] If stale upstream lifecycle is confirmed, reset shared A/V timing anchors when a new stream replaces an existing owner.
- Added a lifecycle guard so normal camera/microphone stream replacement clears stale shared timing anchors before re-pairing.
- Kept soft microphone recovery intentionally separate so it supersedes the mic owner without disturbing an active healthy camera/shared clock.
- Added regression coverage for stale timing-anchor replacement and soft microphone recovery preservation.
- [ ] If UAC sink backlog is confirmed, make UAC output freshness-bounded.
- [ ] If audio progress is marked too early, move/augment progress telemetry to reflect actual sink emission readiness.
- [ ] If UVC and UAC are using incompatible freshness semantics, unify them behind one live-media policy.
- [ ] If browser/WebRTC adds delay after devices are already synced, document the application boundary and add browser-specific mitigation or guidance.
### Phase 4: Gate And Release Criteria
- [x] Add deterministic unit/integration tests for probe analysis logic.
- [x] Add a hardware-in-the-loop/manual gate artifact schema for real Tethys probe runs.
- [x] Update `scripts/ci/media_reliability_gate.sh` to report probe evidence when present.
- Gate now reads `LESAVKA_SYNC_PROBE_REPORT_JSON`, `LESAVKA_SYNC_PROBE_REPORT_DIR`, or `target/media-reliability-gate/sync-probe/report.json`.
- Gate emits sync-probe verdict/check metrics, skew metrics, event counts, and a verdict info metric.
- [x] Require a fresh probe report before declaring lip sync fixed.
- Gate now supports `LESAVKA_REQUIRE_SYNC_PROBE=1`, which fails media reliability when a valid passing probe report is absent.
- Product/release judgment still requires a new live Theia/Tethys probe after the lifecycle fix is installed.
- [ ] Suggested thresholds:
- [x] preferred: p95 skew <= 35 ms
- [x] acceptable: p95 skew <= 80 ms
- [x] gross failure: sustained skew > 250 ms
- [x] catastrophic failure: any sustained skew near or above 1000 ms
### Open Questions
- [x] Decide whether the phase-1 beacon should run as a separate binary, a hidden client mode, or both.
- [x] Decide whether Tethys probe should be Rust-only, shell plus GStreamer, or a hybrid.
- [ ] Confirm whether sudo/Vault access is available for installing missing probe dependencies on Theia/Tethys.
- Non-sudo server journal inspection worked; noninteractive sudo over SSH still needs an explicit TTY/password path.
### Validation Evidence
- [x]`cargo test -p lesavka_server upstream_media_runtime::tests::lifecycle`
- [x]`cargo test -p lesavka_client sync_probe::analyze`
- [x]`cargo test -p lesavka_testing upstream_sync_script_tunnels_auto_server_addr_through_ssh`
Context: the mirrored browser probe finally reproduced the real failure class on 2026-05-01:
`activity_start_delta_ms=+9591.1`. This means the end-to-end browser-visible path can still start video far ahead of audio. The fix target is not silence in the logs; it is a freshness-first A/V uplink whose startup can heal briefly but cannot drift into seconds of skew.
- [x] Make coded mirrored verdicts/calibration use matched coded pulses as authority; raw activity-start deltas are reported separately unless they agree with the coded pairs.
- [x] Print unpaired video/audio onsets in the human report so missed coded pulses are visible during probe triage.
- [x] Raise calibration offset limits enough to cover the measured MJPEG/UVC path delta without rejecting probe corrections.
- [x] Update the MJPEG/UVC factory audio baseline from the old `-45ms`/`+720ms` values to `+1260ms` as the mirrored probe exposes the fresh UAC-vs-UVC path delta.
- [x] Run the mirrored browser probe on installed client/server.
- 0.16.17 still failed: reported `activity_start_delta_ms=+6735.0`, but `raw_first_video_activity_s=0.000` exposed a probe false-positive from the pre-start screen. Paired pulses still showed real steady-state skew (`p95=411.8 ms`, `median=-99.0 ms`), so the product remains unfixed.
- 0.16.18 captured real colored/audio-coded events but the analyzer still bailed with `need at least 3 matching coded pulse pairs; saw 1`. Replaying that artifact after analyzer hardening now reports `gross_failure`: 16/16 coded pairs, p95 `775.7 ms`, activity start `-766.4 ms`, and drift `-2.8 ms`; the failure is stable audio-ahead/video-late skew, not random detector noise.
- 0.16.19 changes the shipped MJPEG/UVC audio playout baseline to `+720ms`; the next mirrored browser probe should move the measured median from about `-766ms` toward roughly `-46ms` before fine calibration.
- 0.16.19 mirrored browser probe did not move the measured skew: p95 `885.7 ms`, median `-788.4 ms`, activity start `-659.1 ms`, drift `-81.2 ms`. SSH inspection showed Theia was on commit `c348597`, but `/etc/lesavka/server.env` still contained `LESAVKA_UPSTREAM_AUDIO_PLAYOUT_OFFSET_US=-45000`; the new `+720ms` baseline was not actually installed. Patch the installer to migrate leaked legacy ambient `-45000` to `+720000` unless `LESAVKA_INSTALL_UPSTREAM_AUDIO_PLAYOUT_OFFSET_US` explicitly asks for the legacy value.
- 0.16.20 installed the `+720ms` offset (`/etc/lesavka/server.env` had `LESAVKA_UPSTREAM_AUDIO_PLAYOUT_OFFSET_US=720000`), but the mirrored browser capture contained no recognizable color pulses. Theia server logs showed repeated `upstream video frame dropped because the audio master never caught up inside the pairing window`; UVC was effectively starved by the positive audio delay instead of flowing delayed-but-fresh frames.
- 0.16.21 makes that wait offset-aware and adds a regression test proving a configured positive audio delay does not freeze UVC video while UAC sleeps before playout.
- Replaying the 0.16.21 artifact after 0.16.22 analyzer hardening changes the verdict from false `catastrophic_failure` to `gross_failure`: p95 `273.8 ms`, median `-188.4 ms`, 7 paired coded pulses. The raw activity-start delta (`-3620.7 ms`) is still printed, but it is ignored for verdict/calibration because it disagrees with coded pairs by `3432.3 ms`; unpaired video/audio onsets are printed for triage.
- 0.16.22 live mirrored run still failed with p95 `433.7 ms`, median `-359.4 ms`, and 5 paired coded pulses. Client telemetry showed camera uplink `latest_age_ms` repeatedly around `300-350 ms`, matching the measured skew; patch 0.16.23 to make video queues latest-only instead of draining stale-but-under-budget backlog.
- 0.16.23 local validation passed for fresh-queue behavior, uplink/probe freshness contracts, sync analyzer tests, client/server binary checks, and whitespace checks.
- 0.16.23 live mirrored run improved to p95 `215.2 ms`, median `+142.2 ms`, 13 paired coded pulses, and raw activity alignment within `6.6 ms` of coded pairs. Patch 0.16.24 makes the probe print local client and remote server versions before capture so every run records what was actually tested.
- 0.16.24 live mirrored run improved again to p95 `168.4 ms`, median `-19.1 ms`, 11 paired coded pulses, but still failed because individual paired pulses bounced between about `-168 ms` and `+45 ms`. Client logs showed the microphone uplink queue still accumulating depth `16`; patch 0.16.25 makes microphone uplink queues latest-only too so stale audio PTS cannot continue acting as the server timing master under backpressure.
- 0.16.25 removed the client mic backlog but exposed a stable hardware/browser path delta: p95 `557.3 ms`, median `-540.5 ms`, drift `+9.0 ms`, and fresh mic delivery ages around `2-10 ms`. Patch 0.16.26 raises the MJPEG/UVC factory audio delay to `+1260 ms` and expands the calibration clamp so this stable offset can actually be corrected instead of rejected.
Context: 0.16.x proved that queue tweaks and static calibration cannot guarantee lip sync. 0.17.0 changes the upstream contract: the server planner is authoritative, audio is the master, video follows by timestamp or freezes, and freshness wins over smooth-but-wrong playback.
### Hard Product Invariants
- [x] No normal live upstream playout may be more than 1000 ms behind the freshest known client capture frontier.
- [x] Video may not advance outside the audio playhead sync budget.
- [x] Audio should be continuous when possible, but stale audio must be dropped/skipped rather than drained.
- [x] Missing or late video should freeze/stutter instead of pulling audio backward.
- [x] Startup may be ugly, but must either enter fresh synced live mode or declare failure within 60 seconds.
- [x] Healing may be visible, but must prevent persistent seconds-scale skew.
- [x] Calibration may fine-tune sub-frame offsets only; it must not be required to rescue seconds-scale desync.
- [x] Bump Lesavka to 0.17.0 because this is a media-contract change, not a patch tune.
- 2026-05-01: Added 0.17 planner defaults (`350ms` target playout, `1000ms` max live lag, `60000ms` startup timeout, `80000us` pair slack), reset MJPEG audio factory offset to `0`, and migrated old `-45ms`, `+720ms`, and `+1260ms` untouched baselines.
- 2026-05-01: Server planner now tracks latest input frontier, presented audio/video playheads, phase, stale drops, skew drops, reanchors, startup timeouts, and freezes.
- 2026-05-01: Runtime tests green for stale audio drop, stale video drop, audio-master/video-follower freeze, repeated reanchor, paired startup timeout, and planner snapshot basics: `cargo test -p lesavka_server upstream_media_runtime::tests -- --nocapture`.
- 2026-05-01: Added `GetUpstreamSync` RPC, `lesavka-relayctl upstream-sync`, launcher diagnostics text, and mirrored-probe before/after planner snapshots so 0.17 probe runs report the exact planner state under test.
- 2026-05-01: Validation green: `cargo test -p lesavka_server --lib --bins`, `cargo test -p lesavka_testing`, `cargo test -p lesavka_client --bins --lib`, and targeted installer/RPC/layout contracts.
- 2026-05-01: First installed 0.17.0 mirrored browser probe on client/server commit `3920e0a` failed honestly: planner reported fresh live state (`live_lag_ms=10`, `skew_ms=+20.7`) but browser-observed paired pulses showed audio late by median `+349.1ms`, p95 `429.1ms`, with 6 video freezes/skew drops. Replayed artifact after analyzer hardening now reports `gross_failure` instead of false raw-start `catastrophic_failure`.
- 2026-05-01: Patch follow-up models the observed MJPEG/UVC browser egress delta by defaulting video playout offset to `+350ms` and preserving the 1s freshness ceiling. Raw activity-start evidence is now ignored for verdict/calibration when it disagrees with paired pulses that are already failing directly. Existing early-0.17 `audio=0/video=0` factory/env calibration files migrate to the new `video=+350ms` default on load.
- 2026-05-01: Release identity cleanup: bumped the patched build to clean semver `0.17.1`; probe attribution now prints `client_version`/`server_version` separately from `client_revision`/`server_revision` and refuses old `client_full_version` output.
- 2026-05-01: 0.17.1 mirrored probe failed with video about `1.18-1.31s` behind audio and 761 planner video freezes. Root cause candidate: the client rebaser forced independent camera/mic pipelines onto one first-packet capture base, so a later-starting camera path was timestamped too early and looked permanently behind audio. Patch 0.17.2 anchors each stream to the shared monotonic clock at its own first packet time.
- 2026-05-02: 0.17.2 mirrored probe and Google Meet test showed major improvement but persistent sub-second late video. Root cause follow-up: the temporary `+350ms` factory MJPEG video playout offset matched the observed browser skew and also made the server skew guard freeze video against its own offset. Patch 0.17.3 restores factory video offset to `0ms`, migrates untouched `+350ms` install/calibration defaults back to `0ms`, and makes the skew guard offset-aware for intentional site calibration.
- 2026-05-02: 0.17.3 Google Meet manual test improved to roughly sub-second/near-quarter-second lip sync, but the mirrored analyzer could not pair pulses and the user still heard choppy background audio. Client logs showed Pulse microphone packets arriving unevenly with ages around `90-240ms`; patch 0.17.4 lowers Pulse mic `buffer-time`/`latency-time`, bounds the mic queue/appsink, and keeps mirrored-probe after-run planner diagnostics even when analysis fails.
- 2026-05-02: 0.17.4 mirrored run was salvageable after an SCP banner timeout, but analysis still failed with no close pulse pairs. The client log still showed `180-240ms` microphone delivery ages, pointing at server playout sleeps backpressuring the gRPC microphone stream. Patch 0.17.5 drains inbound microphone packets while waiting for scheduled UAC playout and retries browser-capture SCP fetches.
- 2026-05-02: 0.17.5 mirrored run still failed with insufficient paired evidence, and the client log still showed recurring `180-240ms` microphone packet age while camera age stayed near zero. Patch 0.17.6 splits oversized mic samples into `20ms` timestamped packets and keeps a short fresh server-side audio window instead of collapsing every pending burst to one newest chunk, aiming to preserve lip sync without making background audio choppy.
- 2026-05-02: 0.17.6 Bumblebee mirrored run proved Bumblebee mic packets are already `10ms`, but camera source timestamps were being rebased up to roughly `1.8s` into the future while mic packets sat around `180-240ms` old. Patch 0.17.7 adds a source lead cap (`80ms` default) to both direct and duration-paced client timestamp rebasing so bursty camera buffers cannot make the server wait for fake future video while fresh audio keeps moving.
- 2026-05-02: The launcher UI was still writing live control files with only camera/mic/speaker booleans, so media device combo changes were honestly only staged for the next child launch. Patch 0.17.7 extends the live media control file with base64-encoded camera source, camera profile, microphone source, and speaker sink choices; the relay child now rebuilds the affected camera, mic, or speaker pipeline when those selections change.
Context: 0.17.7 with the Bumblebee mic and BRIO camera removed the seconds-scale failure and left a stable browser-visible output skew: paired pulses were audio-late by roughly `+95ms` to `+183ms` (`median=+110.8ms`, `mean=+132.6ms`, `p95=+183.1ms`). Per user direction, 0.17.8 is only about establishing sync. Freshness and smoothness tuning are explicitly deferred until the mirrored probe is inside the sync band.
- [x] Do not change freshness ceilings, reanchor thresholds, queue policy, UAC smoothness, or startup healing behavior in this version.
- [x] Set the MJPEG/UVC factory video playout baseline to `+130ms` to counter the measured browser output audio-late bias.
- [x] Migrate only untouched old `0ms` and `+350ms` video defaults to the new `+130ms` baseline.
Context: 0.17.8 installed cleanly on both ends (`314c55b`) but the mirrored probe failed with insufficient data: only 2 paired events, 1187 video freezes, and planner phase `healing`. The server was using the newest planned audio packet as the video-drop reference, so future audio planning could make current video look falsely behind before that audio was actually handed to UAC.
- [x] Keep 0.17.9 scoped to sync enforcement only; no freshness ceilings, queue policy, or smoothness changes.
- [x] Make video freeze/drop decisions compare against audio actually presented to UAC, not merely planned audio.
- [x] Make `wait_for_audio_master` wake on `mark_audio_presented` so video waits for real audio progress.
Context: 0.17.9 installed cleanly on both ends (`fbf274d`) and improved the mirrored probe to `median=+19.8ms`, `mean=-42.0ms`, and planner phase `live`, but it still failed with `p95=254.1ms`, only 6 paired pulses, `drift=341.9ms`, and 591 video freezes. The Theia server log showed repeated `upstream video frame dropped because the audio master never caught up inside the pairing window`, so the video follower was still giving up at the nominal video due time instead of spending a bounded sync grace to let audio catch up.
- [x] Keep 0.17.10 scoped to establishing sync; defer freshness and smoothness tuning until paired skew is stable.
- [x] Add `LESAVKA_UPSTREAM_AUDIO_MASTER_WAIT_GRACE_MS` with a `350ms` default so video can wait past nominal due time for UAC audio progress.
- [x] Stop dropping video solely because it woke late after a successful audio-master wait.
- [x] Preserve the global `1000ms` live-lag ceiling and existing stale-input planner rules.
- [x] Update installer defaults and operational docs for the sync grace.
- [x] Add/adjust tests proving video can wait through sync grace and still times out after grace expires.
Context: 0.17.10 installed cleanly on both ends (`4bb0f4a`) and produced a high-confidence coded-pulse failure instead of probe ambiguity. Browser-visible audio on Tethys arrived about `+891ms` to `+971ms` after the matching video (`median=+962.1ms`, `mean=+946.7ms`, `p95=+971.5ms`), while the server planner reported internal skew near zero (`planner_skew_ms=-56.9`). The missing model is UAC/browser output egress latency: Lesavka was treating `appsrc.push_buffer`/UAC enqueue as audio presentation, but the browser consumes that audio about one second later.
- [x] Keep 0.17.11 scoped to establishing sync; do not tune freshness ceilings or smoothness policy.
- [x] Raise the MJPEG/UVC factory video playout baseline from `+130ms` to `+1090ms` to align video with browser-visible UAC audio.
- [x] Allow intentional A/V playout offsets to exceed the generic future-wait freshness guard so the planner does not immediately reanchor away the sync compensation.
- [x] Widen calibration offset bounds so the measured browser egress baseline is representable instead of silently clamped.
- [x] Migrate untouched `0ms`, `+130ms`, and `+350ms` MJPEG/UVC video baselines to the new browser-visible baseline.
Context: 0.17.11 installed cleanly on both ends (`092c03a`) and fixed the constant browser-visible offset: raw activity start was `-17.0ms`, median coded skew was `+13.1ms`, and no server planner freeze/drop counters moved. The run still failed because only 5 coded pairs were usable and the last two had weak confidence (`0.58`, `0.33`) while the client log showed repeated microphone stale/superseded drops. The remaining sync blocker is audio-marker continuity, not another constant offset.
- [x] Keep 0.17.12 scoped to sync establishment; do not change video freshness or server freshness ceilings.
- [x] Stop using latest-only policy for microphone uplink packets; preserve oldest fresh audio chunks so speech/pulses stay continuous.
- [x] Keep microphone queue age bounded at `400ms` so continuity does not become unbounded backlog.
- [x] Apply the same bounded audio-continuity policy to the synthetic sync-probe audio queue.
- [x] Update contract tests to encode the new audio continuity policy.
- [x] Run focused client queue/probe contracts and package checks.
Context: 0.17.12 installed cleanly on both ends (`2b26fde`) and moved the paired-pulse median into the near-sync zone (`median=+71.6ms`, first/last `+59.8ms/-60.1ms`), but the run still failed with only 4 usable coded pairs and `p95=206.8ms`. Static guesses have reached diminishing returns. 0.17.13 adds a measured calibration loop so the mirrored probe can become the authority for site-specific browser-visible output compensation when, and only when, the analyzer has enough stable evidence.
- [x] Keep 0.17.13 scoped to probe-driven sync calibration tooling; do not change freshness ceilings, queue policy, UAC smoothness, or startup healing behavior.
- [x] Expose relay CLI calibration state and safe calibration actions for scripts.
- [x] Make the mirrored probe locate the analyzer `report.json` after a run and print the calibration decision.
- [x] Apply only when analyzer `calibration.ready=true`; otherwise refuse and print the analyzer reason.
- [x] Default probe-driven correction to video offset adjustment, because the measured residual is browser-visible UAC egress delay relative to UVC video.
- [x] Keep saving the measured offset as the site default opt-in via `LESAVKA_SYNC_SAVE_CALIBRATION=1`.
- [x] Update contract tests for relay CLI and manual probe script behavior.
- [x] Run focused relay/manual-script tests and package checks.
Follow-up candidate: after 0.17.13 proves safe measured apply/refuse behavior, add a segmented live-calibration probe. The current browser probe uploads one WebM after recording ends, so it can only do measure/apply/rerun. A true same-session loop should run a longer stimulus, capture/analyze separate Tethys browser windows, apply calibration only between windows, and use the next window as the confirmation segment so before/after evidence is not mixed.
## 0.17.14 Segmented Live Calibration Probe Checklist
Context: 0.17.13 adds safe measured calibration apply/refuse plumbing, but it is still a single-window measure-then-rerun workflow. The next probe should keep the same Lesavka client/server session alive across multiple browser-capture windows so we can measure, apply, and re-measure without reinstalling or restarting the media path. This is the bridge from probe truth to blind/server-side calibration targets.
- [x] Keep 0.17.14 scoped to probe tooling and observability; do not change media planner policy.
- [x] Add optional multi-segment mirrored probe mode via `LESAVKA_SYNC_CALIBRATION_SEGMENTS`.
- [x] Keep one local stimulus browser and one headless Lesavka sender alive across all segments.
- [x] Run a fresh Tethys browser recording/analyzer pass per segment so before/after calibration evidence is not mixed in one WebM.
- [x] Allow calibration apply between segments using the 0.17.13 ready/refuse gate.
- [x] Capture planner and calibration snapshots before and after each segment for metric correlation.
- [x] Preserve single-segment default behavior for normal manual probes.
- [x] Update manual probe contract tests for segmented live calibration mode.
- [x] Run focused script/CLI checks and package checks.
## 0.17.15 Adaptive Probe Metrics and Blind Target Checklist
Context: 0.17.14 can keep one Lesavka session alive across multiple measured segments, but we still need the probe to teach Lesavka what "good" looks like from server-only telemetry. 0.17.15 turns segmented runs into an adaptive calibration dataset: every segment gets probe truth, planner state, and calibration state joined into artifacts that can drive blind calibration/healing targets when Tethys/browser probe access is not available.
- [x] Keep 0.17.15 scoped to probe intelligence and metrics correlation; do not change media playout policy.
- [x] Add adaptive calibration ergonomics for longer near-continuous runs without changing the default one-segment probe.
- [x] Write per-run segment metrics as CSV and JSONL, joining analyzer verdicts with planner/calibration before/after snapshots.
- [x] Emit a blind-target candidate JSON from segments whose probe verdict passes, including server-visible planner lag/skew ranges.
- [x] Record when no segment is probe-good enough so blind-target generation refuses instead of inventing targets.
- [x] Keep calibration mutation gated by the existing ready/refuse logic and `LESAVKA_SYNC_APPLY_CALIBRATION=1`.
- [x] Update manual probe contract tests for the adaptive artifacts and controls.
- [x] Run focused script checks and package checks.
Context: 0.17.15 proved the adaptive/live-edit loop is structurally useful, but each segment restarted the Tethys browser/getUserMedia receiver. That contaminates calibration segments with receiver startup noise and keeps usable coded pairs too low (`3-5` pairs instead of the `8+` needed for safe calibration). 0.17.16 is scoped to making the probe evidence continuous and attributable before any more media playout changes.
- [x] Keep 0.17.16 scoped to probe/tooling reliability; do not change server media playout policy, freshness ceilings, queue policy, or UAC smoothness.
- [x] Make adaptive mirrored runs keep one Tethys browser/getUserMedia session alive after the first segment.
- [x] Preserve single-segment/manual probe behavior by default.
- [x] Add an explicit `BROWSER_CONSUMER_REUSE_SESSION` control to the browser probe runner.
- [x] Verify browser uploads by start token so the fetched WebM belongs to the segment just triggered.
- [x] Track browser upload counts/tokens in `browser_consumer_probe.py` status JSON for post-run debugging.
- [x] Wire adaptive mirrored mode to reuse the existing Tethys receiver for segments 2+.
Context: 0.17.16 successfully installed and token-verified the first browser capture, but the adaptive run still aborted at segment 1 because `lesavka-sync-analyze` could not form coded pairs (`saw 0`) even though raw activity was nearly synced (`-32.2ms`). That prevented segments 2-4 from exercising the same long-lived Tethys browser session. 0.17.17 keeps adaptive data gathering alive across analyzer-only failures and preserves the failure evidence for summaries.
- [x] Keep 0.17.17 scoped to probe/tooling reliability; do not change media playout policy.
- [x] Add `BROWSER_ANALYSIS_REQUIRED` so browser captures can keep artifacts even when analyzer exits nonzero.
- [x] In adaptive mirrored mode, treat analyzer-only failures as nonfatal segment evidence.
- [x] Preserve analyzer failure reason, raw activity delta, and log path as structured segment artifacts.
- [x] Continue to abort on browser startup, recording, upload, or capture fetch failures.
- [x] Include analyzer failure artifacts in segment CSV/JSONL summaries.
- [x] Keep calibration apply impossible without a real analyzer `report.json` and `calibration.ready=true`.
- [x] Update manual probe contract tests for analyzer-failure continuation.
- [x] Run shell syntax checks, focused contract tests, and package checks.
## 0.17.24 Probe Truthfulness And Localization Checklist
Context: the 0.17.23 run proved adaptive calibration is now live-editing the server,
but confirmation still failed. Segment 3 passed and triggered a provisional calibration
nudge, while the confirmation segment failed with a near-centered median but high p95/drift.
This means the fastest high-quality path is localization tooling, not another static offset
guess.
- [x] Treat the latest failure as timing instability/outlier drift until the probe proves otherwise.
- [x] Fix analyzer-failure raw activity delta parsing so bounded raw-delta calibration can use the evidence it prints.
- [x] Stop marking `blind-targets.json` ready from calibration-only passes when confirmation segments exist and fail.
- [x] Emit combined `segment-events.csv` and `segment-events.jsonl` artifacts so each run exposes per-pulse skew and confidence across segments.
- [ ] Use the next run to decide whether bad p95 is caused by low-confidence analyzer pairings, camera/mic capture instability, or server planner/output jitter.
- [ ] Add stage-local timing evidence for stimulus schedule, client capture onsets, server output timing, and browser/device capture if the event table still cannot isolate the source.
- [ ] Only save calibration defaults after a confirmation segment passes.
- [x] Instrument UVC/UAC/HDMI sink handoff timing before waiting for another run.
## 0.17.26 Blind Timing Window And Sink Handoff Checklist
Context: the next probe should not be required to discover that the server is
blind between "packet arrived" and "packet handed to UAC/UVC/HDMI". Close
measurement gaps before tuning any new healing controller.
- [x] Retain rolling client capture/send skew windows inside the server.
- [x] Retain rolling server receive skew and client queue age windows.
- [x] Record audio/video sink handoff instants and schedule lateness at the server boundary.
- [x] Expose sink handoff skew, sink lateness, and rolling p95 timing metrics through `GetUpstreamSync`.
- [x] Include rolling blind metrics in mirrored-probe CSV/JSONL summaries and blind targets.
- [x] Add planner tests for rolling timing windows and sink handoff timing.
- [ ] Use the next mirrored run only for correlation/tuning: decide whether the controller should adjust playout delay, offset, or drop/freeze policy from these blind metrics.
Context: if client/server-only timing is stable enough, Lesavka should make
small runtime corrections without waiting for the external probe. The probe
remains the truth judge and root-cause localizer, not the production dependency.
- [x] Add a server-owned blind healer loop enabled by default with `LESAVKA_UPSTREAM_BLIND_HEAL=0` escape hatch.
- [x] Gate blind healing on rolling sample counts plus client-send, server-receive, queue-age, sink-late, and sink-handoff p95 limits.
- [x] Apply bounded transient offset nudges from sink handoff skew without saving them as site defaults.
- [x] Expose sample counts in `upstream-sync` and mirrored probe artifacts so failed runs can separate "insufficient evidence" from real timing failure.
- [x] Emit `root-cause-summary.json` from mirrored probe runs to classify failing layers instead of eyeballing raw metrics.
- [x] Add unit tests for apply/refuse/target behavior in the blind healer.
- [ ] Next run should identify the failing layer if confirmation still fails: client capture/uplink, network/server receive, server planner, server sink handoff, or external USB/browser/probe boundary.
Context: the first preferred confirmation pass showed the probe-calibrate-confirm
loop can work, but also revealed two blind-healing blockers: sink handoff samples
stayed empty, and client timing skew included a false cross-pipeline PTS offset.
- [x] Pair server sink handoff samples by planned due time, not raw local PTS, so offset-compensated streams still produce handoff evidence.
- [x] Normalize client sidecar capture/send windows onto the shared capture clock using queue delivery age instead of raw per-pipeline packet PTS.
- [x] Add tests proving sink handoff survives large offset-compensated local PTS gaps.
- [x] Add tests proving audio/video timing metadata no longer copies packet PTS domains into blind sidecar fields.
- [ ] Next mirrored run should show non-zero `planner_sink_handoff_window_samples` and much smaller client send/capture p95 skew before trusting blind healing.
Context: the 0.17.29 mirrored run confirmed the client-side scheduling leak is fixed, but the probe
then applied large opposite calibration nudges from analyzer failures with zero or one coded pair.
Raw activity deltas are useful diagnostic breadcrumbs; they are not safe steering evidence when coded
pairing collapses.
- [x] Treat the 0.17.29 run as proof that client sidecar timing is now trustworthy enough to move the investigation downstream.
- [x] Default raw analyzer-failure calibration to off instead of inheriting provisional calibration.
- [x] Add `LESAVKA_SYNC_RAW_FAILURE_MIN_PAIRS` so even explicit raw-failure calibration refuses weak coded evidence.
- [x] Print the raw-failure pair floor in calibration decisions and segment artifacts.
- [x] Prefer server-side receive/sink blockers over probe-pairing blockers when root-cause evidence is available.
- [x] Update manual probe contract coverage for the safer defaults and refusal reason.
- [ ] Re-run the probe-calibrate-confirm flow; analyzer failures should diagnose but not mutate calibration unless raw fallback is explicitly enabled and has enough coded support.
- [ ] If client send/capture p95 stays low and server receive p95 stays high, localize the transport/server-receive timing layer next.
- [x] Tighten the blind-heal sink handoff p95 gate from 250ms to 120ms before applying runtime nudges.
- [x] Align mirrored-run root-cause summaries with the stricter sink handoff stability threshold.
- [x] Add regression coverage for default-disabled blind healing and noisy sink-handoff refusal.
- [ ] Re-run the normal probe-calibrate-confirm flow; `calibration_source` should remain non-blind unless the server was explicitly started with blind healing.
- [ ] If the probe still produces only one or two visual events while blind metrics stay stable, move the next fix to stimulus/browser/probe detection instead of transport timing.
Context: manual Tethys testing showed the desktop was awake and HDMI was on, but the right-eye feed stayed black. Server logs showed `eye=r` reached `PLAYING` and then hit a V4L2/GStreamer `Invalid argument (22)` poll error before any frame was pushed, while the left eye streamed normally.
- [x] Confirm Tethys was not asleep: X11 reported HDMI connected at 1920x1080 and `Monitor is On`.
- [x] Remove stale Tethys browser-probe processes after mirrored probe runs so manual Google Meet testing does not compete with old recorder sessions.
- [x] Propagate late eye-capture GStreamer bus errors into the gRPC video stream so the launcher reports a preview stream error instead of a silent black window.
- [x] Add a first-frame watchdog for eye capture streams so opened-but-empty sources surface as explicit diagnostics.
- [ ] Re-run a manual two-eye session and confirm right-eye failures now appear in the session log with the concrete source error.
- [ ] If `eye-r` still reports `poll error ... EINVAL`, recover/reseat the right HDMI capture path or add a dedicated eye-capture soft recovery path separate from UVC/UAC.
Context: a manual Google Meet test on 0.17.34/0.17.35 was much worse than the earlier baseline:
remote audio sounded like choppy chunks/clicks and the video was visibly choppy. Live Theia
configuration showed the installer-generated UVC gadget was advertising `640x480 @ 20fps`, the
client camera pipeline was using that server profile as the outgoing packet profile even when the UI
selected `720p@30`, and the UAC speech path used a very tight 20ms/5ms sink buffer/latency with
downstream appsrc dropping.
- [x] Treat probe/analyzer measurements as suspect until the copied Tethys capture is visually and audibly stable.
- [x] Make UI-selected camera quality king: the launcher camera profile now drives the outgoing uplink profile by default instead of being downscaled to stale server caps.
- [x] Keep an explicit `LESAVKA_CAM_LOCK_TO_SERVER_PROFILE=1` lab escape hatch for debugging the server UVC gadget contract.
- [x] Restore generated UVC gadget fallback defaults to `1280x720 @ 30fps` for sessions without an explicit UI/session profile.
- [x] Align runtime UVC fallback defaults with the generated 30fps gadget profile.
- [x] Raise UAC speech sink buffering and appsrc limits so speech favors intelligibility over bare-minimum latency under jitter.
- [x] Stop default downstream appsrc leaking on the UAC speech path; shredded chunks are worse than modest added latency for calls.
- [ ] Reinstall/restart Theia services so `/etc/lesavka/uvc.env` is refreshed from `640x480 @ 20fps` to `1280x720 @ 30fps`.
- [ ] Re-run manual Google Meet before trusting mirrored probe calibration; verify speech is intelligible and video cadence is stable by eye.