WordloopWordloop
WorkDeliveredLive Capture

Pitch

Meeting Recording — rough solution sketch, rabbit holes, and no-gos.

The Pitch

Status: Accepted

Author: Ryan Nel


Problem

See Problem Statement: no live capture flow — meetings enter only via file upload or manual text entry. Appetite: Large.


Solution Sketch

Extend the existing WebSocket connection to carry binary audio frames upstream. Core receives audio and fans out in parallel: stream to GCS for durable storage, and stream to ML for live transcription. ML routes segments and insights back to Core via a persistent HTTP stream — Core broadcasts them to the client via WebSocket and persists them asynchronously.

When recording stops, Core publishes a MeetingSessionTerminated event. ML drains its AssemblyAI buffer and sends final segments. Core then triggers a post-meeting reprocessing job via Pub/Sub — the same pipeline used for file uploads, with skip_tasks: true to preserve tasks captured live.

This bet does not change the data architecture pattern. It extends Optimistic Mutation with Echo-Suppressed Streaming to cover:

  • A new upstream path (audio frames)
  • A new downstream path (transcript segments and ML insights in real time)

Rabbit Holes

Audio encoding. The client captures audio in the browser. The ML service expects a specific format (PCM16 or WebM). Encoding decisions affect latency. We keep this simple: the client sends raw WebM chunks; Core forwards them as-is. No transcoding at Core.

ML degradation. If AssemblyAI is unavailable mid-recording, we cannot fail the session — the user is speaking. The bet requires graceful degradation: continue storing audio to GCS, show a warning, and recover via post-processing when services come back.

Speaker identification confidence threshold. Matching a voice embedding against known profiles requires a threshold. Too low: false matches. Too high: no matches. The threshold must be configurable without a deployment. Start with 0.85 and expose it as a server-side config value.

Session state. One active session per user. Core must enforce this — two concurrent recording sessions from the same user is an error, not a queuing scenario.


No-Gos

  • No calendar integration — auto-starting from calendar events is a separate bet
  • No multi-user collaborative recording
  • No video capture — audio only
  • No meeting bot integration (Zoom/Teams/Meet)
  • No custom vocabulary or domain-specific tuning; use default AssemblyAI settings

Output

Pitch is accepted. Move to Design to map the user journey and define what the UI needs before the API is designed.

On this page