Pitch
Meeting Recording — rough solution sketch, rabbit holes, and no-gos.
The Pitch
Status: Accepted
Author: Ryan Nel
Problem
See Problem Statement: no live capture flow — meetings enter only via file upload or manual text entry. Appetite: Large.
Solution Sketch
Extend the existing WebSocket connection to carry binary audio frames upstream. Core receives audio and fans out in parallel: stream to GCS for durable storage, and stream to ML for live transcription. ML routes segments and insights back to Core via a persistent HTTP stream — Core broadcasts them to the client via WebSocket and persists them asynchronously.
When recording stops, Core publishes a MeetingSessionTerminated event. ML drains its AssemblyAI buffer and sends final segments. Core then triggers a post-meeting reprocessing job via Pub/Sub — the same pipeline used for file uploads, with skip_tasks: true to preserve tasks captured live.
This bet does not change the data architecture pattern. It extends Optimistic Mutation with Echo-Suppressed Streaming to cover:
- A new upstream path (audio frames)
- A new downstream path (transcript segments and ML insights in real time)
Rabbit Holes
Audio encoding. The client captures audio in the browser. The ML service expects a specific format (PCM16 or WebM). Encoding decisions affect latency. We keep this simple: the client sends raw WebM chunks; Core forwards them as-is. No transcoding at Core.
ML degradation. If AssemblyAI is unavailable mid-recording, we cannot fail the session — the user is speaking. The bet requires graceful degradation: continue storing audio to GCS, show a warning, and recover via post-processing when services come back.
Speaker identification confidence threshold. Matching a voice embedding against known profiles requires a threshold. Too low: false matches. Too high: no matches. The threshold must be configurable without a deployment. Start with 0.85 and expose it as a server-side config value.
Session state. One active session per user. Core must enforce this — two concurrent recording sessions from the same user is an error, not a queuing scenario.
No-Gos
- No calendar integration — auto-starting from calendar events is a separate bet
- No multi-user collaborative recording
- No video capture — audio only
- No meeting bot integration (Zoom/Teams/Meet)
- No custom vocabulary or domain-specific tuning; use default AssemblyAI settings
Output
Pitch is accepted. Move to Design to map the user journey and define what the UI needs before the API is designed.