The Pitch

Status: Accepted

Author: Ryan Nel

Problem

See Problem Statement: no live capture flow — meetings enter only via file upload or manual text entry. Appetite: Large.

Extend the existing WebSocket connection to carry binary audio frames upstream. Core receives audio and fans out in parallel: stream to GCS for durable storage, and stream to ML for live transcription. ML routes segments and insights back to Core via a persistent HTTP stream — Core broadcasts them to the client via WebSocket and persists them asynchronously.

When recording stops, Core publishes a MeetingSessionTerminated event. ML drains its AssemblyAI buffer and sends final segments. Core then triggers a post-meeting reprocessing job via Pub/Sub — the same pipeline used for file uploads, with skip_tasks: true to preserve tasks captured live.

This bet does not change the data architecture pattern. It extends Optimistic Mutation with Echo-Suppressed Streaming to cover:

A new upstream path (audio frames)
A new downstream path (transcript segments and ML insights in real time)

Rabbit Holes

Audio encoding. The client captures audio in the browser. The ML service expects a specific format (PCM16 or WebM). Encoding decisions affect latency. We keep this simple: the client sends raw WebM chunks; Core forwards them as-is. No transcoding at Core.

ML degradation. If AssemblyAI is unavailable mid-recording, we cannot fail the session — the user is speaking. The bet requires graceful degradation: continue storing audio to GCS, show a warning, and recover via post-processing when services come back.

Speaker identification confidence threshold. Matching a voice embedding against known profiles requires a threshold. Too low: false matches. Too high: no matches. The threshold must be configurable without a deployment. Start with 0.85 and expose it as a server-side config value.

Session state. One active session per user. Core must enforce this — two concurrent recording sessions from the same user is an error, not a queuing scenario.

No-Gos

No calendar integration — auto-starting from calendar events is a separate bet
No multi-user collaborative recording
No video capture — audio only
No meeting bot integration (Zoom/Teams/Meet)
No custom vocabulary or domain-specific tuning; use default AssemblyAI settings

Output

Pitch is accepted. Move to Design to map the user journey and define what the UI needs before the API is designed.

Pitch

The Pitch

Problem

Solution Sketch

Rabbit Holes

No-Gos

Output

On this page