Meeting Recording — system context, per-flow sequence diagrams, and boundary inventory.

Data Flow

For each step in the User Flow, this page draws what calls what: which service initiates, which responds, what data crosses each boundary. Read each arrow two ways: it is a contract boundary (what shape the data takes) and a sequencing constraint (downstream cannot build until the upstream contract is published).

System Context

Rendering architecture map...

Flow 1: Start Recording

Rendering architecture map...

Flow 2: Live Audio → Transcription (Lowest Latency Path)

Audio flows from the browser microphone through Core and ML to AssemblyAI. Transcript segments return via the streaming HTTP response — the same connection ML uses to receive audio. This is a bidirectional HTTP stream: audio chunks flow upstream, segments and insights flow downstream.

Core streams segments directly to the client via WebSocket for minimum latency, and persists them to the database asynchronously in the background.

Rendering architecture map...

Flow 3a: Live Talking Points (Fast — Per Finalised Segment)

Talking points update on every finalised transcript segment. ML streams them back through the same HTTP stream as transcript segments. Core forwards them to the client via WebSocket and persists to the database asynchronously — the same dual-write pattern as transcript segments.

Rendering architecture map...

Flow 3b: Live Task Extraction (Slow — Every ~60s)

Task extraction runs on a slower cadence. ML buffers segments and periodically checks for action items. Tasks also stream back through the HTTP stream, following the same dual-write pattern.

Rendering architecture map...

Flow 3c: Live Speaker Identification (Per Segment)

Speaker identification is built into the live transcription flow. For every segment, ML extracts a voice embedding and stores it on Core. It then attempts to match the embedding against enrolled voice profiles.

When a user later labels an AssemblyAI speaker label (e.g. "Speaker A") as a known Person, the system uses all segments with that speaker label to enrich that person's voice profile for improved future matching.

Rendering architecture map...

Batch-transcribes the full audio from GCS (higher accuracy)
Replaces transcript segments with the improved results
Generates headline, summary, topics, and finalises talking points (is_final: true)
Extracts tasks when skip_tasks: false (file upload flow only)

Rendering architecture map...

Flow 8: Audio Playback (Signed URL Direct to GCS)

Core generates a short-lived signed URL. The client streams audio directly from Cloud Storage using that URL, with standard HTTP range requests for seeking.

Rendering architecture map...

Boundary Inventory

Every boundary shown in the diagrams above. Each becomes a contract on the Contracts page.

Boundary	From → To	Protocol	Data shape
Meeting CRUD	App → Core	REST	`POST/PATCH /meetings`
Recording commands	App → Core	WebSocket	`StartRecordingCommand`, `StopRecordingCommand`
Audio streaming	App → Core → ML	WebSocket (binary) → HTTP stream	Raw audio chunks
Live insights	ML → Core → App	HTTP stream → WebSocket	NDJSON events (5 types)
Speaker labels	App → Core	REST	`POST /meetings/{id}/speaker-labels`
Signed URL	App → Core → GCS	REST → GCS signed URL	`GET /meetings/{id}/audio-url`
Post-meeting trigger	Core → ML	Pub/Sub	`TranscriptionJob`, `MeetingSessionTerminated`
Synthesis write-back	ML → Core	REST	`PUT /synthesis`, `PATCH /meetings`, `PUT /segments`

Data Flow

Data Flow

System Context

Flow 1: Start Recording

Flow 2: Live Audio → Transcription (Lowest Latency Path)

Flow 3a: Live Talking Points (Fast — Per Finalised Segment)

Flow 3b: Live Task Extraction (Slow — Every ~60s)

Flow 3c: Live Speaker Identification (Per Segment)

Flow 4: User Creates Task During Recording

Flow 5: User Labels Speaker as Person

Flow 6: Stop Recording

Flow 7: Post-Meeting Processing (Automatic, via Pub/Sub)

Flow 8: Audio Playback (Signed URL Direct to GCS)

Boundary Inventory

On this page