Transcription
Transcript processing lifecycle and segments — CRUD, live streaming events, batch processing, ML write-back, and Pub/Sub trigger.
Transcription
A transcription tracks the processing lifecycle for a meeting's audio. Each meeting has at most one transcription. Transcript segments are the individual speaker-attributed text fragments produced during live recording and refined during post-meeting batch processing. For shared semantics, see Infrastructure.
Resource Shapes
Transcription
{
"id": "transcription-uuid",
"meeting_id": "meeting-uuid",
"status": "transcribing",
"status_message": "Batch transcription in progress",
"progress_percent": 45,
"is_degraded": false,
"created_at": "2026-05-01T09:00:00Z",
"updated_at": "2026-05-01T10:01:00Z"
}Valid statuses: pending, transcribing, synthesizing, completed, failed.
pending— created but processing has not started (e.g., waiting for audio upload or first byte of live audio).transcribing— batch transcription and diarisation are in progress.synthesizing— transcript is complete; headline, summary, topics, and talking points are being generated.completed— all artefacts are final.failed— processing failed;status_messagecarries the reason.
Transcript Segment
{
"id": "segment-uuid",
"source_sequence": 1842,
"revision": 2,
"speaker_label": "speaker_1",
"person_id": "person-uuid",
"text": "Let's follow up tomorrow.",
"start_ms": 183900,
"end_ms": 185100,
"confidence": 0.94,
"is_final": true,
"feature_vector": [0.12, -0.34]
}source_sequence is assigned by ML as a monotonic counter per transcription session. It is independent of the audio chunk sequence number — the relationship between audio chunks and transcript segments is not 1:1 (one chunk may produce zero or multiple segments). Deduplication uses (transcription_id, source_sequence, revision).
REST API
GET /meetings/{id}/transcriptions
Lists transcriptions for a meeting (currently always 0 or 1).
| Auth | bearerAuth |
| Response | 200 TranscriptionList |
GET /transcriptions/{id}
Returns transcription metadata and processing status.
| Auth | bearerAuth |
| Response | 200 Transcription |
| Errors | 404 transcription not found |
GET /transcriptions/{id}/segments
Returns transcript segments with cursor-based pagination. Supports time-range filtering for audio-synced views and ML context recovery.
| Auth | bearerAuth or service auth |
| Response | 200 TranscriptSegmentList |
| Query params | cursor, limit (default 100, max 500), after_ms, before_ms, is_final |
The after_ms and before_ms parameters filter by segment start_ms, enabling ML to fetch recent segments for LLM context recovery after a pod restart.
POST /transcriptions/{id}/segments — ML Write-Back
Appends live transcript segments during an active session. Used for low-latency durable writes.
| Auth | service auth |
| Idempotency | De-duplicates by (transcription_id, source_sequence, revision) — source_sequence is ML-assigned (monotonic per session), not the audio chunk sequence number |
| Response | 204 No Content |
| Side effects | Broadcasts TranscriptSegmentEvent for live clients and EntityChangedEvent { entity: "transcript_segment" } for cache revalidation |
{
"segments": [
{
"id": "segment-uuid",
"source_sequence": 1842,
"revision": 1,
"speaker_label": "speaker_1",
"person_id": null,
"text": "Let's follow up tomorrow.",
"start_ms": 183900,
"end_ms": 185100,
"confidence": 0.94,
"is_final": true
}
]
}PUT /transcriptions/{id}/segments — ML Write-Back
Atomically replaces all transcript segments after batch transcription completes. This is the post-meeting quality pass — a new transcript version.
| Auth | service auth |
| Idempotency | Required |
| Response | 204 No Content |
| Errors | 404 transcription not found; 409 live session still active |
| Side effects | Broadcasts TranscriptRevisedEvent (not EntityChangedEvent) — clients must reload the full segment list |
PATCH /transcriptions/{id}/status — ML Write-Back
Updates processing state for the Meeting Summary progress indicator.
| Auth | service auth |
| Response | 204 No Content |
| Side effects | Inserts transcription_status_history row; broadcasts EntityChangedEvent { entity: "transcription", action: "updated" } |
{
"status": "synthesizing",
"message": "Generating summary and talking points",
"progress_percent": 75
}Real-Time Events
Core → Browser
TranscriptSegmentEvent
Carries a full segment for immediate rendering. Interim segments are replaced in-place by later events with the same id and a higher revision.
{
"specversion": "1.0",
"id": "event-uuid",
"source": "wordloop-core/ws",
"type": "com.wordloop.transcript.segment.v1",
"time": "2026-05-01T09:03:05Z",
"traceparent": "00-...",
"data": {
"meeting_id": "meeting-uuid",
"transcription_id": "transcription-uuid",
"segment": {
"id": "segment-uuid",
"revision": 2,
"source_sequence": 1842,
"speaker_label": "speaker_1",
"person_id": null,
"text": "Let's follow up tomorrow.",
"start_ms": 183900,
"end_ms": 185100,
"confidence": 0.94,
"is_final": true
}
}
}TranscriptRevisedEvent — New
Signals that the entire transcript has been replaced by a post-meeting quality pass. Clients must reload the full segment list via GET /transcriptions/{id}/segments. This replaces the ambiguous EntityChangedEvent { entity: "transcript_segment" } for the bulk replacement case.
{
"specversion": "1.0",
"id": "event-uuid",
"source": "wordloop-core/ws",
"type": "com.wordloop.transcript.revised.v1",
"time": "2026-05-01T10:05:00Z",
"traceparent": "00-...",
"data": {
"meeting_id": "meeting-uuid",
"transcription_id": "transcription-uuid",
"segment_count": 812,
"version": 2
}
}ML Integration
ML → Core
WebSocket: TranscriptSegmentProducedEvent
Emits an interim or final transcript segment. Core immediately fans this out to the app via WebSocket and persists it through Core REST/domain services.
{
"specversion": "1.0",
"id": "event-uuid",
"source": "wordloop-ml/ws",
"type": "com.wordloop.ml.transcript.segment.v1",
"time": "2026-05-01T09:03:05Z",
"traceparent": "00-...",
"data": {
"meeting_id": "meeting-uuid",
"transcription_id": "transcription-uuid",
"segment": {
"id": "segment-uuid",
"source_sequence": 1842,
"revision": 1,
"speaker_label": "speaker_1",
"person_id": null,
"text": "Let's follow up tomorrow.",
"start_ms": 183900,
"end_ms": 185100,
"confidence": 0.94,
"is_final": true
}
}
}WebSocket: SegmentFeaturesProducedEvent
Sends feature vectors for speaker matching and later voice-profile enrichment. Core persists vectors but does not broadcast them to the browser. For how these feed the speaker identification pipeline, see Person & Speaker Identity.
{
"specversion": "1.0",
"id": "event-uuid",
"source": "wordloop-ml/ws",
"type": "com.wordloop.ml.segment_features.v1",
"time": "2026-05-01T09:03:06Z",
"traceparent": "00-...",
"data": {
"meeting_id": "meeting-uuid",
"segment_id": "segment-uuid",
"speaker_label": "speaker_1",
"embedding_model": "ecapa-tdnn-v1",
"embedding": [0.12, -0.34]
}
}ML Batch Processing
Batch processing handles post-meeting transcription and synthesis. Pub/Sub is the normal trigger; REST provides a deterministic control surface for Core and tests.
POST /transcription-jobs/{id}/run
Starts or resumes a post-meeting transcription job.
| Auth | service auth |
| Idempotency | Required |
| Response | 202 Accepted with job status |
| Errors | 404 job unknown; 409 job already running with a different audio version |
{
"meeting_id": "meeting-uuid",
"transcription_id": "transcription-uuid",
"storage_path": "gs://wordloop-audio/meetings/meeting-uuid/audio.webm",
"audio_version": 2,
"task_extraction_policy": "skip",
"speaker_profile_policy": "enrich_after_completion"
}GET /transcription-jobs/{id}
Returns ML job progress for diagnostics. Core remains the user-facing source of truth for transcription status.
| Auth | service auth |
| Response | 200 MLTranscriptionJobStatus |
{
"id": "transcription-uuid",
"meeting_id": "meeting-uuid",
"status": "transcribing",
"progress_percent": 45,
"current_stage": "batch_transcription",
"started_at": "2026-05-01T10:01:00Z",
"completed_at": null
}Pub/Sub
transcription-jobs
Dispatches batch transcription and synthesis work to ML after an audio upload completes or a live recording has composed audio.webm. This is the single actionable trigger for post-meeting processing.
| Producer | Core |
| Consumer | ML post-meeting worker |
| CloudEvents type | com.wordloop.transcription.requested.v1 |
| Ordering key | meeting_id |
| Idempotency | transcription_id plus audio_version |
| Dead-letter | transcription-jobs-dlq |
{
"transcription_id": "transcription-uuid",
"meeting_id": "meeting-uuid",
"user_id": "user-uuid",
"storage_path": "gs://wordloop-audio/meetings/meeting-uuid/audio.webm",
"audio_version": 2,
"source_type": "live",
"task_extraction_policy": "skip",
"speaker_profile_policy": "enrich_after_completion"
}Valid source_type values: upload, live.
Valid task_extraction_policy values: extract, skip, replace_system. Live recordings use skip because tasks captured during the live session are preserved.
Valid speaker_profile_policy values: enrich_after_completion, skip. Controls whether ML updates voice profiles with session embeddings.
Consumer Outcomes
| Event | Consumer outcome |
|---|---|
transcription.requested | ML downloads audio, runs batch transcription/synthesis, writes results to Core REST, and updates status transitions. |
meeting.session.terminated | ML drains AssemblyAI, flushes final live segments via Core REST, and closes its ML WebSocket connection. |