WordloopWordloop
WorkMeeting RecordingTechnical Design DocContracts

Transcription

Transcript processing lifecycle and segments — CRUD, live streaming events, batch processing, ML write-back, and Pub/Sub trigger.

Transcription

A transcription tracks the processing lifecycle for a meeting's audio. Each meeting has at most one transcription. Transcript segments are the individual speaker-attributed text fragments produced during live recording and refined during post-meeting batch processing. For shared semantics, see Infrastructure.

Resource Shapes

Transcription

{
  "id": "transcription-uuid",
  "meeting_id": "meeting-uuid",
  "status": "transcribing",
  "status_message": "Batch transcription in progress",
  "progress_percent": 45,
  "is_degraded": false,
  "created_at": "2026-05-01T09:00:00Z",
  "updated_at": "2026-05-01T10:01:00Z"
}

Valid statuses: pending, transcribing, synthesizing, completed, failed.

  • pending — created but processing has not started (e.g., waiting for audio upload or first byte of live audio).
  • transcribing — batch transcription and diarisation are in progress.
  • synthesizing — transcript is complete; headline, summary, topics, and talking points are being generated.
  • completed — all artefacts are final.
  • failed — processing failed; status_message carries the reason.

Transcript Segment

{
  "id": "segment-uuid",
  "source_sequence": 1842,
  "revision": 2,
  "speaker_label": "speaker_1",
  "person_id": "person-uuid",
  "text": "Let's follow up tomorrow.",
  "start_ms": 183900,
  "end_ms": 185100,
  "confidence": 0.94,
  "is_final": true,
  "feature_vector": [0.12, -0.34]
}

source_sequence is assigned by ML as a monotonic counter per transcription session. It is independent of the audio chunk sequence number — the relationship between audio chunks and transcript segments is not 1:1 (one chunk may produce zero or multiple segments). Deduplication uses (transcription_id, source_sequence, revision).

REST API

GET /meetings/{id}/transcriptions

Lists transcriptions for a meeting (currently always 0 or 1).

AuthbearerAuth
Response200 TranscriptionList

GET /transcriptions/{id}

Returns transcription metadata and processing status.

AuthbearerAuth
Response200 Transcription
Errors404 transcription not found

GET /transcriptions/{id}/segments

Returns transcript segments with cursor-based pagination. Supports time-range filtering for audio-synced views and ML context recovery.

AuthbearerAuth or service auth
Response200 TranscriptSegmentList
Query paramscursor, limit (default 100, max 500), after_ms, before_ms, is_final

The after_ms and before_ms parameters filter by segment start_ms, enabling ML to fetch recent segments for LLM context recovery after a pod restart.

POST /transcriptions/{id}/segments — ML Write-Back

Appends live transcript segments during an active session. Used for low-latency durable writes.

Authservice auth
IdempotencyDe-duplicates by (transcription_id, source_sequence, revision)source_sequence is ML-assigned (monotonic per session), not the audio chunk sequence number
Response204 No Content
Side effectsBroadcasts TranscriptSegmentEvent for live clients and EntityChangedEvent { entity: "transcript_segment" } for cache revalidation
{
  "segments": [
    {
      "id": "segment-uuid",
      "source_sequence": 1842,
      "revision": 1,
      "speaker_label": "speaker_1",
      "person_id": null,
      "text": "Let's follow up tomorrow.",
      "start_ms": 183900,
      "end_ms": 185100,
      "confidence": 0.94,
      "is_final": true
    }
  ]
}

PUT /transcriptions/{id}/segments — ML Write-Back

Atomically replaces all transcript segments after batch transcription completes. This is the post-meeting quality pass — a new transcript version.

Authservice auth
IdempotencyRequired
Response204 No Content
Errors404 transcription not found; 409 live session still active
Side effectsBroadcasts TranscriptRevisedEvent (not EntityChangedEvent) — clients must reload the full segment list

PATCH /transcriptions/{id}/status — ML Write-Back

Updates processing state for the Meeting Summary progress indicator.

Authservice auth
Response204 No Content
Side effectsInserts transcription_status_history row; broadcasts EntityChangedEvent { entity: "transcription", action: "updated" }
{
  "status": "synthesizing",
  "message": "Generating summary and talking points",
  "progress_percent": 75
}

Real-Time Events

Core → Browser

TranscriptSegmentEvent

Carries a full segment for immediate rendering. Interim segments are replaced in-place by later events with the same id and a higher revision.

{
  "specversion": "1.0",
  "id": "event-uuid",
  "source": "wordloop-core/ws",
  "type": "com.wordloop.transcript.segment.v1",
  "time": "2026-05-01T09:03:05Z",
  "traceparent": "00-...",
  "data": {
    "meeting_id": "meeting-uuid",
    "transcription_id": "transcription-uuid",
    "segment": {
      "id": "segment-uuid",
      "revision": 2,
      "source_sequence": 1842,
      "speaker_label": "speaker_1",
      "person_id": null,
      "text": "Let's follow up tomorrow.",
      "start_ms": 183900,
      "end_ms": 185100,
      "confidence": 0.94,
      "is_final": true
    }
  }
}

TranscriptRevisedEventNew

Signals that the entire transcript has been replaced by a post-meeting quality pass. Clients must reload the full segment list via GET /transcriptions/{id}/segments. This replaces the ambiguous EntityChangedEvent { entity: "transcript_segment" } for the bulk replacement case.

{
  "specversion": "1.0",
  "id": "event-uuid",
  "source": "wordloop-core/ws",
  "type": "com.wordloop.transcript.revised.v1",
  "time": "2026-05-01T10:05:00Z",
  "traceparent": "00-...",
  "data": {
    "meeting_id": "meeting-uuid",
    "transcription_id": "transcription-uuid",
    "segment_count": 812,
    "version": 2
  }
}

ML Integration

ML → Core

WebSocket: TranscriptSegmentProducedEvent

Emits an interim or final transcript segment. Core immediately fans this out to the app via WebSocket and persists it through Core REST/domain services.

{
  "specversion": "1.0",
  "id": "event-uuid",
  "source": "wordloop-ml/ws",
  "type": "com.wordloop.ml.transcript.segment.v1",
  "time": "2026-05-01T09:03:05Z",
  "traceparent": "00-...",
  "data": {
    "meeting_id": "meeting-uuid",
    "transcription_id": "transcription-uuid",
    "segment": {
      "id": "segment-uuid",
      "source_sequence": 1842,
      "revision": 1,
      "speaker_label": "speaker_1",
      "person_id": null,
      "text": "Let's follow up tomorrow.",
      "start_ms": 183900,
      "end_ms": 185100,
      "confidence": 0.94,
      "is_final": true
    }
  }
}

WebSocket: SegmentFeaturesProducedEvent

Sends feature vectors for speaker matching and later voice-profile enrichment. Core persists vectors but does not broadcast them to the browser. For how these feed the speaker identification pipeline, see Person & Speaker Identity.

{
  "specversion": "1.0",
  "id": "event-uuid",
  "source": "wordloop-ml/ws",
  "type": "com.wordloop.ml.segment_features.v1",
  "time": "2026-05-01T09:03:06Z",
  "traceparent": "00-...",
  "data": {
    "meeting_id": "meeting-uuid",
    "segment_id": "segment-uuid",
    "speaker_label": "speaker_1",
    "embedding_model": "ecapa-tdnn-v1",
    "embedding": [0.12, -0.34]
  }
}

ML Batch Processing

Batch processing handles post-meeting transcription and synthesis. Pub/Sub is the normal trigger; REST provides a deterministic control surface for Core and tests.

POST /transcription-jobs/{id}/run

Starts or resumes a post-meeting transcription job.

Authservice auth
IdempotencyRequired
Response202 Accepted with job status
Errors404 job unknown; 409 job already running with a different audio version
{
  "meeting_id": "meeting-uuid",
  "transcription_id": "transcription-uuid",
  "storage_path": "gs://wordloop-audio/meetings/meeting-uuid/audio.webm",
  "audio_version": 2,
  "task_extraction_policy": "skip",
  "speaker_profile_policy": "enrich_after_completion"
}

GET /transcription-jobs/{id}

Returns ML job progress for diagnostics. Core remains the user-facing source of truth for transcription status.

Authservice auth
Response200 MLTranscriptionJobStatus
{
  "id": "transcription-uuid",
  "meeting_id": "meeting-uuid",
  "status": "transcribing",
  "progress_percent": 45,
  "current_stage": "batch_transcription",
  "started_at": "2026-05-01T10:01:00Z",
  "completed_at": null
}

Pub/Sub

transcription-jobs

Dispatches batch transcription and synthesis work to ML after an audio upload completes or a live recording has composed audio.webm. This is the single actionable trigger for post-meeting processing.

ProducerCore
ConsumerML post-meeting worker
CloudEvents typecom.wordloop.transcription.requested.v1
Ordering keymeeting_id
Idempotencytranscription_id plus audio_version
Dead-lettertranscription-jobs-dlq
{
  "transcription_id": "transcription-uuid",
  "meeting_id": "meeting-uuid",
  "user_id": "user-uuid",
  "storage_path": "gs://wordloop-audio/meetings/meeting-uuid/audio.webm",
  "audio_version": 2,
  "source_type": "live",
  "task_extraction_policy": "skip",
  "speaker_profile_policy": "enrich_after_completion"
}

Valid source_type values: upload, live.

Valid task_extraction_policy values: extract, skip, replace_system. Live recordings use skip because tasks captured during the live session are preserved.

Valid speaker_profile_policy values: enrich_after_completion, skip. Controls whether ML updates voice profiles with session embeddings.

Consumer Outcomes

EventConsumer outcome
transcription.requestedML downloads audio, runs batch transcription/synthesis, writes results to Core REST, and updates status transitions.
meeting.session.terminatedML drains AssemblyAI, flushes final live segments via Core REST, and closes its ML WebSocket connection.

On this page