WordloopWordloop
WorkMeeting RecordingTechnical Design DocContracts

Infrastructure

Cross-cutting API concerns — authentication, connections, CloudEvents, error format, Pub/Sub, and failure semantics.

Infrastructure

Shared semantics that apply across all entity contracts. Each entity page (Meeting, Recording, Audio, Transcription, Synthesis, Task, Person) documents its own endpoints and events but relies on the conventions defined here.


Core REST Semantics

ConcernContract
AuthUser-facing calls require bearerAuth (Clerk JWT). ML write-back uses service-to-service auth (signed JWT or mTLS).
User scopingAll resources are implicitly scoped to the authenticated user's sub claim. Queries return only that user's data; mutations on another user's resource return 403. user_id never appears in request or response bodies — it is derived from the token. Service-auth calls include user_id in the request body when acting on behalf of a user.
Trace contextAll requests accept traceparent and tracestate headers. Core propagates trace context into WebSocket and Pub/Sub envelopes.
IdempotencyAll POST requests require Idempotency-Key: <uuid>. Retried requests return the original result with the same status code.
Echo suppressionUser mutations accept Client-Session-Id: <uuid>. WebSocket echoes caused by that client carry sourceClientId so the origin tab can discard them.
ErrorsAll errors return application/problem+json with RFC 9457 fields: type, title, status, detail, instance, and optional field-level errors[].
PaginationCursor-based. Requests accept cursor and limit (default 20, max 100). Responses include next_cursor.
Location headerAll 201 Created responses include a Location header pointing to the new resource.
Rate limitsUser-facing responses include RateLimit-Limit, RateLimit-Remaining, and RateLimit-Reset.

ML REST Semantics

ConcernContract
AuthService-to-service auth only. Core is the normal caller. Browser credentials are never accepted.
Trace contexttraceparent and tracestate accepted on every request and copied into downstream provider calls (AssemblyAI, OpenAI).
IdempotencyCreating or draining sessions requires Idempotency-Key: <uuid>.
Errorsapplication/problem+json with RFC 9457 fields. Validation errors include field-level errors[].
TimeoutsSession create: 20 seconds. Drain: 30 seconds before returning 202 Accepted. Voice operations: 30 seconds.
PII/audio handlingRaw audio is not persisted by ML unless explicitly part of a voice-profile enrichment operation. Live audio durability belongs to Core/GCS.
Location headerAll 201 Created responses include a Location header.

Browser WebSocket Connection

Core owns the only browser-facing WebSocket.

ConcernContract
EndpointGET /ws upgrade
Authtoken=<jwt> query parameter, or Authorization: Bearer <jwt> when the edge supports forwarding headers
Client identityclient_session_id=<uuid> query parameter; copied into sourceClientId on echoes caused by that client
Replay cursorlast_event_id=<uuid> optional. Core replays durable entity-change events after this cursor when the replay buffer has not expired.
Replay bufferCore retains the last 5 minutes of durable events per user. If the client reconnects after the buffer has expired, it must do a full state re-fetch via REST. Core signals this by sending a ReplayExpiredEvent instead of replaying events.
Message sizeJSON text frames: 64 KiB. Binary audio frames: 1 MiB.
KeepaliveNative WebSocket ping every 30 seconds; two missed pongs terminate the connection.
Load balancingLive recording requires affinity to one Core pod for the life of the socket. The ideal edge is Layer 4 or an equivalent connection-stable route. This is a known constraint — see the backplane problem statement.

Real-Time Pattern

Optimistic Mutation + Echo-Suppressed Streaming for entity mutations. Bidirectional recording streaming for audio. REST remains the source of truth for writes; WebSocket events keep every open tab in sync and deliver low-latency recording artefacts.

CloudEvents Envelope

Every text frame is a CloudEvents v1.0 structured JSON event.

{
  "specversion": "1.0",
  "id": "event-uuid",
  "source": "wordloop-core/ws",
  "type": "com.wordloop.entity.changed.v1",
  "time": "2026-05-01T09:00:00Z",
  "traceparent": "00-...",
  "tracestate": "vendor=value",
  "sourceClientId": "client-session-uuid",
  "data": {}
}

sourceClientId is present only when the event was caused by a specific UI session. The origin client discards matching echoes; other tabs and devices apply the event.


ML WebSocket Connection

Core opens one WebSocket per ML live session after POST /meetings/{id}/live-session returns websocket_url. The browser never connects to ML directly — Core bridges the ML WebSocket to the browser WebSocket.

ConcernContract
EndpointGET /meetings/{meeting_id}/live-session/stream upgrade
CallerCore only
AuthService bearer token or mTLS identity. Browser credentials are never accepted.
Trace contextInitial handshake includes traceparent; every CloudEvent also carries traceparent.
Replay cursorCore may reconnect with last_ml_event_id and last_audio_sequence. ML de-duplicates audio by sequence and resumes output after the cursor when possible.
KeepaliveNative WebSocket ping every 30 seconds. Either side may close with code 1012 for service restart.

Text Frame Envelope

All text frames are CloudEvents v1.0 structured JSON.

{
  "specversion": "1.0",
  "id": "event-uuid",
  "source": "wordloop-ml/ws",
  "type": "com.wordloop.ml.transcript.segment.v1",
  "time": "2026-05-01T09:03:05Z",
  "traceparent": "00-...",
  "data": {}
}

Cache Invalidation

EntityChangedEvent

Generic cache-invalidation signal for single-entity mutations. For bulk operations (transcript replacement, synthesis update), entity pages define specific event types.

{
  "specversion": "1.0",
  "id": "event-uuid",
  "source": "wordloop-core/ws",
  "type": "com.wordloop.entity.changed.v1",
  "time": "2026-05-01T09:05:00Z",
  "traceparent": "00-...",
  "sourceClientId": "client-session-uuid",
  "data": {
    "entity": "meeting",
    "action": "updated",
    "id": "meeting-uuid",
    "version": 42
  }
}

Valid entities: meeting, person, task, note, transcription, transcript_segment, talking_point, synthesis, speaker_state.

Valid actions: created, updated, deleted.

ReplayExpiredEventNew

Sent when the client reconnects with a last_event_id that is older than the replay buffer (5 minutes). The client must do a full state re-fetch via REST.

{
  "specversion": "1.0",
  "id": "event-uuid",
  "source": "wordloop-core/ws",
  "type": "com.wordloop.replay.expired.v1",
  "time": "2026-05-01T09:35:00Z",
  "traceparent": "00-...",
  "data": {
    "last_event_id": "stale-event-uuid",
    "buffer_ttl_seconds": 300,
    "message": "Replay buffer expired. Full state re-fetch required."
  }
}

Browser Reconnection Rules

ScenarioContract
Browser loses socket (< 5 min)App reconnects with last_event_id. Core replays buffered events. If a recording is active, app also sends ResumeRecordingCommand.
Browser loses socket (> 5 min)Core sends ReplayExpiredEvent. App does a full REST re-fetch. If a recording is active, app sends ResumeRecordingCommand for gap recovery.
Audio frames duplicated after reconnectCore de-duplicates by (meeting_id, sequence) and checksum.
ML stream drops but Core socket remainsCore emits RecordingErrorEvent { code: "ml_unavailable", severity: "degraded" }; audio still writes to GCS. On recovery, emits RecordingErrorEvent { code: "ml_recovered" }.
Core drains for deployCore sends RecordingErrorEvent { code: "backpressure" } or closes after ping timeout; OPFS gap repair restores missing chunks on reconnect.

Pub/Sub Semantics

Pub/Sub is for durable asynchronous work — not the live path. Live audio and ML outputs use WebSockets. Pub/Sub coordinates post-meeting processing, session termination/drain, and retryable background jobs. Individual topics are documented on their entity pages (Transcription, Recording).

All Pub/Sub payloads are CloudEvents v1.0 JSON.

{
  "specversion": "1.0",
  "id": "event-uuid",
  "source": "wordloop-core/pubsub",
  "type": "com.wordloop.transcription.requested.v1",
  "time": "2026-05-01T10:00:00Z",
  "traceparent": "00-...",
  "tracestate": "vendor=value",
  "data": {}
}
ConcernContract
DeliveryAt least once. Consumers must de-duplicate by CloudEvents id and business idempotency keys.
Ordering keymeeting_id for all topics. Ensures events for the same meeting are processed in order within a single subscriber.
PublishingCore publishes through a transactional outbox — the event is written to an outbox table within the same database transaction as the state change, then delivered by a background relay. This guarantees at-least-once delivery without two-phase commit.
Traceabilitytraceparent is required whenever the originating HTTP/WebSocket request carried one.

Dead-Letter and Retry Configuration

SettingValueRationale
Max delivery attempts10Covers transient failures without infinite retry.
Initial backoff1 secondFast retry for network blips.
Max backoff600 seconds (10 min)Caps exponential growth.
Backoff multiplier2Standard exponential.
Dead-letter topic suffix-dlq (e.g., transcription-jobs-dlq)One DLQ per source topic.
DLQ retention14 daysEnough time for manual investigation and replay.
Ack deadline600 secondsLong enough for batch transcription jobs.

When a message exhausts its retry budget, Pub/Sub forwards it to the dead-letter topic. The DLQ subscription has no automatic consumers — an operator (or future automated triage) reviews and replays failed messages.


ML Stream Health

StreamWarningEvent

Reports recoverable ML-side degradation that doesn't rise to backpressure. For audio-specific backpressure events, see Audio.

{
  "specversion": "1.0",
  "id": "event-uuid",
  "source": "wordloop-ml/ws",
  "type": "com.wordloop.ml.stream.warning.v1",
  "time": "2026-05-01T09:06:00Z",
  "traceparent": "00-...",
  "data": {
    "meeting_id": "meeting-uuid",
    "code": "insight_warning",
    "message": "Talking points are delayed; transcription continues."
  }
}

ML Failure Semantics

FailureContract
ML WebSocket disconnectsCore reconnects with last_audio_sequence and last_ml_event_id. ML de-duplicates audio and resumes output when possible. Core sends StreamStartEvent with current speaker states and voice profiles on every reconnect.
ML cannot reconnectCore continues browser audio capture and GCS chunk storage, then emits Core RecordingErrorEvent { code: "ml_unavailable" }.
Upstream transcription provider slows downML emits BackpressureEvent; Core preserves audio and may pause live insights. ML emits BackpressureClearedEvent on recovery.
Speaker state changes while disconnectedCore persists the state to the database. On reconnect, Core sends StreamStartEvent with all current speaker states — ML reconstructs its in-memory map without needing a pull endpoint.
ML pod restarts mid-sessionCore detects the WebSocket drop and reconnects (possibly to a new pod). StreamStartEvent includes speaker states and voice profiles. ML fetches recent transcript segments from GET /transcriptions/{id}/segments?after_ms=... to rebuild its LLM context window, then resumes processing. Context quality degrades gracefully — the rolling buffer rebuilds over subsequent segments.
Drain exceeds budgetML returns REST 202 Accepted status and later emits write-back results via Core REST as background completion finishes.

Event Versioning Policy

All CloudEvents types use a .v1 suffix (e.g., com.wordloop.recording.start.v1). The versioning policy:

  • Additive changes (new optional fields, new event types) do not require a version bump. Consumers must ignore unknown fields.
  • Breaking changes (removed fields, changed semantics, changed required fields) require a new version suffix (.v2). The old type continues to be emitted alongside the new type for one release cycle to allow consumer migration.
  • Deprecation: A deprecated event type is annotated in the contract docs but continues to fire until all known consumers have migrated.

Consumers should be written defensively: parse known fields, ignore unknown fields, and tolerate missing optional fields.


Observability Conventions

Every service must include the following fields in structured log output for any operation related to a live recording session:

FieldWhen presentSource
meeting_idAlwaysFrom the request or event
ml_session_idDuring active recordingFrom RecordingStartedEvent or ML session
sequenceAudio chunk operationsFrom the chunk metadata
transcription_idTranscript operationsFrom the transcription resource
traceparentAlwaysFrom the incoming request/event

These fields enable correlation of a single audio chunk or transcript segment across App → Core → ML → AssemblyAI → ML → Core → App, plus GCS writes and Pub/Sub messages.


Recording Event History

Core persists a recording_event_history table that logs every recording state transition and significant event:

ColumnTypeDescription
idUUIDEvent ID
meeting_idUUIDMeeting reference
event_typetexte.g., started, stopped, error, gap_upload, compose_started, compose_completed
from_statustextPrevious recording status (nullable for initial events)
to_statustextNew recording status
metadatajsonbEvent-specific data (error codes, sequence numbers, chunk counts)
created_attimestamptzWhen the event occurred

This table is write-only during normal operation. It is the primary diagnostic tool for investigating recording issues in production.

On this page