WordloopWordloop
WorkMeeting RecordingTechnical Design DocContracts

Person & Speaker Identity

People CRUD, speaker identification pipeline, voice profiles, and ML speaker matching events.

Person & Speaker Identity

People are speaker identities. They can be referenced by tasks (assignee) and transcript segments (speaker attribution). This page covers person CRUD, the speaker identification pipeline that resolves anonymous diarisation labels to known people, and voice profile management. For shared semantics, see Infrastructure.

User-scoped identity: People are scoped to the authenticated user. Each user maintains their own set of people — there is no cross-user sharing of person records or voice profiles. If User A records a meeting with Person X, and User B later records with the same real-world person, User B must create their own Person record. Voice profile enrichment applies only within the owning user's data. This is a deliberate simplification for v1; organisation-level identity sharing is out of scope.

Resource Shape

{
  "id": "person-uuid",
  "display_name": "Avery Chen",
  "full_name": "Avery Chen",
  "title": "Product Manager",
  "role": "Product",
  "company": "WordLoop",
  "email": "avery@example.com",
  "voice_confidence": 0.91,
  "voice_model_status": "ready",
  "tags": ["team-alpha"],
  "created_at": "2026-04-15T10:00:00Z",
  "updated_at": "2026-05-01T09:15:00Z"
}

Valid voice_model_status values: untrained, training, ready, failed.

REST API

GET /people

Lists people for the authenticated user. Used for the speaker-labelling autocomplete.

AuthbearerAuth
Response200 PersonList
Query paramscursor, limit, q (search by name/email)

POST /people

Creates a person. Used during speaker labelling when the user adds a new person.

AuthbearerAuth
IdempotencyRequired
Response201 Created with Person + Location: /people/{id}
Side effectsBroadcasts EntityChangedEvent { entity: "person", action: "created" }
{
  "display_name": "Avery Chen",
  "full_name": "Avery Chen",
  "email": "avery@example.com"
}

GET /people/{id}

Returns a single person.

AuthbearerAuth
Response200 Person
Errors404 person not found

PATCH /people/{id}

Updates person metadata.

AuthbearerAuth
Response200 Person
Side effectsBroadcasts EntityChangedEvent { entity: "person", action: "updated" }

DELETE /people/{id}

Deletes a person. Transcript segments retain the speaker_label but clear the person_id.

AuthbearerAuth
Response204 No Content
Side effectsBroadcasts EntityChangedEvent { entity: "person", action: "deleted" }

Speaker Identification Pipeline

During a live recording, AssemblyAI produces diarised transcript segments with anonymous labels (speaker_1, speaker_2). ML resolves these to known people through voice embedding comparison. The pipeline has four states:

StateBehaviour
unmatchedCompare this segment's embedding against in-session voice profiles (pushed by Core). If confidence exceeds the threshold → transition to matched. Otherwise, increment attempts and retry on the next segment from this speaker.
matchedThe speaker label is locked to a person. All future segments from this speaker are tagged immediately — no further voice comparison needed.
exhaustedAfter N failed attempts (configurable, e.g. 5 segments), stop comparing for this speaker. The raw speaker_label is preserved. The user can manually resolve it.
manualSet when the user labels a speaker via POST /meetings/{id}/speaker-labels (see Meeting). Takes precedence over voice matching — ML will not attempt to match this speaker regardless of voice similarity.

Manual speaker labelling is documented on the Meeting page. The REST fallback for pushing speaker state to ML during session recovery is documented on the Recording page (POST /meetings/{id}/live-session/speaker-states).


ML Integration

Core → ML

WebSocket: SpeakerStateUpdatedEvent

Keeps ML aligned with user speaker-label changes during the live session. manual state takes precedence over voice matching.

{
  "specversion": "1.0",
  "id": "event-uuid",
  "source": "wordloop-core/ml-ws",
  "type": "com.wordloop.ml.speaker_state.updated.v1",
  "time": "2026-05-01T09:15:00Z",
  "traceparent": "00-...",
  "data": {
    "meeting_id": "meeting-uuid",
    "speaker_label": "speaker_1",
    "state": "manual",
    "person_id": "person-uuid"
  }
}

WebSocket: VoiceProfilesUpdatedEvent

Refreshes the in-session voice profile cache when Core enrolls or updates a profile while a recording is active.

{
  "specversion": "1.0",
  "id": "event-uuid",
  "source": "wordloop-core/ml-ws",
  "type": "com.wordloop.ml.voice_profiles.updated.v1",
  "time": "2026-05-01T09:16:00Z",
  "traceparent": "00-...",
  "data": {
    "meeting_id": "meeting-uuid",
    "profiles": [
      {
        "person_id": "person-uuid",
        "embedding_model": "ecapa-tdnn-v1",
        "embedding": [0.12, -0.34]
      }
    ]
  }
}

ML → Core

WebSocket: SpeakerMatchProducedEvent

Reports a confident speaker-to-person match. Core updates all matching segments and persists meeting_speaker_states as matched.

{
  "specversion": "1.0",
  "id": "event-uuid",
  "source": "wordloop-ml/ws",
  "type": "com.wordloop.ml.speaker_match.v1",
  "time": "2026-05-01T09:04:00Z",
  "traceparent": "00-...",
  "data": {
    "meeting_id": "meeting-uuid",
    "speaker_label": "speaker_1",
    "person_id": "person-uuid",
    "score": 0.93,
    "threshold": 0.88,
    "state": "matched"
  }
}

WebSocket: SpeakerExhaustedEvent

Tells Core that ML has stopped trying to match an unknown speaker after the bounded attempt count.

{
  "specversion": "1.0",
  "id": "event-uuid",
  "source": "wordloop-ml/ws",
  "type": "com.wordloop.ml.speaker_exhausted.v1",
  "time": "2026-05-01T09:08:00Z",
  "traceparent": "00-...",
  "data": {
    "meeting_id": "meeting-uuid",
    "speaker_label": "speaker_2",
    "attempt_count": 5,
    "state": "exhausted"
  }
}

Voice Profile Operations

Voice profiles power speaker identification. Core stores person records; ML owns embedding extraction and matching semantics.

POST /voice-profiles/matches

Compares a speaker embedding against enrolled voice profiles. Core supplies candidate profiles explicitly.

Authservice auth
Response200 VoiceMatchResponse
Errors422 invalid embedding; 503 embedding model unavailable

Request:

{
  "meeting_id": "meeting-uuid",
  "speaker_label": "speaker_1",
  "embedding_model": "ecapa-tdnn-v1",
  "embedding": [0.12, -0.34],
  "candidate_person_ids": ["person-uuid"],
  "top_k": 3
}

Response:

{
  "matches": [
    {
      "person_id": "person-uuid",
      "score": 0.93,
      "threshold": 0.88,
      "decision": "matched"
    }
  ]
}

POST /voice-profiles

Creates or enriches a person's voice profile from post-meeting segment embeddings.

Authservice auth
IdempotencyRequired
Request Content-Typemultipart/form-data for audio samples or application/json for segment references
Response201 Created or 200 OK with VoiceProfile + Location: /voice-profiles/{person_id}
{
  "person_id": "person-uuid",
  "meeting_id": "meeting-uuid",
  "segment_ids": ["segment-uuid"],
  "embedding_model": "ecapa-tdnn-v1"
}
{
  "person_id": "person-uuid",
  "embedding_model": "ecapa-tdnn-v1",
  "sample_count": 12,
  "quality_score": 0.91,
  "updated_at": "2026-05-01T10:15:00Z"
}

On this page