People CRUD, speaker identification pipeline, voice profiles, and ML speaker matching events.

Person & Speaker Identity

People are speaker identities. They can be referenced by tasks (assignee) and transcript segments (speaker attribution). This page covers person CRUD, the speaker identification pipeline that resolves anonymous diarisation labels to known people, and voice profile management. For shared semantics, see Infrastructure.

User-scoped identity: People are scoped to the authenticated user. Each user maintains their own set of people — there is no cross-user sharing of person records or voice profiles. If User A records a meeting with Person X, and User B later records with the same real-world person, User B must create their own Person record. Voice profile enrichment applies only within the owning user's data. This is a deliberate simplification for v1; organisation-level identity sharing is out of scope.

Resource Shape

{
  "id": "person-uuid",
  "display_name": "Avery Chen",
  "full_name": "Avery Chen",
  "title": "Product Manager",
  "role": "Product",
  "company": "WordLoop",
  "email": "avery@example.com",
  "voice_confidence": 0.91,
  "voice_model_status": "ready",
  "tags": ["team-alpha"],
  "created_at": "2026-04-15T10:00:00Z",
  "updated_at": "2026-05-01T09:15:00Z"
}

Valid voice_model_status values: untrained, training, ready, failed.

REST API

`GET /people`

Lists people for the authenticated user. Used for the speaker-labelling autocomplete.


Auth	`bearerAuth`
Response	`200 PersonList`
Query params	`cursor`, `limit`, `q` (search by name/email)

`POST /people`

Creates a person. Used during speaker labelling when the user adds a new person.


Auth	`bearerAuth`
Idempotency	Required
Response	`201 Created` with `Person` + `Location: /people/{id}`
Side effects	Broadcasts `EntityChangedEvent { entity: "person", action: "created" }`

{
  "display_name": "Avery Chen",
  "full_name": "Avery Chen",
  "email": "avery@example.com"
}

`GET /people/{id}`

Returns a single person.


Auth	`bearerAuth`
Response	`200 Person`
Errors	`404` person not found

`PATCH /people/{id}`

Updates person metadata.


Auth	`bearerAuth`
Response	`200 Person`
Side effects	Broadcasts `EntityChangedEvent { entity: "person", action: "updated" }`

`DELETE /people/{id}`

Deletes a person. Transcript segments retain the speaker_label but clear the person_id.


Auth	`bearerAuth`
Response	`204 No Content`
Side effects	Broadcasts `EntityChangedEvent { entity: "person", action: "deleted" }`

Speaker Identification Pipeline

During a live recording, AssemblyAI produces diarised transcript segments with anonymous labels (speaker_1, speaker_2). ML resolves these to known people through voice embedding comparison. The pipeline has four states:

State	Behaviour
`unmatched`	Compare this segment's embedding against in-session voice profiles (pushed by Core). If confidence exceeds the threshold → transition to `matched`. Otherwise, increment attempts and retry on the next segment from this speaker.
`matched`	The speaker label is locked to a person. All future segments from this speaker are tagged immediately — no further voice comparison needed.
`exhausted`	After N failed attempts (configurable, e.g. 5 segments), stop comparing for this speaker. The raw `speaker_label` is preserved. The user can manually resolve it.
`manual`	Set when the user labels a speaker via `POST /meetings/{id}/speaker-labels` (see Meeting). Takes precedence over voice matching — ML will not attempt to match this speaker regardless of voice similarity.

Manual speaker labelling is documented on the Meeting page. The REST fallback for pushing speaker state to ML during session recovery is documented on the Recording page (POST /meetings/{id}/live-session/speaker-states).

ML Integration

Core → ML

WebSocket: `SpeakerStateUpdatedEvent`

Keeps ML aligned with user speaker-label changes during the live session. manual state takes precedence over voice matching.

{
  "specversion": "1.0",
  "id": "event-uuid",
  "source": "wordloop-core/ml-ws",
  "type": "com.wordloop.ml.speaker_state.updated.v1",
  "time": "2026-05-01T09:15:00Z",
  "traceparent": "00-...",
  "data": {
    "meeting_id": "meeting-uuid",
    "speaker_label": "speaker_1",
    "state": "manual",
    "person_id": "person-uuid"
  }
}

WebSocket: `VoiceProfilesUpdatedEvent`

Refreshes the in-session voice profile cache when Core enrolls or updates a profile while a recording is active.

{
  "specversion": "1.0",
  "id": "event-uuid",
  "source": "wordloop-core/ml-ws",
  "type": "com.wordloop.ml.voice_profiles.updated.v1",
  "time": "2026-05-01T09:16:00Z",
  "traceparent": "00-...",
  "data": {
    "meeting_id": "meeting-uuid",
    "profiles": [
      {
        "person_id": "person-uuid",
        "embedding_model": "ecapa-tdnn-v1",
        "embedding": [0.12, -0.34]
      }
    ]
  }
}

ML → Core

WebSocket: `SpeakerMatchProducedEvent`

Reports a confident speaker-to-person match. Core updates all matching segments and persists meeting_speaker_states as matched.

{
  "specversion": "1.0",
  "id": "event-uuid",
  "source": "wordloop-ml/ws",
  "type": "com.wordloop.ml.speaker_match.v1",
  "time": "2026-05-01T09:04:00Z",
  "traceparent": "00-...",
  "data": {
    "meeting_id": "meeting-uuid",
    "speaker_label": "speaker_1",
    "person_id": "person-uuid",
    "score": 0.93,
    "threshold": 0.88,
    "state": "matched"
  }
}

WebSocket: `SpeakerExhaustedEvent`

Tells Core that ML has stopped trying to match an unknown speaker after the bounded attempt count.

{
  "specversion": "1.0",
  "id": "event-uuid",
  "source": "wordloop-ml/ws",
  "type": "com.wordloop.ml.speaker_exhausted.v1",
  "time": "2026-05-01T09:08:00Z",
  "traceparent": "00-...",
  "data": {
    "meeting_id": "meeting-uuid",
    "speaker_label": "speaker_2",
    "attempt_count": 5,
    "state": "exhausted"
  }
}

Voice Profile Operations

Voice profiles power speaker identification. Core stores person records; ML owns embedding extraction and matching semantics.

`POST /voice-profiles/matches`

Compares a speaker embedding against enrolled voice profiles. Core supplies candidate profiles explicitly.


Auth	service auth
Response	`200 VoiceMatchResponse`
Errors	`422` invalid embedding; `503` embedding model unavailable

Request:

{
  "meeting_id": "meeting-uuid",
  "speaker_label": "speaker_1",
  "embedding_model": "ecapa-tdnn-v1",
  "embedding": [0.12, -0.34],
  "candidate_person_ids": ["person-uuid"],
  "top_k": 3
}

Response:

{
  "matches": [
    {
      "person_id": "person-uuid",
      "score": 0.93,
      "threshold": 0.88,
      "decision": "matched"
    }
  ]
}

`POST /voice-profiles`

Creates or enriches a person's voice profile from post-meeting segment embeddings.


Auth	service auth
Idempotency	Required
Request Content-Type	`multipart/form-data` for audio samples or `application/json` for segment references
Response	`201 Created` or `200 OK` with `VoiceProfile` + `Location: /voice-profiles/{person_id}`

{
  "person_id": "person-uuid",
  "meeting_id": "meeting-uuid",
  "segment_ids": ["segment-uuid"],
  "embedding_model": "ecapa-tdnn-v1"
}

{
  "person_id": "person-uuid",
  "embedding_model": "ecapa-tdnn-v1",
  "sample_count": 12,
  "quality_score": 0.91,
  "updated_at": "2026-05-01T10:15:00Z"
}

Person & Speaker Identity

On this page