Person & Speaker Identity
People CRUD, speaker identification pipeline, voice profiles, and ML speaker matching events.
Person & Speaker Identity
People are speaker identities. They can be referenced by tasks (assignee) and transcript segments (speaker attribution). This page covers person CRUD, the speaker identification pipeline that resolves anonymous diarisation labels to known people, and voice profile management. For shared semantics, see Infrastructure.
User-scoped identity: People are scoped to the authenticated user. Each user maintains their own set of people — there is no cross-user sharing of person records or voice profiles. If User A records a meeting with Person X, and User B later records with the same real-world person, User B must create their own Person record. Voice profile enrichment applies only within the owning user's data. This is a deliberate simplification for v1; organisation-level identity sharing is out of scope.
Resource Shape
{
"id": "person-uuid",
"display_name": "Avery Chen",
"full_name": "Avery Chen",
"title": "Product Manager",
"role": "Product",
"company": "WordLoop",
"email": "avery@example.com",
"voice_confidence": 0.91,
"voice_model_status": "ready",
"tags": ["team-alpha"],
"created_at": "2026-04-15T10:00:00Z",
"updated_at": "2026-05-01T09:15:00Z"
}Valid voice_model_status values: untrained, training, ready, failed.
REST API
GET /people
Lists people for the authenticated user. Used for the speaker-labelling autocomplete.
| Auth | bearerAuth |
| Response | 200 PersonList |
| Query params | cursor, limit, q (search by name/email) |
POST /people
Creates a person. Used during speaker labelling when the user adds a new person.
| Auth | bearerAuth |
| Idempotency | Required |
| Response | 201 Created with Person + Location: /people/{id} |
| Side effects | Broadcasts EntityChangedEvent { entity: "person", action: "created" } |
{
"display_name": "Avery Chen",
"full_name": "Avery Chen",
"email": "avery@example.com"
}GET /people/{id}
Returns a single person.
| Auth | bearerAuth |
| Response | 200 Person |
| Errors | 404 person not found |
PATCH /people/{id}
Updates person metadata.
| Auth | bearerAuth |
| Response | 200 Person |
| Side effects | Broadcasts EntityChangedEvent { entity: "person", action: "updated" } |
DELETE /people/{id}
Deletes a person. Transcript segments retain the speaker_label but clear the person_id.
| Auth | bearerAuth |
| Response | 204 No Content |
| Side effects | Broadcasts EntityChangedEvent { entity: "person", action: "deleted" } |
Speaker Identification Pipeline
During a live recording, AssemblyAI produces diarised transcript segments with anonymous labels (speaker_1, speaker_2). ML resolves these to known people through voice embedding comparison. The pipeline has four states:
| State | Behaviour |
|---|---|
unmatched | Compare this segment's embedding against in-session voice profiles (pushed by Core). If confidence exceeds the threshold → transition to matched. Otherwise, increment attempts and retry on the next segment from this speaker. |
matched | The speaker label is locked to a person. All future segments from this speaker are tagged immediately — no further voice comparison needed. |
exhausted | After N failed attempts (configurable, e.g. 5 segments), stop comparing for this speaker. The raw speaker_label is preserved. The user can manually resolve it. |
manual | Set when the user labels a speaker via POST /meetings/{id}/speaker-labels (see Meeting). Takes precedence over voice matching — ML will not attempt to match this speaker regardless of voice similarity. |
Manual speaker labelling is documented on the Meeting page. The REST fallback for pushing speaker state to ML during session recovery is documented on the Recording page (POST /meetings/{id}/live-session/speaker-states).
ML Integration
Core → ML
WebSocket: SpeakerStateUpdatedEvent
Keeps ML aligned with user speaker-label changes during the live session. manual state takes precedence over voice matching.
{
"specversion": "1.0",
"id": "event-uuid",
"source": "wordloop-core/ml-ws",
"type": "com.wordloop.ml.speaker_state.updated.v1",
"time": "2026-05-01T09:15:00Z",
"traceparent": "00-...",
"data": {
"meeting_id": "meeting-uuid",
"speaker_label": "speaker_1",
"state": "manual",
"person_id": "person-uuid"
}
}WebSocket: VoiceProfilesUpdatedEvent
Refreshes the in-session voice profile cache when Core enrolls or updates a profile while a recording is active.
{
"specversion": "1.0",
"id": "event-uuid",
"source": "wordloop-core/ml-ws",
"type": "com.wordloop.ml.voice_profiles.updated.v1",
"time": "2026-05-01T09:16:00Z",
"traceparent": "00-...",
"data": {
"meeting_id": "meeting-uuid",
"profiles": [
{
"person_id": "person-uuid",
"embedding_model": "ecapa-tdnn-v1",
"embedding": [0.12, -0.34]
}
]
}
}ML → Core
WebSocket: SpeakerMatchProducedEvent
Reports a confident speaker-to-person match. Core updates all matching segments and persists meeting_speaker_states as matched.
{
"specversion": "1.0",
"id": "event-uuid",
"source": "wordloop-ml/ws",
"type": "com.wordloop.ml.speaker_match.v1",
"time": "2026-05-01T09:04:00Z",
"traceparent": "00-...",
"data": {
"meeting_id": "meeting-uuid",
"speaker_label": "speaker_1",
"person_id": "person-uuid",
"score": 0.93,
"threshold": 0.88,
"state": "matched"
}
}WebSocket: SpeakerExhaustedEvent
Tells Core that ML has stopped trying to match an unknown speaker after the bounded attempt count.
{
"specversion": "1.0",
"id": "event-uuid",
"source": "wordloop-ml/ws",
"type": "com.wordloop.ml.speaker_exhausted.v1",
"time": "2026-05-01T09:08:00Z",
"traceparent": "00-...",
"data": {
"meeting_id": "meeting-uuid",
"speaker_label": "speaker_2",
"attempt_count": 5,
"state": "exhausted"
}
}Voice Profile Operations
Voice profiles power speaker identification. Core stores person records; ML owns embedding extraction and matching semantics.
POST /voice-profiles/matches
Compares a speaker embedding against enrolled voice profiles. Core supplies candidate profiles explicitly.
| Auth | service auth |
| Response | 200 VoiceMatchResponse |
| Errors | 422 invalid embedding; 503 embedding model unavailable |
Request:
{
"meeting_id": "meeting-uuid",
"speaker_label": "speaker_1",
"embedding_model": "ecapa-tdnn-v1",
"embedding": [0.12, -0.34],
"candidate_person_ids": ["person-uuid"],
"top_k": 3
}Response:
{
"matches": [
{
"person_id": "person-uuid",
"score": 0.93,
"threshold": 0.88,
"decision": "matched"
}
]
}POST /voice-profiles
Creates or enriches a person's voice profile from post-meeting segment embeddings.
| Auth | service auth |
| Idempotency | Required |
| Request Content-Type | multipart/form-data for audio samples or application/json for segment references |
| Response | 201 Created or 200 OK with VoiceProfile + Location: /voice-profiles/{person_id} |
{
"person_id": "person-uuid",
"meeting_id": "meeting-uuid",
"segment_ids": ["segment-uuid"],
"embedding_model": "ecapa-tdnn-v1"
}{
"person_id": "person-uuid",
"embedding_model": "ecapa-tdnn-v1",
"sample_count": 12,
"quality_score": 0.91,
"updated_at": "2026-05-01T10:15:00Z"
}