# Wordloop Platform (/docs) {/* LLM-Context: TL;DR: This is the root index of the Wordloop Platform documentation. Wordloop is a monorepo consisting of: - wordloop-core (Go, Port 4002): REST API, DB (pgvector), domain logic. - wordloop-ml (Python/FastAPI, Port 4003): AI/ML tasks, transcription. - wordloop-app (Next.js, Port 4001): Web frontend with SSR. Core routing philosophy: Trace-First Development. Dependencies mapping: Check knowledge-graph.json. */} # Wordloop Platform [#wordloop-platform] Meeting transcription, speaker identification, and AI-powered conversation intelligence. ## Services [#services] | Service | Language | Port | Role | | --------------------------------------------- | ---------------- | ---- | -------------------------------------- | | [wordloop-core](learn/services/core/index.md) | Go | 4002 | REST API, domain logic, database | | [wordloop-ml](learn/services/ml/index.md) | Python / FastAPI | 4003 | Transcription, speaker embeddings, LLM | | [wordloop-app](learn/services/app/index.md) | Next.js | 4001 | Web frontend | ## Architecture at a glance [#architecture-at-a-glance] ## Navigating the Documentation [#navigating-the-documentation] If you are new to the platform, we recommend following the sidebar from top to bottom: 1. **[Principles](principles/index.mdx)** — Start by understanding our core philosophy, engineering values, and system constraints. 2. **[Architecture](learn/architecture/overview.mdx)** — See how those principles are applied structurally across the system and infrastructure. 3. **[Development](start/quickstart.md)** — Learn how to spin up the entire platform locally via our custom `./dev` CLI. 4. **Services** — Dive deep into specific implementations for [Core](learn/services/core/index.md), [ML](learn/services/ml/index.md), and [App](learn/services/app/index.md). 5. **API & Schemas** — Reference material for system contracts. # Postgres with pgvector as the production vector store (/docs/decisions/0001-postgres-for-vector-search) # 0001 — Postgres with `pgvector` as the production vector store [#0001--postgres-with-pgvector-as-the-production-vector-store] **Status:** Accepted **Date:** 2026-04-19 **Deciders:** core platform **Supersedes:** — **Superseded by:** — ## Context [#context] Wordloop generates and stores embeddings for transcript chunks, speaker utterances, and recap summaries. A retrieval-augmented generation (RAG) workflow at read time uses these embeddings to supply context to model calls. The default instinct when adding a GenAI feature is to reach for a dedicated vector database — Pinecone, Milvus, Weaviate, or similar. These systems offer specialised ANN indexes, horizontal scale, and purpose-built tooling. At our current scale, they also introduce an operational surface we do not need and a split-brain failure mode we actively want to avoid. Embeddings in Wordloop are not an island. They exist **because** a specific transcript chunk exists. They must appear atomically with the chunk, be removed atomically when the chunk is removed, and obey the same authorisation rules the chunk does. A system where the transcript lives in Postgres and its embedding lives in a separate service that is updated "eventually" is a system where queries will silently return embeddings for deleted content or miss content that was just created — neither of which is acceptable. ## Decision [#decision] Use PostgreSQL with the `pgvector` extension as the single production vector store. Embeddings live on the row they describe (or in a sibling table joined by primary key), committed in the same transaction as their source data. ## Consequences [#consequences] **Atomic writes.** Inserting a transcript chunk and its embedding happens in one transaction. If the embedding fails to compute or save, the chunk rolls back. There is no asynchronous reconciliation process and no inconsistency window. **One operational surface.** The database we already run, already back up, already monitor, already manage migrations for, is also the vector store. No second system to provision, secure, or teach on-call about. **One authorisation model.** The row-level security rules that protect transcript data also protect the embeddings. We do not have to re-implement access control in a second system and hope the two models agree. **Adequate performance at current scale.** `pgvector`'s IVFFlat and HNSW indexes are sufficient for our current and projected vector counts. We benchmark quarterly; we have not approached the scale where a purpose-built vector database would outperform `pgvector` by a margin that justifies the operational cost. ## Alternatives considered [#alternatives-considered] * **Pinecone, Milvus, Weaviate.** Rejected for the split-brain failure mode and the second operational surface. Revisit if vector count per tenant exceeds \~10M and `pgvector` benchmarks degrade materially. * **Embeddings in a denormalised column with in-Go cosine comparison.** Rejected for O(n) query cost — acceptable for small datasets in prototypes, unacceptable in production. * **Embeddings in an object store with a hand-rolled ANN index.** Rejected for the cost of maintaining the index and the absence of transactional guarantees. ## Debt annotation [#debt-annotation] **Principal:** None beyond the `pgvector` extension install, which is a single SQL statement per environment. **Interest:** Low. `pgvector` is actively maintained and widely deployed; index tuning (IVFFlat `lists`, HNSW `ef_construction`) is a one-time cost per table. **Multiplier:** Vector count per tenant. If a single tenant's embedding set grows beyond the point where `pgvector`'s ANN indexes outperform full scan by a useful margin — empirically, in the tens of millions — revisit this decision. The migration path is well-understood (dual-write, shadow-read, cut over), but non-trivial. ## Verification [#verification] * `SELECT extname FROM pg_extension WHERE extname = 'vector';` returns a row on every environment. * Transcript insertion and embedding insertion appear in the same transaction log entry. * No application code writes to an external vector service. ## Related [#related] * [Postgres stack principle](/docs/principles/stack/postgres) * [AI Engineering principle](/docs/principles/ai-native/ai-engineering) # Next.js with Server Components for the web app (/docs/decisions/0002-nextjs-ssr-for-app) # 0002 — Next.js with Server Components for `wordloop-app` [#0002--nextjs-with-server-components-for-wordloop-app] **Status:** Accepted **Date:** 2026-04-19 **Deciders:** app platform **Supersedes:** — **Superseded by:** — ## Context [#context] The Wordloop web app renders deeply nested AI-derived context: a Meeting contains TranscriptSegments, each segment has a speaker attribution (Person), the Meeting has a MeetingSynthesis with Topics and TalkingPoints, and a list of Tasks. Opening a Meeting is the single most common view in the product. A client-side single-page application fetching this context produces a cascading waterfall. The client first fetches the Meeting, waits for the response, fetches the Transcription, waits, fetches Segments, waits, resolves Person records per speaker, waits, fetches the MeetingSynthesis, waits, fetches Tasks. Each hop is a full round trip between the browser and the edge — in practice, five to seven seconds of blank screen on a median connection before any meaningful content appears. This is not a problem to optimise with skeleton screens or lazy loading. The waterfall is inherent to the data shape and the client-side fetch model. ## Decision [#decision] Build `wordloop-app` on Next.js with the App Router and React Server Components. Meeting views, synthesis views, and the dashboard fetch their data on the server, close to the database, in a single request trip. The client receives the fully resolved DOM with content already present. Client components remain where interactivity demands them: the live transcript stream, the editor, the command palette. These are bounded, named islands inside a server-rendered shell. ## Consequences [#consequences] **Single round trip for the primary view.** Opening a Meeting is one request from the browser; all downstream data fetches happen server-side in parallel, close to the database. Time-to-meaningful-paint drops from seconds to hundreds of milliseconds. **Database queries colocate with the code that needs them.** A Server Component can query Postgres directly (through our Go API in practice, but the programming model is the same: the fetch happens where the latency cost is lowest). **Client bundles stay small.** Components that never run on the client are never shipped to the client. The JavaScript bundle for the Meeting view is a fraction of what it would be in a pure-SPA architecture. **A sharper client/server boundary.** Server Components cannot use `useState`, `useEffect`, or browser APIs. The boundary is explicit and enforced by the framework, which catches a common class of hydration bugs at build time. ## Alternatives considered [#alternatives-considered] * **Pure client-side React + Vite.** Rejected for the waterfall problem described above. Viable only if the data shape were flat, which it is not. * **Remix / TanStack Start / other RSC-capable frameworks.** Considered equivalent in principle. Next.js chosen for the ecosystem maturity, the production track record of the App Router at our scale, and the team's existing expertise. Revisit if Next.js' direction diverges from our needs. * **Hybrid: SPA shell + server-rendered HTML snippets.** Rejected for the cognitive overhead of maintaining two rendering models. Server Components give us the same benefit with a single programming model. ## Debt annotation [#debt-annotation] **Principal:** Moderate. The team has internalised the Server/Client Component boundary; new engineers spend their first week understanding when to use which. **Interest:** Low to moderate. Next.js ships breaking changes in major versions; we pin and plan upgrades quarterly. The RSC model itself is stable. **Multiplier:** Framework direction. If Next.js' architectural direction diverges materially from our needs, the cost of migrating is proportional to the size of the app. The Server Components abstraction is portable — Remix and TanStack Start implement the same conceptual model — so the migration risk is bounded. ## Verification [#verification] * Primary Meeting view renders meaningful content in a single round trip (observed in Core Web Vitals on production). * `next build` output shows Server Components are not included in client chunks. * No data-fetch waterfalls in the Network panel for the dashboard or Meeting view. ## Related [#related] * [Frontend stack principle](/docs/principles/stack/frontend) * [App Service handbook](/docs/learn/services/app) # Stateful containers for the ML service (/docs/decisions/0003-stateful-containers-for-ml) # 0003 — Stateful containers for `wordloop-ml` [#0003--stateful-containers-for-wordloop-ml] **Status:** Accepted **Date:** 2026-04-19 **Deciders:** ml platform **Supersedes:** — **Superseded by:** — ## Context [#context] The ML service is responsible for real-time transcription of live Meeting audio, MeetingSynthesis generation from finalised Transcriptions, and embedding generation for retrieval. The transcription path is latency-critical: from the moment a person speaks to the moment the caption renders, the user-perceived budget is under one second. Serverless function platforms — Lambda, Cloud Run with scale-to-zero, Vercel Edge — are excellent for bursty, stateless workloads with tolerant latency budgets. They are a poor fit for workloads that require: 1. Large model weights loaded into memory (several hundred MB to several GB). 2. Connection-level state for streaming audio frames. 3. Cold start times measured in seconds, which translate directly into user-visible silence during a live meeting. A cold start of five to ten seconds on the first segment of a Meeting destroys the real-time experience. Warm-up pings mitigate but do not eliminate this, and the cost of keeping a serverless function permanently warm approaches the cost of a dedicated container. ## Decision [#decision] Run `wordloop-ml` as long-lived FastAPI workers inside orchestrated containers. Models are loaded at container start and remain resident across requests. The container is the unit of scaling — we scale horizontally by adding more containers, not by spinning up more cold functions. ## Consequences [#consequences] **Models stay warm.** The first segment of a Meeting transcribes with the same latency as the hundredth. No cold-start penalty on the user-visible path. **Streaming state is preserved.** An audio stream's position, rolling buffer, and partial transcription state live in the container that handles the stream. No cross-invocation state-reconstruction step. **Operational posture matches a normal service.** The ML service has rolling deploys, health checks, graceful shutdown, and horizontal scaling — the same operational shape as `wordloop-core`. On-call engineers use the same mental model. **We pay for idle capacity.** A serverless model would scale to zero at night; our containers do not. At current traffic this is cheaper than the alternative (warm-keeping costs in a serverless model exceed the dedicated container cost), but the crossover point will change with usage patterns. ## Alternatives considered [#alternatives-considered] * **Lambda / Cloud Functions with scale-to-zero.** Rejected for cold-start latency on the transcription hot path. * **Cloud Run with always-on minimum instances.** Considered, and a reasonable alternative. We chose explicit container orchestration because it also handles the streaming-state requirement cleanly; Cloud Run's per-request model is awkward for long-lived WebSocket-adjacent connections. Revisit if Cloud Run's streaming support matures. * **Dedicated GPU nodes.** Not yet required — our current model mix runs adequately on CPU. If we adopt models that demand GPU inference, the decision to run stateful containers still holds; we add GPU node pools. * **Batch transcription only (no real-time path).** Rejected as a product decision — live transcription is a core Wordloop feature. ## Debt annotation [#debt-annotation] **Principal:** Moderate. Operating a stateful service means we handle graceful shutdown, connection draining, and rolling-deploy choreography ourselves. This is well-trodden ground and our Go core already does the same. **Interest:** Steady. Container images must be rebuilt when model weights or the Python runtime update; that is a normal CI cost. **Multiplier:** Model size. If model weights grow past what fits comfortably in a container's memory budget (low single-digit GB), we may need to split inference into a dedicated model-serving layer (Triton, Ray Serve) fronted by thin FastAPI workers. The service boundary stays the same; the implementation changes. ## Verification [#verification] * Time-to-first-caption on a cold Meeting start is under one second at p95 (observed in production latency dashboards). * No cold-start warm-up hack exists in the deploy pipeline (no scheduled pings, no keep-warm loop). * Model weights are loaded exactly once per container process, at boot. ## Related [#related] * [ML Systems stack principle](/docs/principles/stack/ml-systems) * [Real-Time system-design principle](/docs/principles/system-design/real-time) * [ML Service handbook](/docs/learn/services/ml) # Hosting-layer Link header for llms.txt discovery (/docs/decisions/0004-hosting-layer-llms-txt-link-header) # 0004 — Hosting-layer `Link` header for `llms.txt` discovery [#0004--hosting-layer-link-header-for-llmstxt-discovery] **Status:** Accepted **Date:** 2026-04-19 **Deciders:** docs platform **Supersedes:** — **Superseded by:** — ## Context [#context] The `llms.txt` specification recommends that sites advertise their machine-readable index via the HTTP `Link` header with `rel="llms-txt"`, in addition to serving the file at `/llms.txt`. This lets agents discover the index without guessing at conventional paths and without parsing HTML. The Wordloop documentation site is built with Next.js 15 in static-export mode (`output: 'export'`). Static export does not support runtime middleware, route handlers that mutate response headers, or `next.config.js` `headers()` for the exported bundle — those hooks are only honoured by the Node server, which we are not running in production. Consequently, the `Link` header cannot be set at the framework layer. The site is served by Firebase Hosting, which supports per-path response headers declaratively in `firebase.json` under `hosting.headers`. ## Decision [#decision] Set the `Link: ; rel="llms-txt", ; rel="llms-full-txt"` header on every response from Firebase Hosting, via the `firebase.json` `hosting.headers` array. Additionally, set `Content-Type: text/markdown; charset=utf-8` on every `**/*.md` path so the per-page markdown exports are served with the correct media type, and `text/plain` on the two `llms*.txt` files. The header applies to all paths (`source: "/**"`). The `rel` advertisement is cheap and universally safe — every Wordloop documentation page is a valid entry point for an agent that then looks up the index. ## Consequences [#consequences] * Agents following the `llms.txt` discovery pattern via `curl -I` or a HEAD request find the index without needing to hardcode `/llms.txt`. * The `.md` exports of each documentation page are served with the correct MIME type; command-line tooling (`curl`, `wget`) treats them as text. * The configuration lives in `firebase.json` — a hosting-platform-specific file. If we ever migrate hosting providers, this configuration has to be reimplemented in the new provider's equivalent. This is captured in the debt annotation below. ## Alternatives considered [#alternatives-considered] * **Set the header in a Next.js middleware.** Rejected: middleware is incompatible with static export. * **Set the header via a meta tag in ``.** Rejected: meta equivalents of the `Link` header (``) are not part of the spec and not observed by agents doing header-only HEAD requests. * **Add an Express shim in front of the static export.** Rejected: introducing a server just to set one header sacrifices the operational simplicity that motivated static export in the first place. * **Rely on convention only (`/llms.txt` at the root).** Rejected: the spec explicitly recommends the header. It is cheap to set and the canonical way for agents to discover the index. ## Debt annotation [#debt-annotation] **Principal:** \~1 hour. One `firebase.json` edit, one ADR, one test. **Interest:** Near-zero. The configuration does not drift; the header string is stable. **Multiplier:** Hosting migration. If we move off Firebase, the `firebase.json` block has to be translated to the new hosting provider's header syntax. The content of the header does not change; only the declaration site does. If we ever move to a self-hosted Next.js runtime, the header moves to middleware and `firebase.json` can be discarded. ## Verification [#verification] * `curl -I https://docs.wordloop.ai/docs/learn/architecture/overview` shows the `Link` header with both `llms-txt` and `llms-full-txt` targets. * `curl -I https://docs.wordloop.ai/docs/learn/architecture/overview.md` returns `Content-Type: text/markdown; charset=utf-8`. * `curl -I https://docs.wordloop.ai/llms.txt` returns `Content-Type: text/plain; charset=utf-8`. ## Related [#related] * [Documentation principle](/docs/principles/foundations/documentation) — the dual-audience stance this header operationalises. * [Agent-Native Systems](/docs/principles/ai-native/agent-native-systems) — the broader principle the discovery mechanism serves. # Docs are canonical knowledge and skills are the agent execution layer (/docs/decisions/0005-docs-canonical-skills-execution-layer) # 0005 — Docs are canonical knowledge and skills are the agent execution layer [#0005--docs-are-canonical-knowledge-and-skills-are-the-agent-execution-layer] **Status:** Accepted **Date:** 2026-05-01 **Deciders:** docs platform, agent tooling **Supersedes:** — **Superseded by:** — ## Context [#context] Wordloop maintains both a documentation site and a set of agent skills. The docs site is built for humans and agents: it publishes navigable pages, `llms.txt`, `llms-full.txt`, per-page Markdown exports, and MCP resources. The skills are loaded by AI agents to guide task execution. The previous stance kept docs and skills as fully separate surfaces. That avoided prompt-like content leaking into the docs site, but it also created a drift risk: durable engineering policy could be duplicated in both docs and skill files. We have already seen signs of this class of drift, such as stack-version claims differing between service docs and package metadata. Modern skill design favours progressive disclosure: concise trigger metadata, a short operating contract, and selective loading of deeper references. This means skill files should not become large documentation mirrors. They should tell the agent what to read, how to act, and how to verify. ## Decision [#decision] The documentation site is the canonical source for durable engineering knowledge. Agent skills are the execution layer that selects, loads, and applies that knowledge safely. A docs page owns: * Principles and architecture guidance. * Service handbooks and implementation conventions. * Workflow guides and runbooks. * ADRs and decision history. * Generated reference material from specs, schemas, and code. * Glossary and domain vocabulary. A skill owns: * Triggering and task routing. * Which docs pages to read for each task shape. * Tool usage, command sequencing, and safety gates. * Verification steps and eval discipline. * Agent-specific constraints that do not belong in human-facing docs. Skills may reference docs pages by slug or MCP resource. Docs pages must not depend on skill internals for their meaning. ## Consequences [#consequences] * Durable guidance has one canonical maintenance path. * Human and agent readers consume the same engineering knowledge. * Skills remain smaller, more triggerable, and easier to evaluate. * Documentation changes can identify affected skills through a skill-to-doc map. * Skill changes can identify which canonical docs pages need review. * The docs site needs stronger freshness, metadata, and health checks because more agent behaviour depends on it. ## Alternatives considered [#alternatives-considered] * **Keep docs and skills completely separate.** Rejected because it preserves duplicated policy and makes drift a review-discipline problem only. * **Move most docs into skills.** Rejected because skills are not a good human-reading surface and large skill files weaken progressive disclosure. * **Have skills fetch arbitrary public documentation at runtime.** Rejected as the default because public retrieval introduces prompt-injection and freshness risks. Trusted local docs, generated Markdown exports, and the Wordloop MCP server are the default context path. * **Generate skills entirely from docs.** Deferred. It may become useful for simple doc-reference sections, but skill trigger wording and safety gates still need deliberate evaluation. ## Debt annotation [#debt-annotation] **Principal:** Medium. We need a skill-to-doc map, workflow docs, freshness metadata, and documentation health checks. **Interest:** Low if automated checks run in CI; high if this remains a manual checklist. **Multiplier:** Agent autonomy. The more agents rely on docs for task execution, the more expensive stale docs become. ## Verification [#verification] * Each maintained skill declares its canonical docs dependencies in the skill-to-doc map. * Documentation health checks validate mapped docs pages exist. * Stale active docs are flagged by review cadence. * Skill updates include a docs review step. * Docs updates include an affected-skills review step. ## Related [#related] * [Documentation](/docs/principles/foundations/documentation) * [Agent-Native Systems](/docs/principles/ai-native/agent-native-systems) * [Keep Docs and Skills in Sync](/docs/guides/keep-docs-and-skills-in-sync) * [Correct Documentation Drift](/docs/guides/correct-documentation-drift) # Architecture Decision Records (/docs/decisions) # Architecture Decision Records [#architecture-decision-records] An ADR is how we remember *why*. Code shows what we built; commit history shows when it changed; ADRs show which options we rejected, what tradeoffs we accepted, and what debt we took on. The log is **append-only**: once an ADR is accepted, it is never edited — only superseded. ## Why ADRs matter on this team [#why-adrs-matter-on-this-team] Two years from now, an engineer — or an agent — will look at a piece of Wordloop and ask "why is this like this?" The answer lives in the ADR. Without it, every design decision regresses to "this is how it was when I got here," and the team loses the ability to challenge decisions on their merits because the merits have been forgotten. We write ADRs for decisions that will be expensive to reverse and decisions that will surprise a reader who does not share our context. ## Statuses [#statuses] | Status | Meaning | | -------------- | ------------------------------------------------- | | **Proposed** | Authored but not yet accepted. Under discussion. | | **Accepted** | Current, in force. | | **Rejected** | Considered and declined, with reasoning. | | **Deprecated** | No longer applicable, but historically important. | | **Superseded** | Replaced by a later ADR (which links back). | ## Log [#log] *The catalogue populates as decisions are committed. Each entry includes title, status, author, date, and a Principal / Interest / Multiplier debt annotation — see [Engineering Principles / Documentation](/docs/principles/foundations/documentation) for the model.* Authoring a new ADR? Copy the frontmatter and 7-section structure from any existing ADR in this directory. The title is the decision in plain language; the filename is `NNNN-kebab-case-decision.mdx` with the next available number. # Add an API Endpoint (/docs/guides/add-api-endpoint) # Add an API Endpoint [#add-an-api-endpoint] ## Goal [#goal] Add a new endpoint to `wordloop-core`, following the spec-first workflow so that the server handler, the TypeScript client, and the reference docs all stay aligned. ## Prerequisites [#prerequisites] * Local stack running (`./dev start all`) — see [Quickstart](/docs/start/quickstart). * Familiarity with [API Design](/docs/principles/system-design/api-design) and [Hexagonal Architecture](/docs/principles/system-design/hexagonal-architecture) principles. ## Steps [#steps] ### 1. Update the OpenAPI spec [#1-update-the-openapi-spec] The spec is the source of truth. Open `specs/core-openapi.json` and add your endpoint: * Path, method, operationId. * Request and response schemas with descriptions on every field. * Example payloads. * Error responses mapped to our standard error codes ([Reference / Errors](/docs/reference/errors)). ### 2. Regenerate handlers and clients [#2-regenerate-handlers-and-clients] ```bash ./dev generate core ``` This produces the server-side handler stub and the TypeScript client surface. See [Code Generation](/docs/guides/code-generation) for details on what runs under the hood. ### 3. Implement the handler [#3-implement-the-handler] Fill in the generated handler stub. Handlers stay thin — extract inputs, call the application service, shape the response. Business rules belong in the domain; orchestration belongs in the application service. ### 4. Write a service test [#4-write-a-service-test] In the handler's test file, spin up the Testcontainers Postgres, make the HTTP call, assert on behaviour and on the OTel trace shape. See [Testing](/docs/principles/foundations/testing) for the discipline. ### 5. Run the relevant checks [#5-run-the-relevant-checks] ```bash ./dev lint core ./dev test core ``` ## Verification [#verification] * `./dev test core` passes. * The [Core API Reference](/docs/reference/api/core) renders the new endpoint automatically. * Hitting the endpoint from the local frontend produces the expected response. ## Troubleshooting [#troubleshooting] * **Generated code is out of date.** Re-run `./dev generate core` and commit the generated files. * **Testcontainers failing to start.** Check `./dev status` and that Docker is running. * **Frontend cannot reach the endpoint.** The frontend uses the generated TypeScript client; re-running generation and restarting the Next.js dev server usually fixes it. See [API Design](/docs/principles/system-design/api-design) for the stance this workflow expresses. # Add a Service (/docs/guides/add-service) # Add a Service [#add-a-service] ## Goal [#goal] Scaffold a new backend service that conforms to our platform conventions — hexagonal structure, OTel instrumentation, standard CI pipeline, `./dev` integration — from day one. ## Prerequisites [#prerequisites] * An accepted [ADR](/docs/decisions) justifying the new service. "We could just add this to `wordloop-core`" is often the right answer; the ADR documents why it is not. * Familiarity with [Hexagonal Architecture](/docs/principles/system-design/hexagonal-architecture), the [Platform](/docs/principles/delivery/platform) stance, and [Go Services](/docs/principles/stack/go-services) or [ML Systems](/docs/principles/stack/ml-systems) depending on the language. ## Steps [#steps] ### 1. Use the scaffolding template [#1-use-the-scaffolding-template] Our platform ships a bootstrapping template per supported language. It produces: * The hexagonal directory layout (`domain/`, `ports/`, `adapters/`, `application/`). * A stub HTTP server with OTel instrumentation configured. * A standard CI pipeline definition. * Dockerfile and Cloud Run deployment config. * `./dev` integration (start, stop, logs, test, lint). ### 2. Register the service with the platform [#2-register-the-service-with-the-platform] Add the service to the platform's service registry so that shared tooling — observability, feature flags, secrets — knows it exists. This is the step that makes the service "real" to the rest of the platform. ### 3. Write the first ADR [#3-write-the-first-adr] A new service is a decision. Capture its purpose, its expected ownership, and the debt it carries (runtime cost, operational surface, coordination overhead) as an ADR. ### 4. Define the service's first SLO [#4-define-the-services-first-slo] Before the service receives traffic, define the user-facing SLO it will live inside ([Reliability](/docs/principles/quality/reliability)). An SLO-less service is a service that nobody can defend. ### 5. Write the service handbook [#5-write-the-service-handbook] Create `content/docs/learn/services//` with `index.mdx`, `architecture.mdx`, and `implementation.mdx`. The handbook explains the "why" that the code cannot. ## Verification [#verification] * `./dev start ` starts the service cleanly. * `./dev test ` passes. * The service is visible on the platform observability dashboard. * A fresh engineer can open the service handbook and understand the shape. ## Troubleshooting [#troubleshooting] * **OTel not exporting.** Check that the service registered its collector endpoint; the template defaults should work but custom configuration may override. * **CI failing on the first push.** The template ships a minimal CI pipeline; extend it with service-specific tests as needed. See [Platform](/docs/principles/delivery/platform) for the broader stance on service scaffolding. # Code Generation (/docs/guides/code-generation) # Code Generation [#code-generation] The platform uses code generation pipelines to keep API contracts in sync across all services. ## Event types (AsyncAPI) [#event-types-asyncapi] The AsyncAPI specification in `services/wordloop-core/asyncapi.yaml` is the single source of truth for all event-driven types (WebSocket events and Pub/Sub messages). ```bash # Compile AsyncAPI spec to typed internal Events for all services ./dev gen events ``` This produces: | Target | Tool | Output | | -------------- | -------------------------- | -------------------------------------------------------------------------- | | **Go** | `asyncapi-codegen` | `services/wordloop-core/internal/provider/generated/asyncapi.gen.go` | | **TypeScript** | `@asyncapi/cli` (Modelina) | `services/wordloop-app/lib/generated/asyncapi.ts` | | **Python** | `@asyncapi/cli` (Modelina) | `services/wordloop-ml/src/wordloop/providers/generated/asyncapi_models.py` | Consumer scripts (App, ML) try to fetch the spec from a running Core instance at `http://localhost:4002/asyncapi.yaml` first, and fall back to the local monorepo path for offline generation. :::info Core owns the spec and generates its own types locally. App and ML are consumers that pull the spec from Core — following the same pattern as OpenAPI client generation. ::: ## Core → ML client (oapi-codegen) [#core--ml-client-oapi-codegen] `wordloop-core` generates a Go HTTP client for calling `wordloop-ml`'s API. ```bash # Core must be running at localhost:4002 and ML at localhost:4003 ./dev gen clients ``` Under the hood: ```bash cd services/wordloop-core WORDLOOP_ML_BASE_URL=http://127.0.0.1:4003 ./scripts/generate-clients.sh ``` **Adding a new external API client in Core:** 1. Create `internal/provider//` 2. Add an `oapi-codegen.yaml` config in that directory 3. Set `_BASE_URL` when running the script ## ML → Core client (openapi-python-client) [#ml--core-client-openapi-python-client] `wordloop-ml` generates a Python client for calling `wordloop-core`'s API. ```bash # Generated simultaneously alongside Core's ./dev gen clients ``` Under the hood: ```bash cd services/wordloop-ml ./scripts/generate_wordloop_core_client.sh ``` The generated client is written to `src/wordloop/providers/wordloop_core/client/` and **must not be edited manually**. ## App TypeScript client (Orval) [#app-typescript-client-orval] `wordloop-app` generates TypeScript types, SWR hooks, and API functions from Core's OpenAPI spec. ```bash # Generated simultaneously via Orval ./dev gen clients ``` Under the hood: ```bash curl http://localhost:4002/openapi.json -o services/wordloop-app/openapi.json cd services/wordloop-app && pnpm orval ``` The generated file is `lib/api/generated.ts` — **never edit it manually**. Use the wrapper hooks in `hooks/use-data.ts`. ## Regenerate everything [#regenerate-everything] ```bash # All services must be running for clients to pull live specs ./dev gen all ``` This runs: `events` → `clients` → `docs`. # Correct Documentation Drift (/docs/guides/correct-documentation-drift) # Correct Documentation Drift [#correct-documentation-drift] ## TL;DR [#tldr] Do not fix drift by editing the first wrong-looking page. First classify the disagreement, identify the source of truth, decide whether the current system or the documented intent is correct, then update every affected surface in one change. ## Drift types [#drift-types] | Drift type | Example | Default source of truth | | ---------------------------- | ------------------------------------------------------- | ----------------------------------------------- | | Docs vs code | Docs say Next.js 15; package metadata says Next.js 16. | Code and package metadata | | Docs vs generated contract | Guide names an endpoint missing from OpenAPI. | OpenAPI or AsyncAPI source | | Docs vs skill | Skill duplicates old architecture guidance. | Docs for knowledge; skill for execution | | Data flow vs implementation | TDD says Core publishes an event that code never emits. | Active delivery decision, then code/tests | | Diagram vs topology | Architecture diagram omits a service boundary. | Code, deployment config, specs, traces | | ADR vs current docs | Principle page contradicts an accepted ADR. | ADR until superseded | | Active bet vs delivered code | TDD intent differs from implementation. | Product decision: fix code or revise active TDD | | Runbook vs operations | Runbook references a retired dashboard. | Current operational tooling | ## Workflow [#workflow] ### 1. Capture the mismatch [#1-capture-the-mismatch] Write down the two or more conflicting claims. Be concrete: * Page or file path. * Claim text or diagram element. * Source that contradicts it. * Date or commit where the contradiction appeared, if known. Avoid vague reports such as "docs are stale." They are not actionable. ### 2. Classify the surfaces [#2-classify-the-surfaces] Mark each surface as one of: * **Generated reference** — contracts, schemas, CLI tables, error catalogues. * **Runtime source** — code, tests, migrations, deployment config, traces. * **Active guidance** — principles, service handbooks, runbooks, active TDD docs. * **Historical record** — accepted ADRs, delivered bets, incident records. * **Agent execution** — skills and skill evals. ### 3. Identify the source of truth [#3-identify-the-source-of-truth] Use this order unless the page states a stricter rule: 1. Generated contracts and schemas define public interfaces. 2. Code, migrations, deployment config, and tests define shipped behaviour. 3. Accepted ADRs define historical decisions until superseded. 4. Active bet and TDD docs define current delivery intent before shipping. 5. Principle and service handbook pages define durable guidance. 6. Skills define agent execution behaviour, not durable engineering knowledge. ### 4. Decide whether to fix code or docs [#4-decide-whether-to-fix-code-or-docs] A mismatch does not always mean the docs are wrong. Ask: * Did code drift away from an intentional design? * Did the design change but docs were not updated? * Did a generated reference fail to regenerate? * Did a skill preserve old policy after docs changed? * Did an ADR get superseded without a new ADR? If the documented design is still correct, fix code or create a delivery task. If shipped behaviour is correct, update active docs and skill references. ### 5. Update all affected surfaces [#5-update-all-affected-surfaces] A complete drift correction may need changes to: * Docs page content and `last_reviewed` metadata. * Diagrams and data-flow descriptions. * OpenAPI or AsyncAPI specs. * Code, tests, migrations, or deployment config. * ADRs when the decision changed. * Skill context routing and verification steps. * `llms.txt`, `llms-full.txt`, and Markdown exports. * Skill-to-doc map entries. ### 6. Add a regression guard [#6-add-a-regression-guard] Choose the cheapest guard that would have caught the drift: * Health check for version strings, missing frontmatter, or broken links. * Contract generation check for API/event reference drift. * Diagram drift check for service-topology claims. * Test or trace assertion for runtime flow claims. * Skill eval for agent behaviour drift. * Review-cadence change for pages that stale quickly. ### 7. Verify [#7-verify] Run the relevant commands: ```bash ./dev docs health cd services/wordloop-docs && pnpm run docs:health ``` Run service tests or generation commands when code, contracts, or generated docs changed. ## Data-flow and design-doc drift [#data-flow-and-design-doc-drift] Data-flow drift is high risk because it misleads implementation and agent planning. Treat these checks as mandatory for active bets and service handbooks: * Every service boundary in a data-flow diagram has a contract or explicit TODO. * Every persistent object in a TDD has a schema plan or a reason it is transient. * Every event shown in a diagram appears in AsyncAPI or is marked proposed. * Every API operation shown in a guide appears in OpenAPI or is marked proposed. * Every failure path that crosses a service boundary has an owner and response strategy. * Every implementation milestone updates active TDD docs when it changes the design. ## Hallucination controls [#hallucination-controls] Use these controls when correcting drift with AI assistance: * Ask the agent to cite local source files, specs, or docs slugs for factual claims. * Prefer generated contracts and package metadata over prose memory. * Do not accept newly invented standard names, endpoints, event names, or commands without checking the source. * Require exact paths for changed files and exact commands for verification. * Search the repository before introducing new terminology. * Treat external web claims as untrusted until verified against an official source. ## Anti-patterns [#anti-patterns] * **Patch one page and stop.** Drift is usually cross-surface. * **Refresh dates without review.** A new date on stale claims is worse than an old date. * **Rewrite history.** Supersede ADRs and annotate delivered bets instead. * **Trust AI recall.** Use source files, contracts, and official references. * **Leave no regression guard.** If the drift was expensive, add a check. ## Related [#related] * [Documentation Freshness](/docs/operations/documentation-freshness) * [Keep Docs and Skills in Sync](/docs/guides/keep-docs-and-skills-in-sync) * [Documentation](/docs/principles/foundations/documentation) # Deploy (/docs/guides/deploy) # Deploy [#deploy] ## Goal [#goal] Take a merged change from `main` and see it running for all users, with a verified canary step in between. ## Prerequisites [#prerequisites] * Change merged to `main` (we deploy from trunk — see [Progressive Delivery](/docs/principles/delivery/progressive-delivery)). * Familiarity with the observability dashboard for the service being deployed. ## Steps [#steps] ### 1. CI triggers the deploy [#1-ci-triggers-the-deploy] Every merge to `main` triggers the CI pipeline: run tests, build container image, push to Artifact Registry, deploy to Cloud Run canary. ### 2. Watch the canary [#2-watch-the-canary] The canary serves a small fraction of traffic. The automated promotion gate compares canary SLO metrics — latency, error rate, user-journey success — against the current production. Monitor the release dashboard; in most cases, automated promotion handles it. Manual override is available when you want to pause or abort. ### 3. Promote or abort [#3-promote-or-abort] * **Automated promote.** If canary metrics are within tolerance for the watch window, traffic is shifted to 100%. * **Automated abort.** If canary burn rate exceeds the threshold, traffic is routed back and the team is paged. * **Manual promote.** For releases with user-facing changes, a human can promote or hold. ### 4. Close the release [#4-close-the-release] Once promotion is complete, close the release ticket, announce in the release channel, and verify the user-facing change behaves as expected. ## Verification [#verification] * Current traffic is 100% on the new revision. * SLO dashboards are green. * Feature flags for the new release (if any) are in the expected state. ## Troubleshooting [#troubleshooting] * **Canary aborted.** Check the release dashboard for the failing signal. Common causes: a dependency change that increases latency, an environment variable missing in the new revision. * **Deploy stuck "in progress."** Check Cloud Run logs for the service; a crash-loop will block promotion. * **SLO burn after promotion.** Roll back via the dashboard; file the incident ticket. See [Progressive Delivery](/docs/principles/delivery/progressive-delivery) for the broader stance and [Operations / Runbooks](/docs/operations/runbooks) for post-deploy recovery procedures. # Guides (/docs/guides) # Guides [#guides] Guides are **task-oriented**: each one walks you through completing a specific goal, from first command to verification. They assume you already know roughly why you want to do the thing; if you do not, follow the links into [Learn](/docs/learn) or [Engineering Principles](/docs/principles) from inside the guide. ## Developer workflow [#developer-workflow] ## How to read a guide [#how-to-read-a-guide] Every guide is structured the same way: **Goal → Prerequisites → Steps → Verification → Troubleshooting**. If you find a step that fails in a way the guide does not cover, treat that as a bug in the documentation and open a PR against the guide itself — see [Your First Contribution](/docs/start/first-contribution). # Keep Docs and Skills in Sync (/docs/guides/keep-docs-and-skills-in-sync) # Keep Docs and Skills in Sync [#keep-docs-and-skills-in-sync] ## TL;DR [#tldr] Docs hold durable engineering knowledge. Skills control agent execution. When either surface changes, update the skill-to-doc map, review the other surface, run documentation health checks, and evaluate any affected skill behaviour. ## When to use this workflow [#when-to-use-this-workflow] Use this workflow when you: * Change a principle, service handbook, workflow guide, runbook, or reference page that an agent skill may load. * Create, edit, split, rename, or remove an agent skill. * Move durable guidance from a skill into the docs site. * Add a docs page that should become canonical context for an existing skill. * Change skill trigger wording, safety gates, verification commands, or reference-loading instructions. ## Source-of-truth rule [#source-of-truth-rule] | Content type | Canonical home | | -------------------------------------------- | -------------------------------------- | | Durable architecture and engineering policy | Docs site | | Service-specific implementation conventions | Docs site | | API, event, schema, CLI, and error reference | Generated docs where possible | | Historical decisions | ADRs | | Active delivery intent | Active bet and TDD docs | | Skill triggering and task routing | Skill frontmatter and SKILL.md | | Agent safety gates and verification workflow | Skill SKILL.md | | Skill evaluation prompts and harness | Skill workspace or skill-factory evals | ## Workflow: changing docs [#workflow-changing-docs] 1. **Identify affected skills.** Check the skill-to-doc map for skills that depend on the page. 2. **Update the docs page.** Keep the page human-readable and agent-readable. Do not write prompt-like instructions into human docs. 3. **Update freshness metadata.** Change `last_reviewed` only after checking the claims against the source of truth. 4. **Review affected skills.** Check whether the skill still points to the right page, loads the right context, and verifies the right behaviour. 5. **Update skill references if needed.** Keep the skill concise; point to docs instead of copying durable guidance. 6. **Run health checks.** Use `./dev docs health` from the platform root. 7. **Run skill evals when behaviour changed.** If trigger wording, routing, or safety gates changed, run representative skill prompts before merging. ## Workflow: changing skills [#workflow-changing-skills] 1. **Decide whether the change is knowledge or execution.** Move durable knowledge to docs. Keep execution behaviour in the skill. 2. **Update the source skill.** Edit `tools/skill-factory/skills//` first; sync to `.agents/skills/` after review. 3. **Update the skill-to-doc map.** Add, remove, or rename canonical docs dependencies. 4. **Review mapped docs pages.** Confirm the docs still contain the knowledge the skill is expected to load. 5. **Create or update eval prompts.** Include should-trigger and should-not-trigger cases for trigger changes. 6. **Run health checks.** Confirm mapped docs pages and skill paths exist. 7. **Sync consumed skills.** Run `./dev sync skills` or copy the reviewed skill into `.agents/skills/` using the approved repository workflow. ## Skill-to-doc map rules [#skill-to-doc-map-rules] Each maintained skill should declare: * The skill name. * The source skill path. * The consumed skill path. * Canonical docs dependencies by docs slug. * Optional secondary docs used for specific task variants. * The review owner. The map is intentionally lightweight. It does not prove semantic correctness; it makes affected-surface review discoverable. ## Review checklist [#review-checklist] * Does the skill still trigger for the right user prompts? * Does the skill avoid triggering for adjacent but wrong prompts? * Does the skill load canonical docs instead of duplicating them? * Does the docs page avoid agent-only prompt language? * Do docs, skills, code, generated specs, and ADRs agree on the source-of-truth hierarchy? * Did `last_reviewed` change only after a real review? * Did generated `llms-full.txt` and Markdown exports stay current? ## Anti-patterns [#anti-patterns] * **Shadow policy in skills.** Durable rules copied into SKILL.md instead of linked to docs. * **Prompt-shaped docs.** Human docs that read like system prompts. * **Unmapped skills.** A skill that depends on docs but is invisible to health checks. * **Blind freshness updates.** Changing `last_reviewed` without validating claims. * **Eval-free trigger edits.** Changing trigger wording without testing realistic prompts. ## Related [#related] * [Documentation](/docs/principles/foundations/documentation) * [Documentation Freshness](/docs/operations/documentation-freshness) * [Correct Documentation Drift](/docs/guides/correct-documentation-drift) * [Docs are canonical knowledge and skills are the agent execution layer](/docs/decisions/0005-docs-canonical-skills-execution-layer) # Migrate the Schema (/docs/guides/migrate-schema) # Migrate the Schema [#migrate-the-schema] ## Goal [#goal] Change the Postgres schema in a way that is safe for production: additive first, reversible, and non-blocking on hot tables. ## Prerequisites [#prerequisites] * Familiarity with [Postgres](/docs/principles/stack/postgres) and [Data Engineering](/docs/principles/system-design/data-engineering) principles. * Local stack running (`./dev start infra`) so you can test the migration against a real database. ## Steps [#steps] ### 1. Draft the migration [#1-draft-the-migration] Migrations live under `services/wordloop-core/migrations/` (or the equivalent directory for the service that owns the schema). Name them by timestamp and intent: `20260419123000_add_loops_archived_at.up.sql`. Write the `.up.sql` **additively**: * Add columns as nullable, or with a default expression that is cheap on a hot table. * Add new tables as empty. * Never rename or drop in a single migration — split into "add new", "backfill", "stop reading old", "drop old" across releases. Write the `.down.sql` as an exact reverse, tested locally. ### 2. Test locally [#2-test-locally] ```bash ./dev migrate up ./dev migrate down ./dev migrate up ``` Round-tripping catches broken `.down.sql` early. ### 3. Backfill in a separate job [#3-backfill-in-a-separate-job] If the column needs a non-trivial value on historical rows, write a backfill job that chunks through the table and commits in batches. Do **not** backfill inside the migration itself — long-running DDL blocks replication and terrifies on-call engineers. ### 4. Coordinate with consumers [#4-coordinate-with-consumers] If the schema change is part of a renaming or restructuring, the order of deploys matters: * Deploy the code that reads both old and new columns. * Run the migration. * Backfill. * Deploy the code that reads only the new column. * In a later release, drop the old column. ### 5. Commit the migration and the code change together [#5-commit-the-migration-and-the-code-change-together] The PR should include the migration and the code that uses it. Reviewers can see the full scope of the change. ## Verification [#verification] * `./dev migrate status` shows the migration as applied. * Service tests pass against the migrated schema. * Rollback tested locally. * [Database Reference](/docs/reference/database) regenerates cleanly. ## Troubleshooting [#troubleshooting] * **`ALTER TABLE` is taking forever in staging.** If it is a large table with a `NOT NULL DEFAULT`, the DDL is rewriting every row. Split into "add nullable → backfill → tighten to NOT NULL." * **`.down.sql` fails.** Down migrations often break when the up migration contains data transformations. Consider whether the down is genuinely needed; some migrations are forward-only (and the code has to be able to tolerate that). See [Postgres](/docs/principles/stack/postgres) for the stance that shapes this workflow. # Run Tests (/docs/guides/run-tests) # Run Tests [#run-tests] ## Goal [#goal] Run the right tests for the change you are making — unit, service, or system — and read the output in a way that makes failures actionable. ## Prerequisites [#prerequisites] * Local stack bootstrapped (`./dev start infra`) so that Testcontainers has a working Docker daemon. * Familiarity with [Testing](/docs/principles/foundations/testing) — especially the "favour service tests over unit tests" and "emulate, don't mock" disciplines. ## Steps [#steps] ### 1. Run per-service tests [#1-run-per-service-tests] ```bash ./dev test core # Go service tests for wordloop-core ./dev test ml # Python tests + evals for wordloop-ml ./dev test app # Vitest + React Testing Library for wordloop-app ./dev test # Everything ``` Service tests spin up real Postgres and Pub/Sub containers where needed. ### 2. Run system tests [#2-run-system-tests] System tests exercise multiple services together through their real APIs and trace assertions. ```bash ./dev test system ``` These take longer; run them before opening a PR that touches multiple services. ### 3. Run with race detection (Go) [#3-run-with-race-detection-go] ```bash ./dev test core -- -race ``` Concurrency bugs are easier to find than to debug; run with `-race` on any change that touches goroutines. ### 4. Run ML evals [#4-run-ml-evals] ```bash ./dev test ml -- --evals ``` Runs the committed eval set. Regressions above the threshold fail the command. ## Verification [#verification] * Exit code 0 on the targeted suites. * Trace assertions pass (no missing spans). * Coverage report (if enabled) shows the change is exercised. ## Troubleshooting [#troubleshooting] * **"Cannot connect to Docker daemon."** Start Docker Desktop; verify with `./dev status`. * **Testcontainers start slow.** First run pulls the Postgres image; subsequent runs use the cached image. * **Flaky test.** Flakiness is a bug. File it; do not retry until green. See [Testing](/docs/principles/foundations/testing) for the underlying stance. # Learn the Platform (/docs/learn) # Learn the Platform [#learn-the-platform] This section is for understanding — the *why* and *how* behind Wordloop. It is not a tutorial (see [Start Here](/docs/start/quickstart)) and it is not a reference (see [Reference](/docs/reference)). It is the narrative layer that turns a repository of code into a system you can reason about. ## What you will find here [#what-you-will-find-here] ## How to read this section [#how-to-read-this-section] Start with **Concepts** if the domain is new to you — understanding what a Meeting, Person, and MeetingSynthesis mean in code matters for every change downstream. Move to **Architecture** to see how services compose into a platform, then drop into a **Service** handbook when you need implementation-level depth. If you want to know *what we believe* about building software at this scale and why, read [Engineering Principles](/docs/principles). If you want to *do something*, see [Guides](/docs/guides). If you want to *look something up*, see [Reference](/docs/reference). # Documentation Freshness (/docs/operations/documentation-freshness) # Documentation Freshness [#documentation-freshness] ## TL;DR [#tldr] Every active documentation page needs an owner, a review cadence, and a visible freshness state. Stale docs are not automatically wrong, but they are lower-trust until reviewed. Historical records such as ADRs and delivered bets are handled differently: they are preserved, corrected with explicit notes when necessary, or superseded. ## Freshness states [#freshness-states] | State | Meaning | Reader guidance | | -------------- | ------------------------------------------------------ | ----------------------------------------------------------------------------- | | **Fresh** | `last_reviewed` is inside the review window. | Treat as current unless code or contracts prove otherwise. | | **Review due** | The review window has passed. | Use with caution; verify against source of truth before making major changes. | | **Stale** | The page is more than one review window overdue. | Do not use as authoritative without checking code, specs, traces, or owners. | | **Generated** | The page is produced from code, contracts, or schemas. | Regenerate from source instead of editing by hand. | | **Historical** | The page records past intent or decisions. | Preserve history; supersede or add correction notes instead of rewriting. | ## Review windows [#review-windows] | Surface | Default review window | Status model | Source of truth | | ------------------- | -----------------------------------: | ------------------- | ------------------------------------------ | | Principles | 6 months | Active | Docs and accepted ADRs | | Service handbooks | 3 months | Active | Code, package metadata, architecture docs | | How-to guides | 6 months | Active | Commands, workflows, and tested paths | | Runbooks | 3 months | Active | Operational reality and incident follow-up | | API reference | Every contract change | Generated | OpenAPI specs | | Event reference | Every contract change | Generated | AsyncAPI specs | | Database reference | Every schema change | Generated or active | Migrations and schema introspection | | Glossary | 6 months | Active | Domain vocabulary and product language | | Active bet TDD docs | Every material implementation change | Active | Current delivery intent and code reality | | Delivered bet docs | No expiry | Historical | Archived design record | | ADRs | No expiry | Historical | Append-only decision record | | Agent skills | Every skill or mapped docs change | Active | Skill source plus mapped docs pages | ## Required frontmatter [#required-frontmatter] Active authored pages should include: ```yaml title: Documentation description: One sentence describing the page. audience: engineers owner: docs-platform last_reviewed: 2026-05-01 review_frequency: P6M status: active source_of_truth: docs ``` Generated pages should declare that they are generated where the generator supports it: ```yaml status: generated source_of_truth: specs/core-openapi.json ``` Historical pages should not be forced into an active freshness cycle: ```yaml status: historical source_of_truth: accepted-adr ``` ## Review triggers [#review-triggers] Review a page before its normal review window when one of these events happens: * A package, language runtime, framework, or infrastructure version changes. * A public command, environment variable, port, endpoint, event, or schema changes. * A service boundary or data-flow diagram changes. * A skill starts depending on the page for agent execution. * An incident exposes missing or misleading operational guidance. * An ADR supersedes a decision that the page explains. * A user or agent reports confusion caused by the page. ## Stale-page handling [#stale-page-handling] 1. **Classify the page.** Decide whether it is active, generated, or historical. 2. **Find the source of truth.** Use code, specs, migrations, traces, ADRs, or active design docs depending on the claim. 3. **Update the page or mark it historical.** Do not silently keep stale active guidance. 4. **Update `last_reviewed`.** Only update the date after checking the claims, not after touching formatting. 5. **Run documentation health checks.** Confirm metadata, internal links, skill-doc references, and generated corpora are still valid. 6. **Review affected skills.** If a skill depends on the page, check whether the skill's routing or verification steps need to change. ## What not to do [#what-not-to-do] * Do not refresh `last_reviewed` without reviewing the claims. * Do not rewrite accepted ADRs to make them current. * Do not edit generated reference pages by hand. * Do not hide stale badges because they are inconvenient. * Do not rely on humans to notice stale stack versions, command names, or broken links when a script can check them. ## Commands [#commands] Run the health check from the platform root: ```bash ./dev docs health ``` Run the underlying docs script directly when working inside the docs service: ```bash cd services/wordloop-docs pnpm run docs:health ``` ## Related [#related] * [Documentation](/docs/principles/foundations/documentation) * [Keep Docs and Skills in Sync](/docs/guides/keep-docs-and-skills-in-sync) * [Correct Documentation Drift](/docs/guides/correct-documentation-drift) # Operations (/docs/operations) # Operations [#operations] The Operations section is written for the person staring at a red graph at 3am — or the one who will, one day. It is different from [Guides](/docs/guides): guides walk you through a happy-path operation you *want* to perform; runbooks walk you through a degraded state you *have to* respond to. ## When to use this section [#when-to-use-this-section] ## Writing for 3am [#writing-for-3am] Operational documentation has a harsh audience: a stressed engineer under time pressure. The bar is high. * **State the goal at the top.** Every runbook begins with "This runbook restores *X* when *Y*." * **Number the steps.** Imperative sentences. Exact commands, exact flags, exact expected output. * **Include rollback.** Every step that changes state must explain how to undo it. * **Link to observability.** Every step that checks state must link to the dashboard that proves it. * **Close with escalation.** If the runbook fails, who or what is next? See [Engineering Principles / Reliability](/docs/principles/quality/reliability) for why we hold this bar. # On-Call (/docs/operations/on-call) # On-Call [#on-call] On-call is the contract we sign with our users: if the platform breaks, someone is responsible for putting it back together, and that someone is paged promptly. This page describes how the rotation is structured, how incidents are handled, and the tools an on-call engineer should have open before their shift starts. ## Rotation [#rotation] Primary and secondary on-call shifts run in one-week blocks. The calendar is maintained in our paging system; pages route to the current primary with automatic escalation to the secondary if unacknowledged. ## Before your shift [#before-your-shift] 1. **Skim the last two weeks of incidents.** Patterns recur — knowing the last time this alert fired is usually the fastest lead. 2. **Confirm paging works.** Send yourself a test page; verify the escalation chain. 3. **Verify dashboard access.** Observability dashboards, feature-flag console, deploy dashboard, Cloud Run console, database console. 4. **Review recent deploys.** A page five minutes after a deploy is almost certainly about the deploy. ## When you are paged [#when-you-are-paged] 1. **Acknowledge within 5 minutes.** Even if you are not ready to act, acknowledge stops escalation. 2. **Open the incident channel.** The paging system creates one automatically; post your initial assessment there. 3. **Localise, don't rebuild.** Use [Troubleshooting](/docs/operations/troubleshooting) to find the matching diagnostic tree. Do not write new code in an incident unless necessary. 4. **Apply the relevant [runbook](/docs/operations/runbooks).** If none exists, write one during the postmortem. 5. **Escalate when stuck.** 30 minutes without progress is the soft threshold. Call the secondary; call the service owner; call the service leader. ## Communication [#communication] The incident channel is the record. Post: * What you saw (the symptom). * What you checked (the diagnostic path). * What you did (the mitigation). * Who else is involved. One line every few minutes is better than radio silence. Other engineers read the channel to decide whether to jump in; absence of updates reads as "this is handled" when it may not be. ## After the incident [#after-the-incident] * **Close the page.** Confirm the alert is cleared. * **Open a postmortem ticket.** Use the blameless postmortem template; name the specific reliability assumption that was invalidated. * **File action items.** One concrete, closable ticket per action. "Be more careful" is not an action item. * **Update the runbook.** If the runbook missed a step, fix it while the experience is fresh. ## Tools every on-call engineer should have ready [#tools-every-on-call-engineer-should-have-ready] * Observability dashboards, pinned per service. * Deploy dashboard with rollback on hand. * Feature-flag console with write access. * Cloud Run console with per-service revision access. * Database console (read-only by default; write access only on demand, with an audit trail). * The team's runbook index. ## Related [#related] * [Reliability](/docs/principles/quality/reliability) — the SLO and error-budget model that shapes what gets paged. * [Troubleshooting](/docs/operations/troubleshooting) — diagnostic trees for common symptoms. * [Runbooks](/docs/operations/runbooks) — step-by-step recovery procedures. # Troubleshooting (/docs/operations/troubleshooting) # Troubleshooting [#troubleshooting] This page is for the "something feels off" moment, before you know which runbook to follow. It is a set of diagnostic trees — start from the symptom you can see, follow the branch that narrows the cause, then consult the matching [runbook](/docs/operations/runbooks) or escalate. ## Symptom: the frontend is blank after sign-in [#symptom-the-frontend-is-blank-after-sign-in] 1. **Check the browser console.** Look for 401/403 from `wordloop-core` → Clerk token issue. Look for 5xx → backend issue. 2. **Check the Core service health.** Hit `/healthz` on Core. If it responds, the backend is up; the problem is in auth or in the specific call the app makes first. 3. **Check JWT verification logs** on Core for the incoming request. A mismatch between the Clerk environment and the Core configuration will produce "token signature does not verify" here. ## Symptom: transcription lag is spiking [#symptom-transcription-lag-is-spiking] 1. **Check the ML service trace**. Filter for `transcribe.turn` spans with latency > SLO. If the model call itself is slow, the model provider or network is the cause. 2. **Check the model-client adapter logs.** Rate-limit responses from the provider surface here. 3. **Check the audio queue depth**. If the queue is deep, consumers are not keeping up — scale the ML workers or investigate a backpressure signal. ## Symptom: WebSocket connections drop repeatedly [#symptom-websocket-connections-drop-repeatedly] 1. **Check the gateway logs** for timeout errors — that usually indicates a platform-layer idle timeout below our expected session length. 2. **Check the client reconnect pattern.** A flood of reconnects from one client suggests a client-side bug; a broader pattern suggests a server-side issue. 3. **Check for `BACKPRESSURE_SHED` error frames.** If clients are being shed, the server is overloaded — check the SLO dashboard. ## Symptom: deploys are failing in CI [#symptom-deploys-are-failing-in-ci] 1. **Check the CI logs for the failing step.** Most failures are one of: tests broke, image build broke, vulnerability scan flagged a dependency. 2. **If tests broke,** run them locally (`./dev test `) — a flaky test should be fixed, not retried. 3. **If the image build broke,** often due to Dockerfile layer changes or base-image updates. The CI log shows the layer. 4. **If the vulnerability scan flagged,** the dependency audit is doing its job. Upgrade the dependency or add a justified waiver. ## When to move to a runbook [#when-to-move-to-a-runbook] If you have localised the symptom to a known failure mode (database slow, cache cold, model provider degraded, Pub/Sub backed up), move to the corresponding [runbook](/docs/operations/runbooks) for the recovery procedure. ## When to escalate [#when-to-escalate] * Symptom is user-visible and you cannot localise it within 10 minutes. * Symptom involves suspected security or privacy breach — escalate immediately ([Security](/docs/principles/quality/security), [Privacy](/docs/principles/quality/privacy)). * Symptom is a novel failure mode not covered by any runbook. Document it in the postmortem for future detection. See [On-Call](/docs/operations/on-call) for the escalation tree. # Engineering Manifesto (/docs/principles) # Engineering Manifesto [#engineering-manifesto] Software engineering is the discipline of managing complexity and optimising for change. Wordloop is a platform that processes high-volume asynchronous workloads and serves clients in real time at scale — so we lean hard on a solid technical foundation, frictionless developer velocity, and a rigorous engineering culture. > \[!IMPORTANT] > These principles are the shared vocabulary we use to decide what to build, how to build it, and what trade-offs we accept. Every page in this hub stands on its own and does not require context from any other document to be useful. The hub serves three audiences equally: engineers new to Wordloop learning how we think, experienced engineers returning for a stance on a specific domain, and AI agents working on a Wordloop task. ## What we believe [#what-we-believe] 1. **Complexity is the enemy; clarity is the goal.** We choose simple designs, simple tools, and simple processes — and we accept the cost of doing so. Speculative abstraction, premature generalisation, and fear of deletion all compound into the kind of complexity that slows teams down. 2. **Contracts are the single source of truth.** API specifications, event schemas, and database definitions are authoritative. Clients, tests, documentation, and UIs are derived from them. When a spec is wrong, everything downstream is wrong — and that is the correct failure mode, because one visible error beats silent drift across hand-maintained artefacts. 3. **Reliability is designed in, not patched in.** We build for failure from the first commit: idempotency at the API boundary, graceful degradation at the edges, backpressure when downstream systems slow, and observability as a design-time concern rather than an afterthought. 4. **We test the system, not the mock of the system.** Tests that run against real databases, real message brokers, and real HTTP stacks catch the bugs that mocked tests hide. Emulation beats mocking wherever the dependency can run in a container. 5. **Hexagonal architecture is how we structure services.** Ports and adapters, with dependencies flowing inward toward the domain. The predictable file topology is as valuable for the humans reading the code as it is for the agents writing it. 6. **Documentation is a product, not a by-product.** This site is versioned, reviewed, and shipped with the same discipline as code. It serves humans and AI agents, and the structures that help one help the other. 7. **Architectural decisions are append-only.** We record trade-offs as they are made, model them as debt (principal + interest + multiplier), and preserve the history. Re-litigating a past decision without a new decision record is how teams lose their memory. 8. **AI agents are first-class engineers.** They read our docs, write our code, review our diffs, and run our tooling. We design our codebase, our conventions, and this documentation so an agent can operate at the same level of quality as a senior engineer. ## How to read this hub [#how-to-read-this-hub] Start with the principle closest to your current task. Every page follows the same shape: a short statement of our stance, the industry context that makes it matter, the concrete principles we follow, and the anti-patterns we explicitly reject. * **[Testing](/docs/principles/foundations/testing)** — How we guarantee reliability with Continuous Risk Assurance: service tests over unit tests, high-fidelity emulation, observability-driven development, and risk-based coverage. More principle pages are being added as the hub expands to cover foundations, system design, our stack, quality, delivery, and AI-native development. Each new page is self-contained and lands on its own merits. # CLI Reference (/docs/reference/cli) # CLI Reference [#cli-reference] The WordLoop platform has fully deprecated legacy Makefiles in favor of a bespoke, shell-native `./dev` interface that powers all local execution logic safely and predictively. All targets are run from the monorepo root. Run `./dev help` for a formatted list. ## Lifecycle [#lifecycle] | Command | Description | | ------------------------------------ | --------------------------------------------------------------------------------- | | `./dev start all` | Start infra (Docker) + Core, ML, App, Docs (native) | | `./dev start all --docker` | Start everything in Docker containers | | `./dev start infra` | Start shared infra only (Postgres, Pub/Sub, Storage, OTel) | | `./dev start [services...]` | Start specific services natively (e.g. `./dev start core ml`) | | `./dev start [services...] --docker` | Start specific services in Docker containers | | `./dev stop all` | Stop everything safely (Docker + native processes) | | `./dev stop wipe` | Destructive: stop everything and destroy all data volumes | | `./dev stop [services...]` | Stop specific services (auto-detects native vs Docker) | | `./dev logs all` | Tail logs for all running services | | `./dev logs [services...]` | Tail logs for specific services — supports multi-tail (e.g. `./dev logs core ml`) | | `./dev attach db` | Drop into an interactive psql shell | | `./dev status` | Print local environment ports and endpoints | Services run **natively** by default with auto-reload (Air for Go, uvicorn for Python, HMR for Next.js). Use `--docker` to opt into Docker containers when needed. ## Quality [#quality] | Command | Description | | ------------------- | ------------------------------------------------------------- | | `./dev test all` | Execute all testing suites across all packages | | `./dev test system` | Execute strictly end-to-end integration boundaries via Pytest | | `./dev test smoke` | Run infrastructure health smoke tests | | `./dev test core` | Run Go test suites | | `./dev test ml` | Run Python Pytest suites | | `./dev test app` | Run TS Vitest suites | | `./dev lint all` | Run static analysis across all services | | `./dev lint core` | Run `go vet` on Core | | `./dev lint ml` | Run `ruff check` on ML | | `./dev lint app` | Run `eslint` on App | ## Utilities [#utilities] | Command | Description | | --------------------- | ------------------------------------------------- | | `./dev db migrate` | Apply all pending Core DB migrations | | `./dev db rollback` | Revert the single most recently applied migration | | `./dev db drop` | Destructive: completely drop the schema | | `./dev db shell` | Drop securely into the local PostgreSQL console | | `./dev dash obs` | Open the .NET Aspire Observability UI Dashboard | | `./dev dash api` | Open the ML API Swagger docs | | `./dev dash app` | Open the Next.js App | | `./dev dash docs` | Open the Fumadocs Documentation UI | | `./dev gcp pubsub` | Interact with local Pub/Sub emulator via gcloud | | `./dev gcp storage` | Query the local Storage emulator REST API | | `./dev gen api` | Generate OpenAPI schemas | | `./dev gen events` | Generate AsyncAPI structs across all services | | `./dev gen clients` | Rebuild typed API clients (Orval + Go + Python) | | `./dev gen docs` | Recompile OpenAPI metadata for docs UI | | `./dev setup env` | Copy environment baseline configurations | | `./dev setup install` | Install workspace-wide package dependencies | ## System [#system] | Command | Description | | ------------------------ | --------------------------------------------------------------------------------- | | `./dev doctor` | Validate all system dependencies, Docker status, port availability, and env files | | `./dev completions zsh` | Output zsh auto-completion script | | `./dev completions bash` | Output bash auto-completion script | **First time?** Run `./dev doctor` immediately after cloning to verify your machine has everything needed. ### Enabling auto-completion [#enabling-auto-completion] ```bash # Zsh — add to ~/.zshrc for permanent access eval "$(./dev completions zsh)" # Bash — add to ~/.bashrc eval "$(./dev completions bash)" ``` After sourcing, typing `./dev ` then pressing Tab will suggest available commands and sub-targets. ## Native vs Docker [#native-vs-docker] By default, `./dev start core` runs the Go service natively using Air for auto-reload. This means: * **File changes are detected automatically** — Air watches `.go` files and rebuilds in \~1 second * **Migrations run on every restart** — database schema is always current * **Logs go to `.dev/logs/`** — tail them with `./dev logs core` * **IDE debugging works** — you can also run Core from your IDE's debugger instead Use `--docker` when you need full containerized behavior (e.g., testing Dockerfiles, CI parity, or running without Go installed locally). ## Debug Environments [#debug-environments] By running selectively (e.g., `./dev start infra core`), you intentionally leave services like `wordloop-ml` turned off. This allows you to run those specific services through your IDE (like VSCode Launch actions) so you get full debugging breakpoint control while depending on a containerized or native backend. ## Resilience Model [#resilience-model] The CLI is designed for safety and resilience: * **Graceful shutdown**: `./dev stop` sends `SIGTERM` first, allowing services to flush connections and clean up. Only falls back to `SIGKILL` after a 3-second grace period. * **Subshell isolation**: All commands run in isolated subshells, preventing `cd` side-effects from corrupting your terminal's working directory. * **Port conflict detection**: `./dev doctor` and `./dev start` both check for port conflicts before launching services. * **No external dependencies**: Port checking uses native bash `/dev/tcp` instead of requiring `nc` or `netcat`. # Configuration (/docs/reference/configuration) # Configuration [#configuration] Every service in the Wordloop platform loads its configuration from environment variables, following the [Twelve-Factor App](https://12factor.net/) config principle. This page is the canonical catalogue of those variables — what they do, what their defaults are, and which service owns them. Local defaults are generated by `./dev setup env`. The variables listed here are the full contract; your local `.env` files typically override only the subset you need. ## Common variables [#common-variables] Variables consumed by multiple services. | Variable | Service(s) | Default (local) | Purpose | | ----------------------------- | ---------- | ----------------------- | ------------------------------------------------------------------------------------------------------------ | | `APP_ENV` | all | `development` | `development`, `test`, `staging`, `production`. Controls auth mode, logging verbosity, and feature defaults. | | `DATABASE_URL` | core | derived | Postgres connection string. | | `PUBSUB_EMULATOR_HOST` | core, ml | `localhost:8085` | Local Pub/Sub emulator. Unset in production. | | `OTEL_EXPORTER_OTLP_ENDPOINT` | all | `http://localhost:4318` | Collector endpoint for traces, metrics, and logs. | | `LOG_LEVEL` | all | `info` | `debug`, `info`, `warn`, `error`. | ## `wordloop-core` [#wordloop-core] | Variable | Default | Purpose | | ----------------------- | ---------------------- | ---------------------------------------- | | `CORE_PORT` | `4002` | HTTP + WebSocket port. | | `CLERK_SECRET_KEY` | — | Backend Clerk key for JWT verification. | | `CLERK_PUBLISHABLE_KEY` | — | Frontend-shared key; surfaced for debug. | | `STORAGE_BUCKET` | `wordloop-local-audio` | GCS bucket for audio artefacts. | ## `wordloop-ml` [#wordloop-ml] | Variable | Default | Purpose | | ---------------------- | ----------- | --------------------------------------------- | | `ML_PORT` | `4003` | FastAPI port. | | `MODEL_PROVIDER` | `anthropic` | Chooses which model adapter to load. | | `ANTHROPIC_API_KEY` | — | Set when `MODEL_PROVIDER=anthropic`. | | `OPENAI_API_KEY` | — | Set when `MODEL_PROVIDER=openai`. | | `ML_CACHE_TTL_SECONDS` | `3600` | Cache lifetime for deterministic model calls. | ## `wordloop-app` [#wordloop-app] | Variable | Default | Purpose | | ----------------------------------- | ----------------------- | ----------------------------------- | | `NEXT_PUBLIC_CORE_URL` | `http://localhost:4002` | URL the browser uses to reach Core. | | `NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY` | — | Clerk frontend key. | | `APP_PORT` | `4001` | Next.js port. | ## Feature flags [#feature-flags] Feature flags are served dynamically — they are not environment variables. See the flag dashboard for the current state and owners. Progressive-delivery principles ([Progressive Delivery](/docs/principles/delivery/progressive-delivery)) govern how flags are created, rolled, and retired. ## Further reading [#further-reading] * [Quickstart](/docs/start/quickstart) — bootstrapping local `.env` files. * [Security](/docs/principles/quality/security) — the rules around secret handling. * [Twelve-Factor App](https://12factor.net/) — the philosophy behind environment-based config. # Database Schema (/docs/reference/database) # Database Schema [#database-schema] The Postgres database is owned exclusively by `wordloop-core`. Schema changes must be managed through versioned SQL migrations in `services/wordloop-core/scripts/migrations/`. Do not apply manual schema alterations. ## ER diagram [#er-diagram] ## Tables [#tables] ### `users` [#users] Primary user account linked to Auth0. | Column | Type | Notes | | ------------ | ---------------- | ------------------ | | `id` | UUID PK | | | `auth0_id` | TEXT UNIQUE | External identity | | `email` | TEXT | | | `name` | TEXT | | | `person_id` | UUID FK → people | Optional self-link | | `created_at` | TIMESTAMPTZ | | ### `people` [#people] Contacts and meeting participants. | Column | Type | Notes | | -------------------- | ----------- | ---------------------------------- | | `id` | UUID PK | | | `name` | TEXT | | | `role` | TEXT | Job title / role description | | `email` | TEXT | | | `company` | TEXT | | | `tags` | JSONB | | | `voice_model_status` | TEXT | `untrained` / `training` / `ready` | | `voice_confidence` | DECIMAL | | | `voice_vector` | vector(512) | Optional SpeechBrain embedding | ### `meetings` [#meetings] Recorded or uploaded conversations. | Column | Type | Notes | | ------------- | --------------- | --------------------------------------------- | | `id` | UUID PK | | | `user_id` | UUID FK → users | | | `title` | TEXT | | | `start_time` | TIMESTAMPTZ | | | `end_time` | TIMESTAMPTZ | | | `summary` | TEXT | AI-generated summary | | `key_points` | JSONB | | | `source_type` | TEXT | `recording` / `upload` / `text` / `anecdotal` | | `created_at` | TIMESTAMPTZ | | ### `meeting_audio_files` [#meeting_audio_files] Audio files attached to meetings. | Column | Type | Notes | | -------------- | ------------------ | --------- | | `id` | UUID PK | | | `meeting_id` | UUID FK → meetings | | | `storage_path` | TEXT | GCS path | | `file_name` | TEXT | | | `content_type` | TEXT | MIME type | | `file_size` | BIGINT | Bytes | | `created_at` | TIMESTAMPTZ | | ### `transcriptions` [#transcriptions] Records the transcription job details connected to a meeting. | Column | Type | Notes | | ---------------- | ------------------ | ------------------------------------------------------------------------------------------ | | `id` | UUID PK | | | `meeting_id` | UUID FK → meetings | | | `status` | TEXT | enum: `pending`, `transcribing`, `diarizing`, `extracting_features`, `completed`, `failed` | | `status_message` | TEXT | Optional error details | | `created_at` | TIMESTAMPTZ | | | `updated_at` | TIMESTAMPTZ | | ### `transcription_status_history` [#transcription_status_history] Audit log of transcription status changes. | Column | Type | Notes | | ------------------ | ------------------------ | ----- | | `id` | UUID PK | | | `transcription_id` | UUID FK → transcriptions | | | `status` | TEXT | | | `status_message` | TEXT | | | `created_at` | TIMESTAMPTZ | | ### `transcript_segments` [#transcript_segments] Timestamped chunks of transcribed speech. | Column | Type | Notes | | ------------------ | ------------------------ | --------------------------------------------- | | `id` | UUID PK | | | `transcription_id` | UUID FK → transcriptions | | | `person_id` | UUID FK → people | Nullable | | `speaker_label` | TEXT | Temporary label before identification | | `text` | TEXT | | | `start_time` | DECIMAL | Seconds from start | | `end_time` | DECIMAL | Seconds from start | | `confidence` | DECIMAL | Transcription confidence | | `is_final` | BOOLEAN | Indicates if segment is finalized (streaming) | | `feature_vector` | vector(512) | SpeechBrain embedding | ### `tasks` (formerly `action_items`) [#tasks-formerly-action_items] Actionable items extracted from meetings. | Column | Type | Notes | | ------------- | ------------------ | ----------------------- | | `id` | UUID PK | | | `user_id` | UUID FK → users | | | `content` | TEXT | | | `status` | TEXT | `pending` / `completed` | | `due_date` | DATE | | | `assigned_to` | UUID FK → people | | | `meeting_id` | UUID FK → meetings | | | `sub_tasks` | JSONB | | | `created_at` | TIMESTAMPTZ | | ### `notes` [#notes] Free-form notes attached to people or meetings. | Column | Type | Notes | | -------------- | --------------- | -------------------- | | `id` | UUID PK | | | `user_id` | UUID FK → users | | | `content` | TEXT | | | `subject_type` | TEXT | `PERSON` / `MEETING` | | `subject_id` | UUID | Polymorphic FK | | `tags` | JSONB | | | `created_at` | TIMESTAMPTZ | | | `updated_at` | TIMESTAMPTZ | | ### `ai_threads` [#ai_threads] Contextual AI conversation containers. | Column | Type | Notes | | -------------- | --------------- | -------------------- | | `id` | UUID PK | | | `user_id` | UUID FK → users | | | `context_type` | TEXT | `PERSON` / `MEETING` | | `context_id` | UUID | Polymorphic FK | | `created_at` | TIMESTAMPTZ | | ### `chat_messages` [#chat_messages] Individual messages within an AI thread. | Column | Type | Notes | | ------------ | --------------------- | ---------------------------------------- | | `id` | UUID PK | | | `thread_id` | UUID FK → ai\_threads | | | `role` | TEXT | `user` / `assistant` / `system` / `tool` | | `content` | TEXT | | | `tool_calls` | JSONB | | | `created_at` | TIMESTAMPTZ | | ## Migration history [#migration-history] Migrations are applied via `./dev db migrate` and live in `services/wordloop-core/scripts/migrations/`. | Version | Description | | ---------------- | ------------------------------------------------------------------ | | `20250709123530` | Initial schema (users, people) | | `20260309152000` | Meetings, transcripts, tasks, notes, AI threads | | `20260313204400` | Add `person_id` to users | | `20260315213000` | Rename `action_items` → `tasks` | | `20260324204621` | Add `meeting_audio_files` | | `20260324211500` | Add `meeting.status` | | `20260326090621` | Add `meeting.status_message` | | `20260327200316` | Update transcript segment fields | | `20260329060000` | Add `is_final` to transcript\_segments | | `20260329204000` | Add `meeting_status_history` (later dropped) | | `20260330203000` | Add pgvector extension, `transcriptions` table, and `voice_vector` | # Errors (/docs/reference/errors) # Errors [#errors] The Wordloop Core API follows RFC 9457 (`application/problem+json`) for error responses. Every error carries a `status` (HTTP code), a `title` (short stable description), and an optional `detail` string with context. Clients and AI agents should branch on `status` and `title` — these are stable and never renumbered. ## Envelope [#envelope] All error responses follow this shape: ```json { "status": 404, "title": "Not Found", "detail": "No meeting with the provided id exists.", "instance": "/meetings/abc123" } ``` Validation errors include an `errors` array of field-level diagnostics: ```json { "status": 400, "title": "Unprocessable Entity", "detail": "Request body did not match the schema.", "errors": [ { "message": "required", "path": "body.title", "value": "" } ] } ``` ## Common HTTP status codes [#common-http-status-codes] | Status | Title | Meaning | Action | | ------ | --------------------- | ----------------------------------------------------------------- | --------------------------------------------- | | 401 | Unauthorized | The request lacked a valid Clerk token or session. | Re-authenticate; refresh token. | | 403 | Forbidden | The caller is authenticated but not authorised for this resource. | Confirm role and scope. Do not retry. | | 404 | Not Found | The resource does not exist. | Verify the identifier; check user visibility. | | 400 | Unprocessable Entity | The request body did not match the schema. | Inspect `errors` for field-level diagnostics. | | 409 | Conflict | An `Idempotency-Key` was reused with a different payload. | Generate a fresh key; retry. | | 429 | Too Many Requests | Per-caller rate limit exceeded. | Back off per `Retry-After`. | | 504 | Gateway Timeout | A downstream dependency timed out. | Retry with exponential backoff. | | 500 | Internal Server Error | Unexpected server error; details captured in our observability. | Retry with backoff; escalate if sustained. | ## WebSocket error frames [#websocket-error-frames] Real-time errors use a custom envelope on the wire: ```json { "type": "error", "error": { "code": "SESSION_EXPIRED", "message": "Session token expired; reconnect with a fresh one.", "details": { "session_id": "sess_..." } } } ``` | Code | Meaning | Action | | ------------------- | ---------------------------------------------------------------------------------------------- | -------------------------------------------------- | | `SESSION_EXPIRED` | The WebSocket session token is no longer valid. | Fetch a new token; reconnect. | | `RESUME_FAILED` | The server could not resume the session at the supplied sequence. | Reconnect without a resume token; rehydrate state. | | `BACKPRESSURE_SHED` | Informational: the server dropped a low-priority message because the client could not keep up. | No client action required. | ## Further reading [#further-reading] * [API Design](/docs/principles/system-design/api-design) — the stance on structured errors. * [Agent-Native Systems](/docs/principles/ai-native/agent-native-systems) — why stable codes matter for agent consumers. * [Core API Reference](/docs/reference/api/core) — per-endpoint error catalogues rendered from the OpenAPI spec. # Glossary (/docs/reference/glossary) # Glossary [#glossary] The authoritative vocabulary of Wordloop. When code, docs, or conversation refers to one of these terms, this page is what the term means. The domain-level concepts also appear in [Learn / Concepts](/docs/learn/concepts), where they are explained with more narrative context. ## A [#a] **ADR — Architecture Decision Record.** An append-only document capturing a significant, hard-to-reverse decision, with explicit debt annotations. See [Decisions](/docs/decisions). **Adapter.** A component that implements a [port](#p), bridging the domain to an external dependency (database, message broker, HTTP framework). Part of the [hexagonal](#h) architecture vocabulary. **AsyncAPI.** The machine-readable specification format we use to document event streams — the asynchronous counterpart to OpenAPI. See [Core Events Reference](/docs/reference/events/core-ws). ## B [#b] **Backpressure.** The explicit control signal by which a producer is slowed when a consumer cannot keep up. In Wordloop, backpressure is designed into every real-time flow — we shed, coalesce, or block rather than buffering unbounded. See [Real-Time](/docs/principles/system-design/real-time). ## C [#c] **Canary.** A release shape where a small fraction of traffic reaches a new revision before it is promoted to 100%. See [Progressive Delivery](/docs/principles/delivery/progressive-delivery). **Clerk.** The third-party authentication provider we use for user identity. JWTs from Clerk are verified by `wordloop-core` on every request. **Core (wordloop-core).** The Go HTTP and WebSocket API that is the source of truth for Meetings, People, Transcriptions, Tasks, and real-time session state. ## D [#d] **DORA metrics.** Deployment frequency, lead time for changes, change failure rate, and mean time to recover — the four research-backed metrics we use to measure delivery performance. See [DevEx](/docs/principles/delivery/devex). ## E [#e] **Error budget.** The quantity of "bad" events allowed by an SLO over a rolling window. Consumed by outages; restored by uptime. See [Reliability](/docs/principles/quality/reliability). **Eval.** A scored comparison of model output against a reference. We run evals in CI to catch regressions in AI-driven behaviour. See [AI Engineering](/docs/principles/ai-native/ai-engineering). ## H [#h] **Hexagonal architecture.** The structural pattern — domain core, ports, adapters — that every non-trivial Wordloop service follows. See [Hexagonal Architecture](/docs/principles/system-design/hexagonal-architecture). ## I [#i] **Idempotency key.** A client-supplied identifier that lets the server recognise and safely handle retried writes. Every write endpoint in Wordloop accepts one. See [API Design](/docs/principles/system-design/api-design). **IDP — Internal Developer Platform.** The set of shared tooling, runtimes, and golden paths engineers use to build on Wordloop. See [Platform](/docs/principles/delivery/platform). ## J [#j] **JIT — Just-in-Time provisioning.** The pattern by which Wordloop creates a local User and Person record the first time a user signs in via Clerk. No webhooks, no seeding. See [Quickstart](/docs/start/quickstart). ## L [#l] **llms.txt.** The machine-readable index of a documentation site, consumed by AI agents. See [/llms.txt](/llms.txt) and [Agent-Native Systems](/docs/principles/ai-native/agent-native-systems). ## M [#m] **MCP — Model Context Protocol.** The interoperable protocol we use to expose tools and resources to AI agents. See [Agent-Native Systems](/docs/principles/ai-native/agent-native-systems). **Meeting.** The primary entity in Wordloop — a bounded session that is captured in the system, attended by People, and producing a Transcription, a MeetingSynthesis, and Tasks. The `meetings` table and `/meetings` routes are the centre of the domain. **MeetingSynthesis.** The AI-generated summary attached to a Meeting. Contains a headline, prose summary, key points, Topics, and TalkingPoints. Produced by the ML service after the Transcription finalises. **ML (wordloop-ml).** The Python FastAPI runtime responsible for transcription, synthesis generation, and embedding. ## N [#n] **Note.** A free-form annotation attached to any entity via a polymorphic `subject_type` / `subject_id` pair. ## O [#o] **OpenAPI.** The machine-readable specification format we use to document HTTP APIs. Our server handlers and clients are generated from it. See [API Design](/docs/principles/system-design/api-design). **OTel — OpenTelemetry.** The vendor-neutral observability framework we use for traces, metrics, and logs. See [Observability](/docs/principles/quality/observability). **Outbox pattern.** The transactional pattern by which a database write and an event emission are committed together, via an `outbox` table. See [Integration Patterns](/docs/principles/system-design/integration-patterns). ## P [#p] **Person.** A contact record representing someone who appeared in a Meeting, with or without a Wordloop account. Carries identity fields and an optional voice model for speaker attribution. **pgvector.** The Postgres extension we use as our production vector store. See [Postgres](/docs/principles/stack/postgres). **Port.** An interface declared by the domain describing a capability it needs, implemented by an [adapter](#a). Part of the [hexagonal](#h) architecture vocabulary. ## R [#r] **RAG — Retrieval-Augmented Generation.** The pattern of enriching a model call with retrieved context from our own data. See [AI Engineering](/docs/principles/ai-native/ai-engineering). **Runbook.** A step-by-step recovery procedure for a known failure mode. See [Operations / Runbooks](/docs/operations/runbooks). ## S [#s] **SLO — Service Level Objective.** A per-journey target for latency and success rate, measured over a rolling window. The foundation of [Reliability](/docs/principles/quality/reliability). ## T [#t] **Tag.** A user-defined label applied to Meetings, People, or Tasks for organisation. **TalkingPoint.** A specific point or claim within a Topic, surfaced as a bullet in the MeetingSynthesis view. **Task.** An action item extracted from a Meeting. Tasks are assignable, hierarchical, and tracked through to completion. Statuses: `pending`, `in_progress`, `completed`, `canceled`. **Topic.** A thematic cluster extracted from a Meeting's TranscriptSegments, carrying a name, summary, and the contributing segments. **Transcription.** The speech-to-text record attached to a Meeting, aggregating TranscriptSegments as they arrive from the ML service. **TranscriptSegment.** The atomic unit of a Transcription — one speaker turn, carrying speaker label, attributed Person, text, timestamps, confidence score, and a `is_final` flag. ## U [#u] **User.** A Wordloop account holder, identified via Clerk and JIT-provisioned on first sign-in. Each User has an associated Person record. ## V [#v] **Voice model.** The speaker-identification vector attached to a Person, built from verified TranscriptSegments and used to attribute future segments to a specific Person. ## W [#w] **WebSocket.** The default transport for real-time streams in Wordloop. See [Real-Time](/docs/principles/system-design/real-time). # Reference (/docs/reference) # Reference [#reference] Reference material is **information-oriented**: terse, complete, and predictable. If you are returning to Wordloop after a break and need to remember the exact `./dev` flag, the JSON shape of an event, or what error code 4003 means — you are in the right section. ## Contract surfaces [#contract-surfaces] ## A note on sources of truth [#a-note-on-sources-of-truth] Wherever possible, reference pages are **generated from the same specs the code is generated from**. The API reference is rendered directly from `specs/*-openapi.json`, the events reference from `specs/*-asyncapi.yaml`, and the database schema from the live migrations. If a reference page seems to drift from reality, the spec is the canonical source — open an issue, then fix the spec, and the page will follow. # Your First Contribution (/docs/start/first-contribution) # Your First Contribution [#your-first-contribution] You have the platform running locally and you have read the relevant principle pages. Your first change is a chance to learn the tooling and the review culture, not to design a system. Pick something bounded. ## Good candidates for a first PR [#good-candidates-for-a-first-pr] * **Fix a typo or broken link in the docs.** The docs site lives in `services/wordloop-docs`. Edit an MDX file, rebuild locally, open a PR. * **Add a missing Vale word** to our style dictionary when linting flags a legitimate term. This teaches you the quality-governance workflow. * **Tighten an existing test** against a real behaviour it does not yet cover. Real bugs fall out of this kind of reading; fabricating new features does not. * **Improve a runbook** after you follow it and find a step that is unclear. ## The mechanics [#the-mechanics] 1. **Branch.** `git checkout -b your-name/short-description` from `main`. Keep names short and descriptive. 2. **Make the change.** Small and focused. If you find yourself fixing two things at once, split into two PRs. 3. **Run the relevant tests.** Use `./dev test ` for unit tests and `./dev test system` for cross-service integration. See [Run Tests](/docs/guides/run-tests). 4. **Lint your work.** `./dev lint all` covers Go, Python, and TypeScript; for docs changes run Vale if configured. 5. **Commit.** Write a commit message that explains *why* the change is needed, not just what changed. Our history is a long-term artefact. 6. **Open a PR.** Include a description of the change, the reasoning, a test plan, and links to any related issues or decision records. 7. **Respond to review.** Reviewers may push back on naming, structure, or scope. Treat review comments as invitations to improve the change, not as attacks. 8. **Merge when green.** CI must pass; a reviewer must approve. ## After merge [#after-merge] Watch the deploy. Our [CI/CD pipeline](/docs/learn/architecture/infrastructure) builds a Docker image, pushes it to Artifact Registry, and deploys to Cloud Run. If anything breaks in production, the on-call engineer will page — you may be asked to revert quickly. That is normal; it means the feedback loop is working. ## What to read next [#what-to-read-next] * [Guides](/docs/guides) — task-oriented how-tos for the common operations. * [Engineering Principles](/docs/principles) — the stance behind the code you are about to touch. * [Reference / CLI](/docs/reference/cli) — every `./dev` command in one table. # Getting Started (/docs/start/quickstart) # Getting Started [#getting-started] ## The `./dev` CLI Driver [#the-dev-cli-driver] All local orchestration, testing, database migrations, and telemetry dashboards are driven exclusively by the custom `./dev` CLI tool located in the repository root. See the [CLI Reference](/docs/reference/cli) to get started! ## Prerequisites [#prerequisites] | Tool | Version | Purpose | | --------------------------------------------- | -------------- | ------------------------------------------------------------- | | [Docker](https://docs.docker.com/get-docker/) | Compose v2.20+ | Infrastructure services | | [Go](https://go.dev/) | 1.25+ | wordloop-core | | [Air](https://github.com/air-verse/air) | latest | Go auto-reload (`go install github.com/air-verse/air@latest`) | | [uv](https://github.com/astral-sh/uv) | latest | wordloop-ml Python env | | [pnpm](https://pnpm.io/) | latest | wordloop-app dependencies | | [ffmpeg](https://ffmpeg.org/) | latest | ML audio processing | {/* LLM-Context: TL;DR: This guide is a "Day Zero" guided walkthrough. It moves beyond raw commands mapping out how developers should use `./dev start infra` to bootstrap their local environment, and attach IDE debuggers (like VSCode / GoLand) specifically for service debugging. */} ## The "Day Zero" Guided Walkthrough [#the-day-zero-guided-walkthrough] Welcome to Wordloop! Instead of throwing a wall of terminal commands at you, this guide walks you through setting up your environment for an optimal local development experience, including hooking up your IDE debuggers. ### Step 1: Environment Checks [#step-1-environment-checks] Before starting, validate your local toolchain: ```bash # Assumes you have cloned the repo and are at the root ./dev doctor ``` If `doctor` flags any missing dependencies (like Docker, Go, or Node) or occupied ports, follow its provided instructions to resolve them. ### Step 2: Bootstrapping Config & Secrets [#step-2-bootstrapping-config--secrets] Generate and configure your local environment files: ```bash ./dev setup env ``` This scaffolds `.env` and `.env.local` files across the monorepo. * **wordloop-ml:** Edit to add ML/AI API keys. * **wordloop-app & core:** Add your Clerk frontend & backend keys for authentication. ### Step 3: Install Package Dependencies [#step-3-install-package-dependencies] ```bash ./dev setup install ``` ### Step 4: Infrastructure & IDE Debugging [#step-4-infrastructure--ide-debugging] We use a Hybrid Development Model. Infrastructure (Postgres, PubSub, etc.) runs statically in Docker, allowing you to run your target application natively in your IDE. If you are working on the App Frontend but want to run Core natively so you can step through Go code: 1. **Start the dependencies in the background:** ```bash ./dev start infra ml app ``` *This starts the DB, Pub/Sub, the ML service, and the Next.js frontend.* 2. **Launch the Core service in your IDE:** * **VSCode:** Open the debug panel and run the "Launch Core API" configuration. * **GoLand:** Run the `cmd/server/main.go` file with Debug context. Now, any frontend requests will hit your breakpoints in the Core API. If you just want to run everything locally without IDE debugging (e.g., verifying a PR): ```bash ./dev start all ``` ## The Hybrid Development Model [#the-hybrid-development-model] Infrastructure runs in Docker (stable, rarely changes). Application services run natively for instant feedback: | What runs | Where | Auto-reload? | | ------------------------------- | ----------------- | ------------------------------- | | Postgres, PubSub, Storage, OTel | Docker containers | n/a | | Core API (Go) | Native via Air | ✅ Rebuilds on `.go` file change | | ML API (Python) | Native via uv | ✅ Restarts on `.py` file change | | App (Next.js) | Native via pnpm | ✅ HMR in browser | ### Typical workflows [#typical-workflows] ```bash # Full stack (recommended for daily work) ./dev start all # Infrastructure only (run services from your IDE) ./dev start infra # Infrastructure + specific services ./dev start infra core # Debug ML from IDE ./dev start infra core ml # Debug App from IDE # Force Docker containers (for integration testing) ./dev start core ml --docker ``` ### Tailing logs [#tailing-logs] Native service logs are written to `.dev/logs/` and can be tailed with the same CLI: ```bash ./dev logs core # Tail Core output ./dev logs ml # Tail ML output ./dev logs core ml # Multi-tail Core + ML simultaneously ./dev logs all # Tail everything (Docker) ``` ## Full stack in Docker [#full-stack-in-docker] For CI-like environments or full-stack integration testing: ```bash ./dev start all --docker # Everything in containers ./dev logs all # Tail all logs ./dev stop all # Stop everything ``` ## Authentication [#authentication] Authentication is handled automatically through **JIT provisioning**: 1. Sign in via Clerk (Google, email, or test accounts) in the browser 2. The Core API verifies the Clerk JWT 3. If the user doesn't exist locally yet, they're auto-created from the Clerk API 4. No webhook tunnels, no manual tokens, no database seeding System tests use a separate `APP_ENV=test` mode with raw UUID tokens. See [Testing](/docs/principles/foundations/testing) for details. ## Linting [#linting] ```bash ./dev lint # Lints core (go vet), ml (ruff), and app (eslint) ./dev lint core # Go linter only ./dev lint ml # Python Ruff only ./dev lint app # TypeScript ESLint only ``` ## Checking status [#checking-status] ```bash ./dev status # Show nicely formatted dashboard of running services ``` See [CLI Reference](/docs/reference/cli) for the complete target list. # How We Work (/docs/work) # How We Work [#how-we-work] This section describes how we move from an observed problem to shipped customer value. The process is lean by design, enforcing that technical execution is strictly bound to clear intent and verified by automated tests from the very beginning. It answers the fundamental question: *How do you move fast without skipping the discovery that stops you building the wrong thing?* Work flows through four stages, each more concrete than the last: *** # Inside each stage [#inside-each-stage] ## 1. Problem Statement [#1-problem-statement] Most wasted work is caused by excellent execution of the wrong thing. Skipping from idea to solution — without pausing to understand the problem — leads to building with false confidence. A **Problem Statement** captures observed pain — real, evidenced, specific — alongside an **appetite**: a judgment about how much time this problem is worth solving. * **Appetite** is not an estimate of how long a solution will take. It is an opportunity cost judgment made *before* the solution is defined. You are betting that the problem is worth that much time. * Problem statements do not accumulate indefinitely. They are a curated list, updated as understanding evolves and retired when no longer relevant. * **Platform and infrastructure problems are valid problem statements.** The "who experiences it" can be internal — the engineering team, the system's reliability, the business's compliance posture. Feature bets routinely surface infrastructure gaps (e.g., a missing event backplane, no deletion cascade). The right response is to extract the gap as its own problem statement — not to expand the feature bet. The feature bet declares the constraint explicitly; the platform bet solves it. ## 2. Pitch [#2-pitch] Unformed ideas become backlogs. Backlogs create the illusion that everything is captured and considered, when really they are lists of things nobody explicitly said no to. Before a problem reaches the build phase, it is shaped into a **Pitch**. A pitch links a validated problem to a rough solution proposal. It is concrete enough to execute against but stays away from micro-detail. A pitch must contain: * **The problem** — what was observed, who experiences it, why it matters now. * **The appetite** — how much time to spend. * **A rough solution sketch** — the general approach to the solution. * **Rabbit holes** — approaches already considered and ruled out. Include plausible-looking approaches that would blow the appetite or the scope, and infrastructure assumptions the bet makes (e.g., "we assume sticky sessions, not a backplane"). * **Explicit no-gos** — what is completely out of scope. Include both obvious exclusions *and* natural extensions that users would reasonably expect but that don't belong in this version (e.g., pause/resume, mobile support, export/download). Vague no-gos invite scope creep — be specific about what's excluded and why. A funded pitch becomes a **Bet** — a commitment bounded by the appetite. ## 3. TDD: Foundations [#3-tdd-foundations] Technical Design bridges the intent of the pitch to parallel execution. We start by laying the technical foundation so that progress isn't blocked later by misaligned interfaces. ### UI Design [#ui-design] The **UI Design** doc translates the pitch's rough solution sketch into concrete, screen-level detail. It answers: *what exactly will the user see and do?* Organise it by screen — not by feature, not by user story. Each screen the bet touches gets its own section with: * **A wireframe** — even a rough sketch. This is the anchor; the text describes it. * **Layout** — the regions on screen and what content lives in each one. * **States** — what the user sees during loading, active use, empty states, errors, and degraded conditions. * **Key interactions** — what the user can do and what happens in response. After the screens, map the **user journeys** between them (how the user moves from entry point to final outcome), and list **edge cases** (anything unusual the system needs to handle visibly). **Be specific about data objects.** If a screen shows tasks, define what a task is: which fields it has, which are required, whether they nest, what states they can be in. If a screen has a text editor, say whether it's rich text or plain, whether it auto-saves or has a save button. These details directly determine the API contracts and database schema that come next. **Stay at the user level.** If you're specifying which service owns the logic, how the frontend integrates, or where data persists — you've gone too far. The UI Design doc describes what the user experiences, not how the system delivers it. System concerns belong in the Data Flow doc. **The output feeds directly into:** Data Flow diagrams (which service calls which), API contracts (what fields and endpoints exist), and database schemas (what gets stored). If someone can't design those artefacts from the UI Design doc alone, the doc isn't detailed enough. ### Data Flows [#data-flows] The **Data Flow** doc maps every user interaction from the UI Design through service boundaries. It answers: *what calls what, what data crosses each boundary, and what happens when something fails.* It is the primary input for API contracts and database schemas — if someone can't design those artefacts from this doc alone, the doc isn't complete. **Start with a system context graph.** Before drawing any sequences, draw the topology: which services exist, which protocols connect them, which data stores each service owns. This is a static map — it orients readers and makes the scope of the bet explicit. **Name flows after what triggers them.** Group related flows into logical Parts (e.g., "Session Lifecycle", "Streaming Processing", "Failure Modes"). A flow name describes what the user does or what system condition fires — not the implementation. **Use descriptive operation labels — never endpoint paths.** Diagram labels should read like `Create task (idempotent, echo-suppressed)` not `POST /meetings/{id}/tasks`. Header names, field names, and HTTP methods all belong in the Contracts doc. Each arrow in a flow is a **contract boundary** (what shape the data takes) and a **sequencing constraint** (downstream cannot build until upstream is agreed). Naming the operation is enough — the Contracts doc defines the shape precisely. **Failure modes are required, not optional.** For every significant service boundary in the bet, there must be at least one flow describing what happens when that boundary fails. If the UI Design doc models a "Degraded" or "Connectivity Lost" state, the Data Flow doc must show the recovery sequence. Resilience is a first-class design concern — not an afterthought. **Close with two required sections:** * **Design Decisions** — tradeoffs made, alternatives ruled out, constraints that drove choices. Captures reasoning that isn't visible in the diagrams. * **Boundary Inventory** — a table of every service-to-service boundary in the doc. Five columns: Boundary | Flows | From → To | Protocol | Data shape. Each row here becomes a contract entry in the Contracts doc. ### Contracts & Schemas [#contracts--schemas] The agreed API contracts (REST, WebSocket, Pub/Sub) and database schemas (PostgreSQL, object storage). Downstream UI can mock against the contract; upstream Core can build against it. ## 4. TDD: Execution [#4-tdd-execution] Once the technical foundation is set, the bet is decomposed into deliverable units. * **Integration Milestones:** Points of user-visible value. This is the integration of multiple pieces that results in a cohesive feature or state change for the user. * **Domain Slices:** The smallest independently buildable and testable units of work. We **never** slice horizontally (e.g. building all databases, then all APIs, then all UI). We always slice vertically. A vertical slice could be a full feature connecting App -> Core -> ML, or a complete vertical slice completely within a single domain (e.g., being able to do CRUD on a Meeting in Core via the API). Slices must be independently deployable and verifiable. ### Tests as Proof of Delivery [#tests-as-proof-of-delivery] Every milestone and slice has its test overview properly documented, and corresponding **empty test stubs are generated in the test runner** before any production code is written. These tests serve as the **single source of truth** for progress signaling. Red means work to do; green means proven. 1. **Service/system tests** (permanent) — implemented in the service repo or `tests/system/` during the build. 2. **Bet progress suite** (temporary) — mirrored in `tests/bets//`, run on demand via `./dev test bet `. *** ## Bet Operations [#bet-operations] By utilizing the Golden Path CLI tools, documentation is kept exactly in sync with the integration testing layout. ### Start a new bet [#start-a-new-bet] ```bash ./dev new bet ``` Promotes a pitch into an active bet at `work//` and creates the baseline test boundary suite in `tests/bets//`. The slug must be lowercase kebab-case (e.g. `speaker-navigation`). A pitch must exist first — run `./dev new pitch `. ### Scaffolding Architecture (TDD) [#scaffolding-architecture-tdd] ```bash # Scaffolds architectural boundaries ./dev new contract ./dev new schema # Scaffolds milestones and domains ./dev new milestone ./dev new slice ``` Generating a `slice` or a `milestone` will drop corresponding placeholder testing boundaries in `tests/bets/`. ### Run bet progress tests [#run-bet-progress-tests] ```bash ./dev test bet ``` Runs the bet progress suite on demand. Watch the test output to verify that your delivery is progressing as intended. ### Archive a delivered bet [#archive-a-delivered-bet] ```bash ./dev archive bet ``` Moves the bet directory to `_archive/` and the associated test suite to `tests/bets/_archive//`. URL routing is preserved. # Authentication & Authorization (/docs/learn/architecture/auth) # Authentication & Authorization [#authentication--authorization] Wordloop delegates absolute identity management to **Clerk** while retaining local user schemas strictly to anchor database relations. Internal services rely on symmetric tokens for system-level trust. Zero-trust principles apply at external boundaries; inherited trust applies internally. ## User Authentication Flow (Clerk) [#user-authentication-flow-clerk] Clerk acts as our authoritative identity provider (IdP). ### Frontend Implementation [#frontend-implementation] * **Identity Context:** `wordloop-app` uses `@clerk/nextjs` for all auth flows. * **Header Injection:** JWT tokens are automatically injected into `wordloop-core` requests as `Authorization: Bearer ` by the Orval API clients via a custom fetch interceptor. ### Backend Validation [#backend-validation] * **Middleware:** `wordloop-core` uses robust Clerk middleware within the Huma framework. * **Verification:** The middleware validates the JWT symmetrically against Clerk's JWKS endpoint, extracting the `clerk_user_id` directly into the Request `context.Context`. ## Data Synchronization [#data-synchronization] To link auth identities with core business entities (like Meetings or Transcripts), users are synchronized into the local Postgres database. Database synchronization occurs asynchronously via Clerk Webhooks. 1. **User Creation:** When a user registers, Clerk fires a `user.created` webhook to `wordloop-core`. 2. **Database Sink:** Core validates the Svix headers, parses the webhook payload, and idempotently upserts the record into the `users` table. ## Service-to-Service Authentication [#service-to-service-authentication] When internal services communicate outside of standard user contexts (e.g., the ML engine pulling an audio binary from Core API endpoints), they use a static symmetric token. * **Header Specification:** `Authorization: Bearer ` * **Assumed Scope:** Full administrative access. **Never expose the `SERVICE_AUTH_TOKEN` to the frontend or public-facing API routes.** This token bypasses user validation logic. # Optimistic Mutation with Echo-Suppressed Streaming (/docs/learn/architecture/data-flow) # Optimistic Mutation with Echo-Suppressed Streaming [#optimistic-mutation-with-echo-suppressed-streaming] This is Wordloop's core data architecture for all user-initiated CRUD operations. The pattern separates **writes** (REST) from **reads** (WebSocket) to achieve perceived zero-latency mutations with real-time multi-device synchronization. This pattern governs all entity-level operations — notes, tasks, topics, meeting metadata, and any future entity types. Audio streaming and ML-generated events use different pipelines documented in [System Workflows](/docs/learn/architecture/system-workflows). ## Why This Design [#why-this-design] Traditional request/response flows force the user to wait for the server round-trip before seeing results. Polling-based updates miss state changes between intervals. Full event sourcing introduces operational complexity that isn't justified for Wordloop's entity CRUD workloads. This pattern sits in the pragmatic middle: | Concern | Approach | | --------------------- | -------------------------------------------------------------------------------------------------------------------- | | **Write path** | REST — transactional, idempotent, familiar error handling. The server is the single source of truth. | | **Read path** | WebSocket — server pushes complete entity payloads on every state change. No polling, no stale cache windows. | | **Perceived latency** | Optimistic updates — the client applies the change locally before the REST response. The UI responds in under 16ms. | | **Multi-device sync** | All connected clients for a user receive every state change via WebSocket. No refresh required. | | **Echo prevention** | Source-aware events — the originating client ignores its own echo by matching the `clientId` on the WebSocket event. | *** ## The Five-Step Data Loop [#the-five-step-data-loop] Every mutation follows this exact sequence: *** ## Step-by-Step Breakdown [#step-by-step-breakdown] ### Step 1 — Optimistic Update [#step-1--optimistic-update] When a user performs an action (add note, edit title, delete task), the client applies the change to local state **immediately**, before the network request fires. Three things happen: 1. **The change is applied to the UI.** The user sees the result instantly. 2. **A rollback snapshot is stored.** If the server rejects the mutation, the client reverts to this snapshot. 3. **A pending indicator is shown.** Optimistic entities render with a subtle visual cue (reduced opacity, syncing badge, or a small spinner) so the user understands the change is not yet confirmed. The indicator is removed when the REST response arrives. For entity **creation**, the client generates a temporary ID (a UUID prefixed with `temp_`) so the new entity can appear in the UI and be referenced before the server assigns a permanent ID. ### Step 2 — REST Mutation [#step-2--rest-mutation] The mutation is sent to the appropriate REST endpoint with two critical headers: ```http POST /api/v1/notes HTTP/1.1 Authorization: Bearer X-Client-Id: abc-123 Content-Type: application/json { "meetingId": "mtg_01J...", "content": "Follow up with the design team" } ``` | Header | Purpose | | --------------- | ---------------------------------------------------------------------------------------------------------------------- | | `Authorization` | User identity (JWT from Clerk). Determines **who** is performing the action. | | `X-Client-Id` | Client instance identity. Determines **which device/tab** initiated the action. Used exclusively for echo suppression. | The REST response returns the **complete server-authoritative entity** — including the server-assigned `id`, `createdAt`, `updatedAt`, and `version` fields. The client uses this response to replace its temporary optimistic state with the confirmed server state. ### Step 3 — Event Broadcast [#step-3--event-broadcast] After the database write succeeds, Core publishes a WebSocket event to **all connected clients** within the event's scope. The event uses the CloudEvents envelope and carries the full entity payload: ```json { "specversion": "1.0", "type": "note.created", "source": "wordloop-core", "id": "evt_01J...", "data": { "id": "note_01J...", "meetingId": "mtg_01J...", "content": "Follow up with the design team", "createdAt": "2026-04-17T20:00:00Z", "updatedAt": "2026-04-17T20:00:00Z", "version": 1 }, "sourceClientId": "abc-123" } ``` Events carry the full entity state, not a delta. This keeps client logic simple — the receiving client replaces its local copy of the entity directly without applying patch operations or maintaining a change log. The trade-off is larger payloads, which is acceptable for Wordloop's entity sizes. ### Step 4 — Echo Suppression [#step-4--echo-suppression] The originating client receives the WebSocket event and compares `sourceClientId` against its own client ID: ``` Incoming event sourceClientId: "abc-123" My clientId: "abc-123" → Match. Discard event (UI already reflects this from the optimistic update). ``` Without echo suppression, the originating client would render the change twice — once from the optimistic update and once from the WebSocket event — causing visual flicker and duplicate list entries. ### Step 5 — Cross-Device Sync [#step-5--cross-device-sync] Other clients connected for the same user receive the identical WebSocket event. Since their `clientId` does not match the `sourceClientId`, they apply the entity payload directly to their local UI state: ``` Incoming event sourceClientId: "abc-123" My clientId: "def-456" → No match. Apply entity to local state. UI updates in real time. ``` No REST call is needed. The WebSocket event contains the complete entity, so the receiving client has everything it needs to render the change. *** ## Client Identity [#client-identity] ### What Is a Client ID? [#what-is-a-client-id] A `clientId` is a UUID generated **per browser tab** when the application initializes. It is **not** tied to the user's authentication identity — a single user can have multiple client IDs across different tabs and devices. | Property | Value | | --------------- | --------------------------------------------------------------------------------------- | | **Scope** | One per browser tab / app instance | | **Lifetime** | Created on tab open, discarded on tab close | | **Persistence** | Stored in `sessionStorage` (survives page refresh within the same tab, not across tabs) | | **Format** | UUIDv4 (e.g., `abc-123-def-456`) | ### Why Per-Tab, Not Per-Session? [#why-per-tab-not-per-session] If the client ID were per-session (shared across tabs), a mutation from Tab A would suppress the WebSocket event in Tab B — meaning Tab B would never render the change. Per-tab IDs ensure that only the exact tab that initiated the mutation suppresses the echo. ### Why Client ID, Not Mutation ID? [#why-client-id-not-mutation-id] Some architectures use a unique `mutationId` per operation instead of a persistent `clientId`. The trade-off: | Approach | Pros | Cons | | ------------------------ | ----------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ | | **Client ID** (Wordloop) | Simpler — one header, no per-mutation tracking state. Echo suppression is a single string comparison. | Cannot distinguish between two of *your own* rapid mutations on the same entity (both are suppressed). | | **Mutation ID** | Each operation is individually tracked. Can precisely reconcile specific operations. | Requires a pending-mutation queue on the client and mutation-ID propagation through the server. | Wordloop uses `clientId` because entity operations are independent and non-overlapping — a user does not typically create the same note twice in rapid succession. If Wordloop ever introduces collaborative editing where individual keystrokes must be tracked, mutation IDs would be required. *** ## Event Scoping [#event-scoping] Not every connected client receives every event. Events are scoped to the relevant audience: | Scope | Events Delivered To | Example | | ----------- | ----------------------------------------- | -------------------------------------------------- | | **User** | All clients authenticated as that user | `note.created`, `task.updated`, `meeting.deleted` | | **Meeting** | All clients viewing that specific meeting | `transcript.segment.produced`, `insight.generated` | The WebSocket hub maintains a mapping of `userId → [clientId, clientId, ...]` and a subscription registry of `meetingId → [clientId, clientId, ...]`. When Core emits an event, the hub resolves the target audience and delivers to only those connections. *** ## Initial State Hydration [#initial-state-hydration] When a client first loads, the WebSocket is not yet connected. The client must establish initial state before subscribing to real-time updates. The sequence is: Between the REST response and the WebSocket connection being established, events can be lost. To handle this, the client should include a `since` timestamp (from the REST response's latest `updatedAt`) in the WebSocket connection handshake. Core replays any events that occurred after that timestamp during the connection setup. *** ## Edge Cases [#edge-cases] ### Mutation Failure and Rollback [#mutation-failure-and-rollback] If the REST call fails, the client must **undo the optimistic update** and surface the error. The rollback strategy depends on the error category: | Error Category | HTTP Status | Retry? | Client Behavior | | ----------------------------- | ------------- | ------ | -------------------------------------------------------------------------------------------------------------------- | | **Validation / Client Error** | 400, 409, 422 | ❌ No | Roll back immediately. Surface error to user. The request is structurally wrong and retrying won't help. | | **Authentication Error** | 401, 403 | ❌ No | Roll back. Redirect to login or refresh token. | | **Not Found** | 404 | ❌ No | Roll back. Entity was deleted by another client. Surface "item no longer exists" notification. | | **Server Error** | 500, 502, 503 | ✅ Yes | Retry with exponential backoff + jitter (1s, 2s, 4s, max 3 retries). Roll back only after all retries are exhausted. | | **Network Timeout** | — | ✅ Yes | Retry once. If it still fails, roll back and surface ambiguous error: "Changes may not have been saved." | For **network timeouts**, the client cannot know whether the server received and processed the request. If the mutation did succeed server-side, the WebSocket event will eventually deliver the confirmed state — at which point the client should silently accept it rather than showing a duplicate. ### Optimistic ID Reconciliation [#optimistic-id-reconciliation] When creating a new entity, the client uses a temporary ID (`temp_xxx`) for the optimistic update. When the REST response returns with the server-assigned ID, the client must **replace the temporary ID** everywhere it appears in local state: ``` Optimistic state: { id: "temp_abc", content: "..." } REST response: { id: "note_01J...", content: "...", createdAt: "..." } → Replace temp_abc → note_01J in all local state references ``` The subsequent WebSocket echo is suppressed by `clientId` matching, so no further reconciliation is needed for the originating client. ### WebSocket Event Arrives Before REST Response [#websocket-event-arrives-before-rest-response] The WebSocket event can arrive at the originating client **before** the REST response under high load. This is safe because: 1. Echo suppression discards the event regardless of timing (the `clientId` matches). 2. The REST response is the authoritative confirmation — it arrives independently and the client reconciles from it. No special handling is required. ### Concurrent Mutations (Last-Write-Wins) [#concurrent-mutations-last-write-wins] If two devices edit the same entity simultaneously, the **last write to reach the database wins**. Both REST calls succeed independently, and both produce WebSocket events. Each client receives the other client's update event and replaces its local state. Wordloop uses last-write-wins, not conflict resolution. This is appropriate for the current entity types (notes, tasks, meeting metadata) where conflicts are rare and the cost of a lost edit is low. If collaborative editing (e.g., simultaneous text editing within a note) is introduced, this section must be revisited with CRDTs or Operational Transform. ### Delete Race Condition [#delete-race-condition] If Client A deletes an entity while Client B is editing it: 1. Client A's `DELETE` succeeds. Core publishes `note.deleted` over WebSocket. 2. Client B receives `note.deleted` and removes the entity from its UI — even if Client B has unsaved optimistic changes. 3. If Client B's `PATCH` arrives at Core **after** the delete, Core returns `404 Not Found`. Client B rolls back its optimistic update and surfaces the error. The delete always wins. The client must handle the case where a `deleted` event arrives for an entity the user is currently editing by closing the editor and surfacing a notification. ### Stale Event Ordering [#stale-event-ordering] Under network jitter or high load, WebSocket events for the same entity can arrive out of order. Each entity carries a `version` field (monotonically incrementing integer) and an `updatedAt` timestamp: ``` Current local state: { id: "note_01J...", version: 3 } Incoming WS event: { id: "note_01J...", version: 2 } → Event version < local version. Discard as stale. ``` The client must **never apply an event whose version is less than or equal to the local version** for the same entity. ### Reconnection and Missed Events [#reconnection-and-missed-events] When the WebSocket connection drops (network change, server restart, mobile backgrounding), events published during the disconnection window are lost: The client tracks the `id` of the last received event. On reconnection, it sends this as `lastEventId` in the handshake. Core replays all events after that ID from a short-lived event buffer before resuming the live stream. **Reconnection strategy:** The client uses **exponential backoff with jitter** to avoid thundering-herd reconnection storms when the server restarts: | Attempt | Base Delay | With Jitter (±30%) | | ------- | ---------- | ------------------ | | 1 | 1s | 0.7s – 1.3s | | 2 | 2s | 1.4s – 2.6s | | 3 | 4s | 2.8s – 5.2s | | 4 | 8s | 5.6s – 10.4s | | 5+ | 16s (cap) | 11.2s – 20.8s | The event buffer has a finite retention window. If the client has been disconnected longer than the buffer window, a replay is not possible. In this case, the client must perform a full state re-fetch via REST (the same hydration flow as initial page load) and then resume WebSocket subscription. ### Idempotency on REST Retry [#idempotency-on-rest-retry] If a REST mutation times out and the client retries, the server may process the same mutation twice — producing two WebSocket events for a single user action. For **create** operations, the client should generate and send an `Idempotency-Key` header. Core checks this key against a short-lived cache and returns the cached response if the key has been seen, preventing duplicate creation and duplicate WebSocket events. For **update** and **delete** operations, natural idempotency applies — updating to the same values or deleting an already-deleted entity produces the same result. ```http POST /api/v1/notes HTTP/1.1 Idempotency-Key: idem_7f3a9c... X-Client-Id: abc-123 ``` ### Partial Server Failure [#partial-server-failure] If the database write succeeds but the WebSocket broadcast fails (hub crash, network partition between Core and hub): * **Originating client**: Receives the REST `201 Created` response and knows the mutation succeeded. Its optimistic update is confirmed. * **Other clients**: Miss the WebSocket event and do not update their UI. This is an eventually-consistent failure. Other clients will receive corrected state on their next REST fetch (page navigation, tab focus) or when the WebSocket reconnects and replays missed events. This is acceptable because the originating client — the device where the user performed the action — always sees the confirmed state. ### Rapid Mutations on the Same Entity [#rapid-mutations-on-the-same-entity] If a user edits the same entity in rapid succession (typing a title, adjusting a slider), firing a REST call for every keystroke wastes bandwidth and creates ordering hazards where a slow early response overwrites a fast later one. **Strategy: Debounce + Coalesce** 1. **Debounce the REST call.** Wait until the user pauses interaction (300–500ms of inactivity) before sending the mutation. The optimistic update still applies immediately on every keystroke — only the network request is debounced. 2. **Coalesce intermediate states.** Only the final state is sent to the server, not every intermediate value. If the user types "Hel", "Hell", "Hello" — the server receives one `PATCH` with `"Hello"`. 3. **Cancel stale in-flight requests.** If a new mutation fires while a previous one is still in-flight for the same entity, abort the previous request using `AbortController` to prevent a stale response from overwriting the newer state. ``` User types: H → He → Hel → Hello → [pauses 300ms] Optimistic UI: H → He → Hel → Hello (each applied immediately) REST calls: [none] → [none] → [none] → PATCH { content: "Hello" } ``` ### Tab Focus Revalidation [#tab-focus-revalidation] When a browser tab regains focus after being backgrounded, the WebSocket may have silently disconnected without triggering an error event (common on mobile browsers and laptop lid-close). The client should treat tab-focus as a trigger to: 1. **Check WebSocket health.** If the connection is dead, initiate reconnection with `lastEventId` replay. 2. **Revalidate stale queries.** SWR's `revalidateOnFocus` (or equivalent) re-fetches the current view's data via REST to catch any mutations that occurred while the tab was inactive. This ensures the client is never silently stale after returning from background. ### WebSocket Authentication Lifecycle [#websocket-authentication-lifecycle] The WebSocket connection authenticates with a JWT during the initial handshake. Since JWTs have a finite lifetime, the connection must handle token expiry and session revocation: **Token Refresh (Proactive):** 1. The client monitors its JWT expiration. A few minutes before expiry, it refreshes the token via the standard Clerk token refresh. 2. The client sends an `auth.refresh` message over the *existing* WebSocket with the new token. 3. Core validates the new token and associates it with the connection. No reconnection is needed. **Session Revocation (Server-Initiated):** 1. When a user logs out from any device, or an admin revokes access, Core sends a `session.revoked` event to **all** WebSocket connections for that user. 2. Each client receives the event, closes the WebSocket, clears local state, and redirects to the login screen. 3. Core terminates the server-side connection after sending the event. **Token Expired (Reactive):** 1. If the token expires without a proactive refresh (client was backgrounded), Core sends a WebSocket close frame with code `4401` (custom "Unauthorized" code). 2. The client refreshes its token and reconnects with the new JWT. ### Cache Reconciliation on Settled [#cache-reconciliation-on-settled] After every mutation — whether it succeeds or fails — the client should revalidate the affected SWR cache key to ensure the local cache matches the server's authoritative state. This is the `onSettled` pattern: 1. **On success:** The REST response already contains the server-authoritative entity. The client updates the SWR cache with this response. A background revalidation is triggered to catch any concurrent mutations from other devices that may have occurred during the request. 2. **On error:** The rollback restores the snapshot, and a revalidation fetches the current server state to ensure the cache is clean. This guarantees that even if echo suppression, version comparison, or reconnection logic has a subtle bug, the cache self-heals within one mutation cycle. *** ## What This Pattern Does NOT Cover [#what-this-pattern-does-not-cover] | Concern | Handled By | | ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Audio streaming** | Dedicated binary WebSocket pipeline — see [Real-Time WebSocket Streaming](/docs/learn/architecture/system-workflows#2-real-time-websocket-streaming) | | **ML-generated events** | Transcript segments and insights originate from Pub/Sub consumers, not REST mutations. These always flow through the WebSocket without echo suppression because no client initiated them. | | **Authentication** | JWT validation on REST and WebSocket handshake — see [Authentication](/docs/learn/architecture/auth) | | **Pub/Sub worker pipelines** | Asynchronous inter-service communication — see [Unified Asynchronous Meeting Finalization](/docs/learn/architecture/system-workflows#1-unified-asynchronous-meeting-finalization) | # Infrastructure & Hosting (/docs/learn/architecture/infrastructure) # Infrastructure & Hosting [#infrastructure--hosting] Wordloop deploys entirely to managed Google Cloud serverless infrastructure in production. For information about local development and emulation, see the [Local Infrastructure](local-infrastructure.md) page. ## Production Hosting [#production-hosting] | Service | GCP Resource | Description | | ----------------- | ---------------- | -------------------------------------------------------------------------------------- | | **wordloop-docs** | Firebase Hosting | Next.js/Fumadocs static site deployment. | | **wordloop-app** | Cloud Run | Next.js server utilizing SSR and Route Handlers. | | **wordloop-core** | Cloud Run | Go REST API. | | **wordloop-ml** | Cloud Run (x2) | Deployed as two separate services: an HTTP web server and a Pub/Sub background worker. | | **Database** | Cloud SQL | Managed Postgres 15 database instance. | | **Messaging** | Cloud Pub/Sub | Managed topics and subscriptions. | ### API Routing (Production) [#api-routing-production] To ensure the frontend is environment-agnostic, the Next.js `wordloop-app` implements a **Server-Side API Proxy**. * All frontend fetches are directed to `/api/...`. * A Next.js Route Handler proxies these requests to the underlying `wordloop-core` URL (defined via the `CORE_API_URL` environment variable at runtime). * This prevents hardcoding backend URLs during the Next.js build step. ## Environment Configuration [#environment-configuration] Configuration relies exclusively on environment variables injected at runtime. There are NO configuration files deployed with the containers. See individual service handbooks for specifics. # Local Infrastructure (/docs/learn/architecture/local-infrastructure) # Local Infrastructure & Emulation [#local-infrastructure--emulation] Wordloop utilizes a **hybrid local-first development model** orchestrated via our custom `./dev` CLI. Instead of running the entire stack in heavy, monolithic Docker containers, we segment the environment: * **Infrastructure & Observability (Docker):** Stateful services (Postgres, Pub/Sub emulator, Storage emulator) and telemetry tools (Aspire Dashboard) run in Docker. * **Application Services (Native):** Code bases (`wordloop-core`, `wordloop-ml`, `wordloop-app`, `wordloop-docs`) run natively on your host machine. We use file monitoring tools (`air` for Go, `uvicorn` for Python, and Next.js dev server) to enable **instant hot-reloading**, bypassing the need to rebuild Docker images after every code change. ## Local Port Architecture [#local-port-architecture] To prevent port collisions, all services follow a well-structured port layout: ### Application Services (Native) [#application-services-native] | Service | Internal Target | Port | Tooling | | ----------------- | ------------------- | ------ | ---------------------- | | **wordloop-app** | Next.js Frontend | `4001` | `next dev` | | **wordloop-docs** | Fumadocs Site | `4000` | `next dev` | | **wordloop-core** | Go REST API | `4002` | `air` (hot-reload) | | **wordloop-ml** | Python API & Worker | `4003` | `uvicorn` (hot-reload) | ### Infrastructure Spaces (Docker) [#infrastructure-spaces-docker] | Service | Image / Role | Port | | -------------------- | ----------------------------- | ------- | | **Aspire Dashboard** | Local Observability UI | `18888` | | **Postgres** | `postgres:15` | `5432` | | **Pub/Sub** | `cloud-sdk:emulators` | `8085` | | **Storage (GCS)** | `oittaa/gcp-storage-emulator` | `8086` | * **Statefulness:** Postgres data is persisted in a local Docker volume (`db_data`). Emulators spin up ephemerally. The Core service programmatically provisions required Pub/Sub topics and buckets on boot. * **Bootstrapping:** Use `./dev start all` to bring up the Docker infra and native host services concurrently. Run `./dev help` for more granularity. ## Environment Configuration [#environment-configuration] Configuration relies exclusively on environment variables injected at runtime. There are NO configuration files deployed with the containers. See individual service handbooks for specifics. # Observability (/docs/learn/architecture/observability) # Observability [#observability] Instead of emitting fragmented logs, metrics, and traces, we generate high-cardinality, wide events (Spans) using **OpenTelemetry (OTel)**. These spans serve as the single source of truth for the health, performance, and behavior of the entire platform. ## Tracing Architecture [#tracing-architecture] We utilize W3C Trace Context headers to propagate traces across every service boundary, ensuring that identity and context are never severed from the symptom. * **App (Next.js):** Generates the root span for user interactions, authenticates via Clerk, and injects `clerk_user_id` into OTel Baggage as `enduser.id`. * **Core (Go):** Uses `otel/sdk/go` to trace HTTP handles, Postgres queries (via pgx), and Pub/Sub publishing. It automatically reads W3C Baggage from incoming requests and propagates it via Pub/Sub attributes. * **ML (Python):** Uses `opentelemetry-python` to extract spans and identity Baggage from incoming Pub/Sub messages, trace ML pipelines, and propagate context when calling Core. ### Span-Derived Metrics [#span-derived-metrics] We do **not** manually instrument and roll up traditional RED (Rate, Errors, Duration) metrics at runtime. Emitting isolated metrics destroys the context necessary for debugging. Instead, our system relies on dynamic aggregations of our wide spans. Because every span contains the exact duration, status code, and rich metadata (tenant IDs, roles), our observability backend continuously calculates and visualizes RED metrics derived directly from the trace stream. If an aggregate error rate spikes, engineers can simply click the spike to see the exact traces that generated it. ## Logging [#logging] To ensure structural consistency, all logs are written as structured JSON and natively integrate the OpenTelemetry context. * **Go Logging:** Implemented via `slog` with an OpenTelemetry handler. * **Python Logging:** Implemented via `structlog` naturally wrapping the OTel context. Every log emitted within the scope of a request automatically inherits the `trace_id` and `span_id`, allowing developers to find any application log by looking at its parent trace. ## Telemetry Destinations & Sampling [#telemetry-destinations--sampling] Our services act purely as OTLP (OpenTelemetry Protocol) emitters. They never communicate directly with the final observability storage backend. Data routing and sampling are centrally managed. ### Local Development (.NET Aspire) [#local-development-net-aspire] Locally, all services export OTLP data to the **.NET Aspire Dashboard**. 1. Run `./dev dash obs` (or start it automatically via `./dev start infra`). 2. Access the UI at [http://localhost:18888](http://localhost:18888). 3. You can view Traces, Metrics, and Structured Logs across all containers in real-time. Since `enduser.id` Baggage is propagated, you can search for a user's exact ID to trace their entire session timeline end-to-end. ### Production Pipeline & Tail-Based Sampling [#production-pipeline--tail-based-sampling] In production, SDKs do not push directly to Google Cloud. We deploy instances of the **OpenTelemetry Collector Gateway** to act as an intermediary buffer. Because we employ **Tail-Based Sampling** for financial responsibility, the Collector buffers the entire distributed trace. Once the trace is complete, the Collector executes our sampling rules: * **100% Sampling for Errors & High Latency:** If any span anywhere in the trace breaches our latency threshold or contains an error, the entire trace is preserved and exported to Google Cloud. * **5% Sampling for Happy Paths:** If the request succeeded without anomalies, we drop 95% of them at the Collector level to save ingest and storage costs without sacrificing visibility into system failures. # System Architecture (/docs/learn/architecture/overview) # System Architecture Overview [#system-architecture-overview] Wordloop is a localized, intelligence-first platform structured so that each service owns an isolated domain boundary, communicating through strictly typed, declarative contracts. ## High-Level Topology [#high-level-topology] ## Service Boundaries [#service-boundaries] The platform is decoupled into three primary execution domains: ### `wordloop-core` (Go) [#wordloop-core-go] The absolute system of record. Responsible for transactional orchestration, state management, Clerk webhook syncing, and exposing the primary REST API via [Huma](https://huma.rocks). * [Core Service Handbook](../services/core/index.md) ### `wordloop-ml` (Python) [#wordloop-ml-python] The async intelligence engine. Stateless, event-driven, and built on FastAPI. It consumes Pub/Sub events from Core, interfaces with external APIs (AssemblyAI), and uses a symmetric service token to push structured data back to Core. * [ML Service Handbook](../services/ml/index.md) ### `wordloop-app` (Next.js) [#wordloop-app-nextjs] The presentation layer built on React Server Components. Authenticates via Clerk and communicates with Core via Orval-generated API clients wrapped in a Next.js server-side proxy route. * [App Service Handbook](../services/app/index.md) ## Communication Patterns [#communication-patterns] The client–server data architecture follows the **[Optimistic Mutation with Echo-Suppressed Streaming](data-flow)** pattern: REST for writes, WebSocket for reads, with optimistic UI and source-aware echo suppression for multi-device sync. Contracts act as the sole source of truth. Hand-written API clients are forbidden. | Pattern | Mechanism | | ----------------------- | --------------------------------------------------------------------------------------------------- | | **Mutations (CUD)** | REST via Orval-generated clients. Optimistic UI with rollback. Next.js proxies to circumvent CORS. | | **Streaming Reads** | WebSocket pushes complete entity payloads on every state change. Echo suppressed via `X-Client-Id`. | | **Worker Dispatch** | GCP Pub/Sub utilizing strict AsyncAPI schemas for inter-service async work. | | **Internal Writebacks** | Internal REST calls authenticated via strict Service Tokens (ML → Core). | | **Identity Sync** | Webhooks from Clerk ingested to local Postgres `users` table. | See the dedicated documentation for [Authentication](auth.md), [Data Flow](data-flow), [Observability](observability.md), and [Hosting](infrastructure.md). # System Workflows (/docs/learn/architecture/system-workflows) # System Workflows [#system-workflows] This document outlines the vital data pipelines and chronological component interactions driving the Wordloop platform. ## 1. Unified Asynchronous Meeting Finalization [#1-unified-asynchronous-meeting-finalization] WordLoop utilizes a singular background processing pipeline capable of finishing *both* batch-uploaded raw audio files and finalizing severed/abandoned WebSocket Live meetings. By deferring all complex generation algorithms to the asynchronous `TranscriptionJobMessage`, Wordloop protects the stateful live recording connections from cascading OOM crashes while ensuring offline tasks natively self-heal broken streams. ## 2. Real-Time WebSocket Streaming [#2-real-time-websocket-streaming] The synchronous audio pipeline designed around high-availability, zero-in-memory-buffering, and multi-endpoint data dispersion. ## 3. Voice Context Pipelines [#3-voice-context-pipelines] Workflows for orchestrating speaker identity, embeddings, and context. Vector matching operations are computationally intensive. The frontend must expect varying latency when querying nearest neighbors. ## 4. AI Chat Context Orchestration [#4-ai-chat-context-orchestration] Retrieving meeting context for intelligent conversational RAG queries. # Concepts (/docs/learn/concepts) # Concepts [#concepts] A shared vocabulary is not a cosmetic concern. When every engineer on the team means the same thing by "segment," "synthesis," or "task," design conversations become faster and bugs become easier to describe. This page is the canonical glossary of the domain; use it when writing code, specs, or tests. ## Core entities [#core-entities] **Meeting** — the primary unit of work in Wordloop. A Meeting is a bounded session captured in the system, tied to a user, optionally attended by multiple People, and producing a Transcription, a MeetingSynthesis, and Tasks. The `meetings` table and the `/meetings` routes are the center of gravity for the entire platform. **Person** — a contact record representing an attendee of one or more Meetings. A Person carries identity fields (display name, email, title, company) and an optional voice model used to attribute TranscriptSegments to a speaker. People are distinct from Users — a User is someone with a Wordloop account; a Person is someone who appeared in a meeting, with or without an account. **Transcription** — the speech-to-text record attached to a Meeting. A Transcription aggregates TranscriptSegments as they are produced in near-real-time by the ML service and reaches a `completed` status when the meeting closes. **TranscriptSegment** — the atomic unit of the Transcription. Each segment carries a speaker label, the attributed Person (if matched), text, start and end timestamps, a confidence score, and a `is_final` flag. Most ML processing — embeddings, topic extraction, synthesis — operates over segments. **MeetingSynthesis** — the AI-generated summary attached to a Meeting. Contains a headline, a prose summary, key points, a list of Topics, and nested TalkingPoints. Produced by the ML service after the Transcription finalises; can be regenerated on demand. **Topic** — a thematic cluster extracted from a Meeting's segments. Topics carry a name, a summary, and the set of TranscriptSegments that contributed to them. A Meeting has many Topics; a Topic belongs to one Meeting. **TalkingPoint** — a specific point or claim within a Topic. TalkingPoints are the most granular unit of the MeetingSynthesis, surfaced in the recap UI as bullets under each Topic. **Task** — an action item extracted from a Meeting. Tasks are assignable, trackable, and hierarchical (via `parent_task_id`). They live beyond the Meeting itself and are the primary output a user acts on after review. Status values: `pending`, `in_progress`, `completed`, `canceled`. ## Supporting entities [#supporting-entities] **User** — a Wordloop account holder, identified via Clerk. A User has an associated Person record (the voice model and contact info for their own participation in meetings). JIT-provisioned on first sign-in. **Note** — a free-form annotation attached to any entity (`meeting`, `person`, `task`, etc.) via a polymorphic `subject_type` / `subject_id` pair. **Tag** — a label a user can apply to Meetings, People, or Tasks for organisation. ## Cross-cutting concepts [#cross-cutting-concepts] **Voice model** — the speaker-identification vector attached to a Person. The ML service matches incoming audio against stored voice vectors to attribute TranscriptSegments to a specific Person rather than an anonymous `SpeakerLabel`. Voice models are built incrementally from verified segments. **JIT provisioning** — "just-in-time" user creation. When a user signs in via Clerk for the first time, the Core API reads their Clerk profile and creates both the local User record and the corresponding Person record on demand. No webhooks, no seeding. **Echo suppression** — the mechanism by which a person's own outgoing audio is not re-ingested as incoming segments. A subtle but load-bearing piece of the real-time pipeline; see [Real-Time principles](/docs/principles/system-design/real-time) for the design model. ## Further reading [#further-reading] * [Architecture Overview](/docs/learn/architecture/overview) — how these entities are distributed across services. * [Data Flow](/docs/learn/architecture/data-flow) — the lifecycle of a segment, from microphone to synthesis. * [Reference / Glossary](/docs/reference/glossary) — the complete, link-resolvable vocabulary. # Platform Services (/docs/learn/services) # Platform Services [#platform-services] Wordloop is composed of four services, each with a distinct responsibility, language, and runtime. This section contains one handbook per service — how it is structured, what it owns, and how to work on it. The documentation site itself (`wordloop-docs`) is a fourth deployable but is treated as a piece of platform tooling rather than an application surface; it is documented via the [Reference](/docs/reference) and [Guides](/docs/guides) sections. # Runbooks (/docs/operations/runbooks) # Runbooks [#runbooks] A runbook is a script. It is written so a tired, stressed engineer can follow it at 3am and restore service without having to reason from first principles. Each runbook in this section targets a specific, recognisable failure symptom and walks through detection, diagnosis, mitigation, and recovery. ## Runbook authoring [#runbook-authoring] New runbooks are welcome — every incident we resolve should teach the team one. The template: ```markdown # Runbook: **Owner:** **Last tested:** YYYY-MM-DD **Pager rule:** ## Goal Restore when . ## Detection How to confirm this is the failure you are hitting. ## Diagnosis Fast checks to localise the fault. ## Mitigation Immediate actions to restore user-facing health. ## Recovery Steps to return to a fully healthy state. ## Rollback How to undo each state-changing step. ## Escalation When and whom to escalate to. ## Postmortem Link to the incident doc once one exists. ``` ## Available runbooks [#available-runbooks] *The catalogue is populated as real incidents drive new runbooks. Writing a runbook "just in case" is usually wasted effort; writing one in the follow-up from an actual incident captures the specific, sharp-edged lessons a generic version would miss.* See [On-Call](/docs/operations/on-call) for rotation logistics and [Troubleshooting](/docs/operations/troubleshooting) for exploratory diagnostic trees. # Agent-Native Systems (/docs/principles/ai-native/agent-native-systems) # Agent-Native Systems [#agent-native-systems] ## TL;DR [#tldr] AI agents read our APIs, our events, and our documentation programmatically. Building agent-native systems means designing every interface — contract, spec, doc page — so that an agent can consume it without a human translator in the loop. MCP for structured tool surfaces, `llms.txt` for discoverable documentation, stable error codes, rich OpenAPI examples — the pieces compose into a system agents can work inside. ## Why this matters [#why-this-matters] The organisation that takes agent-readiness seriously in 2026 gets a multiplier on every engineer's output. Agents write code faster, answer questions faster, and onboard faster when the systems they are working against are designed for them. The organisation that treats agent-readiness as an afterthought pays the cost in a constant low-grade friction: agents that need babysitting, outputs that need correction, onboarding that requires a human bootstrapping step for every task. The investment is modest; the return compounds. ## Our principles [#our-principles] ### 1. Every interface has a machine-consumable specification [#1-every-interface-has-a-machine-consumable-specification] HTTP endpoints have OpenAPI; events have AsyncAPI; documentation has `llms.txt` and `.md` exports; the tools an agent should use have MCP schemas. An interface without a machine-consumable spec is off-limits to agents by default. ### 2. Specifications include descriptions, examples, and constraints [#2-specifications-include-descriptions-examples-and-constraints] A spec that says a field is `string` without saying what the string represents is a spec an agent cannot use correctly. We write descriptions, give examples, enumerate finite domains, and state constraints explicitly. The standard is: a competent agent should be able to use the interface without reading the implementation. ### 3. MCP is our standard tool surface [#3-mcp-is-our-standard-tool-surface] When we want agents to interact with Wordloop beyond reading, we expose the capability through a Model Context Protocol server. Tools are typed, documented, and error-reporting; resources are typed and fetchable. A bespoke prompt-engineering integration is a deprecated pattern — MCP is the interop. ### 4. `llms.txt` and `.md` exports are shipped alongside docs [#4-llmstxt-and-md-exports-are-shipped-alongside-docs] Every docs site ships `llms.txt` (the index) and `llms-full.txt` (the consolidated corpus), plus a `.md` export for every page. Agents navigate the docs the same way a human would, but through a plain-text channel that does not require HTML parsing. ### 5. Error responses are structured, stable, and actionable [#5-error-responses-are-structured-stable-and-actionable] Every error carries a stable code, a human message, and machine-readable details. The code is catalogued in [Reference / Errors](/docs/reference/errors) and never renumbered. Agents branch on codes; they do not parse prose. This is the single highest-leverage API hygiene choice for agent-readiness. ### 6. Idempotency enables retry [#6-idempotency-enables-retry] Agents retry. Systems that penalise retry — duplicate records, doubled charges, phantom events — cannot be worked against reliably. Every write endpoint accepts an idempotency key ([API Design](/docs/principles/system-design/api-design)); every event consumer is de-duplicating ([Integration Patterns](/docs/principles/system-design/integration-patterns)). ### 7. Outputs are structured where it matters [#7-outputs-are-structured-where-it-matters] When an agent is producing a structured result — a database record, an API payload, a configuration fragment — we use schema-constrained generation (JSON schema, tool calling) rather than free-text-then-parse. Free-text parsing is how agent pipelines become brittle. ### 8. Documentation is reviewed for agent consumption [#8-documentation-is-reviewed-for-agent-consumption] When we write a page, we ask: would an agent reading this through MCP understand what to do? If the page assumes visual hierarchy, colour, or context that does not survive serialisation, we re-shape it. Agent-readiness is a docs quality attribute, not a separate track of work. ## How we apply this [#how-we-apply-this] * [/llms.txt](/llms.txt) and [/llms-full.txt](/llms-full.txt) — the canonical entry points. * The MCP server at `scripts/mcp-server.ts` — the current tool and resource surface. * [API Design](/docs/principles/system-design/api-design) — the OpenAPI discipline that makes our APIs agent-consumable. * [Documentation](/docs/principles/foundations/documentation) — the dual-audience docs stance. ## Anti-patterns we reject [#anti-patterns-we-reject] * **Auth flows that require human interaction.** A consent screen with a "click here" button is a dead end for an automated client. Design auth that supports programmatic token issuance. * **Prose-only error responses.** `"something went wrong"` is unusable by any automated caller. * **Undocumented "internal" APIs.** An API without a spec is an API that agents cannot use — which means humans will be asked to do the thing an agent should be doing. * **MCP tools that wrap everything.** An MCP server that mirrors every endpoint in the API is noise. Expose the capabilities agents actually need, named in the agent's vocabulary. * **Documentation that leans on rendered visuals.** An architecture diagram nobody can parse from Markdown is a diagram an agent cannot read. Prefer Mermaid source in the Markdown. ## Further reading [#further-reading] * *Model Context Protocol* ([modelcontextprotocol.io](https://modelcontextprotocol.io)) — the canonical MCP specification. * *llms.txt specification* ([llmstxt.org](https://llmstxt.org)) — the dual-audience docs convention. * *OpenAPI Specification* ([openapis.org](https://www.openapis.org)) — the HTTP contract format. * *Anthropic's agent engineering posts* — practical patterns for building agents against real APIs. * *Simon Willison's blog* ([simonwillison.net](https://simonwillison.net)) — ongoing, practical commentary on the state of tooling. # AI Engineering (/docs/principles/ai-native/ai-engineering) # AI Engineering [#ai-engineering] ## TL;DR [#tldr] AI engineering is software engineering with a non-deterministic component in the loop. We treat prompts as code, evaluations as tests, context as a first-class design surface, and agents as distributed systems. The discipline is about making probabilistic systems behave predictably enough to ship. ## Why this matters [#why-this-matters] Every team that has tried to ship an AI feature has learned the same lesson the hard way: the part that feels like magic in a demo is the part that fails in unpredictable ways in production. The gap between "it works in the playground" and "it works for every user, every day" is where AI engineering happens. The discipline treats the non-determinism as an engineering problem — measurable, testable, and addressable — rather than as an inherent limitation to shrug at. ## Our principles [#our-principles] ### 1. Prompts are code [#1-prompts-are-code] Prompts live in version control, are reviewed, are tested, and are versioned. A prompt change is a code change; it ships through the same PR review as any other change. "We tweaked the prompt in the dashboard" is how a team loses the ability to reason about its own AI behaviour. ### 2. Evals are tests [#2-evals-are-tests] Every meaningful AI behaviour has an eval: a scored comparison of model output against a reference. Evals run in CI; thresholds are committed; regressions block merge the same way unit-test failures do. Without evals, "did we make the model worse?" is unanswerable, which means every improvement is also a potential regression you will discover from users. ### 3. Context is the interface [#3-context-is-the-interface] The content of the context window — what system prompt, what few-shot examples, what retrieved documents, what tool outputs — is the single biggest lever on model behaviour. We design it deliberately, measure its token budget, and treat it as a first-class interface. "Throw in everything relevant" is the anti-pattern that blows up the bill and dilutes the signal. ### 4. Retrieval matters more than the model [#4-retrieval-matters-more-than-the-model] For most RAG systems, the retrieval layer determines the ceiling. A clever model with bad retrieval gives confident nonsense; a boring model with good retrieval gives boring, correct answers. We invest in the retrieval quality — indexing, ranking, reranking, chunk boundaries — before we invest in the model choice. ### 5. Model outputs are validated at the boundary [#5-model-outputs-are-validated-at-the-boundary] Every model output that crosses into code is validated: shape, length, content, and expected enumerations. Parse failures are handled explicitly, not allowed to propagate. A model output flowing into business logic without validation is an injection vector waiting to happen. ### 6. Agents are distributed systems [#6-agents-are-distributed-systems] An agent loop — model plans, model takes action, agent observes, model re-plans — has all the problems of a distributed system: retries, idempotency, timeouts, failure isolation. We apply the same patterns ([Integration Patterns](/docs/principles/system-design/integration-patterns)): bounded retries, circuit breakers, auditable history. The hardest agent failures are system failures, not model failures. ### 7. Cost is part of the evaluation [#7-cost-is-part-of-the-evaluation] A prompt that is 10% better but 5× more expensive is not obviously better. Evals track quality, latency, *and* cost, and decisions about which configuration to ship consider all three. Cost-unaware evaluation is how an AI feature becomes a cost incident after launch ([Cost Engineering](/docs/principles/delivery/cost-engineering)). ### 8. Human oversight is designed in [#8-human-oversight-is-designed-in] For high-stakes AI outputs — a recap that a user will act on, an automated action taken on behalf of a user — we design the review point deliberately. The human reviewer gets a summary, not a wall of text; the review UX is built alongside the AI feature, not retrofitted. "Let the model do it" without a review loop is a promise the model will eventually fail. ## How we apply this [#how-we-apply-this] * [ML Systems](/docs/principles/stack/ml-systems) — the implementation principles for the Python ML service. * [Agent-Native Systems](/docs/principles/ai-native/agent-native-systems) — the flip side, making our interfaces consumable by agents. * [Observability](/docs/principles/quality/observability) — the trace surface for model calls. * [Testing](/docs/principles/foundations/testing) — the broader testing discipline evals sit inside. ## Anti-patterns we reject [#anti-patterns-we-reject] * **"The model will figure it out."** Hope is not a design. * **Prompts as configuration.** Untracked prompts drift silently, and evals cannot catch drift they are not told about. * **Over-stuffed context windows.** Throwing the kitchen sink at the model is usually how quality *decreases*. * **Skipping evals "this once."** This once becomes always. Evals compound when you have them and compound against you when you do not. * **Agent loops without termination.** A loop without a clear exit condition is how a runaway agent becomes a runaway bill. * **Deterministic reasoning on top of probabilistic output.** If you need a number, ask for a number in a structured schema. Do not regex-extract it from prose. ## Further reading [#further-reading] * *Prompt Engineering Guide* ([promptingguide.ai](https://www.promptingguide.ai)) — the practitioner's summary of current patterns. * *Evaluating and Reinforcing LLM Behaviors*, Shreya Shankar et al. — the academic grounding for eval design. * *Anthropic's Building Effective Agents* — the reference for agent architecture patterns. * *Context Engineering* (Shopify, 2024; see public writeups) — the emerging discipline that elevates context design to first-class engineering. * *A Survey on Retrieval-Augmented Generation*, multiple authors — RAG ground truth. # AI-Native (/docs/principles/ai-native) # AI-Native [#ai-native] Wordloop is AI-native in two directions: the product runs on AI (transcription, recap, embedding), and the team builds with AI (agents write substantial code, read documentation programmatically, and contribute to reviews). Both directions demand a stance — on how models are integrated, how agents consume our interfaces, and how we keep a human on the hook for outcomes. Related reading: [Hexagonal Architecture](/docs/principles/system-design/hexagonal-architecture) — the structural choice that, more than any other, determines how effectively an agent can contribute to a codebase. # Cost Engineering (/docs/principles/delivery/cost-engineering) # Cost Engineering [#cost-engineering] ## TL;DR [#tldr] Cost is a non-functional requirement with a dashboard and a dollar sign. Every significant architectural decision considers cost-per-user and cost-per-call; every service has a budget it lives inside; surprising spend is an incident. FinOps is how we stay honest about the economics of running what we build. ## Why this matters [#why-this-matters] Most teams discover cost too late — after a quarterly bill raises eyebrows in a meeting. By then, the decisions that drove the cost are in production, have consumers, and are expensive to reverse. Cost engineering is the discipline of making the economic consequences of decisions visible at the point of the decision. It turns cost from a finance concern into an engineering variable. ## Our principles [#our-principles] ### 1. Cost is a first-class metric [#1-cost-is-a-first-class-metric] Cost-per-call, cost-per-user, cost-per-feature — all tracked alongside latency and error rate. A feature's success includes its unit economics, not just its engagement numbers. A team that does not know what its features cost cannot reason about trade-offs that matter. ### 2. Budgets are set and defended [#2-budgets-are-set-and-defended] Every significant service runs inside a cost budget. The budget is set at design time, reviewed monthly, and treated as a commitment. Exceeding budget triggers the same response as exceeding any other SLO: investigate, remediate, or explicitly negotiate an increase. ### 3. Autoscaling is designed, not enabled [#3-autoscaling-is-designed-not-enabled] Autoscaling is a tool with sharp edges. Aggressive autoscaling on a bursty workload can multiply cost without improving user experience; conservative autoscaling on a steady workload wastes headroom. Each scaling policy is tuned per workload with the production load profile in mind, not set to vendor defaults and left. ### 4. Cheap queries beat fast queries [#4-cheap-queries-beat-fast-queries] The fastest query is the one that does not run. We cache what we can, compute what we must, and denormalise when the read-to-write ratio justifies it. A cheap-and-fast query is a rare combination; when they conflict, the cheap version is usually the right default. ### 5. Egress is expensive; plan for it [#5-egress-is-expensive-plan-for-it] Cloud provider egress is the most mispriced line item in most bills. Inter-region chatter, chatty logs, bulky screenshots uploaded constantly — these add up. We place data where its consumers are, batch where we can, and compress where it is cheap to do so. ### 6. AI spend has the same discipline [#6-ai-spend-has-the-same-discipline] Every model call has a measured cost and a caching strategy. Prompts are versioned with token-count measurement; expensive prompts are justified by value. "Just pass the whole context to the largest model" is how an AI feature becomes a cost incident ([ML Systems](/docs/principles/stack/ml-systems)). ### 7. Reservations and commits where they pay [#7-reservations-and-commits-where-they-pay] For predictable baseline workloads, reserved instances and committed-use discounts save 30-50% over on-demand. The discipline is to match the reservation to the baseline — over-reserving locks us in, under-reserving wastes the committed spend. ### 8. FinOps is a practice, not an office [#8-finops-is-a-practice-not-an-office] Cost engineering is something every team does, not a team that does it on behalf of others. The central function provides tooling and visibility; the distributed decisions are made by the teams that built the spend. ## How we apply this [#how-we-apply-this] * [Observability](/docs/principles/quality/observability) — the measurement substrate for cost per unit. * [ML Systems](/docs/principles/stack/ml-systems) — the cost discipline for model calls. * [Platform](/docs/principles/delivery/platform) — the shared infra that every team's cost sits on. * [Performance](/docs/principles/quality/performance) — cheap code is often also fast code. ## Anti-patterns we reject [#anti-patterns-we-reject] * **"We will optimise cost later."** Later never comes; the architecture is what it is by then. * **Autoscale-and-forget.** Default autoscaling on a workload you have not profiled is how you get a thousand-dollar day. * **Chatty logs forever.** Unstructured debug logs at volume are a non-trivial line on the bill. * **AI calls without budget.** Model spend without a measured cost-per-request grows silently until it does not. * **"It's just pennies."** Pennies × N × daily = a real number. Track it. ## Further reading [#further-reading] * *Cloud FinOps*, Storment & Fuller — the canonical text on cross-functional cost management. * *The Cost of Complexity*, Frederic Lardinois (various articles) — the essays on why complex architectures cost more than they appear. * *AWS Well-Architected Framework — Cost Optimization pillar* — applicable beyond AWS, useful as a checklist. * *FinOps Foundation framework* ([finops.org](https://finops.org)) — the practitioner's handbook. # Developer Experience (/docs/principles/delivery/devex) # Developer Experience [#developer-experience] ## TL;DR [#tldr] A team ships as fast as its feedback loop lets it. We invest deliberately in the inner loop — the seconds between a code change and the evidence that the change works — because every second saved there is paid back a thousand times over across the team. `./dev` is our golden path, DORA metrics are how we measure the loop, and friction in the loop is an engineering bug. ## Why this matters [#why-this-matters] The single largest predictor of a team's output, over months and years, is the quality of its feedback loop. A team that sees the result of a change in five seconds ships more and ships better than a team that sees it in five minutes — not because the individuals are smarter, but because the loop of hypothesis-and-test runs an order of magnitude more often. Developer experience is not a perk; it is an engineering lever. ## Our principles [#our-principles] ### 1. The inner loop is sacred [#1-the-inner-loop-is-sacred] The inner loop is the sequence from "I think this code will work" to "yes or no, here is the evidence." We invest in making this loop as short as it can be: incremental compilation, test selection, hot reload, one-command bootstrapping, fast linting. Every second shaved off the inner loop multiplies across every engineer, every day. ### 2. `./dev` is the single entry point [#2-dev-is-the-single-entry-point] Every local task — start, stop, test, lint, migrate, deploy, generate — runs through `./dev`. One command to remember, one tool to teach a new engineer, one surface to improve. Proliferating ad-hoc scripts in `Makefile`, `package.json`, and `bin/` is how a developer experience becomes a treasure hunt. ### 3. Golden paths, not mandatory paths [#3-golden-paths-not-mandatory-paths] The golden path is the well-trodden, well-supported way to do a common task. It is the default, and it is the path new engineers and agents follow by default. Deviation is allowed when a task genuinely does not fit, but the deviator pays the cost of their own tooling. Golden paths concentrate investment; mandatory paths breed resentment. ### 4. DORA metrics keep us honest [#4-dora-metrics-keep-us-honest] Deployment frequency, lead time for changes, change failure rate, mean time to recover — the four DORA metrics are how we measure whether the delivery system is healthy. We track them, surface them, and react to them. A regression in any one of the four is a signal to invest in the loop. ### 5. Onboarding time-to-first-value is a design target [#5-onboarding-time-to-first-value-is-a-design-target] A new engineer should reach their first local contribution — "I changed something and I can see the change" — in their first day. A new service should reach its first deploy in the first week. These are targets we hold ourselves to, and regressions here are treated as bugs. ### 6. Documentation is part of the loop [#6-documentation-is-part-of-the-loop] A command you cannot find is a command you do not use. Every `./dev` subcommand has a reference entry, every golden path has a guide, every service has a handbook. The documentation exists so the loop does not depend on tribal memory. ### 7. Local environments match production shape [#7-local-environments-match-production-shape] The local stack uses the same Postgres version, the same Pub/Sub contract, the same container runtime. "It works on my machine" is eliminated by eliminating the gap between the machines. Emulation over mocks ([Testing](/docs/principles/foundations/testing)) applies here too. ### 8. Friction is filed as a bug [#8-friction-is-filed-as-a-bug] If a process is painful, that pain is a bug. File it, prioritise it, fix it. "Everyone deals with it" is how chronic friction becomes chronic velocity loss. The developer experience team — or whoever is the local maintainer of `./dev` — owns the backlog the same way a product team owns its user-bug backlog. ## How we apply this [#how-we-apply-this] * [CLI Reference](/docs/reference/cli) — the surface of `./dev`. * [Quickstart](/docs/start/quickstart) — the first-contact experience we measure. * [Platform](/docs/principles/delivery/platform) — the broader internal platform `./dev` is a part of. * [Progressive Delivery](/docs/principles/delivery/progressive-delivery) — the outer loop the inner loop feeds into. ## Anti-patterns we reject [#anti-patterns-we-reject] * **"Follow the README and read between the lines."** Onboarding that depends on tacit knowledge is not onboarding. * **Five CLIs for five tasks.** `./dev` is one. A second CLI earns its existence by solving a problem `./dev` cannot. * **Skip-the-test culture.** Fast-but-unreliable tests are worse than slow-reliable tests. The inner loop is made fast by honest investment, not by cheating. * **DORA theatre.** Tracking the metric while not responding to it is worse than not tracking it at all. * **Ignoring friction.** If you find a sharp edge, file the ticket. Do not route around it silently. ## Further reading [#further-reading] * *Accelerate*, Forsgren, Humble, Kim — the empirical foundation for DORA metrics. * *The DevOps Handbook*, Kim et al. — the full treatment of the inner-and-outer loop view. * *Team Topologies*, Skelton & Pais — the organisational side of platform and golden paths. * *Developer Experience: Concept and Definition* (Fagerholm & Münch, 2012) — the academic framing that predates the modern DevEx term. # Delivery (/docs/principles/delivery) # Delivery [#delivery] Delivery is the discipline of turning code into running software that users can feel. The pages in this section describe the four practices that determine whether our delivery loop is a source of leverage or a source of toil: developer experience, progressive delivery, platform engineering, and cost engineering. # Platform (/docs/principles/delivery/platform) # Platform [#platform] ## TL;DR [#tldr] The platform is the substrate every application team builds on: the local stack, the CI/CD pipeline, the observability collector, the secrets manager, the IDP that fronts all of it. We treat the platform as a product — it has users (us), a backlog, a quality bar, and explicit investment. A good platform makes the right thing the easy thing. ## Why this matters [#why-this-matters] Every team in a multi-service organisation eventually arrives at the same realisation: the biggest drag on productivity is not the code the team writes, but the accumulated friction of the common plumbing every project has to assemble. A platform that handles the plumbing well turns that friction into a paved road. A platform that does not becomes a tax every project pays repeatedly. The quality of the platform is a direct multiplier on the output of every engineer on top of it. ## Our principles [#our-principles] ### 1. Platform is a product, with users and a roadmap [#1-platform-is-a-product-with-users-and-a-roadmap] The people who build the platform have explicit users — the application engineers — and treat their work as a product: backlog, priorities, measurement, feedback. A platform maintained "when we have time" decays; a platform treated as product investment compounds. ### 2. Self-service is the goal [#2-self-service-is-the-goal] Every common task — spinning up a new service, requesting a secret, adding an OTel dashboard, changing a feature flag — should be self-service. When an application team has to file a ticket and wait for the platform team, the platform is the bottleneck. Self-service is the acid test. ### 3. Golden paths over policy [#3-golden-paths-over-policy] We paved specific paths — how to create a service, how to deploy, how to observe — and we make those paths the easiest route. Policy documents without paved paths produce compliance in shape but drift in substance. ### 4. `./dev` is the platform's front door [#4-dev-is-the-platforms-front-door] For local workflows, `./dev` is the abstraction over every underlying tool: Docker, pnpm, uv, Air, migrate. The platform team maintains `./dev`; application teams use it without needing to know what is under it. See [DevEx](/docs/principles/delivery/devex). ### 5. One paved-road CI pipeline [#5-one-paved-road-ci-pipeline] One pipeline definition for every Go service; one for every Python service; one for every TypeScript service. Teams that deviate earn the cost of maintaining their own pipeline. This is how we prevent snowflake CI configurations from accumulating. ### 6. Observability is part of the platform [#6-observability-is-part-of-the-platform] Traces, metrics, and logs flow through the same collector, into the same backend, onto the same dashboards. Observability set up by each team independently ([Observability](/docs/principles/quality/observability)) is observability broken in five different ways. ### 7. The platform gets the same scrutiny as the product [#7-the-platform-gets-the-same-scrutiny-as-the-product] Platform code is reviewed, tested, versioned, and deployed the same way product code is. A broken platform release can hurt every team at once, so the bar is actually higher. "It is just tooling, ship it" is how a platform becomes an obstacle. ### 8. Measure what the users feel [#8-measure-what-the-users-feel] Platform success is measured by the application teams' outcomes — DORA metrics, onboarding time, number of tickets filed against the platform. Not by the platform team's own output metrics, which can be excellent while the users are miserable. ## How we apply this [#how-we-apply-this] * [CLI Reference](/docs/reference/cli) — the `./dev` surface. * [DevEx](/docs/principles/delivery/devex) — the developer-facing experience the platform enables. * [Observability](/docs/principles/quality/observability) — the centralised telemetry substrate. * [Progressive Delivery](/docs/principles/delivery/progressive-delivery) — the CI/CD pipeline as a platform service. ## Anti-patterns we reject [#anti-patterns-we-reject] * **Platform-as-gatekeeper.** A platform that says "no" more than it says "self-serve" is a bottleneck, not a platform. * **Five ways to do one thing.** Historical pipelines that nobody cleaned up. The platform should consolidate. * **Tooling that only the platform team can use.** If the API requires insider knowledge, the tool is incomplete. * **"Platform investment later."** The platform is either invested in or decaying; there is no steady state. * **Metrics for the platform's sake.** Measuring "tickets closed by platform team" without measuring application-team outcomes misses the point. ## Further reading [#further-reading] * *Team Topologies*, Skelton & Pais — the canonical framing of platform teams and enabling teams. * *Platform Engineering on Kubernetes*, Mauricio Salatino — the practical engineering view. * *The DevOps Handbook*, Kim et al. — the broader cultural context the platform sits inside. * *Backstage documentation* ([backstage.io](https://backstage.io)) — the archetype of an internal developer portal. # Progressive Delivery (/docs/principles/delivery/progressive-delivery) # Progressive Delivery [#progressive-delivery] ## TL;DR [#tldr] Progressive delivery is how we decouple the act of deploying code from the act of releasing a feature. We ship to production multiple times a day from a single branch, but users see changes only when we open a flag, route a canary, or promote a cohort. The production environment is stable; the user experience is controlled independently. ## Why this matters [#why-this-matters] The reason most teams avoid shipping often is that shipping carries risk — a bad deploy can break production for every user at once. Progressive delivery breaks the link. A deploy puts the code into production. A release makes the code reach users. With the two decoupled, deploys become small, frequent, and boring; releases become observable, controllable, and reversible. That asymmetry is how modern teams sustain a fast release cadence without a proportional rate of incidents. ## Our principles [#our-principles] ### 1. Trunk-based development with short-lived branches [#1-trunk-based-development-with-short-lived-branches] Every change lands on `main` as soon as it is ready. Branches measured in days, not weeks. Long-lived branches are how integration bugs accumulate quietly; trunk-based development surfaces them constantly, which makes them cheap to fix. ### 2. Deploy on every merge [#2-deploy-on-every-merge] Main is always deployable, and we deploy from it continuously. A merged PR reaches production within the deploy window — not hours or days later. This is enforced by automation; a team that relies on a human "release engineer" has already lost the bet on cadence. ### 3. Feature flags separate deploy from release [#3-feature-flags-separate-deploy-from-release] A new feature is deployed behind a flag, defaulted off. The flag state decides who sees the feature — nobody, internal users, a cohort, everyone. A bad feature is disabled without a redeploy; a controversial feature is rolled to 1% before 100%. Flags are a core primitive, not a third-party dependency. ### 4. Canary before promote [#4-canary-before-promote] Every release that could affect latency, reliability, or user experience goes through a canary — a small fraction of traffic for a bounded window — before promoting. Canary signals (error rate, p99 latency, user journey success) are automated comparisons, not eyeballs on a dashboard. ### 5. Release is reversible, cheaply [#5-release-is-reversible-cheaply] Every release has a rollback path that can be executed in a few minutes by any on-call engineer. Database migrations are designed reversibly ([Migrate the Schema](/docs/guides/migrate-schema)); flags can be flipped; canaries can be re-routed. "We can't roll that back" is a red flag on the release itself. ### 6. Flag hygiene is continuous [#6-flag-hygiene-is-continuous] Flags are an asset and a debt. A long-lived flag that nobody remembers the purpose of is a drag on every future change. Every flag has an owner, a purpose, and an expiry date; stale flags are removed in the normal course of work. ### 7. Observability defines "healthy" [#7-observability-defines-healthy] A release is healthy when the relevant user-journey SLOs are within tolerance ([Reliability](/docs/principles/quality/reliability)). Not when CPU is low, not when memory is steady — when users' journeys are succeeding at the rate they did before. The canary is evaluated against SLO burn rates. ### 8. The release story is the same for every service [#8-the-release-story-is-the-same-for-every-service] One rollout model, one flag system, one canary pattern. Different services with different release mechanics multiply cognitive load and reduce the effectiveness of the on-call engineer. Consistency is a force multiplier. ## How we apply this [#how-we-apply-this] * [DevEx](/docs/principles/delivery/devex) — the inner loop that feeds into continuous delivery. * [Reliability](/docs/principles/quality/reliability) — the SLO surface that gates canary promotion. * [Observability](/docs/principles/quality/observability) — the signal layer for release health. * [Deploy](/docs/guides/deploy) — the canonical deploy workflow. ## Anti-patterns we reject [#anti-patterns-we-reject] * **Release trains.** Batching up a month of changes and shipping them on Friday is how you get a huge, unreviewable deploy that breaks in ways nobody can localise. * **Flags without expiry.** A flag that has been "temporary" for a year is permanent — and a permanent decision hidden inside a runtime config. * **Canary-by-eyeball.** Promoting because the graph "looks fine" is a coin flip. Automate the comparison. * **"We will test it in staging."** Staging has no users. A canary in production is the only test of production behaviour. * **Commit-and-hope.** No canary, no flag, deploy to 100%. You will find out in the morning. ## Further reading [#further-reading] * *Accelerate*, Forsgren, Humble, Kim — the data on trunk-based development and its outcomes. * *Continuous Delivery*, Humble & Farley — the canonical treatment of the release pipeline. * James Governor, *Progressive Delivery* (RedMonk, 2018) — the essay that named the practice. * *The Release It! Second Edition*, Michael Nygard — the stability-pattern view of rollout. # Code Craft (/docs/principles/foundations/code-craft) # Code Craft [#code-craft] ## TL;DR [#tldr] Code is read far more than it is written. Our craft is to write code that the next reader — human or agent — can understand, change, and delete with confidence. Simplicity is the default; abstraction is a cost that must be earned. ## Why this matters [#why-this-matters] In a codebase that is alive for more than a year, the dominant cost is not writing code — it is understanding the code already there so you can change it. Every abstraction, every layer of indirection, every "flexible" interface is a tax on future readers. Our stance is that taxes must be justified. When we optimise for future flexibility we have not yet needed, we pay a certain cost today against an uncertain benefit later; more often than not, the benefit never arrives and we are left with the cost. ## Our principles [#our-principles] ### 1. Simpler is better than clever [#1-simpler-is-better-than-clever] A function that a tired engineer can understand in thirty seconds is worth more than a function that demonstrates the author's taste in type systems. Prefer plain data structures over clever abstractions, plain control flow over meta-programming, plain naming over in-joke naming. When "clever" and "clear" conflict, clear wins. ### 2. No speculative abstraction [#2-no-speculative-abstraction] Do not build a generalisation until you have at least three concrete use cases driving the same shape. Premature abstractions are harder to change than the duplication they replace — because now you have to understand the abstraction, the use cases, and the compatibility between them before you can change any of them. Three similar lines of code is almost always better than a half-designed helper. ### 3. Deletion is a virtue [#3-deletion-is-a-virtue] The code you delete cannot break, cannot require maintenance, cannot confuse the next reader, and cannot leak a vulnerability. When a feature is removed, the code should go with it — including the tests, the config flags, and the docs. Leaving dead code "just in case" is a bet that is almost always wrong: if we need it back, we will write a clearer version with the benefit of hindsight. ### 4. Names are the interface [#4-names-are-the-interface] A badly named function is a broken interface even if its behaviour is correct, because every caller has to read the implementation to know what it does. We spend time on names. We rename aggressively when a better name becomes clear. Variables, functions, types, files, directories — all of them communicate, and a mismatch between name and behaviour is a bug. ### 5. Comments explain the "why" [#5-comments-explain-the-why] Code explains the "what" — the comment is redundant. Names explain the "who" and "where." The only thing left for a comment is the "why": the non-obvious constraint, the invariant that must hold, the bug that drove an odd choice, the reference to an ADR. If a comment would be obvious to anyone who read the surrounding code, it is noise. ### 6. Error handling is design, not decoration [#6-error-handling-is-design-not-decoration] Errors are a first-class part of the interface, not an afterthought. We decide — explicitly — which errors a function can return, how callers are expected to respond, and where the boundary between recoverable and fatal is. `err != nil` sprinkled through a codebase without a model behind it is a failure of design. ### 7. Trust the boundary; distrust the internal [#7-trust-the-boundary-distrust-the-internal] We validate at system boundaries — user input, external APIs, message payloads — where the data is untrusted. We do not re-validate between internal callers in the same service; if an internal contract is wrong, the right fix is the contract, not a runtime check in every consumer. Defensive programming inside the trust boundary is a form of noise. ### 8. Dead code is a bug [#8-dead-code-is-a-bug] Commented-out code, `_unused` variables, orphan functions, legacy configuration — all of it decays the signal-to-noise ratio of the codebase. When we find it, we delete it. `git` preserves anything we lose; the working tree should contain only code that is alive today. ## How we apply this [#how-we-apply-this] * [Hexagonal Architecture](/docs/principles/system-design/hexagonal-architecture) — the structural discipline that makes simplicity scalable. * [Testing](/docs/principles/foundations/testing) — tests that exercise behaviour keep refactoring cheap. * [Go Services](/docs/principles/stack/go-services) — the idioms that keep our Go code readable. * [Frontend](/docs/principles/stack/frontend) — the conventions that keep our React code readable. * [Decisions](/docs/decisions) — the ADRs that capture the "why" our comments do not. ## Anti-patterns we reject [#anti-patterns-we-reject] * **Defensive programming without a threat model.** Guarding every internal call against nil is not robustness — it is distrust of our own type system. * **"Might need it later" scaffolding.** Config flags for scenarios that do not exist, plugin systems with one plugin, interfaces with one implementation. Delete. * **Fashion-driven refactors.** Rewriting working code to match a new pattern the team read about this week is debt, not progress. * **Multi-paragraph docstrings.** If the function needs a multi-paragraph docstring to be understood, the function is wrong. Split it, rename it, or simplify it — then the docstring is not needed. * **Backwards-compatibility shims for internal APIs.** If it is fully internal, changing it is allowed and expected; compatibility layers are debt we impose on ourselves for no benefit. ## Further reading [#further-reading] * *A Philosophy of Software Design*, John Ousterhout — deep-module principle, the cost of shallow abstractions. * *Tidy First?*, Kent Beck — the economics of refactoring as a separable activity. * *The Pragmatic Programmer*, Hunt & Thomas — the canonical treatment of names, duplication, and orthogonality. # Documentation (/docs/principles/foundations/documentation) # Documentation [#documentation] ## TL;DR [#tldr] Documentation is an active product surface. Wordloop docs are the canonical source for durable engineering knowledge; agent skills are the execution layer that selects, loads, and applies that knowledge safely. We design documentation for humans and AI agents at the same time, organise it with Diátaxis, expose it through `llms.txt`, Markdown exports, and MCP, and enforce freshness with automation wherever a human would drift. ## Why this matters [#why-this-matters] In 2026, documentation is part of the runtime environment for engineering work. A human reads the site through navigation and search; an agent reads the same knowledge through MCP resources, `llms.txt`, `llms-full.txt`, and per-page Markdown exports. If those surfaces disagree, the system teaches different readers different truths. That is not a documentation problem; it is an engineering defect. The operating model is simple: **docs hold the knowledge, skills control the agent behaviour**. Durable guidance belongs in the docs site where humans and agents can inspect it. Skill files stay concise and directive: they define when to trigger, what context to load, which tools to use, and which safety checks must run. This keeps prompts lean, reduces duplicated policy, and gives us one canonical place to correct factual drift. ## Our principles [#our-principles] ### 1. Documentation is canonical knowledge [#1-documentation-is-canonical-knowledge] Architecture principles, service handbooks, workflow guides, glossary terms, ADRs, API references, and generated schemas belong in the docs site. A skill may point to these pages, but it does not become the source of truth for material that humans also need to understand. ### 2. Skills are the agent execution layer [#2-skills-are-the-agent-execution-layer] Agent skills are a control surface, not a second documentation site. A skill owns triggering, task routing, tool use, safety constraints, verification steps, and context-loading instructions. It should say, for example, "read the App service handbook before changing `wordloop-app` data fetching," not duplicate the handbook in full. ### 3. AI-native documentation is first class [#3-ai-native-documentation-is-first-class] Every important documentation surface must survive machine consumption. We publish `llms.txt` as the curated index, `llms-full.txt` as the consolidated corpus, `.md` exports for individual pages, and MCP resources for structured retrieval. Agent-readiness is not an afterthought or an SEO trick; it is a quality attribute of the docs system. ### 4. Diátaxis is the structural frame [#4-diátaxis-is-the-structural-frame] We organise by reader intent, not by our internal org chart. Tutorials teach, how-to guides solve, reference pages support lookup, and explanation pages build understanding. A page that mixes these jobs forces both humans and agents to infer the purpose from context, which makes retrieval weaker and maintenance harder. ### 5. Active docs replace passive docs [#5-active-docs-replace-passive-docs] A page is not "done" when it is written. Active docs declare ownership, review cadence, freshness status, and source-of-truth boundaries. Pages that age past their review window are visibly flagged and reviewed as part of normal engineering work, not as a cleanup project. ### 6. Automation is the first reviewer [#6-automation-is-the-first-reviewer] Automated checks enforce the cheap, high-signal rules: required frontmatter, broken internal links, stale review dates, invalid skill-to-doc references, stale generated corpora, and known version mismatches. Humans review accuracy, judgment, and usefulness. Automation handles the facts it can verify without fatigue. ### 7. Prefer generated reference over prose [#7-prefer-generated-reference-over-prose] API specs, event contracts, database schemas, CLI command tables, and error catalogues have machine-readable sources. We render them from those sources instead of hand-writing reference pages. Hand-written reference material drifts; generated reference material can be rebuilt and checked. ### 8. Decisions are append-only [#8-decisions-are-append-only] Hard-to-reverse decisions live in ADRs. Accepted ADRs are not edited to match current preference; they are superseded. Each ADR carries enough consequence and debt context for a future reader to understand why the decision existed, what it cost, and when to revisit it. ### 9. Metadata interoperability matters [#9-metadata-interoperability-matters] Formal documentation standards are useful when they sharpen interoperability discipline. ISO/PAS 25955:2026 is a Publicly Available Specification for Data Documentation Initiative interoperability, not a generic agent-documentation linking standard. The lesson we apply is precise metadata, stable identifiers, and explicit relationships between documentation objects. For agent discovery specifically, Wordloop uses `llms.txt`, Markdown exports, MCP resources, and HTTP `Link` headers. ### 10. Drift is corrected at the source [#10-drift-is-corrected-at-the-source] When code, docs, skills, specs, and design records disagree, we identify the source of truth before editing. Code and generated contracts win for shipped runtime behaviour. ADRs win for historical decisions. Active design docs win for current delivery intent until the shipped system proves otherwise. Skills win for agent execution behaviour only. ## Freshness model [#freshness-model] | Surface | Review window | Freshness rule | | ----------------------- | -----------------------------------: | ----------------------------------------------------------------------------------- | | Principles | 6 months | Review when operating model or engineering policy changes. | | Service handbooks | 3 months | Review when code structure, stack versions, commands, or service boundaries change. | | API and event reference | Every contract change | Generated from OpenAPI and AsyncAPI sources. | | Runbooks | 3 months | Review after incidents, operational changes, or ownership changes. | | Active bet and TDD docs | Every material implementation change | Keep design intent aligned with delivery reality. | | Delivered bet docs | Historical | Freeze except for explicit correction notes. | | ADRs | Historical | Supersede instead of rewriting accepted records. | | Agent skills | Every skill or mapped docs change | Validate trigger logic, context routing, and verification steps. | See [Documentation Freshness](/docs/operations/documentation-freshness) for the operational policy. ## How we apply this [#how-we-apply-this] * [llms.txt](/llms.txt) and [llms-full.txt](/llms-full.txt) are the machine-readable entry points. * [Agent-Native Systems](/docs/principles/ai-native/agent-native-systems) defines the broader interface discipline for agent consumers. * [Keep Docs and Skills in Sync](/docs/guides/keep-docs-and-skills-in-sync) defines the change workflow for canonical docs and skill files. * [Correct Documentation Drift](/docs/guides/correct-documentation-drift) defines the triage workflow when docs, skills, code, specs, and design records disagree. * [Decisions](/docs/decisions) records architectural decisions with append-only history. * [Reference](/docs/reference) contains generated and lookup-oriented material. ## Anti-patterns we reject [#anti-patterns-we-reject] * **Skill files as shadow docs.** A skill that duplicates durable engineering policy becomes stale faster than the canonical docs page. * **Docs pages as prompts.** Documentation should explain systems and decisions; skills should instruct agents how to act. * **Documentation as an afterthought.** Docs ship with the feature or the feature is incomplete. * **Manual reference tables.** If a table can be generated from code, contracts, or schemas, generate it. * **Unowned pages.** A page without owner and review cadence has no maintenance path. * **Stale diagrams.** A diagram that does not match the system is worse than no diagram because it creates false confidence. * **Screenshots as reference.** Screenshots are acceptable as evidence in incidents, not as canonical UI or architecture documentation. * **Marketing-flavoured engineering docs.** Assertions need evidence, examples, or source-of-truth links. * **Overstated standards claims.** Distinguish formal standards from emerging conventions. Name the standard, its scope, and why it applies. ## Further reading [#further-reading] * [Diátaxis](https://diataxis.fr) — the structural model for tutorials, how-to guides, reference, and explanation. * [llms.txt](https://llmstxt.org) — the emerging convention behind our AI-readable documentation index. * [Model Context Protocol](https://modelcontextprotocol.io) — the protocol we use for structured agent access to docs resources and tools. * [ISO/PAS 25955:2026](https://www.iso.org/standard/92127.html) — DDI interoperability specification; useful as a metadata-interoperability reference, not as an agent-discovery standard. * *Docs for Developers*, Bhatti et al. — practical guidance for engineering documentation. * *Living Documentation*, Cyrille Martraire — using code and automation to reduce documentation drift. # Foundations (/docs/principles/foundations) # Foundations [#foundations] Foundations are the ideas that shape our engineering before any specific stack, service, or feature enters the conversation. They are deliberately stack-agnostic — the same principles should hold whether we are writing Go, Python, or TypeScript, whether the target is a backend API or a frontend surface, whether the change is large or small. Four pages live here: Read these before reading anything else in the principles hub. They are the filter through which every subsequent decision makes sense. # Product Engineering (/docs/principles/foundations/product-engineering) # Product Engineering [#product-engineering] ## TL;DR [#tldr] We are product engineers before we are coders. Our job is to move user outcomes — not to ship tickets. Work is shaped before it is scheduled, scheduled against a fixed appetite rather than an estimate, and measured by the change it makes in user behaviour rather than the volume of code it produces. ## Why this matters [#why-this-matters] The dominant failure mode of engineering teams in 2026 is not technical debt — it is building the wrong thing well. Feature factories optimise cycle time and output velocity and end up with a product surface that grows faster than the value it delivers. Product engineering is the discipline of resisting that. It says the unit of work is a user outcome, the unit of planning is an appetite, and the test of a PR is whether a real user can feel it. ## Our principles [#our-principles] ### 1. Outcomes over outputs [#1-outcomes-over-outputs] An "output" is a feature shipped, a ticket closed, a migration completed. An "outcome" is a change in what a user can do, how quickly they can do it, or how reliably the system supports them. We plan around outcomes and let outputs be whatever shape is required to deliver them. A sprint ending with three closed tickets and no user-visible outcome is a sprint of failed work. ### 2. Shape work before scheduling it [#2-shape-work-before-scheduling-it] No work enters a sprint without having been *shaped*: the problem stated in user terms, the rough solution sketched, the boundaries drawn to exclude rabbit holes. Shaped work is expensive upfront and cheap downstream. Unshaped work is the single biggest source of mid-sprint drift, scope creep, and late-breaking discovery that the whole approach was wrong. ### 3. Appetite, not estimate [#3-appetite-not-estimate] We set an *appetite* — "this is worth about two weeks of one engineer's attention" — and then design a solution that fits inside it. If it cannot fit, we either reduce scope or reject the work. This inverts the usual flow: instead of estimating the cost of a fixed solution, we fix the cost and negotiate the solution. It forces the team to ask "what is the cheapest version of this that delivers the outcome?" and it kills the tendency of work to expand to the time available. ### 4. Kill your darlings [#4-kill-your-darlings] If a feature is not moving an outcome, we remove it. Deletion is the most under-used tool in a product engineer's kit. Every line of code, every page of docs, every dashboard tile, every CLI flag that does not pay for its maintenance cost should be cut. A smaller, sharper product is cheaper to operate and easier for the next engineer to understand. ### 5. Instrument everything you ship [#5-instrument-everything-you-ship] A feature that is not measured does not exist from a product engineering point of view. We decide the signal *before* we ship — event, dashboard, success criterion — and we check the signal after release. If we cannot measure it, we negotiate the feature until we can. ## How we apply this [#how-we-apply-this] * [Run Tests](/docs/guides/run-tests) — we test the outcome, not the implementation. * [Progressive Delivery](/docs/principles/delivery/progressive-delivery) — canaries and flags are the mechanism by which we measure outcomes safely. * [Observability](/docs/principles/quality/observability) — the signal layer that makes outcome-based engineering possible. * [Decisions](/docs/decisions) — the record of shaping decisions that cost us real time. ## Anti-patterns we reject [#anti-patterns-we-reject] * **Velocity-as-KPI.** Story points per sprint measure nothing about user outcomes. Optimising for it corrupts the team. * **Estimate-driven planning.** Estimates anchor on how long the team thinks work will take, not on how much the work is worth. We use appetites instead. * **"Build it and they will come."** Launching a feature without a measurement plan is a signal that no one owns the outcome. * **Technical-debt-for-its-own-sake projects.** Refactors without a user-visible payoff are a smell; wrap them inside an outcome that demands them. ## Further reading [#further-reading] * *Shape Up*, Ryan Singer — the canonical treatment of shaped work and fixed appetites. * *Inspired*, Marty Cagan — the product-engineering triad and its implications for how teams are built. * *Escaping the Build Trap*, Melissa Perri — why feature-factory metrics corrupt outcomes. # Testing (/docs/principles/foundations/testing) # Testing [#testing] ## TL;DR [#tldr] Tests are risk-weighted assertions about production behaviour — not boxes ticked for coverage. We favour high-fidelity service tests over solitary unit tests, emulate dependencies rather than mocking them, and treat observability signals as first-class test assertions. ## Why this matters [#why-this-matters] The dominant failure mode of a test suite in 2026 is not that it is too small — it is that it passes while production breaks. Mocked dependencies drift from their real counterparts, unit tests assert on implementation rather than behaviour, and green CI gives a false sense of security. *Continuous Risk Assurance* is our name for the discipline that replaces "coverage as a target" with "risk as the thing we actually measure." ## Our principles [#our-principles] ### 1. Favour service tests over solitary unit tests [#1-favour-service-tests-over-solitary-unit-tests] The "sociable" service test is our foundational unit of validation. We test from the API entry point through to real, ephemeral database containers. We reserve solitary unit tests exclusively for complex isolated algorithms (parsers, validators, pure computation). In a service-oriented codebase, the interesting bugs live at the boundaries — HTTP serialisation, SQL query correctness, event emission — and those are exactly what solitary unit tests mock away. ### 2. Emulate, don't mock [#2-emulate-dont-mock] If a dependency can run in a container — Postgres, Pub/Sub, object storage — we emulate it via Testcontainers or equivalent. In-memory fakes miss critical data-integrity, serialisation, and networking issues. The startup cost is strictly worth the confidence gain; these are precisely the bugs that escape to production when you mock them out. Emulators are reset per test suite to maintain determinism and prevent test pollution. ### 3. Observability is a test surface [#3-observability-is-a-test-surface] OpenTelemetry instrumentation is a design-time concern, not an afterthought. System tests assert that traces are unbroken end-to-end: a missing span, a lost TraceID, or a broken parent-child relationship is a test failure, not an instrumentation TODO. The boundary between "test" and "monitor" dissolves — both are asking whether the system is behaving as we claim. ### 4. Name tests by behaviour, not implementation [#4-name-tests-by-behaviour-not-implementation] Every test follows a BDD-style name: `[Function] should [expected outcome] when [condition]`. This ensures the test log alone tells the story: an on-call engineer reading a failure can form a hypothesis without opening the test code. Names like `TestCreateLoop_Success` are banned — they convey nothing beyond what already appears on the dashboard. ### 5. Risk-based depth, not blanket coverage [#5-risk-based-depth-not-blanket-coverage] Coverage percentages are meaningless without proof that the assertions catch real faults. We score modules using a risk matrix — Impact × Complexity × Change-frequency — before deciding on test depth. High-risk modules earn live system tests and chaos experiments; low-risk modules need only small tests and static analysis. Equal test depth everywhere is wasted effort. ### 6. Tests are part of the change, not after it [#6-tests-are-part-of-the-change-not-after-it] A PR without tests is incomplete. A test added in a follow-up PR is a test that will never be written. We write tests alongside the code they verify, and we review the test with the same rigour as the code. If a change resists testing, that is a signal about the design of the code, not the design of the test. ## How we apply this [#how-we-apply-this] * [Run Tests](/docs/guides/run-tests) — how to invoke the suites locally and in CI. * [Observability](/docs/principles/quality/observability) — the OTel-first stance that makes traces-as-assertions possible. * [Reliability](/docs/principles/quality/reliability) — how tests compose with chaos and load experiments. * [Hexagonal Architecture](/docs/principles/system-design/hexagonal-architecture) — the structural choice that makes tests cheap to write and fast to run. ## Anti-patterns we reject [#anti-patterns-we-reject] * **Mocking the database.** A test that mocks the database is a test that asserts against your SQL-writing skill, not against database behaviour. Use an ephemeral container. * **Snapshot tests as a default.** Snapshots are a brittle, noisy substitute for behavioural assertions. They are acceptable only when the thing being snapshotted is a genuinely opaque artefact (a rendered email, a serialised response). * **Coverage-gated CI.** "95% line coverage required" is a metric that can be gamed without improving real risk reduction. Use it as a read-out, never as a gate. * **Shared staging environments as the integration test.** Staging has no hermetic guarantees, no reproducibility, and no determinism. It is a deployment target; it is not a test bed. * **"It's hard to test, so we didn't."** That is a signal the code is badly designed. Fix the code. ## Further reading [#further-reading] * *Accelerate*, Forsgren, Humble, Kim — the empirical case for continuous delivery and its testing discipline. * *Working Effectively with Legacy Code*, Michael Feathers — seams, test doubles, and when each is appropriate. * *Growing Object-Oriented Software, Guided by Tests*, Freeman & Pryce — the canonical treatment of outside-in service testing. * *xUnit Test Patterns*, Gerard Meszaros — the vocabulary we use for test doubles, fixtures, and strategies. # Accessibility (/docs/principles/quality/accessibility) # Accessibility [#accessibility] ## TL;DR [#tldr] Every user interface we ship meets WCAG 2.2 AA as a baseline. Keyboard, screen reader, and visual assistive technology are first-class targets, not after-launch polish. A feature that does not work for a keyboard user or a screen-reader user is not finished. ## Why this matters [#why-this-matters] Accessibility is not a niche concern — a significant fraction of our users rely on assistive technology at some point. Beyond the moral case (equal access is a baseline), the design constraints that accessibility imposes — clear hierarchy, visible focus, semantic structure, predictable navigation — tend to produce better software for *every* user. An accessible interface is almost always also a clearer, calmer interface. ## Our principles [#our-principles] ### 1. WCAG 2.2 AA is the floor, not the ceiling [#1-wcag-22-aa-is-the-floor-not-the-ceiling] We conform to WCAG 2.2 AA for every page, every component, every release. AA is the baseline, and we aim for AAA on critical journeys where the cost is bearable. Falling below AA is a bug; it is not a trade-off we make. ### 2. Keyboard first [#2-keyboard-first] Every interactive element is reachable and usable with the keyboard. Tab order is logical, focus is always visible, and there are no keyboard traps. The design test is simple: can a power user — or a user who cannot use a mouse — complete every journey without touching the pointer? ### 3. Screen readers see what sighted users see [#3-screen-readers-see-what-sighted-users-see] Semantic HTML first; ARIA only when HTML is not expressive enough. Headings form an outline, landmarks mark regions, form fields carry labels, images carry alt text, live regions announce updates. A screen reader should produce a narrative that matches what a sighted user sees — not a richer or poorer version of it. ### 4. Colour is never the only signal [#4-colour-is-never-the-only-signal] A red error, a green success, a blue link — each one is accompanied by a label, an icon, or a structural cue. Colour-blind users exist; colour-only signalling is an exclusion. ### 5. Motion is optional [#5-motion-is-optional] Animations respect `prefers-reduced-motion`. Large-scale parallax and aggressive transitions are used sparingly; for users with vestibular conditions, unrequested motion is not decoration, it is an accessibility failure. ### 6. Live regions are used sparingly and correctly [#6-live-regions-are-used-sparingly-and-correctly] Real-time updates — transcription chunks appearing, participants joining — are announced via `aria-live` when they matter to the user's understanding. But over-announcement is as bad as under-announcement; noisy announcements make screen readers ignore the ones that matter. ### 7. Testing is multi-layered [#7-testing-is-multi-layered] We run automated accessibility checks in CI (axe, Lighthouse accessibility audits), keyboard-walk every new journey manually, and run screen-reader walkthroughs on major features. Automated testing catches the common failures; humans catch the semantic ones. ### 8. Accessibility is reviewed like code [#8-accessibility-is-reviewed-like-code] Accessibility issues are tracked, owned, and closed the same way any other bug is. The backlog does not accumulate "we will get to the a11y later" — that queue grows forever. Every PR author is expected to include the accessibility check in their definition-of-done. ## How we apply this [#how-we-apply-this] * [Frontend](/docs/principles/stack/frontend) — the component-library patterns that make accessibility default. * [App Service Handbook](/docs/learn/services/app) — the wordloop-app architectural view. * [DevEx](/docs/principles/delivery/devex) — the CI gates that block accessibility regressions. * [Performance](/docs/principles/quality/performance) — related budgets that compound with accessibility. ## Anti-patterns we reject [#anti-patterns-we-reject] * **Placeholder text as label.** The placeholder disappears when the field is filled; the label is gone. Users who come back to check the field see nothing. Use a visible label. * **`
` as button.** A `div` with an `onClick` is invisible to keyboard, screen reader, and user agent. Use ` ); } ``` ## 3. Concrete Dependency Injection (Providers) [#3-concrete-dependency-injection-providers] Rather than handwriting brittle `fetch` calls scattered across multiple UI components, we rely entirely on purely generated API clients. ### Using the Generated Orval Client [#using-the-generated-orval-client] `Orval` reads our OpenAPI spec and generates pure TypeScript hooks and fetchers. These act as our "Providers" in the Clean Architecture context. Components (the Domain) use them without caring about the underlying HTTP mechanism. ```tsx // components/meeting-list.tsx 'use client' // 1. Import the generated Provider import { useGetMeetings } from '@/lib/providers/generated/wordloop'; export function MeetingList() { // 2. The Provider abstracts SWR caching, headers, and type validation const { data, error, isLoading } = useGetMeetings(); if (isLoading) return if (error) return // 3. Types are guaranteed perfectly backwards compatible with Core return (
    {data.meetings.map(m =>
  • {m.title}
  • )}
) } ``` ## 4. Idiomatic React & TypeScript Standards [#4-idiomatic-react--typescript-standards] We do not aim to rewrite foundational guidance on writing excellent React and TypeScript code. Instead, we adhere to established industry baselines mapped to our internal engineering principles. We expect all Wordloop App engineers to intimately understand: * [Next.js App Router Documentation](https://nextjs.org/docs/app) for framework-dictated rendering boundaries. * [Total TypeScript Patterns](https://www.totaltypescript.com/) for strict TypeScript fundamentals. Below is concrete guidance on how overarching TS/React idioms manifest as system-enforced architectural invariants. ### Default to Server Components (Clean Architecture) [#default-to-server-components-clean-architecture] **The React Idiom:** Start with React Server Components (RSC) and only use `'use client'` at the absolute leaf nodes.\ **The Principle Connection:** As defined in our [Service Architecture](/docs/principles/system-design/hexagonal-architecture), RSCs act as our "Inbound Adapters." They handle pure data fetching securely on the backend without exposing network waterfalls to the client. This enforces a strict separation where UI interactivity (Client components) is totally decoupled from data orchestration. ```tsx // app/meetings/page.tsx // This is a Server Component by default. No 'use client' directive. import { getMeetingClient } from '@/lib/providers/api'; import { MeetingList } from '@/components/MeetingList'; export default async function Page() { // 1. Data orchestration stays securely on the server. const client = await getMeetingClient(); const meetings = await client.listMeetings(); // 2. We pass pure data down to the interactivity leaf. return (

Your Meetings

); } ``` ### Discriminated Unions for Predictable State (Resilience) [#discriminated-unions-for-predictable-state-resilience] **The TypeScript Idiom:** Using strict discriminated union types instead of optional properties.\ **The Principle Connection:** We avoid `try/catch` UI crashes by mapping server actions to unified result patterns. Using discriminated unions guarantees the TypeScript compiler will force the frontend engineer to handle both states explicitly, leading to [Resilient Error Handling](/docs/principles/quality/reliability). ```typescript // 1. The Discriminated Union explicitly separates the Success and Failure states. export type ActionState = | { success: true; data: T } | { success: false; error: string }; // 2. The UI is forced to check the discriminator before accessing data. function handleResponse(response: ActionState) { if (!response.success) { // TS knows 'response' only has an 'error' here. showToast(response.error); return; } // TS knows 'response' guaranteed has 'data' here. renderMeeting(response.data); } ``` # App Service (Next.js) (/docs/learn/services/app) # App Service (Next.js) [#app-service-nextjs] `wordloop-app` is the frontend UI. Deployed on the **Next.js 16 App Router**, the application prioritizes React Server Components (RSC) to construct HTML payloads instantly on initial load, only falling back to Client Components for pure user interactivity. ## Architecture & Layout [#architecture--layout] > \[!IMPORTANT]\ > The project enforces an absolute inward-facing dependency graph. Application routing logic can depend on internal business functions hooks, but business core functions must never depend on UI primitives. ```text services/wordloop-app/ ├── app/ # Next.js App Router (Pages, Layouts, API Proxies) ├── components/ # UI Elements │ ├── ui/ # Shadcn/Radix Primitives │ └── / # Bet-specific components ├── hooks/ # SWR hooks for client caching ├── lib/ # Core Logic (NO UI DEPENDENCIES ALLOWED) │ ├── schemas/ # Zod schema definitions │ ├── api.ts # Generated HTTP client │ └── utils.ts # Pure logic and Tailwind mergers ├── orval.config.ts # Code generation rules └── globals.css # Tailwind v4 configuration ``` ## Local Development Workflow [#local-development-workflow] 1. **Start System Infrastructure** ```bash ./dev start infra core ml ``` *(Boots databases, memory layer, and all backing backend services)* 2. **Start Next.js** ```bash cd services/wordloop-app pnpm run dev ``` Visit [http://localhost:4001](http://localhost:4001) in your browser. ## Development Guidelines [#development-guidelines] * **Tailwind v4 First:** Token declaration is strictly CSS-first. Please review our [Design Guide](design-guide.mdx) for UI patterns. * **Strict Boundary Checks:** Always review the rigid [Frontend Architecture Rules](architecture.mdx) before abstracting components or lifting state. # UX Guide (/docs/learn/services/app/ux-guide) # UX Design Guide: The Velocity Manifesto [#ux-design-guide-the-velocity-manifesto] The interface is a transparent, frictionless layer between the user’s thought and their data. The application must feel weightless, preemptive, and immediately responsive. Every interaction is designed to keep the user in a state of uninterrupted flow. ## 1. Core Interaction Pillars [#1-core-interaction-pillars] The fundamental rule of this application's UX is **velocity**. The UI must never ask permission to be useful or force the user to manage the system. * **Flow-State Entry (Zero-Click):** Upon rendering any page or view, the primary input must be immediately focused. The user should be able to begin typing the millisecond the application loads without reaching for a mouse or trackpad. * **Frictionless Inline Editing:** The distinction between "view mode" and "edit mode" is eliminated. Administrative tasks and content creation are the same action. Clicking any text element transforms it into an active input field in-place. Avoid dedicated edit screens or modals. * **Pre-emptive Architecture:** The application must anticipate the user's next action. Upon saving or submitting an entry, the UI must reset to a "Ready" state instantly. * **Optimistic UI:** Never make the user wait for a server response. Use optimistic updates to reflect changes in the UI instantaneously, handling data synchronization silently in the background. ## 2. Navigation & The Command Palette [#2-navigation--the-command-palette] The primary interface should remain sparse, dedicating maximum screen real estate to the user's content. Global navigation and complex actions are abstracted away from static menus. * **The Central "Brain":** The Command Palette is the operational core of the application. * **Keyboard-First Dominance:** Users must be able to navigate between loops, trigger global actions, and modify settings entirely via keyboard shortcuts utilizing the Command Palette. * **Context Switching:** The Command Palette allows users to instantly pivot between distinct tasks without losing their place in the primary interface. ## 3. User Perception & "Liquid Glass" [#3-user-perception--liquid-glass] The mental model for the interface is a continuous, physical sheet of glass that has been layered, etched, or frosted. * **Depth over Distance:** Visual hierarchy is communicated through translucency and backdrop blurring, not aggressive drop shadows. * **The Theme Split:** * *Milk (Light Mode):* Designed to feel luminous and airy, simulating natural light passing through frosted acrylic. * *Obsidian (Dark Mode):* Designed to feel deep and ink-like, utilizing high-contrast text against soft, dark depth to focus user attention. * **Spatial Rhythm:** Proximity dictates relationship. Maintain a strict, consistent grid for macro-spacing, but utilize aggressively tighter groupings for related internal elements to build immediate visual associations. ## 4. Interaction, Motion, & State Communication [#4-interaction-motion--state-communication] Movement and state changes must feel organic, physical, and calm. * **Liquid Transitions:** Elements do not snap instantly between states. Use organic easing curves so elements flow smoothly from one layout or state to the next. * **Micro-Friction:** Interactive elements should respond to the user's presence. Use subtle scale shifts on hover or press to make buttons and cards feel malleable and tactile. * **Dynamic Stacking Context:** When overlays (like the Command Palette) are invoked, the background must dynamically blur further. This visually "pushes" the main content deep into the background, reinforcing the physical layering of the glass interface and narrowing the user's focus. * **Calm Feedback Loops:** * *Success:* Use soft, organic Sage Green. It signals completion calmly, avoiding high-tension, vibrating colors. * *Error:* Use a muted Rose. It provides a clear, sophisticated warning signal that remains integrated with the soft glass aesthetic without resorting to harsh "stoplight" reds. * **Progressive Disclosure via Iconography:** Icons function strictly as wayfinders. To keep the interface sparse, action icons should remain hidden or low-opacity until the user hovers over the parent container, revealing functionality exactly when needed. # Core Architecture Rules (/docs/learn/services/core/architecture) # Architecture Rules: Core Service (Go) [#architecture-rules-core-service-go] The `wordloop-core` service is the direct physical manifestation of our [Service Architecture Principles](/docs/principles/system-design/hexagonal-architecture). It strictly abides by Clean Architecture (Ports and Adapters) to protect business rules from infrastructure volatility. All domain logic resides in `internal/`; external dependencies point inwards. ## 1. Layers & Dependency Flow [#1-layers--dependency-flow] * **Domain (`internal/core/domain`):** Pure Go. Zero dependencies. Defines entities, validation rules, and sentinel errors. * **Gateways (`internal/core/gateway`):** Interfaces (Contracts) defining external data access. Depends only on Domain. * **Services (`internal/core/service`):** Business logic and orchestration. Depends on Domain and Gateways. * **Providers (`internal/provider`):** Concrete implementations (Postgres, PubSub). Depends on Domain and Gateways. * **Entrypoints (`internal/entrypoints`):** HTTP routes, JWT middleware, OpenAPI mappings. Depends on Domain and Services. ## 2. Context & Dependency Injection [#2-context--dependency-injection] * **Context is King:** `context.Context` MUST be the first parameter of every boundary function. This is critical for OpenTelemetry trace propagation. * **Constructor Injection:** Use `NewService(...)` functions returning concrete types while accepting interfaces. Avoid global singleton state to ensure code remains deterministic and testable. ## 3. Telemetry & Observation [#3-telemetry--observation] * **OpenTelemetry:** Every HTTP endpoint and background job must initialize a Root Span. Provider calls (e.g., executing a SQL query or publishing to Pub/Sub) must extract and cascade the span. * **Logging:** Use `slog` exclusively. Always inject `trace_id` and `span_id` dynamically from the current context. ## 4. Authentication & Security [#4-authentication--security] * **Clerk Identity:** Entrypoints must use the validated Clerk JWT middleware. Do not trust user IDs from requests; extract them from the authenticated context token. * **Testing:** Provider layer testing requires actual Postgres containers (`testcontainers`). We prioritize absolute database fidelity; thus, we interact directly with real instances rather than mocking external state. ## 5. Migrations [#5-migrations] * All database schema updates must be written as discrete `.sql` migration files in `scripts/migrations/`. * State migrations are immutable. Rely solely on the programmatic runner (`./dev db migrate`). # Core Implementation Guide (/docs/learn/services/core/implementation) # Core Implementation Guide (Go) [#core-implementation-guide-go] This guide translates WordLoop's overarching Engineering Principles into explicit, copy-pasteable Go code for the `wordloop-core` service. ## 1. Concrete Trace-First Development [#1-concrete-trace-first-development] We rely on OpenTelemetry for all observability. Every inbound request starts a trace, and every outbound request cascades it. ### Initializing a Span [#initializing-a-span] A new operation must start a span. If extracting from an HTTP Gin context, pass `c.Request.Context()`. ```go import "go.opentelemetry.io/otel/trace" func (s *TranscriptionService) Process(ctx context.Context, meetingID string) error { // 1. Start the span ctx, span := s.tracer.Start(ctx, "TranscriptionService.Process") // 2. Guarantee it closes defer span.End() // 3. Enrich the span with concrete, searchable attributes span.SetAttributes(attribute.String("meeting.id", meetingID)) // ... logic } ``` ### Passing Context [#passing-context] **Context is King.** Do not store context in structs. Pass it as the first parameter to every single Domain, Service, and Provider function. If you drop the context, you sever the distributed trace. ## 2. Concrete Error Handling [#2-concrete-error-handling] We use Go's `errors.Is` capabilities combined with purely defined "Sentinel Errors" to prevent database or HTTP leakage into our Domain logic. ### Defining Sentinels [#defining-sentinels] Define business rule errors in `internal/core/domain/errors.go`: ```go package domain import "errors" var ErrMeetingNotFound = errors.New("meeting not found") var ErrUnauthorized = errors.New("unauthorized access") ``` ### Wrapping & Mapping Errors in Providers [#wrapping--mapping-errors-in-providers] An Adapter (Provider) catching a third-party or infrastructure error must wrap it into a Domain error before returning it to the Service. ```go package provider import ( "database/sql" "fmt" "wordloop-core/internal/core/domain" ) func (r *PostgresMeetingStore) GetMeeting(ctx context.Context, id string) (*domain.Meeting, error) { var meeting domain.Meeting err := r.db.QueryRowContext(ctx, "SELECT ...").Scan(...) if err != nil { if errors.Is(err, sql.ErrNoRows) { // Map infrastructure error to Domain concept return nil, fmt.Errorf("provider execution failed: %w", domain.ErrMeetingNotFound) } return nil, fmt.Errorf("unexpected db error: %v", err) } return &meeting, nil } ``` ## 3. Concrete Dependency Injection [#3-concrete-dependency-injection] We use interface injection (Ports) to satisfy dependencies. ### The Port (Defined by the Core) [#the-port-defined-by-the-core] The interface belongs in `internal/core/gateway/`/`service` and is strictly defined using Domain language. ```go package gateway import "wordloop-core/internal/core/domain" type MeetingStore interface { GetMeeting(ctx context.Context, id string) (*domain.Meeting, error) } ``` ### The Wiring (Entrypoint) [#the-wiring-entrypoint] Constructor injection is used to assemble the pieces at startup without relying on globals. ```go // 1. Initialize the concrete Provider dbProvider := provider.NewPostgresMeetingStore(sqlDB) // 2. Inject it into the Service (which only knows the Gateway Interface) meetingService := service.NewMeetingService(dbProvider) // 3. Inject the Service into the inbound HTTP route entrypoints.RegisterMeetingRoutes(router, meetingService) ``` ## 4. Idiomatic Go & Standards [#4-idiomatic-go--standards] We do not aim to rewrite foundational guidance on writing excellent Go code. Instead, we adhere to established industry baselines and strictly map them to our internal engineering principles. We expect all Wordloop Core engineers to intimately understand: * [Effective Go](https://go.dev/doc/effective_go) for language fundamentals. * [Uber Go Style Guide](https://github.com/uber-go/guide/blob/master/style.md) for practical, enterprise-grade formatting, concurrency, and pattern consensus. Below is concrete guidance on how overarching Go idioms manifest as system-enforced architectural invariants. ### Accept Interfaces, Return Structs (Clean Architecture) [#accept-interfaces-return-structs-clean-architecture] **The Go Idiom:** "Accept interfaces, return structs."\ **The Principle Connection:** This idiom is the bedrock of [Clean Architecture (Ports and Adapters)](/docs/principles/system-design/hexagonal-architecture). Gateways (Ports) define the interfaces. Services accept those interfaces. Providers return concrete struct representations. ```go // 1. The Gateway (Port) is an interface type Store interface { Get(ctx context.Context, id string) (*domain.Meeting, error) } // 2. The Service accepts the interface func NewService(store Store) *Service { return &Service{store: store} } // 3. The Provider (Adapter) returns the concrete struct type PostgresStore struct { /* ... */ } func NewPostgresStore(db *sql.DB) *PostgresStore { return &PostgresStore{db: db} } ``` ### Goroutines and Context Loss (Trace-First) [#goroutines-and-context-loss-trace-first] **The Go Idiom:** "Don't leave goroutines hanging, and always pass context."\ **The Principle Connection:** We practice [Trace-First Observability](/docs/principles/quality/observability). Executing a background goroutine without passing context severs the OpenTelemetry trace, blinding our dashboards to system behavior. When spawning an asynchronous background task, use `context.WithoutCancel` (introduced in Go 1.21) or extract/inject the trace so the background span remains a child of the request trace—even if the HTTP client disconnects early. ```go func (s *Service) ProcessAsync(ctx context.Context) { // Prevent the goroutine from dying if the HTTP request closes early, // but preserve the Trace Context so the background task is observable. bgCtx := context.WithoutCancel(ctx) go func() { _, span := s.tracer.Start(bgCtx, "ProcessAsync.Background") defer span.End() // Execute asynchronous domain work... }() } ``` ### Immutability in the Domain (Domain Purity) [#immutability-in-the-domain-domain-purity] **The Go Idiom:** Receiver types (Pointer vs Value semantics).\ **The Principle Connection:** Our Domain layer must remain pure and free from unpredictable side effects. When creating methods on Domain entities that calculate or evaluate state **rather than modifying it**, enforce immutability by exclusively using **value receivers**. This ensures core business rules remain deterministic, trivially unit-testable, and free from accidental pointer mutation. ```go package domain // Meeting is our core domain entity. type Meeting struct { DurationSeconds int Status string } // CalculateCost utilizes a value receiver (m Meeting) instead of a pointer (*Meeting). // This guarantees the calculation logic cannot accidentally mutate the Meeting's state. func (m Meeting) CalculateCost(rate float64) float64 { return float64(m.DurationSeconds) * rate } ``` # Core Service (Go) (/docs/learn/services/core) # Core Service (Go) [#core-service-go] `wordloop-core` is the platform's system of record. It handles all database interactions, operational transactional logic, and asynchronous job orchestration. > \[!IMPORTANT] > The Core service exposes a strictly typed REST API via [Huma](https://huma.rocks), ensuring absolute contract adherence. ## Architecture & Layout [#architecture--layout] The project strictly abides by Clean Architecture principles, enforcing strong boundaries between domain logic and side-effects. ```text services/wordloop-core/ ├── cmd/api/main.go # Entrypoint (DI wiring & server boot) ├── internal/ │ ├── core/domain/ # Entities, Value Objects, Pure Logic │ ├── core/service/ # Orchestration & Use-Cases │ ├── entrypoints/ # HTTP Handlers (Huma), Clerk JWT Middleware │ └── provider/ # Postgres, Pub/Sub, Storage Adapters └── scripts/migrations/ # SQL up/down migrations ``` ## Local Development Workflow [#local-development-workflow] Run the Go server locally with standard tools and the consolidated CLI driver. 1. **Start Infrastructure Services** ```bash ./dev start infra ``` *(Boots Postgres, Pub/Sub, Storage Emulators, and the OTel Aspire Dashboard)* 2. **Execute Database Migrations** ```bash ./dev db migrate ``` 3. **Start the API Server** ```bash cd services/wordloop-core go run cmd/api/main.go ``` ## Development Guidelines [#development-guidelines] > \[!WARNING]\ > Always adhere to the [Core Architecture Rules](architecture.mdx). If your changes expose new HTTP endpoints, you must regenerate the OpenAPI client before committing. Run: ```bash ./dev gen api ``` # Architecture Rules (/docs/learn/services/ml/architecture) # Architecture Rule: Clean Architecture for WordLoop ML [#architecture-rule-clean-architecture-for-wordloop-ml] ## 1. Context & Scope [#1-context--scope] This rule physically applies our [Service Architecture Principles](/docs/principles/system-design/hexagonal-architecture) to all Python code within `src/wordloop`. We divide the code structurally based on its behavior: the core business logic remains isolated in `src/wordloop/core/`, inbound traffic is handled by `src/wordloop/entrypoints/`, and all external integrations belong in `src/wordloop/providers/`. ## 2. Architectural Layers (Inward Dependency Flow) [#2-architectural-layers-inward-dependency-flow] **Dependencies must only point INWARD.** Inner layers must never import from outer layers. ### **Domain** (`src/wordloop/core/domain`) [#domain-srcwordloopcoredomain] * **Purpose:** Business entities and core logic using `dataclasses` or `Pydantic`. * **Zero-Dependency Core:** Standard library and Pydantic/Dataclasses only. No I/O. * **Framework Agnostic Definitions:** Models must remain free from database-specific decorators or library-specific types (e.g., avoid `SQLAlchemy` ORM models here). * **Universal Vocabulary:** Define core application exceptions (`src/wordloop/core/exceptions.py`) and constants here. * **Testing:** Pure Unit Tests. Verify state transitions and business rules with zero mocks. ### **Gateways** (`src/wordloop/core/gateways`) [#gateways-srcwordloopcoregateways] * **Purpose:** `typing.Protocol` or `abc.ABC` definitions that define **capabilities**. * **Contractual Masters:** Gateways define *what* (e.g., `store`), never *how*. * **No Leaky Abstractions:** Signatures must use **Domain** entities. Never reference SDK types (e.g., `openai.ChatCompletion`) or transport types. * **The Golden Rule:** Use generic names. `publish(msg: Message)`, not `send_to_sqs(msg: Message)`. ### **Services** (`src/wordloop/core/services`) [#services-srcwordloopcoreservices] * **Purpose:** Use-case orchestration. This is where the application "decides" what happens. * **Dependency Injection:** Services depend on **Gateways** (Protocols/ABCs), not concrete Providers. * **Protocol Consumer Rule:** Services should return concrete Domain objects. If a Service is used by an Entrypoint, the **Entrypoint** defines the Protocol/Interface it requires from the Service. * **Transaction Boundaries:** Coordinate workflows by fetching data, applying domain logic, and persisting results. ### **Providers** (`src/wordloop/providers/`) [#providers-srcwordloopproviders] * **Purpose:** Concrete implementations of Gateway interfaces (The "Adapter"). * **Mapping (Domain Alignment):** Translates external SDK responses into **Domain Entities**. * **Error Wrapping:** Catch library-specific errors (e.g., `botocore.exceptions.ClientError`) and raise a corresponding **Core Exception** defined in the Domain layer. * **Testing:** Integration Tests only. Use `testcontainers-python` to verify actual I/O against real instances. ### **Entrypoints** (`src/wordloop/entrypoints/`) [#entrypoints-srcwordloopentrypoints] * **Purpose:** The interaction layer (FastAPI, CLI). * **Boundary Validation:** Entrypoints validate inputs, map request schemas to Domain objects, and call a Service. They delegate all business decisions to the Core. * **Testing:** Use `fastapi.testclient.TestClient` with mocked Services to verify routing and status codes. *** ## 3. Dependency Injection & State [#3-dependency-injection--state] * **Constructor Injection:** Use `__init__` for all dependencies. * **Explicit Lifecycles:** Initialize database clients solely at the entrypoint startup (e.g., `lifespan` in FastAPI) and inject them rather than relying on global singletons. * **Wiring:** All concrete Provider-to-Service wiring happens at the outermost edge (the Entrypoint or a dedicated `container.py`). *** ## 4. Integrity & System Testing [#4-integrity--system-testing] ### **Bootstrap Verification (The "Smoke" Test)** [#bootstrap-verification-the-smoke-test] To ensure the application is wired correctly: * **Wiring Test:** A test in `tests/system/test_bootstrap.py` that attempts to initialize the full dependency tree. * **Validation:** Ensures that all required environment variables are present and that the DI container (or manual wiring) doesn't fail on startup. ### **Golden Path System Tests** [#golden-path-system-tests] * **Location:** `tests/system/`. * **Strategy:** Run a live instance of the app (e.g., using `uvicorn` in a subprocess or `TestClient` with real providers) against real infrastructure via `testcontainers`. * **Zero Mocks:** These tests verify the "Golden Thread" from the API route all the way to the database/third-party SDK. * **Scope:** Focus strictly on high-value success paths. *** ## 5. Core Engineering Standards [#5-core-engineering-standards] 1. **Pydantic Everywhere:** Use Pydantic for all data boundaries (Request/Response and Domain). 2. **Structured Logging:** Utilize structural logging libraries rather than basic print statements to ensure trace fidelity. 3. **Acyclic Dependencies:** Prevent import cycles, as they act as an immediate signal of leaked layer responsibilities. 4. **Strict Boundaries:** Maintain clean layers by preventing FastAPI `Depends`, `Request`, or SQL `Session` objects from entering the Service or Domain layers. 5. **Clean Containers:** Always ensure `container.stop()` or similar cleanup is called in `pytest` fixtures to prevent resource leaks in CI. # ML Implementation Guide (/docs/learn/services/ml/implementation) # ML Implementation Guide (Python) [#ml-implementation-guide-python] This guide translates WordLoop's overarching Engineering Principles into explicit, copy-pasteable Python code for the `wordloop-ml` service. ## 1. Concrete Trace-First Development [#1-concrete-trace-first-development] We rely on OpenTelemetry for all observability. Because Python requires explicit context propagation in background tasks, we must properly extract and inject W3C Baggage. ### Initializing a Span [#initializing-a-span] A new operation must start a span. In FastAPI, this is often handled automatically, but for background pipeline tasks, you must explicitly declare it. ```python from opentelemetry import trace tracer = trace.get_tracer(__name__) def process_audio(meeting_id: str) -> None: # 1. Start the span with tracer.start_as_current_span("ProcessAudio") as span: # 2. Enrich the span with concrete attributes span.set_attribute("meeting.id", meeting_id) # ... processing logic ``` ### Passing Context [#passing-context] When publishing to Pub/Sub or calling another service, you must explicitly inject the current trace context into the HTTP headers or message attributes. ## 2. Concrete Error Handling [#2-concrete-error-handling] We use explicit Python `Exception` subclasses defined in our pure Domain to prevent external SDK errors from polluting our business logic. ### Defining Sentinels [#defining-sentinels] Define business rule errors in `src/wordloop/core/exceptions.py`. They should inherit from a base `WordLoopError`: ```python class WordLoopError(Exception): """Base exception for all Wordloop errors.""" def __init__(self, message: str, cause: Exception | None = None): super().__init__(message) self.cause = cause class ModelInferenceError(WordLoopError): """Raised when an AI model fails to return a valid response.""" ``` ### Wrapping & Mapping Errors in Providers [#wrapping--mapping-errors-in-providers] An Adapter (Provider) interacting with the AssemblyAI SDK or OpenAI SDK *must* catch the library-specific error and raise a pure Domain exception. ```python import assemblyai as aai from wordloop.core.exceptions import ModelInferenceError class AssemblyAIProvider: def transcribe(self, url: str) -> str: try: transcript = aai.Transcriber().transcribe(url) if transcript.error: raise ModelInferenceError(f"AssemblyAI failed: {transcript.error}") return transcript.text except aai.errors.AssemblyAIError as e: # Map infrastructure error to Domain concept raise ModelInferenceError("AssemblyAI SDK crashed", cause=e) ``` ## 3. Concrete Dependency Injection [#3-concrete-dependency-injection] We use Python's `Protocol` from the `typing` module to define Interfaces (Ports). ### The Port (Defined by the Core) [#the-port-defined-by-the-core] The protocol belongs in `src/wordloop/core/gateways/` and strictly uses Domain language, completely ignorant of AssemblyAI or Postgres. ```python from typing import Protocol from wordloop.core.domain.models import TranscriptionResult class TranscriptionProvider(Protocol): def transcribe(self, audio_uri: str) -> TranscriptionResult: ... ``` ### The Wiring (Entrypoint) [#the-wiring-entrypoint] Constructor injection is used. FastAPI's `Depends` system automatically resolves these during the request lifecycle. ```python from fastapi import Depends from wordloop.core.services import AudioService from wordloop.providers.assembly import AssemblyAIProvider # Dependency Injection function def get_audio_service( provider: AssemblyAIProvider = Depends() ) -> AudioService: # The AudioService requires a TranscriptionProvider protocol! return AudioService(provider=provider) ``` ## 4. Idiomatic Python & Standards [#4-idiomatic-python--standards] We do not aim to rewrite foundational guidance on writing excellent Python code. Instead, we adhere to established industry baselines and mapping them to our internal engineering principles. We expect all Wordloop ML engineers to understand: * [PEP 8](https://peps.python.org/pep-0008/) for fundamental language syntax. * [Google Python Style Guide](https://google.github.io/styleguide/pyguide) for enterprise-level structure and docstring consensus. Below is concrete guidance on how overarching Python idioms manifest as system-enforced architectural invariants. ### Strict Typing over Duck Typing (Clean Architecture) [#strict-typing-over-duck-typing-clean-architecture] **The Python Idiom:** Using strong static typing (`mypy`) instead of traditional dynamic duck-typing.\ **The Principle Connection:** [Clean Architecture (Ports and Adapters)](/docs/principles/system-design/hexagonal-architecture) relies heavily on explicit Contracts/Ports across boundaries. We enforce the use of `typing.Protocol` and strict type hints on all domain models to ensure dependency inversion is compile-time verifiable. ```python from typing import Protocol from dataclasses import dataclass @dataclass(frozen=True) class TranscriptionRequest: audio_url: str target_language: str # 1. We use a strictly typed Protocol instead of relying on duck-typed methods. class TranscriptionProvider(Protocol): def transcribe(self, request: TranscriptionRequest) -> str: ... ``` ### Context Managers for Resource Leaks (Resilience) [#context-managers-for-resource-leaks-resilience] **The Python Idiom:** Using `with` and `@contextmanager` for resource lifecycle management.\ **The Principle Connection:** We practice robust [Error Handling & Resilience](/docs/principles/quality/reliability). If an ML SDK or file stream throws an exception, failing to clean up memory or connections results in persistent leaks and eventual cluster death. Always utilize Context Managers when handling stateful resources. This guarantees the `__exit__` cleanup executes even if your domain logic crashes. ```python import tempfile import os from contextlib import contextmanager @contextmanager def temporary_audio_file(audio_bytes: bytes): """Context manager to ensure ephemeral files are always deleted after processing.""" temp_path = tempfile.mktemp(suffix=".wav") try: with open(temp_path, "wb") as f: f.write(audio_bytes) yield temp_path finally: # This cleanup is guaranteed to run, preventing disk exhaustion. if os.path.exists(temp_path): os.remove(temp_path) def process(): # The file safely deletes itself the moment the block exits or throws. with temporary_audio_file(b"...") as path: result = run_inference(path) ``` # ML Service (Python) (/docs/learn/services/ml) # ML Service (Python) [#ml-service-python] `wordloop-ml` operates as the platform's stateless asynchronous execution engine. It is responsible for audio processing payloads, interfacing securely with ML APIs (such as AssemblyAI), and normalizing telemetry constraints. The service exposes a synchronous REST interface via FastAPI but primarily executes within a custom worker consuming AsyncAPI Pub/Sub events. ## Architecture & Layout [#architecture--layout] The Python stack adheres to pure Clean Architecture logic and utilizes modern Python (3.12+). ```text services/wordloop-ml/ ├── src/wordloop/ │ ├── core/domain/ # Pydantic state models (No logic leaks) │ ├── core/gateways/ # typing.Protocol interface definitions │ ├── core/services/ # Orchestration workflows │ ├── entrypoints/ # FastAPI Routes, Pub/Sub Worker Consumers │ └── providers/ # Concrete external integrations (AssemblyAI, GCP) ├── tests/ # unit/ and system/ └── pyproject.toml # `uv` managed dependencies ``` ## Local Development Workflow [#local-development-workflow] Our Python architecture relies entirely on `uv` for ultra-fast, predictable dependency management and virtual environments. 1. **Start Platform Dependencies** ```bash ./dev start infra core ``` *(Boots Emulators, Observability dashboard, and the Core Go service)* 2. **Boot the API Server** ```bash cd services/wordloop-ml uv run wordloop-api ``` 3. **Boot the Async Worker (Pub/Sub)** ```bash cd services/wordloop-ml uv run wordloop-worker ``` ## Development Guidelines [#development-guidelines] * **Pydantic Everywhere:** Use Pydantic models to strictly serialize, deserialize, and validate I/O boundaries. * **Service Identity & Core Interaction:** When writing back to Core, ML must inject the `SERVICE_AUTH_TOKEN` generated via `./dev setup`. Never bypass interface restrictions. Always examine the [ML Architecture Rules](architecture.mdx) before injecting new dependency chains into a workflow. # Data Flow (/docs/work/_template/tdd/data-flow) {/* LLM CONTEXT — DATA FLOW DOC Bet: Purpose: Maps every user action from the UI Design through service boundaries. Services: App (Next.js) | Core (Go) | ML (Python) | [add/remove as needed] Persistent stores: PostgreSQL (Core) | GCS | [add/remove as needed] Protocol inventory: REST | WebSocket | Pub/Sub | [add/remove as needed] */} # Data Flow [#data-flow] Diagrams use **descriptive operation labels** — not endpoint paths, header names, or field names. Those belong in the Contracts doc. If you find yourself writing `POST /meetings/:id/tasks` or `Authorization: Bearer` in a diagram label, move it there. *** ## System Context [#system-context] *The topology of the system — which services exist and how they connect. This is a map, not a sequence. Draw it once, at the top, before any flows.* *Edit this graph to match the actual services in this bet. Every node shown here should appear as a participant in at least one flow below.* *** ## Flows [#flows] *Group flows into logical **Parts** — one Part per major phase of the user journey. Name each flow after what triggers it, not after the implementation.* ***Rule:** Labels describe the operation, never the implementation. "Create task (idempotent, echo-suppressed)" is correct. "POST /meetings/:id/tasks" belongs in Contracts.* ### Part 1 — \[Phase Name] [#part-1--phase-name] *What the user is doing and what the system is setting up during this phase.* #### Flow 1: \[Flow Name] [#flow-1-flow-name] *One sentence: what the user does to trigger this, and what state the system reaches when it completes.* *Explain non-obvious sequencing decisions — why async vs sync, why this ordering constraint — in a sentence below the diagram, not as diagram annotations.* *** #### Flow 2: \[Flow Name] [#flow-2-flow-name] *One sentence describing trigger and outcome.* *** ### Part 2 — \[Phase Name] [#part-2--phase-name] *What the system does continuously during this phase, and what the user sees in response.* #### Flow 3: \[Flow Name] [#flow-3-flow-name] *One sentence describing trigger and outcome.* *** ### Part N — Failure Modes [#part-n--failure-modes] *Failure flows are **required**. For every significant service boundary in this bet, there must be at least one flow showing what happens when that boundary fails and how the system recovers. Model failures that would cause data loss or silent breakage — not every possible error.* *If the UI Design doc models a "Degraded" or "Connectivity Lost" state, the corresponding recovery sequence must appear here.* #### Flow N: \[Failure Scenario Name] [#flow-n-failure-scenario-name] *What failure condition triggers this, which boundaries it affects, and the recovery sequence.* *** ## Design Decisions [#design-decisions] *Required. Record decisions that shaped the flows above. If a future engineer would ask "why did you do it this way?", it belongs here. Common categories:* * ***Infrastructure constraints** — what the bet assumes exists (or doesn't). E.g., "sticky session affinity, no pod-to-pod backplane." If the missing infrastructure is significant, extract a separate problem statement for it.* * ***Scope boundaries** — capabilities explicitly deferred to a future version and why. Reference the relevant No-Go in the pitch.* * ***Performance / latency choices** — what was optimised and what was traded. E.g., "pre-warm the upstream session on permission grant, not on first data."* * ***Lifecycle / cleanup policies** — what temporary data is created, when it's deleted, and what safety window exists.* * ***Protocol / pattern choices** — why this protocol over alternatives. E.g., "HTTP stream not WebSocket for service-to-service, because..."* | Decision | Alternatives considered | Why this | | ------------------ | ---------------------------- | -------------------------------------------- | | *What was decided* | *What else was on the table* | *The constraint or tradeoff that settled it* | *** ## Boundary Inventory [#boundary-inventory] *Every service-to-service boundary shown in the flows above. This table feeds directly into the Contracts doc — each row here becomes a contract entry.* | Boundary | Flows | From → To | Protocol | Data shape | | -------------------------------- | -------- | ------------------- | --------------------- | ------------------------------------------------------------ | | *Descriptive name for this call* | *Flow N* | *Service → Service* | *REST / WS / Pub/Sub* | *What information crosses: operation, key fields, direction* | # Overview (/docs/work/_template/tdd) # Technical Design Document [#technical-design-document] > **Status**: Draft | Agreed > **Author**: *@handle* > **Date**: *YYYY-MM-DD* ## Success Criteria [#success-criteria] *What does "solved" look like? Define measurable criteria from the user's perspective and the system's perspective. The problem context lives in the [Pitch](../pitch) — don't restate it here.* | Criterion | Measured by | | ---------------------- | ---------------------- | | *User-visible outcome* | *How you'll verify it* | ## Architectural Approach [#architectural-approach] *The approach taken and the key decisions made. What options were considered? Which was chosen and why? Not implementation detail — the reasoning. Link to the Design Decisions table in Data Flow for the full rationale.* ## Constraints [#constraints] *Architectural constraints discovered during design, beyond the [no-gos in the Pitch](../pitch). Include links to problem statements for known limitations that are deliberately deferred.* * *Example: \[constraint or deferred limitation]* ## Open Questions [#open-questions] *What is still unknown, who owns the answer, and when it must be resolved. An empty table is a warning sign.* | Question | Owner | Status | | ----------------------- | ------ | -------------------------------- | | *What is the question?* | *@who* | *To verify / Resolved / Blocked* | *** ## Navigation Map [#navigation-map] *Link to each TDD sub-document with a one-line description of what it covers. Helps readers orient quickly.* | Document | What it covers | | ------------------------ | ----------------------------------------------------- | | [UI Design](ui-design) | *Wireframes, screen states, and interaction patterns* | | [Data Flow](data-flow) | *Sequence diagrams and design decisions* | | [Contracts](contracts) | *API shapes for every boundary* | | [Schemas](schemas) | *Database table designs* | | [Milestones](milestones) | *Build plan broken into shippable slices* | *** ## Architecture Scaffolding [#architecture-scaffolding] To maintain structural consistency and immediate integration with the test suite, always use the CLI to generate the remaining TDD components. **Add a Milestone** ```bash ./dev new milestone {{BET_SLUG}} ``` **Add a Domain Slice** (Automatically connects to pytest suite) ```bash ./dev new slice {{BET_SLUG}} ``` **Design API Contracts** ```bash # Scaffold the default contract tree for a service bet: # contracts/core/{rest,websocket,pubsub}.mdx # contracts/ml/{rest,websocket}.mdx ./dev new contracts {{BET_SLUG}} # Scaffold one additional boundary when the Boundary Inventory calls for it: ./dev new contract {{BET_SLUG}} ``` Contract docs live under `tdd/contracts//.mdx`. The folder describes the API boundary between services in the ideal end state: REST resources and commands, WebSocket streams and events, Pub/Sub topics and event envelopes. Use the protocol-specific templates to start from the right checklist instead of a blank page. **Design a Database Schema** ```bash ./dev new schema {{BET_SLUG}} ``` # UI Design (/docs/work/_template/tdd/ui-design) # UI Design [#ui-design] *This document describes what the user sees and does — not how the system delivers it. Walk through each screen the bet touches, then map the journeys between them. The goal: enough concrete detail that a data flow, API contracts, and database schema can be designed from this document alone.* ***Scope check:** If you're specifying which service owns the logic, how the frontend integrates, or where data persists — you've gone too far. That belongs in the Data Flow doc.* *** ## 1. Screens [#1-screens] *For each screen, follow this structure: a brief description, a wireframe, the layout, the states it can be in, and the key interactions. One subsection per screen.* ### Screen Name [#screen-name] *One sentence: what is the user trying to do on this screen?* *Include a wireframe — even a rough sketch. The wireframe is the anchor; the text describes it.* {/* ![Wireframe](/images/bets/bet-slug/screen_name.png) */} **Layout:** *Describe the layout regions and what content lives in each one. Be specific about what fields and controls exist.* * **Region name:** *What's here. If it's an input, say what kind (free text, dropdown, rich text). If it auto-saves, say so. If there's a component being reused, name it.* **States:** | State | What the user sees | | --------------------- | ------------------------------------------------------------------------------- | | *Loading / skeleton* | *What's visible while data arrives? Grey blocks? Spinner? Disabled controls?* | | *Active / happy path* | *Everything working normally.* | | *Empty* | *No data yet — what does the user see? A prompt? A placeholder?* | | *Error / degraded* | *Something went wrong — what's visible, what still works, how does it recover?* | *Think about: What does the user see in the first second? The first 10 seconds? After an hour?* ***For any feature that depends on a live connection or external service:** each failure mode gets its own named row — not just "Error / degraded". A live recording screen, for example, needs separate rows for Connecting, Degraded (ML down but audio continues), Connectivity Lost, and Reconnected. If the Data Flow doc will have a failure mode flow for it, this table needs a row for it. The two docs must stay in sync.* **Key Interactions:** *What can the user do on this screen? For each interaction, describe what happens in response. Stay at the user level — "the task appears in the list" not "the API returns 201".* * **Interaction name:** *What the user does → what they see in response.* *Be specific about data objects. If the screen shows tasks, define what a task is: what fields does it have? Which are required? Can they be nested? What states can they be in? Does editing change their classification? Name the component if reusing one.* *** ### Another Screen [#another-screen] *Repeat the same structure. Include screens for both the primary flow and any secondary views (tabs, modals, expanded states).* *** ## 2. User Journeys [#2-user-journeys] *Map how the user moves between screens. One journey per major flow. Use simple ASCII diagrams — they're scannable and diffable. Include both the happy path and the key branches (permission denied, error recovery, etc.).* ### Primary Journey [#primary-journey] ```text [Starting point] → [Screen A] │ ▼ [Decision point] ──failure──→ [Error state / blocking modal] │ success ▼ [Screen B] │ │ ← what the user does here, what the system shows them │ ▼ [Screen C] ``` ### Secondary Journey [#secondary-journey] ```text [Screen C] │ │ ← review, edit, explore │ ├──→ [Screen D] │ └──→ back to [Screen C] ``` *** ## 3. Edge Cases [#3-edge-cases] *Anything that doesn't fit neatly into a screen's state table. Focus on user-visible behaviour, not system internals.* *Prompt yourself with these categories — not every bet will hit all of them, but each is worth considering:* * ***Concurrent access** — same user in multiple tabs, same resource accessed by multiple users* * ***Session boundaries** — what happens on tab close, browser crash, token expiry, long idle periods?* * ***Resource limits** — very long sessions, very large data sets, quota exhaustion* * ***Background/foreground** — what happens when the tab is backgrounded or the device sleeps?* | Scenario | Behaviour | | ------------------------------- | ---------------------------------------- | | *What goes wrong or is unusual* | *What the user sees and can do about it* | # Problem Statement (/docs/work/delivered/live-capture/01-problem-statement) {/* LLM-Context: TL;DR: Problem Statement for the Meeting Recording bet. Problem: No live capture flow; meetings enter only via file upload or manual text entry. Appetite: Large (the most complex bet the platform has run so far). Why now: Live recording is the missing foundation for real-time AI value. */} # Problem Statement [#problem-statement] > **Status**: Accepted > > **Author**: Ryan Nel > > **Date**: 2026-04-18 *** ## Observed Problem [#observed-problem] Users need a way to capture meetings — both in-person and virtual — directly from the WordLoop app without relying on third-party recording tools or manual note-taking. Today, meetings can only enter the system via file upload or manual text entry — both of which are post-hoc and require the meeting to have already happened. There is no live capture flow. The core pain points are: 1. **No live capture** — users must remember to take notes or record externally, then import later. 2. **No real-time feedback** — users have no visibility into what's being captured until processing completes. 3. **Post-processing delay** — insights (talking points, tasks, topics) are only available after a batch transcription and synthesis pipeline finishes. *** ## Appetite [#appetite] **Large.** This is the most complex bet the platform has run so far. It introduces a binary audio streaming path, a bidirectional HTTP stream between Core and ML, a new recording lifecycle, speaker identification, post-meeting reprocessing, and audio playback — all coordinated across three services. The appetite is deliberately accepted before scoping begins. If the full scope doesn't fit, we cut scope — we don't extend the appetite. *** ## Why Now [#why-now] Live recording is the missing foundation for real-time AI value. File upload works but it delays every insight. The ML infrastructure (AssemblyAI, speaker diarisation, streaming insights) is already in place — this bet wires it to a live capture path. Without live recording, WordLoop remains a post-hoc tool. With it, it becomes a meeting partner. *** ## Output [#output] Check [Bet Sizing](../../sizing) to confirm the appetite judgment is realistic. Then move to [The Pitch](pitch). # Pitch (/docs/work/delivered/live-capture/02-pitch) {/* LLM-Context: TL;DR: Pitch for the Meeting Recording bet. Core sketch: extend the WebSocket connection for binary audio, stream to GCS and ML in parallel, route ML insights back to the client via the same connection, trigger post-meeting reprocessing via Pub/Sub. Key rabbit holes: audio encoding choice, degraded ML handling, speaker ID confidence threshold. */} # The Pitch [#the-pitch] > **Status**: Accepted > > **Author**: Ryan Nel *** ## Problem [#problem] See [Problem Statement](01-problem-statement): no live capture flow — meetings enter only via file upload or manual text entry. Appetite: Large. *** ## Solution Sketch [#solution-sketch] Extend the existing WebSocket connection to carry binary audio frames upstream. Core receives audio and fans out in parallel: stream to GCS for durable storage, and stream to ML for live transcription. ML routes segments and insights back to Core via a persistent HTTP stream — Core broadcasts them to the client via WebSocket and persists them asynchronously. When recording stops, Core publishes a `MeetingSessionTerminated` event. ML drains its AssemblyAI buffer and sends final segments. Core then triggers a post-meeting reprocessing job via Pub/Sub — the same pipeline used for file uploads, with `skip_tasks: true` to preserve tasks captured live. **This bet does not change the data architecture pattern.** It extends Optimistic Mutation with Echo-Suppressed Streaming to cover: * A new upstream path (audio frames) * A new downstream path (transcript segments and ML insights in real time) *** ## Rabbit Holes [#rabbit-holes] **Audio encoding.** The client captures audio in the browser. The ML service expects a specific format (PCM16 or WebM). Encoding decisions affect latency. We keep this simple: the client sends raw WebM chunks; Core forwards them as-is. No transcoding at Core. **ML degradation.** If AssemblyAI is unavailable mid-recording, we cannot fail the session — the user is speaking. The bet requires graceful degradation: continue storing audio to GCS, show a warning, and recover via post-processing when services come back. **Speaker identification confidence threshold.** Matching a voice embedding against known profiles requires a threshold. Too low: false matches. Too high: no matches. The threshold must be configurable without a deployment. Start with 0.85 and expose it as a server-side config value. **Session state.** One active session per user. Core must enforce this — two concurrent recording sessions from the same user is an error, not a queuing scenario. *** ## No-Gos [#no-gos] * No calendar integration — auto-starting from calendar events is a separate bet * No multi-user collaborative recording * No video capture — audio only * No meeting bot integration (Zoom/Teams/Meet) * No custom vocabulary or domain-specific tuning; use default AssemblyAI settings *** ## Output [#output] Pitch is accepted. Move to [Design](design) to map the user journey and define what the UI needs before the API is designed. # Data Flow (/docs/work/meeting-recording/tdd/data-flow) {/* LLM-Context: TL;DR: Data flow for the Meeting Recording bet. 20 flows across 3 sections: — Part 1 (Live Session): Start Recording (incl. concurrent session guard), Live Audio→Transcription (always-on OPFS shadow buffer + chunk sequencing), Live Insights Pipeline (3a batched talking points + tasks via single LLM structured output with existing task list for LLM-native deduplication, 3c progressive speaker ID with lock-on), Degraded Mode (5 failure domains incl. GCS failure), Audio Gap Recovery, Audio Silence Detection, Duration Warning & Auto-Stop, Server-Side Inactivity Timeout (5 min default), Background Tab Audio Continuity (Web Workers). — Part 2 (User Mutations): Notes Auto-Save, User Creates Task, Task Mutations (full CRUD), User Labels Speaker, Create New Person During Speaker Labelling. — Part 3 (Session End & Post-Meeting): Stop Recording (4-phase: drain ML → collect OPFS gaps → compose GCS → publish TranscriptionJob), Batched Gap Upload (50 chunks per multipart request), Post-Meeting Processing, Soft-Deleted Meeting Handling (204 on write-back), Transcription Processing Lifecycle, Audio Playback (signed URL + readiness). Each arrow = contract boundary. All contracts formalised on the Contracts page. */} # Data Flow [#data-flow] This document sits between [UI Design](ui-design) (which defines what the user sees) and [Contracts](contracts) (which formalise the API shapes). For each screen and interaction, it draws what calls what: which service initiates, which responds, what data crosses each boundary. Read each arrow two ways: it is a **contract boundary** (what shape the data takes) and a **sequencing constraint** (downstream cannot build until the upstream contract is published). ## System Context [#system-context] *** ## Part 1: Live Session [#part-1-live-session] Flows that run automatically during an active recording session — audio capture, ML insights, and system resilience. ### Flow 1: Start Recording [#flow-1-start-recording] The user opens the **New Meeting ▾** dropdown and selects **Start Live Recording**. After the browser grants microphone access, the app creates a meeting and initiates the recording session across all three services. **Pre-conditions:** The client checks for an active recording session before enabling the button. If one exists, "Start Live Recording" is disabled with a tooltip. Core enforces this server-side — if `StartRecordingCommand` arrives while a session is already running, it responds with `RecordingErrorEvent (session_conflict)`. If the browser denies microphone access, the app shows a blocking modal with a link to browser audio settings — no data flow occurs. ### Flow 2: Live Audio → Transcription (Lowest Latency Path) [#flow-2-live-audio--transcription-lowest-latency-path] Audio flows from the browser microphone through Core and ML to AssemblyAI. Transcript segments return via the **ML WebSocket** — the same long-lived connection Core uses to send audio. Audio chunks flow upstream as binary frames, segments and insights flow downstream as CloudEvents text frames. **OPFS shadow buffer:** Every audio chunk is simultaneously written to an always-on shadow buffer maintained by a dedicated Web Worker using the Origin Private File System (OPFS) `createSyncAccessHandle()` API. Each chunk carries a monotonically incrementing sequence number assigned in the browser. This buffer runs unconditionally — it captures audio regardless of Core or GCS connectivity. It is cleared only after Core confirms all chunks are safely in GCS (see Flow 16 and Flow 9). **Chunk-based GCS writes:** Instead of a single streaming write to one file, each audio chunk is stored as a separate GCS object keyed by sequence number: `meetings/{id}/chunks/{seq:08d}.webm`. WebM encodes its EBML header in the first chunk; subsequent chunks contain raw Cluster data. This structure enables gap recovery — any chunk missed due to a connectivity failure can be backfilled from OPFS by sequence number. At session end, Core composes the chunk objects into the final `audio.webm` (see Flow 9). Core streams segments directly to the client via WebSocket for minimum latency, and persists them to the database asynchronously in the background. The app distinguishes **interim** segments from **final** segments and replaces them in-place when the final version arrives — no layout shift. ### Flow 3: Live Insights Pipeline [#flow-3-live-insights-pipeline] Talking points and tasks are extracted by the same LLM query, batched together as a single structured output call. This avoids redundant token spending — the transcript context is loaded once into the prompt cache and both extraction tasks run against it. All insights stream back through the ML WebSocket as CloudEvents text frames, following the dual-write pattern: Core fans out to the browser via WebSocket for latency, and persists to DB asynchronously for durability. **Context management:** ML maintains a rolling transcript buffer in memory, appending each finalised segment as it arrives. The full buffer is included in the LLM prompt as cached context. The prompt is always ordered as: `[system instructions] [schema] [anchor segments] [recent window] [latest segment]`. When the context budget is exceeded, segments are dropped from the **oldest non-anchored position** — the boundary between the anchor and the recent window — never from the front. Removing from the front would change the content immediately after the static cached prefix, invalidating every transcript token in the cache. Because the anchor only ever grows, the cached region (`[system] + [schema] + [anchor]`) expands over the course of the meeting and is never invalidated by trimming. #### Flow 3a: Live Talking Points & Tasks (Batched — Per Finalised Segment) [#flow-3a-live-talking-points--tasks-batched--per-finalised-segment] On every finalised transcript segment, ML sends the full rolling transcript buffer to the LLM requesting both the latest talking point and any new tasks in a single structured output call. The LLM returns both in one response. Talking points update immediately, and tasks are extracted opportunistically from the same call. **LLM-native task deduplication:** The current list of extracted tasks is appended to the dynamic suffix of each prompt. The LLM is instructed to return only tasks that are genuinely new — not already represented in the existing list. This delegates deduplication to the model, which handles paraphrase and semantic overlap naturally without a separate post-processing step. The prompt is structured for OpenAI prompt caching: `[system instructions] [output schema] [anchor segments]` forms the stable cached prefix that grows as the session progresses. `[recent window] [latest segment] [existing task list]` is the dynamic suffix appended on each call. Placing the task list in the dynamic suffix (not the cached prefix) keeps the cache hit rate high — the stable cached region is never invalidated by task accumulation. #### Flow 3b: Live Speaker Identification (Per Diarised Speaker) [#flow-3b-live-speaker-identification-per-diarised-speaker] AssemblyAI's transcript segments arrive pre-diarised — each segment carries a `speaker_label` (e.g. `speaker_1`, `speaker_2`). ML's job is to resolve each `speaker_label` to a known Person by matching voice embeddings against enrolled profiles. **Every segment gets a voice embedding.** Regardless of whether the speaker has been identified, ML extracts a voice embedding from the segment's audio and stores it on the segment. This happens unconditionally — embeddings are required for post-meeting RAG and future retrieval, not only for speaker matching. **Matching strategy:** Speaker matching runs separately, gated on the per-session map `speaker_label → { status, person_id?, attempts }`. This map lives in ML's memory for the hot path but is mirrored to a `meeting_speaker_states` table in Postgres on meaningful transitions. On session start — and on reconnect after a pod restart — Core pushes the current speaker states and voice profiles to ML via `StreamStartEvent`, so ML reconstructs its in-memory map without needing a pull endpoint. | State | Behaviour | Persisted? | | ----------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------ | | `unmatched` | Compare this segment's embedding against all enrolled voice profiles. If confidence exceeds the match threshold → transition to `matched`. Otherwise, increment `attempts` and retry on the next segment from this speaker. | Attempts tracked in-memory only — an unmatched speaker restarting at 0 on recovery is acceptable. | | `matched` | The speaker label is locked to a person. All future segments from this speaker are tagged immediately — no further voice comparison needed. | Yes — persisted to `meeting_speaker_states` (status + person reference) on transition. | | `exhausted` | After N failed attempts (configurable, e.g. 5 segments), stop comparing for this speaker. The raw `speaker_label` is preserved. The user can manually resolve it via Flow 7 (speaker labelling). | Yes — persisted to `meeting_speaker_states` on transition. | | `manual` | Set by Flow 7 when the user labels a speaker. Takes precedence over voice matching — ML will not attempt to match this speaker regardless of voice similarity. | Yes — written synchronously by Core (Flows 7/8) so it is immediately visible on any subsequent pod recovery. | *** ## Part 2: User Mutations [#part-2-user-mutations] Flows initiated by the user during or after a recording session. All follow the **Optimistic Mutation with Echo-Suppressed Streaming** pattern: the client updates local state immediately, sends the mutation via REST, and suppresses the returning WebSocket echo. ### Flow 4: Notes Auto-Save [#flow-4-notes-auto-save] The **Private Notes** scratchpad is the primary surface of the live recording view — it occupies the left column. Notes auto-save continuously with no explicit save button. The app debounces keystrokes and patches the meeting's notes field. ### Flow 5: User Creates Task [#flow-5-user-creates-task] The user's task is written via REST (not the streaming path) since it's a user-initiated mutation. Tasks have a description (required), assignee (optional), and due date (optional). ### Flow 6: Task Mutations (Full CRUD) [#flow-6-task-mutations-full-crud] Flow 5 covers task creation. The UI design specifies a full set of task mutations: edit, delete, toggle completion, nest under other tasks, assign a person, and set a due date. Editing a system-generated task converts it to user-owned. ### Flow 7: User Labels Speaker as Person [#flow-7-user-labels-speaker-as-person] When a user identifies "Speaker A" as a known Person (by clicking the speaker label on any transcript segment), the system reassigns all segments from that speaker and records the mapping in `meeting_speaker_states` as a manual override so that ML respects it immediately on any pod recovery. Voice profile enrichment from the session's embeddings happens during post-meeting processing, not here. ### Flow 8: Create New Person During Speaker Labelling [#flow-8-create-new-person-during-speaker-labelling] Flow 7 assumes the person already exists. The UI design says the user can "reassign to a known person or **add a new one**." When creating a new person, the UI handles this as two sequential operations: first create the person, then assign them to the speaker label using the same endpoint as Flow 7. The speaker-labels endpoint always receives an existing person reference — it has no knowledge of whether that person was just created or long-established. Voice profile enrichment happens during post-meeting processing. *** ## Part 3: Session End & Post-Meeting [#part-3-session-end--post-meeting] Flows triggered when a recording stops (user-initiated or auto) and the subsequent background processing that upgrades all artefacts to final quality. ### Flow 9: Stop Recording [#flow-9-stop-recording] The user presses **Stop Recording** (or the system auto-stops at the duration limit). The stop sequence is strictly ordered: drain ML first (to flush final transcript segments), then collect any remaining OPFS gaps, then compose the final audio file, then trigger post-meeting processing. This ordering ensures no audio is lost and the composed file includes all chunks. ### Flow 10: Duration Warning & Auto-Stop [#flow-10-duration-warning--auto-stop] A configurable maximum recording duration (default: 4 hours) is enforced server-side. Core sends a warning at T-10 minutes and auto-stops at the limit. The auto-stop triggers the same post-meeting pipeline as a user-initiated stop. ### Flow 11: Post-Meeting Processing (Automatic, via Pub/Sub) [#flow-11-post-meeting-processing-automatic-via-pubsub] Post-meeting processing runs automatically via the shared `TranscriptionJob` Pub/Sub worker. For live recordings, the job is published flagging tasks to be skipped, preserving tasks captured during the session. The worker: 1. Batch-transcribes the full audio from GCS (higher accuracy) 2. Replaces transcript segments with the improved results 3. Generates headline, summary, topics, and finalises talking points 4. Extracts tasks (file upload flow only) The Meeting Summary page shows a subtle progress indicator during re-processing and updates each artefact in real time as it completes. ### Flow 12: Transcription Processing Lifecycle [#flow-12-transcription-processing-lifecycle] The `transcriptions` table tracks processing status through a defined state machine: `pending` → `transcribing` → `synthesizing` → `completed` (or `failed`). The client uses this to show re-processing progress on the Meeting Summary page. ### Flow 13: Audio Playback (Signed URL Direct to GCS) [#flow-13-audio-playback-signed-url-direct-to-gcs] Core generates a short-lived signed URL. The client streams audio directly from Cloud Storage, with standard HTTP range requests for seeking. The audio player appears on the **Transcript tab** of the Meeting Summary page. If the audio file is still being processed, the endpoint returns `404` and the client retries with exponential backoff. ### Flow 14: Degraded Mode — Layered Resilience [#flow-14-degraded-mode--layered-resilience] The system has five independent failure domains. Each degrades gracefully — the OPFS shadow buffer ensures audio is never lost regardless of what fails on the backend. Core detects failures and notifies the client via `RecordingErrorEvent` with a specific error code. Recovery is automatic: when the broken link restores, Core sends a recovery signal and the client clears the warning. Gaps in GCS are filled via the gap upload sequence (Flow 16). | Failure | What breaks | What still works | Error code | | -------------------------- | ------------------------------------------------ | ----------------------------------------------------------------- | --------------------- | | App → Core (WS drops) | All commands, events, audio streaming to Core | OPFS shadow buffer captures all audio locally | Client-side `onclose` | | Core → GCS (storage fails) | Chunk writes — audio gap accumulates in GCS | Audio still flows via WS to Core; OPFS captures all audio locally | `storage_unavailable` | | Core → ML (stream fails) | Transcription, talking points, tasks, speaker ID | Audio→GCS (or OPFS on WS drop), notes auto-save | `ml_unavailable` | | ML → AssemblyAI | Transcript segments | Audio→GCS, notes, voice embeddings | `transcoder_error` | | ML → OpenAI | Talking points, task extraction, summaries | Transcript, speaker ID, audio→GCS | `insight_warning` | The diagram above shows the `ml_unavailable` path in detail. The remaining failure domains follow the same notification pattern: **App → Core WS drop:** The browser's WebSocket `onclose` event fires. The OPFS shadow buffer captures all audio produced during the outage. On reconnect, the client sends `ResumeRecordingCommand` and Flow 16 backfills any missing GCS chunks before audio forwarding to ML resumes. **Core → GCS failure:** Chunk writes fail — Core sends `RecordingErrorEvent (storage_unavailable)`. Audio continues flowing through the WebSocket; the OPFS shadow buffer captures the gap locally. On GCS recovery, Core sends `RecordingErrorEvent (storage_recovered, last_stored_seq)` and gap chunks are uploaded via Flow 16. **ML → AssemblyAI failure:** Transcript segments stop arriving. Core sends `RecordingErrorEvent (transcoder_error)`. Voice embeddings are unaffected. Audio and notes continue. Missing transcript is rebuilt during post-meeting processing from the full audio in GCS. **ML → OpenAI failure:** Talking points and task extraction stop. Core sends `RecordingErrorEvent (insight_warning)`. Transcript and speaker ID are unaffected. Missing insights are rebuilt during post-meeting processing. ### Flow 15: Audio Silence Detection [#flow-15-audio-silence-detection] Two layers detect audio problems: the **browser** catches microphone issues locally, and **Core** catches broken streams server-side. **Client-side (primary):** The browser monitors the `MediaStream` via Web Audio API `AnalyserNode`. If the RMS level falls below a threshold for 10 consecutive seconds, the client shows an inline notice. No server round-trip needed — this is purely a UX signal. Clears automatically when audio levels recover or the first transcript segment arrives. **Server-side (secondary):** Core tracks time since the last audio chunk was received on the WebSocket. If no chunks arrive for 10 seconds while a session is active, Core sends a `RecordingErrorEvent`. This catches the case where the browser believes it's sending audio but the WebSocket stream is silently broken. ### Flow 16: Audio Gap Recovery [#flow-16-audio-gap-recovery] When a connectivity gap occurs — either the App→Core WebSocket drops or Core cannot write to GCS — the OPFS shadow buffer accumulates all audio produced during the outage. On recovery, the client compares its OPFS buffer against the last sequence number Core successfully stored in GCS, then uploads any missing chunks via REST. Core writes each to GCS by sequence number and deduplicates — chunks already stored are skipped. This same flow runs at session stop time if any gaps remain (see Flow 9). ### Flow 17: Server-Side Inactivity Timeout — **New** [#flow-17-server-side-inactivity-timeout--new] If no audio chunks arrive for a configurable period (default: 5 minutes) while a recording session is active, Core treats the session as abandoned and triggers the same stop sequence as Flow 9. This covers the case where the user closes their laptop lid, loses power, or otherwise disappears without explicitly stopping — the WebSocket heartbeat timeout (\~60 seconds) transitions the connection to closed, but the recording resource would remain in `active` state indefinitely without this secondary timeout. The inactivity timeout prevents abandoned sessions from blocking the concurrent-session guard. ### Flow 18: Background Tab Audio Continuity — **New** [#flow-18-background-tab-audio-continuity--new] Chrome and other browsers aggressively throttle background tabs — JavaScript timers are capped at 1 execution per minute, and some WebSocket activity may be delayed. However, `MediaRecorder` itself runs on a browser-internal thread and is **not** throttled when the tab is backgrounded. The critical design choice: all audio chunk processing (sequence numbering, OPFS writes, and WebSocket sends) runs in a dedicated **Web Worker**, which is exempt from background tab throttling. The main thread only receives notifications for UI updates. This means audio capture and transmission continue uninterrupted when the user switches to another tab. The page title changes to "● Recording…" so the user can find the tab. ### Flow 19: Batched Gap Upload — **New** [#flow-19-batched-gap-upload--new] When a large connectivity gap occurs (e.g., 30 minutes offline = \~18,000 chunks), the client uploads gap chunks in batches rather than one-at-a-time. The client reads chunks from OPFS in batches of 50, uploads each batch as a single multipart request to `POST /meetings/{id}/recording/chunks`, and uses the `remaining_missing_sequences` in the response to drive the next batch. A determinate progress indicator shows upload progress on the Meeting Summary page. If the browser closes mid-upload, the upload resumes from where it left off on next page load (OPFS retains all chunks until `AudioChunkStoredEvent` confirms GCS receipt). ### Flow 20: Soft-Deleted Meeting During Post-Processing — **New** [#flow-20-soft-deleted-meeting-during-post-processing--new] Meetings are soft-deleted (flagged with `deleted_at`, not removed from the database). If a user soft-deletes a meeting while post-meeting processing is running, ML's write-back calls to Core REST will encounter a soft-deleted resource. Core handles this gracefully: write-back endpoints (`PUT /transcriptions/{id}/segments`, `PUT /meetings/{id}/synthesis`, `PATCH /transcriptions/{id}/status`) check `deleted_at` and return `204 No Content` if the meeting is soft-deleted. ML treats this as success (no retry). The post-meeting processing completes silently — artefacts are written to the soft-deleted meeting's rows in the database but are never visible to the user. This avoids 404 errors, unnecessary retries, and DLQ noise. *** ## Design Decisions [#design-decisions] Key architectural choices and their rationale. These are the "why" behind the flows above. | Decision | Rationale | | ------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Dual-write (WS + async DB)** | Stream to client via WebSocket for minimum latency (\~200ms). Persist to DB asynchronously so a DB hiccup doesn't block the live experience. | | **Echo-suppressed optimistic mutations** | Client updates local state immediately (optimistic), sends via REST, then suppresses the returning WebSocket `EntityChanged` event using a session identifier. Gives instant UI feedback without double-rendering. | | **WebSocket for Core↔ML** | Audio flows upstream as binary frames and insights flow downstream as CloudEvents text frames on the same long-lived WebSocket. Supports bidirectional control events (DrainCommand, BackpressureEvent), replay cursors for reconnection, and speaker state push — capabilities that would require a separate control channel with HTTP streaming. Core acts as a protocol bridge: browser-facing WebSocket on the client side, service-to-service WebSocket on the ML side. | | **OPFS always-on shadow buffer** | Every audio chunk is written to the browser's Origin Private File System via `createSyncAccessHandle()` in a dedicated Web Worker before (or instead of) being sent to Core. The buffer runs unconditionally — it captures audio regardless of Core or GCS connectivity. This separates audio capture (which must never fail) from transport (which can be retried). The buffer is cleared only after Core confirms all chunks are safely in GCS. | | **Chunk-based GCS writes + hierarchical compose** | Each audio chunk is stored as a separate GCS object keyed by sequence number (`meetings/{id}/chunks/{seq:08d}.webm`) rather than as a single streaming upload. This enables gap recovery: any chunk missed during a connectivity failure can be backfilled by sequence number. At session end, Core composes the chunk objects into the final `audio.webm` using GCS Compose — hierarchically in groups of ≤32 for recordings that exceed GCS's 32-object compose limit. | | **GCS as the indestructible recording** | Audio always reaches GCS eventually, even across connectivity failures, because the OPFS shadow buffer guarantees local capture. Everything else (transcript, insights, tasks) can be rebuilt from the audio during post-meeting processing. | | **Task preservation for live recordings** | Post-meeting re-processing replaces transcript segments and regenerates synthesis, but must not regenerate tasks. Users create and edit tasks during the live session — clobbering them would destroy user work. The Pub/Sub job carries a flag to skip task extraction when triggered from a live recording. | | **Transcription status state machine** | `pending → transcribing → synthesizing → completed` gives the client granular progress without polling. Each transition fires an `EntityChanged` event. Fewer states reduce complexity while still distinguishing the two user-visible phases: transcript generation and insight synthesis. | | **Signed URL with client-side rotation** | Client streams audio directly from GCS (no Core proxy). Signed URLs expire after 1 hour. Client sets a timer to refresh before expiry for seamless playback. | | **Dual-layer silence detection** | Client-side `AnalyserNode` catches mic issues instantly (no latency). Server-side chunk timeout catches broken streams the client can't detect. Neither alone covers both cases. | | **Concurrent session as pre-condition (not a separate flow)** | The check is a guard on Flow 1, not an independent workflow. Client checks on page load; Core enforces atomically before starting a session. | | **Batched LLM query (talking points + tasks)** | A single structured output call extracts both talking points and tasks from the same prompt. The rolling transcript buffer is loaded once into the prompt cache; adding a second extraction task to the same query costs almost nothing vs. a separate call. | | **Rolling transcript buffer with prompt caching** | ML appends each finalised segment to an in-memory buffer. Each call to the LLM sends `[system instructions] [schema] [anchor] [recent window] [latest segment] [existing task list]`. OpenAI caches from the prompt start, so the cached region (`[system] + [schema] + [anchor]`) grows as the anchor grows. When the context budget is exceeded, segments are dropped from the **oldest non-anchored position** (between anchor and recent window) — never from the front. Dropping from the front would change the content right after the static prefix, invalidating the entire transcript cache. Dropping from the middle preserves the cached prefix and keeps the most recent context intact. The existing task list in the dynamic suffix lets the LLM deduplicate naturally — it returns only tasks not already in the list, eliminating the need for a separate semantic similarity step. | | **Speaker state externalised to `meeting_speaker_states`** | The in-memory `speaker_label → state` map is mirrored to Postgres on meaningful transitions (`matched`, `exhausted`, `manual`). Core pushes current speaker states and voice profiles to ML on every session start and WebSocket reconnect (via `StreamStartEvent`), so ML reconstructs its map without a pull endpoint. Attempts are tracked in-memory only; an unmatched speaker restarting at 0 on recovery is acceptable since it retries a bounded number of times before exhausting again. Manual overrides written by Core (Flows 7/8) are immediately visible on any reconnect, so user resolutions are never lost or re-overridden by voice matching. | | **Progressive speaker matching with lock-on** | ML tries to match each diarised speaker to an enrolled voice profile. Once a confident match is found, the speaker label is locked — no further voice comparison is done for that speaker. Unknown speakers fail fast after a bounded number of attempts. Manual state (set by user labelling) takes precedence and cannot be overridden by voice matching. Voice profile enrichment from session embeddings is deferred to post-meeting processing. | | **Sticky session affinity (not a backplane)** | Load balancer routes all WebSocket frames for a session to the same Core pod. No pod-to-pod event routing exists today. This is a known scaling constraint documented separately as a problem statement. | | **Session not resumable after tab close (v1)** | OPFS data persists beyond tab close, but the recording session does not. If the user closes the tab, the session ends and post-meeting processing runs on whatever audio reached GCS. Session resume is a future enhancement, captured as a separate problem statement. | | **WebSocket heartbeat (30s ping/pong)** | Detects zombie connections in seconds rather than waiting for TCP timeout (minutes). Two missed pongs trigger the client-side OPFS-bridges-the-gap path (Flow 16). | | **Chunk integrity via CRC32** | Each chunk carries a CRC32 checksum on the hot path (\~10 chunks/sec). CRC32 is sufficient for detecting transmission corruption at this frequency. Core verifies on receipt; corrupted chunks are re-requested from OPFS during gap recovery. OPFS stores each chunk with a CRC32 integrity envelope so corruption can be detected on read. Gap recovery uploads via REST use the same CRC32. | | **Pre-warm AssemblyAI on mic permission** | The upstream streaming session opens when the browser grants mic access, not when the first audio chunk arrives. Saves \~500–800ms on first-segment latency. | | **Sequential post-meeting pipeline** | Post-meeting processing is always sequential: batch transcription first (replaces live segments with higher-accuracy results), then synthesis (headline, summary, topics, talking points). Synthesis depends on the final transcript, so the stages cannot be parallelised. Task extraction is skipped for live recordings (tasks were captured during the session). Each stage updates the transcription status, giving the client granular progress. | | **Talking-point cadence: per finalised segment** | An LLM call fires on every finalised transcript segment. The rolling transcript buffer is already loaded in the prompt cache, so the marginal cost of each call is low (dynamic suffix only). This gives the fastest possible insight updates — the user sees talking points and tasks within seconds of speech. If cost becomes a concern at scale, the cadence can be relaxed to a batched window without changing the contract. | | **GCS chunk lifecycle: 24h TTL after compose** | Once `audio.webm` is composed, chunk objects (`chunks/{seq:08d}.webm`) are no longer needed. A GCS lifecycle rule deletes them 24 hours after composition. The delay provides a safety window for debugging or re-composition. | *** ## Boundary Inventory [#boundary-inventory] Every boundary shown in the diagrams above. Each becomes a contract on the [Contracts](contracts) page. | Boundary | Flows | From → To | Protocol | Data shape | | ------------------------ | ------ | ---------------- | -------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Meeting CRUD | 1, 4 | App → Core | REST | Create meeting (live recording); patch meeting notes (echo-suppressed) | | Recording commands | 1, 9 | App → Core | WebSocket | `StartRecordingCommand`, `StopRecordingCommand` | | Audio streaming | 2 | App → Core → ML | WS (binary) → ML WS (binary) | Raw audio chunks (sequence-numbered, Core enriches with ml\_session\_id) | | Live insights | 3a, 3b | ML → Core → App | ML WS (CloudEvents) → Browser WS | Talking points, tasks, embeddings, speaker matches, speaker exhausted | | Task CRUD | 5, 6 | App → Core | REST | Task create/update/delete (idempotent create, echo-suppressed; cascading sub-task nesting) | | Person creation | 8 | App → Core | REST | Create person | | Speaker labels | 7, 8 | App → Core | REST | Speaker-to-person assignment, meeting-scoped (always references an existing person) | | Notes auto-save | 4 | App → Core | REST | Meeting notes patch (debounced, echo-suppressed) | | OPFS gap upload | 9, 16 | App → Core | REST | Sequence-numbered audio chunks from OPFS shadow buffer; Core deduplicates by sequence number | | Degraded mode | 14, 16 | Core → App | WebSocket | `RecordingErrorEvent` (error code variants: `ml_unavailable`, `ml_recovered`, `storage_unavailable`, `storage_recovered`, `insight_warning`, `transcoder_error`, `no_audio_detected`, `session_conflict`) | | Duration warning | 10 | Core → App | WebSocket | `RecordingDurationWarningEvent` | | Concurrent session check | 1 | App → Core | REST | Active session read (read-only guard, no mutation) | | Transcription status | 12 | ML → Core → App | REST → WebSocket | Transcription status transitions + `EntityChanged (transcription)` | | Signed URL | 13 | App → Core → GCS | REST → GCS signed URL | Signed URL fetch (404 while processing, 200 when ready; 1-hour expiry, client-side rotation) | | Post-meeting trigger | 9, 10 | Core → ML | Pub/Sub | `TranscriptionJob` (published after drain completes and audio is composed) | | Synthesis write-back | 11 | ML → Core | REST | Transcript segments replace-all; meeting headline; synthesis artefacts (summary, topics, talking points); system-generated tasks | # Overview (/docs/work/meeting-recording/tdd) # Technical Design Document [#technical-design-document] > **Status**: Agreed > > **Author**: Ryan Nel > > **Date**: 2026-05-01 ## Success Criteria [#success-criteria] | Criterion | Measured by | | ---------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------- | | User can start a live recording from the browser and see real-time transcript within 2 seconds of speech | End-to-end latency from mic input to transcript segment on screen | | Audio is never lost, even across connectivity failures | Zero-gap rate: all chunks reach GCS via direct upload or OPFS gap recovery | | Post-meeting artefacts (headline, summary, topics, talking points) reach final quality without user action | Transcription status reaches `completed` and all synthesis artefacts are present | | Live session degrades gracefully — audio capture continues even if ML or insights fail | Recording produces a complete audio file even when ML is unavailable for part of the session | | A single active recording per user at any time | Concurrent session guard enforced client-side and server-side | ## Architectural Approach [#architectural-approach] The system connects browser-captured microphone audio to the existing ML pipeline (AssemblyAI transcription, OpenAI insights) via a streaming architecture with three layers of durability. **Core path:** Browser → Core (WebSocket, binary frames) → ML (WebSocket) → AssemblyAI (real-time streaming). Insights flow back: ML → Core → Browser on the same WebSocket connections. **Durability strategy:** Audio is captured at three levels simultaneously: 1. **OPFS shadow buffer** — every chunk is written to the browser's Origin Private File System via a dedicated Web Worker before transport. This runs unconditionally. 2. **GCS chunk storage** — each chunk is stored as a separate GCS object keyed by sequence number. Gap recovery backfills any missing chunks from OPFS. 3. **Post-meeting reprocessing** — the composed audio file is batch-transcribed at higher accuracy, replacing live segments entirely. **Key architectural decisions:** * **Dual-write** (WebSocket for latency, async DB for durability) — a DB hiccup doesn't block the live experience * **Echo-suppressed optimistic mutations** — instant UI feedback without double-rendering * **Chunk-based GCS writes with hierarchical compose** — enables gap recovery by sequence number; compose at session end * **Sequential post-meeting pipeline** — batch transcription must complete before synthesis runs (synthesis depends on final transcript) * **Task preservation** — post-meeting processing skips task extraction for live recordings to avoid clobbering user-created tasks For the full rationale on all decisions, see the [Design Decisions table in Data Flow](data-flow#design-decisions). ## Constraints [#constraints] Architectural constraints discovered during design, in addition to the [no-gos in the Pitch](../pitch#no-gos): * **Sticky session affinity** — all WebSocket frames for a session route to the same Core pod. No pod-to-pod event routing (backplane) exists. This is a known scaling constraint, captured as a [problem statement](../../../problem-statements/backplane). * **Session not resumable after tab close (v1)** — OPFS data persists, but the recording session does not. Tab close ends the session. Session recovery is a [separate problem statement](../../../problem-statements/session-recovery). * **Desktop browsers only (this bet)** — Chrome/Edge primary, Safari 17+ best-effort. Mobile architecture should not be precluded. * **Single AssemblyAI model** — no custom vocabulary or domain-specific tuning. * **5-minute WebSocket replay buffer** — reconnects beyond 5 minutes require full REST re-fetch. Captured as a [problem statement](../../../problem-statements/replay-buffer-optimization). ## Open Questions [#open-questions] | Question | Owner | Status | | ----------------------------------------------------------------------- | ----- | ------------------------------------ | | Safari 17+ `createSyncAccessHandle()` support — confirmed in workers? | App | To verify during milestone 1 | | GCS compose latency for long recordings (10k+ chunks) — need benchmarks | Core | To measure during milestone 2 | | AssemblyAI v3 turn-based API migration timeline | ML | Monitoring — no action needed for v1 | *** ## Navigation Map [#navigation-map] | Document | What it covers | | ------------------------ | ------------------------------------------------------------------------------------- | | [UI Design](ui-design) | Wireframes, screen states, and interaction patterns | | [Data Flow](data-flow) | 20 sequence diagrams across live session, user mutations, and post-meeting processing | | [Contracts](contracts) | API shapes for every boundary: REST, WebSocket, Pub/Sub, binary audio frames | | [Schemas](schemas) | Database table designs for Core and ML | | [Milestones](milestones) | Build plan broken into shippable slices | *** ## Architecture Scaffolding [#architecture-scaffolding] To maintain structural consistency and immediate integration with the test suite, always use the CLI to generate the remaining TDD components. **Add a Milestone** ```bash ./dev new milestone meeting-recording ``` **Add a Domain Slice** (Automatically connects to pytest suite) ```bash ./dev new slice meeting-recording ``` **Design a Contract** ```bash ./dev new contract meeting-recording ``` **Design a Database Schema** ```bash ./dev new schema meeting-recording ``` # UI Design (/docs/work/meeting-recording/tdd/ui-design) # UI Design: Meeting Recording [#ui-design-meeting-recording] *** ## 1. Screens [#1-screens] ### The Entry Point [#the-entry-point] The existing "New Meeting" button becomes a dropdown with two options: **Upload File** (existing) and **Start Live Recording** (new). No new navigation patterns — we're extending what's already there. Start Recording Dropdown *** ### Live Recording View [#live-recording-view] A focused, distraction-free workspace. The user's private notes take centre stage — the system's output sits alongside as an ambient feed, never competing for attention. Active Recording UI Layout **Layout:** * **Recording Banner (top):** Pulsing recording indicator, elapsed timer, Stop Recording button. If the ML connection drops, a degraded-mode warning appears inline here. * **Left Column (primary) — Private Notes:** A single rich-text scratchpad per meeting. Auto-saves — no save button. The user writes freely during the session and can continue editing after. * **Right Column (secondary) — Context Panel:** Live transcript stream (auto-scrolling), real-time talking points, and extracted tasks. **States:** | State | What the user sees | | -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Mic Permission** | Browser permission dialog. Behind it: the recording view skeleton — two-column layout with grey placeholder blocks. | | **Connecting** | Skeleton visible. Banner shows "Connecting…" with a spinner instead of the pulsing dot. | | **Awaiting First Segment** | Banner transitions to `● Recording 00:00`. Transcript area shows "Listening…" in muted text. Notes scratchpad is active and ready. | | **Active** | Transcript streaming in the context panel. Talking points and tasks arriving. User writing notes. | | **Degraded** | ML connection drops — transcript and sidebar freeze on the last-received content. An inline warning appears above: "Live insights paused — audio is still recording." Notes scratchpad and audio capture continue uninterrupted. Clears automatically on reconnect. | | **Connectivity Lost** | The browser loses its connection to the server. Banner: "Connection lost — audio is being saved on this device. Nothing will be lost." Context panel freezes. Notes scratchpad remains active. A gap placeholder appears in the transcript at the point of the last received segment. | | **Reconnected** | Connection restored. Banner: "Reconnected. Live transcript continues from here. The full transcript (including the gap) will regenerate when the meeting ends." Gap placeholder remains visible in the transcript until post-meeting processing completes. Live insights resume. | | **Duration Warning** | At T-10 minutes before the 4-hour limit, a non-blocking banner: "Recording will automatically stop in 10 minutes." | **Key Interactions:** * **Auto-scroll:** Transcript auto-scrolls to newest segment. Stops if the user scrolls up manually. A "Jump to latest →" button appears. * **Speaker labelling:** Each transcript segment shows a speaker label. User can click it to reassign to a known person or add a new one. Reassignments update all segments from that speaker. * **Interim vs. final segments:** Interim segments display muted/italic and are replaced in-place (no layout shift) when their final version arrives. * **Tasks:** Each task has a description (required), assignee (optional), and due date (optional). Tasks can be nested. Tasks are labelled as either `system` or `user` generated. If the user edits a system-generated task, it becomes a user task. Tasks have a completion state (checkbox). Uses the standard task component. *** ### Meeting Summary — Overview [#meeting-summary--overview] Post-stop, the UI transitions to the Meeting Summary page. Two tabs at the top: **Overview** (active) and **Transcript**. Meeting Summary Overview Tab **Layout (single column, stacked):** * **Summary:** Auto-generated headline, summary, and topics. * **Tasks:** All tasks — both user-created and system-extracted. * **Notes:** The user's private notes from the live session, still editable. **States:** | State | What the user sees | | ----------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Re-processing** | Summary, tasks, and notes visible immediately. A subtle progress indicator shows the background job is upgrading accuracy. Content updates in real time as each artefact completes. | | **Complete** | All artefacts finalised. No progress indicator. | **Key Interactions:** * **Edit notes:** Notes remain fully editable after the session — same scratchpad, no mode change. * **Task CRUD:** User can add, edit, and delete tasks. Editing a system-generated task promotes it to a user task. Tasks can be marked complete, nested under other tasks, and assigned to people with optional due dates. *** ### Meeting Summary — Transcript [#meeting-summary--transcript] The **Transcript** tab with an audio player pinned to the top. Meeting Summary Transcript Tab **Layout:** * **Audio Player (fixed header):** Play/pause, ±15s skip, scrub bar, current/total time, playback speed (0.5×, 1×, 1.5×, 2×). Audio streams — playback begins before the full file downloads. * **Transcript:** Segments highlighted in sync with audio playback. Clicking a segment seeks to that moment. **States:** | State | What the user sees | | --------------------- | ------------------------------------------------------------------------------------------------- | | **Audio Processing** | Player disabled (greyed out): "Audio is still being processed." Enables automatically when ready. | | **Ready** | Full player controls active. Transcript clickable and synced. | | **Audio Unavailable** | Player error state: "Audio unavailable." Auto-retries with a fresh URL. | **Key Interactions:** * **Sync'd highlighting:** The transcript segment matching current playback time is visually highlighted. Auto-scrolls to keep it visible. * **Click-to-seek:** Clicking any transcript segment seeks the audio to that segment's start time. * **Speaker labelling:** Each segment shows a speaker label. User can click to reassign or add a new person. Reassignments update all segments from that speaker. *** ## 2. User Journeys [#2-user-journeys] ### Live Recording [#live-recording] ```text [New Meeting ▾] → Start Live Recording │ ▼ Mic Permission Prompt ──denied──→ Blocking modal (links to browser settings) │ granted ▼ Connecting → Live Recording View │ │ ← user speaks, system streams transcript, talking points, tasks │ ← user writes notes, adds tasks, and labels transcript segments │ ▼ [Stop Recording] │ ▼ "Ending session…" → Meeting Summary (Overview tab) │ └──→ Background re-processing upgrades all artefacts (except tasks) ``` ### Post-Meeting Review [#post-meeting-review] ```text Meeting Summary — Overview tab │ │ ← review summary, edit notes, add/edit/delete tasks │ ├──→ Transcript tab │ │ │ │ ← play audio, follow along with sync'd transcript │ │ ← click segment to seek, label speakers │ │ │ └──→ back to Overview tab ``` *** ## 3. Edge Cases [#3-edge-cases] | Scenario | Behaviour | | ----------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Concurrent session** | If a recording is already active, "Start Live Recording" is disabled with a tooltip. | | **ML connection drops** | Transcript and sidebar freeze on last-received content. Inline warning appears. Notes and audio capture continue. Recovery is automatic. | | **Browser loses server connection** | Audio saved to OPFS on the device — nothing is lost. Banner: "Connection lost — audio is being saved on this device." Gap placeholder appears in the transcript at the disconnect point. On reconnect, audio uploads automatically in the background. | | **GCS temporarily unavailable** | Audio continues flowing; a gap accumulates in cloud storage. Client shows connectivity degraded banner. OPFS buffer captures the gap. Clears automatically when GCS recovers; gap backfilled from OPFS. | | **Browser tab backgrounded** | Recording continues. Page title changes to "● Recording…" so the user can find the tab. | | **Long session (60+ min)** | Transcript must virtualize — thousands of segments without virtual scrolling will freeze the browser. | | **Auto-stop at 4 hours** | System stops recording and runs the standard post-meeting pipeline automatically. | | **No audio detected (10s)** | Inline notice: "We're not detecting audio. Check your microphone." Clears on first segment. | | **Same user, multiple tabs** | Only one tab may control a recording. If a recording is active in another tab, the second tab shows "Recording active in another tab" with a link to switch. No takeover — the original tab owns the session. | # Data Flow (/docs/work/delivered/live-capture/03-tdd/data-flow) {/* LLM-Context: TL;DR: Data flow for the Meeting Recording bet. 8 flows: Start Recording, Live Audio→Transcription, Live Talking Points (per segment), Live Task Extraction (~60s), Live Speaker ID, User Creates Task, User Labels Speaker, Stop Recording, Post-Meeting Processing, Audio Playback (signed URL direct to GCS). Each arrow = contract boundary + sequencing constraint. All contracts are in the Contracts page. */} # Data Flow [#data-flow] For each step in the [User Flow](user-flow), this page draws what calls what: which service initiates, which responds, what data crosses each boundary. Read each arrow two ways: it is a **contract boundary** (what shape the data takes) and a **sequencing constraint** (downstream cannot build until the upstream contract is published). ## System Context [#system-context] *** ## Flow 1: Start Recording [#flow-1-start-recording] ## Flow 2: Live Audio → Transcription (Lowest Latency Path) [#flow-2-live-audio--transcription-lowest-latency-path] Audio flows from the browser microphone through Core and ML to AssemblyAI. Transcript segments return via the **streaming HTTP response** — the same connection ML uses to receive audio. This is a bidirectional HTTP stream: audio chunks flow upstream, segments and insights flow downstream. Core streams segments directly to the client via WebSocket for minimum latency, and persists them to the database asynchronously in the background. ## Flow 3a: Live Talking Points (Fast — Per Finalised Segment) [#flow-3a-live-talking-points-fast--per-finalised-segment] Talking points update on every finalised transcript segment. ML streams them back through the same HTTP stream as transcript segments. Core forwards them to the client via WebSocket and persists to the database asynchronously — the same dual-write pattern as transcript segments. ## Flow 3b: Live Task Extraction (Slow — Every \~60s) [#flow-3b-live-task-extraction-slow--every-60s] Task extraction runs on a slower cadence. ML buffers segments and periodically checks for action items. Tasks also stream back through the HTTP stream, following the same dual-write pattern. ## Flow 3c: Live Speaker Identification (Per Segment) [#flow-3c-live-speaker-identification-per-segment] Speaker identification is built into the live transcription flow. For every segment, ML extracts a voice embedding and stores it on Core. It then attempts to match the embedding against enrolled voice profiles. When a user later labels an AssemblyAI speaker label (e.g. "Speaker A") as a known Person, the system uses all segments with that speaker label to enrich that person's voice profile for improved future matching. ## Flow 4: User Creates Task During Recording [#flow-4-user-creates-task-during-recording] Standard Optimistic Mutation with Echo-Suppressed Streaming. The user's task is written via REST (not the streaming path) since it's a user-initiated mutation. ## Flow 5: User Labels Speaker as Person [#flow-5-user-labels-speaker-as-person] When a user identifies "Speaker A" as a known Person, the system enriches that person's voice profile using all segments attributed to that speaker label. ## Flow 6: Stop Recording [#flow-6-stop-recording] ## Flow 7: Post-Meeting Processing (Automatic, via Pub/Sub) [#flow-7-post-meeting-processing-automatic-via-pubsub] Post-meeting processing runs automatically via the shared `TranscriptionJob` Pub/Sub worker. For live recordings, the job is published with `skip_tasks: true` to preserve tasks captured during the session. The worker: 1. Batch-transcribes the full audio from GCS (higher accuracy) 2. Replaces transcript segments with the improved results 3. Generates headline, summary, topics, and finalises talking points (`is_final: true`) 4. Extracts tasks when `skip_tasks: false` (file upload flow only) ## Flow 8: Audio Playback (Signed URL Direct to GCS) [#flow-8-audio-playback-signed-url-direct-to-gcs] Core generates a short-lived signed URL. The client streams audio directly from Cloud Storage using that URL, with standard HTTP range requests for seeking. *** ## Boundary Inventory [#boundary-inventory] Every boundary shown in the diagrams above. Each becomes a contract on the [Contracts](contracts) page. | Boundary | From → To | Protocol | Data shape | | -------------------- | ---------------- | -------------------------------- | ---------------------------------------------------- | | Meeting CRUD | App → Core | REST | `POST/PATCH /meetings` | | Recording commands | App → Core | WebSocket | `StartRecordingCommand`, `StopRecordingCommand` | | Audio streaming | App → Core → ML | WebSocket (binary) → HTTP stream | Raw audio chunks | | Live insights | ML → Core → App | HTTP stream → WebSocket | NDJSON events (5 types) | | Speaker labels | App → Core | REST | `POST /meetings/{id}/speaker-labels` | | Signed URL | App → Core → GCS | REST → GCS signed URL | `GET /meetings/{id}/audio-url` | | Post-meeting trigger | Core → ML | Pub/Sub | `TranscriptionJob`, `MeetingSessionTerminated` | | Synthesis write-back | ML → Core | REST | `PUT /synthesis`, `PATCH /meetings`, `PUT /segments` | # User Flow (/docs/work/delivered/live-capture/03-tdd/user-flow) {/* LLM-Context: TL;DR: User flow for the Meeting Recording bet. Formerly the "Design" phase. Contains 12 user stories with acceptance criteria from the original specification. Screen inventory: Recording Controls (extend Meeting Detail), Live Recording View (new), Meeting Playback (extend Meeting Detail). Key IA insight: live and playback views share the transcript — layout adapts by state. */} # User Flow [#user-flow] This page maps the user journey for the Meeting Recording bet. The user stories and acceptance criteria here directly determine what data each screen needs — which in turn determines the API endpoints and their shapes. *** ## User Stories [#user-stories] ### Story 1: Start a Live Recording [#story-1-start-a-live-recording] > As a **WordLoop user**, I want to **press a single button in the app to start recording a meeting** so that **I don't need any external tools to capture what's being said**. **Acceptance Criteria:** * Given I am on a meeting view, when I press "Record", the system begins a live recording session for that meeting * The app begins capturing audio from the device microphone and streaming it to the server * The server confirms the session has started * A recording indicator is visible in the UI for the duration of the session * If I already have an active recording session, the system prevents starting a second one and shows an error *** ### Story 2: Watch Live Transcription [#story-2-watch-live-transcription] > As a **WordLoop user**, I want to **see what's being said in real time while recording** so that **I can follow along and verify the system is capturing correctly**. **Acceptance Criteria:** * Given a recording is active, when the system produces a transcript segment, it appears in the UI * Interim segments appear immediately and are visually distinct from final segments * Final segments replace their corresponding interim segments * Transcript segments scroll automatically to keep the latest content visible * Latency from audio capture to text on screen is \< 2 seconds under normal conditions *** ### Story 3: Watch Live Talking Points [#story-3-watch-live-talking-points] > As a **WordLoop user**, I want to **see the current talking point as the meeting progresses** so that **I have a structured summary building up in real time**. **Acceptance Criteria:** * Given a recording is active, when the system detects a new or updated talking point, it appears in the UI * Talking points created during a live session are marked as **draft** * Talking points appear as a scrollable list alongside the transcript *** ### Story 4: Watch Live Tasks [#story-4-watch-live-tasks] > As a **WordLoop user**, I want to **see tasks extracted from the conversation in real time** so that **action items aren't lost**. **Acceptance Criteria:** * Given a recording is active, when the system detects an action item, a task is created and appears in the UI * System-extracted tasks are visually distinguishable from user-created tasks via a `source` indicator *** ### Story 5: Add Tasks During Recording [#story-5-add-tasks-during-recording] > As a **WordLoop user**, I want to **manually add my own tasks while a meeting is being recorded** so that **I can capture action items the AI might miss**. **Acceptance Criteria:** * Given a recording is active, a task input field is available in the meeting view * When I submit a task, it appears immediately in the task list (optimistic update) * User-created tasks are tagged with `source: user` to distinguish them from system-extracted tasks *** ### Story 6: Stop Recording and Generate Summary [#story-6-stop-recording-and-generate-summary] > As a **WordLoop user**, I want to **stop the recording and receive a complete meeting summary** so that **I have a structured artefact of the meeting**. **Acceptance Criteria:** * Given a recording is active, when I press "Stop", the system ends the recording session * The system **automatically** generates a headline for the meeting * The system **automatically** generates a summary and topics (with best-effort talking-point nesting) * The UI transitions from the live recording view to the standard meeting detail view * All generated content is visible when the meeting detail loads *** ### Story 7: Automatic Post-Meeting Re-Generation [#story-7-automatic-post-meeting-re-generation] > As the **WordLoop system**, I want to **automatically re-process the meeting after recording ends** so that **the transcription and synthesis are as accurate as possible**. **Acceptance Criteria:** * Given a recording has stopped and the audio has been stored, the system automatically triggers a full offline re-processing job * The system re-transcribes the full audio using the offline pipeline for higher accuracy * The system re-generates talking points, topics, summary, and headline from the improved transcript * Talking points and topics are promoted from **draft** to **final** * **Tasks are NOT re-generated** — both user-created and system-extracted tasks from the live session are preserved * The UI updates in real time as each re-generated artefact completes *** ### Story 8: Play Back Meeting Audio [#story-8-play-back-meeting-audio] > As a **WordLoop user**, I want to **play back the recorded audio after a meeting ends** so that **I can revisit specific moments I may have missed**. **Acceptance Criteria:** * Given a meeting has a completed recording, an audio player is available on the meeting detail view * The player supports play, pause, seek (scrubbing), and playback speed control (0.5×, 1×, 1.5×, 2×) * Audio is streamed — the full file does not need to download before playback begins * The player displays the current playback position and total duration *** ### Story 9: Synchronised Transcript Highlighting [#story-9-synchronised-transcript-highlighting] > As a **WordLoop user**, I want to **see the transcript highlight in sync with audio playback** so that **I can follow along with what's being said**. **Acceptance Criteria:** * Given audio is playing, the transcript segment matching the current playback time is visually highlighted * The transcript auto-scrolls to keep the highlighted segment visible * When I click a transcript segment, the audio player seeks to that segment's start time and begins playing *** ### Story 10: Live Speaker Identification [#story-10-live-speaker-identification] > As a **WordLoop user**, I want to **see who is speaking during a live recording** so that **the transcript is attributed to the correct person, not just "Speaker A"**. **Acceptance Criteria:** * Given a recording is active and speaker voice profiles have been enrolled, the system matches speaker voice embeddings against known person profiles in near real-time * When a match exceeds the confidence threshold, the transcript segment shows the resolved person's name instead of the raw speaker label * Unmatched speakers continue to display their raw speaker label (e.g., "Speaker A") * Speaker resolution happens incrementally — early segments may remain unresolved and get updated as more audio is processed *** ### Story 11: Graceful Degradation During Recording [#story-11-graceful-degradation-during-recording] > As a **WordLoop user**, I want to **keep recording even if AI services are temporarily unavailable** so that **I don't lose the meeting**. **Acceptance Criteria:** * Given a recording is active and the ML service becomes unavailable, the system continues capturing and storing audio * The UI displays a clear message indicating that audio is being captured and transcription/insights will be generated when services recover * Recovery is fully automatic — no user intervention required *** ### Story 12: Recording Duration Limit [#story-12-recording-duration-limit] > As a **WordLoop user**, I want to **be warned when I'm approaching the recording time limit** so that **I can wrap up the meeting before it's cut off**. **Acceptance Criteria:** * Given a recording has been active for a configurable duration (default: 4 hours), the system automatically stops the recording * A warning is shown to the user at a configurable interval before the limit (e.g., 10 minutes before) * When the recording is auto-stopped, the standard post-meeting generation and reprocessing pipeline runs automatically *** ## Screen Inventory [#screen-inventory] | Screen | Status | Data needed | Actions | | ----------------------------------- | -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------- | | Meeting Detail — Recording Controls | Extend existing | `meeting.source_type`, `meeting_audio_files.status` | Press Record, Press Stop | | Live Recording View | Extend Meeting Detail | `TranscriptSegmentEvent` (WS), `TalkingPointEvent` (WS), `EntityChanged { entity: task }` (WS), `RecordingStartedEvent`, `RecordingDegradedEvent` | Add task (optimistic), Stop recording | | Meeting Detail — Playback | Extend existing (post-recording) | `GET /meetings/{id}/audio-url`, transcript segments with `start_ms`/`end_ms`, `person_id` | Play/pause/seek/speed, click segment to seek | *** ## Information Architecture [#information-architecture] ### Live Recording View [#live-recording-view] ``` [Meeting Detail — Recording Active] ├── Recording Indicator (pulsing, top banner) │ elapsed time | Degraded mode warning (conditional) ├── Main Area (two columns) │ Left: Live Transcript (auto-scroll) │ [interim segment — visually muted] │ [final segment — speaker label or person name] │ Right: Live Sidebar │ ├── Talking Points (draft badge) │ │ scrollable list, latest at top │ └── Tasks │ task input field │ scrollable list (user vs system indicator) └── [Stop Recording] button ``` ### Meeting Playback View [#meeting-playback-view] ``` [Meeting Detail — After Recording] ├── Audio Player (fixed header) │ ◀ 15s | ▶ Play | ▶▶ 15s | 01:23 / 42:17 | speed | scrub bar ├── Main Area (two columns) │ Left: Transcript │ [segment — highlighted when current, click to seek] │ speaker label or resolved person name per segment │ Right: Post-Meeting Sidebar │ ├── Talking Points (draft → final badge) │ ├── Topics │ └── Tasks ``` # Audio (/docs/work/meeting-recording/tdd/contracts/audio) # Audio [#audio] Audio chunks flow from the browser through Core to ML as binary WebSocket frames. This page covers the frame formats for both hops, chunk-based GCS storage, ML acknowledgement, and backpressure signalling. For the recording lifecycle (start/stop/resume commands and events), see [Recording](recording). For shared connection semantics, see [Infrastructure](infrastructure). ## Browser → Core: Binary Audio Frame [#browser--core-binary-audio-frame] Audio chunks are sent as binary WebSocket frames using a length-prefixed metadata envelope followed by raw audio bytes. ```text uint32_be metadata_length utf8_json metadata raw_audio_bytes ``` Metadata schema: ```json { "type": "com.wordloop.recording.audio_chunk.v1", "id": "chunk-event-uuid", "traceparent": "00-...", "meeting_id": "meeting-uuid", "sequence": 1842, "started_at_ms": 184200, "duration_ms": 100, "mime_type": "audio/webm", "crc32": "hex-encoded-crc32" } ``` Core verifies the CRC32 checksum, stores the chunk by sequence number in GCS, enriches the metadata with `ml_session_id`, forwards the frame to ML over the ML WebSocket, and records the highest contiguous sequence. Duplicate sequences are acknowledged but not re-stored. ## Chunk-Based GCS Storage [#chunk-based-gcs-storage] Each audio chunk is stored as a separate GCS object keyed by sequence number: `meetings/{id}/chunks/{seq:08d}.webm`. WebM encodes its EBML header in the first chunk; subsequent chunks contain raw Cluster data. This structure enables gap recovery — any chunk missed due to a connectivity failure can be backfilled from OPFS by sequence number. At session end, Core composes the chunk objects into the final `audio.webm` using GCS Compose — hierarchically in groups of ≤32 for recordings that exceed GCS's 32-object compose limit. ## OPFS Shadow Buffer [#opfs-shadow-buffer] Every audio chunk is simultaneously written to an always-on shadow buffer maintained by a dedicated Web Worker using the Origin Private File System (OPFS) `createSyncAccessHandle()` API. Each chunk carries a monotonically incrementing sequence number assigned in the browser. This buffer runs unconditionally — it captures audio regardless of Core or GCS connectivity. It is cleared only after Core confirms all chunks are safely in GCS. ### OPFS Chunk Storage Format [#opfs-chunk-storage-format] Each chunk is stored in OPFS with an integrity envelope so corrupted chunks can be detected during gap recovery: ```text uint32_be crc32 uint32_be audio_length raw_audio_bytes ``` The CRC32 is computed over the raw audio bytes. On read (during gap recovery), the reader verifies the CRC32 before uploading. Chunks that fail verification are skipped — the post-meeting batch transcription will handle any resulting audio gaps. ## Core → ML: Binary Audio Frame [#core--ml-binary-audio-frame] Core enriches the browser's binary frame with `ml_session_id` before forwarding to ML. The binary framing structure is identical (length-prefixed metadata + raw audio), but the metadata schema differs from the browser→Core frame. ```text uint32_be metadata_length utf8_json metadata raw_audio_bytes ``` Metadata schema: ```json { "type": "com.wordloop.ml.audio_chunk.v1", "id": "chunk-event-uuid", "traceparent": "00-...", "meeting_id": "meeting-uuid", "ml_session_id": "ml-session-uuid", "sequence": 1842, "started_at_ms": 184200, "duration_ms": 100, "mime_type": "audio/webm", "crc32": "hex-encoded-crc32" } ``` ML acknowledges processed audio progress through `AudioChunkAckEvent`, not per-frame WebSocket acks. This avoids chatty acknowledgements while still letting Core detect lag. ## ML → Core: `AudioChunkAckEvent` [#ml--core-audiochunkackevent] Reports processed audio progress. Core uses this for diagnostics and backpressure decisions. ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-ml/ws", "type": "com.wordloop.ml.audio_chunk.ack.v1", "time": "2026-05-01T09:03:05Z", "traceparent": "00-...", "data": { "meeting_id": "meeting-uuid", "last_sequence_received": 1842, "last_sequence_processed": 1841 } } ``` *** ## Backpressure [#backpressure] ### ML → Core: `BackpressureEvent` [#ml--core-backpressureevent] Tells Core that ML is falling behind. Core continues storing audio to GCS and may degrade live insights while preserving the recording. ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-ml/ws", "type": "com.wordloop.ml.backpressure.v1", "time": "2026-05-01T09:05:00Z", "traceparent": "00-...", "data": { "meeting_id": "meeting-uuid", "reason": "provider_latency", "retry_after_ms": 1000, "queue_depth": 128 } } ``` ### ML → Core: `BackpressureClearedEvent` — **New** [#ml--core-backpressureclearedevent--new] Explicitly signals that ML has recovered from backpressure. Without this, Core must infer recovery from the absence of further `BackpressureEvent` messages or from `AudioChunkAckEvent` progress, which makes Core's state machine ambiguous. ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-ml/ws", "type": "com.wordloop.ml.backpressure_cleared.v1", "time": "2026-05-01T09:05:30Z", "traceparent": "00-...", "data": { "meeting_id": "meeting-uuid", "queue_depth": 0 } } ``` *** ## Client-Side Backpressure [#client-side-backpressure] Core does not send an explicit backpressure event to the browser. Instead, the client monitors `WebSocket.bufferedAmount` on the Core-facing connection. If `bufferedAmount` exceeds a configurable threshold (default: 5 MB), the client pauses `MediaRecorder` output and queues chunks in the OPFS shadow buffer only. When `bufferedAmount` drops below the resume threshold (default: 1 MB), the client resumes sending. This uses the browser's native WebSocket flow control rather than adding a custom protocol-level backpressure mechanism. # Contracts (/docs/work/meeting-recording/tdd/contracts) # Contracts [#contracts] These contracts define the complete API surface for the Meeting Recording bet. Each page covers one entity end-to-end: its resource shape, REST operations, WebSocket events, ML integration, and Pub/Sub triggers — so you can understand how a single concept works across all protocols without jumping between files. For shared concerns that apply across all entities — connection semantics, authentication, error format, CloudEvents envelope, Pub/Sub configuration, and failure modes — see [Infrastructure](./infrastructure). **Specification alignment:** existing machine-readable specs live in `specs/core-openapi.json`, `specs/ml-openapi.json`, `specs/core-asyncapi-ws.yaml`, and `specs/core-asyncapi-pubsub.yaml`. These contract pages describe the ideal-state surface — both existing endpoints and new additions for live recording. New endpoints and fields introduced by this bet are marked with **New**. *** ## Entity Pages [#entity-pages] | Entity | What it covers | | -------------------------------- | --------------------------------------------------------------------------------------------------------------- | | [Meeting](./meeting) | Top-level resource. CRUD, expand parameter, speaker-label assignment, audio playback URL. | | [Recording](./recording) | Live recording lifecycle. WebSocket commands and events, ML session management, OPFS gap repair, Pub/Sub drain. | | [Audio](./audio) | Binary audio transport. Frame formats (browser→Core, Core→ML), chunk storage, acknowledgement, backpressure. | | [Transcription](./transcription) | Transcript segments. CRUD, live streaming events, batch processing, ML write-back, Pub/Sub trigger. | | [Synthesis](./synthesis) | ML-generated artefacts. Summary, topics, talking points. Read endpoints and ML write-back. | | [Task](./task) | Action items. Full CRUD with sub-task nesting. User-created and system-generated (ML). | | [Person](./person) | Speaker identity. CRUD, speaker identification pipeline, voice profiles, ML matching events. | *** ## ML → Core Write-Back Summary [#ml--core-write-back-summary] ML writes durable meeting artefacts to Core REST, not to the browser and not directly to Core's database. Each write-back endpoint is documented on its entity page. | ML output | Core REST target | Entity page | | ---------------------------- | ------------------------------------ | -------------------------------- | | Headline | `PATCH /meetings/{id}` | [Meeting](./meeting) | | Live transcript append | `POST /transcriptions/{id}/segments` | [Transcription](./transcription) | | Batch transcript replacement | `PUT /transcriptions/{id}/segments` | [Transcription](./transcription) | | Transcription lifecycle | `PATCH /transcriptions/{id}/status` | [Transcription](./transcription) | | Live talking point | `POST /meetings/{id}/talking-points` | [Synthesis](./synthesis) | | Final synthesis | `PUT /meetings/{id}/synthesis` | [Synthesis](./synthesis) | | System-generated tasks | `POST /tasks` | [Task](./task) | *** ## Open Problem: ML→Core Write-Back Resilience [#open-problem-mlcore-write-back-resilience] If Core is unavailable when ML finishes post-meeting processing, ML cannot deliver results (transcript segments, synthesis, status updates). Today, ML retries with exponential backoff against Core REST. If Core remains down beyond the retry budget, the results are lost. This is documented as a separate [problem statement](/docs/work/problem-statements/ml-writeback-resilience) for a future bet. Potential approaches include ML publishing results to Pub/Sub as a durable fallback, or persisting results to its own store for later delivery. # Infrastructure (/docs/work/meeting-recording/tdd/contracts/infrastructure) # Infrastructure [#infrastructure] Shared semantics that apply across all entity contracts. Each entity page ([Meeting](meeting), [Recording](recording), [Audio](audio), [Transcription](transcription), [Synthesis](synthesis), [Task](task), [Person](person)) documents its own endpoints and events but relies on the conventions defined here. *** ## Core REST Semantics [#core-rest-semantics] | Concern | Contract | | ---------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Auth | User-facing calls require `bearerAuth` (Clerk JWT). ML write-back uses service-to-service auth (signed JWT or mTLS). | | User scoping | All resources are implicitly scoped to the authenticated user's `sub` claim. Queries return only that user's data; mutations on another user's resource return `403`. `user_id` never appears in request or response bodies — it is derived from the token. Service-auth calls include `user_id` in the request body when acting on behalf of a user. | | Trace context | All requests accept `traceparent` and `tracestate` headers. Core propagates trace context into WebSocket and Pub/Sub envelopes. | | Idempotency | All `POST` requests require `Idempotency-Key: `. Retried requests return the original result with the same status code. | | Echo suppression | User mutations accept `Client-Session-Id: `. WebSocket echoes caused by that client carry `sourceClientId` so the origin tab can discard them. | | Errors | All errors return `application/problem+json` with RFC 9457 fields: `type`, `title`, `status`, `detail`, `instance`, and optional field-level `errors[]`. | | Pagination | Cursor-based. Requests accept `cursor` and `limit` (default 20, max 100). Responses include `next_cursor`. | | Location header | All `201 Created` responses include a `Location` header pointing to the new resource. | | Rate limits | User-facing responses include `RateLimit-Limit`, `RateLimit-Remaining`, and `RateLimit-Reset`. | *** ## ML REST Semantics [#ml-rest-semantics] | Concern | Contract | | ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------- | | Auth | Service-to-service auth only. Core is the normal caller. Browser credentials are never accepted. | | Trace context | `traceparent` and `tracestate` accepted on every request and copied into downstream provider calls (AssemblyAI, OpenAI). | | Idempotency | Creating or draining sessions requires `Idempotency-Key: `. | | Errors | `application/problem+json` with RFC 9457 fields. Validation errors include field-level `errors[]`. | | Timeouts | Session create: 20 seconds. Drain: 30 seconds before returning `202 Accepted`. Voice operations: 30 seconds. | | PII/audio handling | Raw audio is not persisted by ML unless explicitly part of a voice-profile enrichment operation. Live audio durability belongs to Core/GCS. | | Location header | All `201 Created` responses include a `Location` header. | *** ## Browser WebSocket Connection [#browser-websocket-connection] Core owns the only browser-facing WebSocket. | Concern | Contract | | --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Endpoint | `GET /ws` upgrade | | Auth | `token=` query parameter, or `Authorization: Bearer ` when the edge supports forwarding headers | | Client identity | `client_session_id=` query parameter; copied into `sourceClientId` on echoes caused by that client | | Replay cursor | `last_event_id=` optional. Core replays durable entity-change events after this cursor when the replay buffer has not expired. | | Replay buffer | Core retains the last **5 minutes** of durable events per user. If the client reconnects after the buffer has expired, it must do a full state re-fetch via REST. Core signals this by sending a `ReplayExpiredEvent` instead of replaying events. | | Message size | JSON text frames: 64 KiB. Binary audio frames: 1 MiB. | | Keepalive | Native WebSocket ping every 30 seconds; two missed pongs terminate the connection. | | Load balancing | Live recording requires affinity to one Core pod for the life of the socket. The ideal edge is Layer 4 or an equivalent connection-stable route. This is a known constraint — see the [backplane problem statement](/docs/work/problem-statements/backplane). | ### Real-Time Pattern [#real-time-pattern] **Optimistic Mutation + Echo-Suppressed Streaming** for entity mutations. **Bidirectional recording streaming** for audio. REST remains the source of truth for writes; WebSocket events keep every open tab in sync and deliver low-latency recording artefacts. ### CloudEvents Envelope [#cloudevents-envelope] Every text frame is a CloudEvents v1.0 structured JSON event. ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-core/ws", "type": "com.wordloop.entity.changed.v1", "time": "2026-05-01T09:00:00Z", "traceparent": "00-...", "tracestate": "vendor=value", "sourceClientId": "client-session-uuid", "data": {} } ``` `sourceClientId` is present only when the event was caused by a specific UI session. The origin client discards matching echoes; other tabs and devices apply the event. *** ## ML WebSocket Connection [#ml-websocket-connection] Core opens one WebSocket per ML live session after `POST /meetings/{id}/live-session` returns `websocket_url`. The browser never connects to ML directly — Core bridges the ML WebSocket to the browser WebSocket. | Concern | Contract | | ------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- | | Endpoint | `GET /meetings/{meeting_id}/live-session/stream` upgrade | | Caller | Core only | | Auth | Service bearer token or mTLS identity. Browser credentials are never accepted. | | Trace context | Initial handshake includes `traceparent`; every CloudEvent also carries `traceparent`. | | Replay cursor | Core may reconnect with `last_ml_event_id` and `last_audio_sequence`. ML de-duplicates audio by sequence and resumes output after the cursor when possible. | | Keepalive | Native WebSocket ping every 30 seconds. Either side may close with code `1012` for service restart. | ### Text Frame Envelope [#text-frame-envelope] All text frames are CloudEvents v1.0 structured JSON. ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-ml/ws", "type": "com.wordloop.ml.transcript.segment.v1", "time": "2026-05-01T09:03:05Z", "traceparent": "00-...", "data": {} } ``` *** ## Cache Invalidation [#cache-invalidation] ### `EntityChangedEvent` [#entitychangedevent] Generic cache-invalidation signal for single-entity mutations. For bulk operations (transcript replacement, synthesis update), entity pages define specific event types. ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-core/ws", "type": "com.wordloop.entity.changed.v1", "time": "2026-05-01T09:05:00Z", "traceparent": "00-...", "sourceClientId": "client-session-uuid", "data": { "entity": "meeting", "action": "updated", "id": "meeting-uuid", "version": 42 } } ``` Valid entities: `meeting`, `person`, `task`, `note`, `transcription`, `transcript_segment`, `talking_point`, `synthesis`, `speaker_state`. Valid actions: `created`, `updated`, `deleted`. ### `ReplayExpiredEvent` — **New** [#replayexpiredevent--new] Sent when the client reconnects with a `last_event_id` that is older than the replay buffer (5 minutes). The client must do a full state re-fetch via REST. ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-core/ws", "type": "com.wordloop.replay.expired.v1", "time": "2026-05-01T09:35:00Z", "traceparent": "00-...", "data": { "last_event_id": "stale-event-uuid", "buffer_ttl_seconds": 300, "message": "Replay buffer expired. Full state re-fetch required." } } ``` *** ## Browser Reconnection Rules [#browser-reconnection-rules] | Scenario | Contract | | --------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Browser loses socket (\< 5 min) | App reconnects with `last_event_id`. Core replays buffered events. If a recording is active, app also sends `ResumeRecordingCommand`. | | Browser loses socket (> 5 min) | Core sends `ReplayExpiredEvent`. App does a full REST re-fetch. If a recording is active, app sends `ResumeRecordingCommand` for gap recovery. | | Audio frames duplicated after reconnect | Core de-duplicates by `(meeting_id, sequence)` and checksum. | | ML stream drops but Core socket remains | Core emits `RecordingErrorEvent { code: "ml_unavailable", severity: "degraded" }`; audio still writes to GCS. On recovery, emits `RecordingErrorEvent { code: "ml_recovered" }`. | | Core drains for deploy | Core sends `RecordingErrorEvent { code: "backpressure" }` or closes after ping timeout; OPFS gap repair restores missing chunks on reconnect. | *** ## Pub/Sub Semantics [#pubsub-semantics] Pub/Sub is for durable asynchronous work — not the live path. Live audio and ML outputs use WebSockets. Pub/Sub coordinates post-meeting processing, session termination/drain, and retryable background jobs. Individual topics are documented on their entity pages ([Transcription](transcription), [Recording](recording)). All Pub/Sub payloads are CloudEvents v1.0 JSON. ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-core/pubsub", "type": "com.wordloop.transcription.requested.v1", "time": "2026-05-01T10:00:00Z", "traceparent": "00-...", "tracestate": "vendor=value", "data": {} } ``` | Concern | Contract | | ------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Delivery | At least once. Consumers must de-duplicate by CloudEvents `id` and business idempotency keys. | | Ordering key | `meeting_id` for all topics. Ensures events for the same meeting are processed in order within a single subscriber. | | Publishing | Core publishes through a transactional outbox — the event is written to an outbox table within the same database transaction as the state change, then delivered by a background relay. This guarantees at-least-once delivery without two-phase commit. | | Traceability | `traceparent` is required whenever the originating HTTP/WebSocket request carried one. | ### Dead-Letter and Retry Configuration [#dead-letter-and-retry-configuration] | Setting | Value | Rationale | | ------------------------ | --------------------------------------- | ------------------------------------------------- | | Max delivery attempts | 10 | Covers transient failures without infinite retry. | | Initial backoff | 1 second | Fast retry for network blips. | | Max backoff | 600 seconds (10 min) | Caps exponential growth. | | Backoff multiplier | 2 | Standard exponential. | | Dead-letter topic suffix | `-dlq` (e.g., `transcription-jobs-dlq`) | One DLQ per source topic. | | DLQ retention | 14 days | Enough time for manual investigation and replay. | | Ack deadline | 600 seconds | Long enough for batch transcription jobs. | When a message exhausts its retry budget, Pub/Sub forwards it to the dead-letter topic. The DLQ subscription has no automatic consumers — an operator (or future automated triage) reviews and replays failed messages. *** ## ML Stream Health [#ml-stream-health] ### `StreamWarningEvent` [#streamwarningevent] Reports recoverable ML-side degradation that doesn't rise to backpressure. For audio-specific backpressure events, see [Audio](audio). ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-ml/ws", "type": "com.wordloop.ml.stream.warning.v1", "time": "2026-05-01T09:06:00Z", "traceparent": "00-...", "data": { "meeting_id": "meeting-uuid", "code": "insight_warning", "message": "Talking points are delayed; transcription continues." } } ``` *** ## ML Failure Semantics [#ml-failure-semantics] | Failure | Contract | | ------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | ML WebSocket disconnects | Core reconnects with `last_audio_sequence` and `last_ml_event_id`. ML de-duplicates audio and resumes output when possible. Core sends `StreamStartEvent` with current speaker states and voice profiles on every reconnect. | | ML cannot reconnect | Core continues browser audio capture and GCS chunk storage, then emits Core `RecordingErrorEvent { code: "ml_unavailable" }`. | | Upstream transcription provider slows down | ML emits `BackpressureEvent`; Core preserves audio and may pause live insights. ML emits `BackpressureClearedEvent` on recovery. | | Speaker state changes while disconnected | Core persists the state to the database. On reconnect, Core sends `StreamStartEvent` with all current speaker states — ML reconstructs its in-memory map without needing a pull endpoint. | | ML pod restarts mid-session | Core detects the WebSocket drop and reconnects (possibly to a new pod). `StreamStartEvent` includes speaker states and voice profiles. ML fetches recent transcript segments from `GET /transcriptions/{id}/segments?after_ms=...` to rebuild its LLM context window, then resumes processing. Context quality degrades gracefully — the rolling buffer rebuilds over subsequent segments. | | Drain exceeds budget | ML returns REST `202 Accepted` status and later emits write-back results via Core REST as background completion finishes. | *** ## Event Versioning Policy [#event-versioning-policy] All CloudEvents types use a `.v1` suffix (e.g., `com.wordloop.recording.start.v1`). The versioning policy: * **Additive changes** (new optional fields, new event types) do not require a version bump. Consumers must ignore unknown fields. * **Breaking changes** (removed fields, changed semantics, changed required fields) require a new version suffix (`.v2`). The old type continues to be emitted alongside the new type for one release cycle to allow consumer migration. * **Deprecation**: A deprecated event type is annotated in the contract docs but continues to fire until all known consumers have migrated. Consumers should be written defensively: parse known fields, ignore unknown fields, and tolerate missing optional fields. *** ## Observability Conventions [#observability-conventions] Every service must include the following fields in structured log output for any operation related to a live recording session: | Field | When present | Source | | ------------------ | ----------------------- | ------------------------------------------ | | `meeting_id` | Always | From the request or event | | `ml_session_id` | During active recording | From `RecordingStartedEvent` or ML session | | `sequence` | Audio chunk operations | From the chunk metadata | | `transcription_id` | Transcript operations | From the transcription resource | | `traceparent` | Always | From the incoming request/event | These fields enable correlation of a single audio chunk or transcript segment across App → Core → ML → AssemblyAI → ML → Core → App, plus GCS writes and Pub/Sub messages. *** ## Recording Event History [#recording-event-history] Core persists a `recording_event_history` table that logs every recording state transition and significant event: | Column | Type | Description | | ------------- | ----------- | ----------------------------------------------------------------------------------------- | | `id` | UUID | Event ID | | `meeting_id` | UUID | Meeting reference | | `event_type` | text | e.g., `started`, `stopped`, `error`, `gap_upload`, `compose_started`, `compose_completed` | | `from_status` | text | Previous recording status (nullable for initial events) | | `to_status` | text | New recording status | | `metadata` | jsonb | Event-specific data (error codes, sequence numbers, chunk counts) | | `created_at` | timestamptz | When the event occurred | This table is write-only during normal operation. It is the primary diagnostic tool for investigating recording issues in production. # Meeting (/docs/work/meeting-recording/tdd/contracts/meeting) # Meeting [#meeting] A meeting is the top-level entity. It represents a conversation — whether captured live, uploaded as a file, or created as ad-hoc notes. A meeting owns its transcription, synthesis, tasks, and audio. Recording is available via `?expand=recording` for meetings that have one. For shared concerns that apply across all entities — authentication, error format, idempotency, echo suppression — see [Infrastructure](infrastructure). For the full recording lifecycle — commands, events, binary audio, ML session, and gap repair — see [Recording](recording). *** ## Resource Shape [#resource-shape] ```json { "id": "meeting-uuid", "title": "Weekly Product Review", "headline": "Rollout plan review", "source_type": "live", "start_time": "2026-05-01T09:00:00Z", "end_time": "2026-05-01T10:00:00Z", "created_at": "2026-05-01T08:59:00Z", "attendees": [], "notes": "## Action Items\n- Follow up with design team\n- Review rollout plan", "transcription": { "id": "transcription-uuid", "status": "completed" }, "synthesis": { "summary": "The team aligned on rollout sequencing.", "topics": [], "talking_points": [] } } ``` `headline` is auto-generated by ML from the meeting content and present on all meetings regardless of source type. It is included in the compact list shape so it can be displayed when listing meetings. ML writes it via `PATCH /meetings/{id}` with service auth. **Expand parameter:** `GET /meetings/{id}` supports `?expand=transcription,synthesis,tasks,attendees,recording` to control which nested resources are included. Without expansion, only the top-level fields and summary references (e.g., `transcription.id`, `transcription.status`) are returned. `recording` is expand-only — it is never present in the default or compact shapes. List endpoints (`GET /meetings`) always return the compact form. **Expanded: recording** — when `?expand=recording` is included and the meeting has a recording: ```json { "recording": { "status": "completed", "started_at": "2026-05-01T09:00:00Z", "stopped_at": "2026-05-01T10:00:00Z", "stop_reason": "user_requested", "last_received_sequence": 36000, "audio_available": true } } ``` If the meeting has no recording, the `recording` field is `null` even when expanded. Valid `source_type` values: `live`, `upload`, `text`, `anecdotal`. *** ## REST API [#rest-api] ### `POST /meetings` [#post-meetings] | | | | -------------------- | ------------------------------------------------------------------------ | | **Auth** | `bearerAuth` | | **Idempotency** | Required | | **Echo suppression** | `Client-Session-Id` optional | | **Response** | `201 Created` with `Meeting` + `Location: /meetings/{id}` | | **Side effects** | Broadcasts `EntityChangedEvent { entity: "meeting", action: "created" }` | ```json { "title": "Weekly Product Review", "source_type": "live", "start_time": "2026-05-01T09:00:00Z" } ``` ### `GET /meetings` [#get-meetings] | | | | ---------------- | ---------------------------------------------------------------- | | **Auth** | `bearerAuth` | | **Response** | `200 MeetingList` | | **Query params** | `cursor`, `limit`, `has_active_recording` **New**, `source_type` | The compact list shape includes: `id`, `title`, `headline`, `source_type`, `start_time`, `end_time`, `created_at`, `attendees` (compact: id + display\_name only), and `transcription` (compact: id + status only). Recording, synthesis, and tasks are never included in the list shape — use the detail endpoint with `?expand` to fetch them. **New:** `has_active_recording=true` filters to meetings with an active live recording — the app uses this as the read-only guard for disabling **Start Live Recording**. ### `GET /meetings/{id}` [#get-meetingsid] | | | | ---------------- | ------------------------------------------------------------------------------------------- | | **Auth** | `bearerAuth` | | **Response** | `200 Meeting` | | **Query params** | `expand` (comma-separated: `transcription`, `synthesis`, `tasks`, `attendees`, `recording`) | | **Errors** | `404` meeting not found | ### `PATCH /meetings/{id}` [#patch-meetingsid] | | | | -------------------- | ------------------------------------------------------------------------ | | **Auth** | `bearerAuth` or service auth | | **Echo suppression** | `Client-Session-Id` optional | | **Response** | `200 Meeting` | | **Side effects** | Broadcasts `EntityChangedEvent { entity: "meeting", action: "updated" }` | User update: ```json { "notes": "## Action Items\n- Follow up with design team" } ``` ML write-back (service auth): ```json { "headline": "Rollout plan review" } ``` ### `DELETE /meetings/{id}` [#delete-meetingsid] | | | | ---------------- | ------------------------------------------------------------------------ | | **Auth** | `bearerAuth` | | **Response** | `204 No Content` | | **Errors** | `409` active recording in progress | | **Side effects** | Broadcasts `EntityChangedEvent { entity: "meeting", action: "deleted" }` | ### `POST /meetings/{id}/speaker-labels` — **New** [#post-meetingsidspeaker-labels--new] | | | | -------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Auth** | `bearerAuth` | | **Idempotency** | Required | | **Echo suppression** | `Client-Session-Id` optional | | **Response** | `200 SpeakerLabelAssignment` | | **Errors** | `404` meeting not found; `422` person not found; `422` speaker label not present in the meeting | | **Side effects** | Broadcasts `EntityChangedEvent { entity: "transcript_segment", action: "updated" }`. If a live session is active, sends `SpeakerStateUpdatedEvent` to ML over the ML WebSocket. | Request: ```json { "speaker_label": "speaker_1", "person_id": "person-uuid" } ``` Response: ```json { "meeting_id": "meeting-uuid", "speaker_label": "speaker_1", "person_id": "person-uuid", "state": "manual", "updated_segment_count": 27 } ``` For how speaker labels feed the identification pipeline, see [Person & Speaker Identity](person). ### `GET /meetings/{id}/audio-url` — **New** [#get-meetingsidaudio-url--new] | | | | ----------------- | ------------------------------------------------------------------- | | **Auth** | `bearerAuth` | | **Response** | `200 AudioPlaybackUrl` | | **Cache-Control** | `private, no-store` | | **Errors** | `404` meeting not found; `404` audio still composing or unavailable | ```json { "url": "https://storage.googleapis.com/signed-url", "expires_at": "2026-05-01T10:00:00Z", "mime_type": "audio/webm", "duration_ms": 3610000 } ``` # Person & Speaker Identity (/docs/work/meeting-recording/tdd/contracts/person) # Person & Speaker Identity [#person--speaker-identity] People are speaker identities. They can be referenced by tasks (assignee) and transcript segments (speaker attribution). This page covers person CRUD, the speaker identification pipeline that resolves anonymous diarisation labels to known people, and voice profile management. For shared semantics, see [Infrastructure](infrastructure). **User-scoped identity:** People are scoped to the authenticated user. Each user maintains their own set of people — there is no cross-user sharing of person records or voice profiles. If User A records a meeting with Person X, and User B later records with the same real-world person, User B must create their own Person record. Voice profile enrichment applies only within the owning user's data. This is a deliberate simplification for v1; organisation-level identity sharing is out of scope. ## Resource Shape [#resource-shape] ```json { "id": "person-uuid", "display_name": "Avery Chen", "full_name": "Avery Chen", "title": "Product Manager", "role": "Product", "company": "WordLoop", "email": "avery@example.com", "voice_confidence": 0.91, "voice_model_status": "ready", "tags": ["team-alpha"], "created_at": "2026-04-15T10:00:00Z", "updated_at": "2026-05-01T09:15:00Z" } ``` Valid `voice_model_status` values: `untrained`, `training`, `ready`, `failed`. ## REST API [#rest-api] ### `GET /people` [#get-people] Lists people for the authenticated user. Used for the speaker-labelling autocomplete. | | | | ---------------- | --------------------------------------------- | | **Auth** | `bearerAuth` | | **Response** | `200 PersonList` | | **Query params** | `cursor`, `limit`, `q` (search by name/email) | ### `POST /people` [#post-people] Creates a person. Used during speaker labelling when the user adds a new person. | | | | ---------------- | ----------------------------------------------------------------------- | | **Auth** | `bearerAuth` | | **Idempotency** | Required | | **Response** | `201 Created` with `Person` + `Location: /people/{id}` | | **Side effects** | Broadcasts `EntityChangedEvent { entity: "person", action: "created" }` | ```json { "display_name": "Avery Chen", "full_name": "Avery Chen", "email": "avery@example.com" } ``` ### `GET /people/{id}` [#get-peopleid] Returns a single person. | | | | ------------ | ---------------------- | | **Auth** | `bearerAuth` | | **Response** | `200 Person` | | **Errors** | `404` person not found | ### `PATCH /people/{id}` [#patch-peopleid] Updates person metadata. | | | | ---------------- | ----------------------------------------------------------------------- | | **Auth** | `bearerAuth` | | **Response** | `200 Person` | | **Side effects** | Broadcasts `EntityChangedEvent { entity: "person", action: "updated" }` | ### `DELETE /people/{id}` [#delete-peopleid] Deletes a person. Transcript segments retain the `speaker_label` but clear the `person_id`. | | | | ---------------- | ----------------------------------------------------------------------- | | **Auth** | `bearerAuth` | | **Response** | `204 No Content` | | **Side effects** | Broadcasts `EntityChangedEvent { entity: "person", action: "deleted" }` | *** ## Speaker Identification Pipeline [#speaker-identification-pipeline] During a live recording, AssemblyAI produces diarised transcript segments with anonymous labels (`speaker_1`, `speaker_2`). ML resolves these to known people through voice embedding comparison. The pipeline has four states: | State | Behaviour | | ----------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `unmatched` | Compare this segment's embedding against in-session voice profiles (pushed by Core). If confidence exceeds the threshold → transition to `matched`. Otherwise, increment attempts and retry on the next segment from this speaker. | | `matched` | The speaker label is locked to a person. All future segments from this speaker are tagged immediately — no further voice comparison needed. | | `exhausted` | After N failed attempts (configurable, e.g. 5 segments), stop comparing for this speaker. The raw `speaker_label` is preserved. The user can manually resolve it. | | `manual` | Set when the user labels a speaker via `POST /meetings/{id}/speaker-labels` (see [Meeting](meeting)). Takes precedence over voice matching — ML will not attempt to match this speaker regardless of voice similarity. | Manual speaker labelling is documented on the [Meeting](meeting) page. The REST fallback for pushing speaker state to ML during session recovery is documented on the [Recording](recording) page (`POST /meetings/{id}/live-session/speaker-states`). *** ## ML Integration [#ml-integration] ### Core → ML [#core--ml] #### WebSocket: `SpeakerStateUpdatedEvent` [#websocket-speakerstateupdatedevent] Keeps ML aligned with user speaker-label changes during the live session. `manual` state takes precedence over voice matching. ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-core/ml-ws", "type": "com.wordloop.ml.speaker_state.updated.v1", "time": "2026-05-01T09:15:00Z", "traceparent": "00-...", "data": { "meeting_id": "meeting-uuid", "speaker_label": "speaker_1", "state": "manual", "person_id": "person-uuid" } } ``` #### WebSocket: `VoiceProfilesUpdatedEvent` [#websocket-voiceprofilesupdatedevent] Refreshes the in-session voice profile cache when Core enrolls or updates a profile while a recording is active. ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-core/ml-ws", "type": "com.wordloop.ml.voice_profiles.updated.v1", "time": "2026-05-01T09:16:00Z", "traceparent": "00-...", "data": { "meeting_id": "meeting-uuid", "profiles": [ { "person_id": "person-uuid", "embedding_model": "ecapa-tdnn-v1", "embedding": [0.12, -0.34] } ] } } ``` ### ML → Core [#ml--core] #### WebSocket: `SpeakerMatchProducedEvent` [#websocket-speakermatchproducedevent] Reports a confident speaker-to-person match. Core updates all matching segments and persists `meeting_speaker_states` as `matched`. ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-ml/ws", "type": "com.wordloop.ml.speaker_match.v1", "time": "2026-05-01T09:04:00Z", "traceparent": "00-...", "data": { "meeting_id": "meeting-uuid", "speaker_label": "speaker_1", "person_id": "person-uuid", "score": 0.93, "threshold": 0.88, "state": "matched" } } ``` #### WebSocket: `SpeakerExhaustedEvent` [#websocket-speakerexhaustedevent] Tells Core that ML has stopped trying to match an unknown speaker after the bounded attempt count. ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-ml/ws", "type": "com.wordloop.ml.speaker_exhausted.v1", "time": "2026-05-01T09:08:00Z", "traceparent": "00-...", "data": { "meeting_id": "meeting-uuid", "speaker_label": "speaker_2", "attempt_count": 5, "state": "exhausted" } } ``` *** ## Voice Profile Operations [#voice-profile-operations] Voice profiles power speaker identification. Core stores person records; ML owns embedding extraction and matching semantics. ### `POST /voice-profiles/matches` [#post-voice-profilesmatches] Compares a speaker embedding against enrolled voice profiles. Core supplies candidate profiles explicitly. | | | | ------------ | ---------------------------------------------------------- | | **Auth** | service auth | | **Response** | `200 VoiceMatchResponse` | | **Errors** | `422` invalid embedding; `503` embedding model unavailable | **Request:** ```json { "meeting_id": "meeting-uuid", "speaker_label": "speaker_1", "embedding_model": "ecapa-tdnn-v1", "embedding": [0.12, -0.34], "candidate_person_ids": ["person-uuid"], "top_k": 3 } ``` **Response:** ```json { "matches": [ { "person_id": "person-uuid", "score": 0.93, "threshold": 0.88, "decision": "matched" } ] } ``` ### `POST /voice-profiles` [#post-voice-profiles] Creates or enriches a person's voice profile from post-meeting segment embeddings. | | | | ------------------------ | --------------------------------------------------------------------------------------- | | **Auth** | service auth | | **Idempotency** | Required | | **Request Content-Type** | `multipart/form-data` for audio samples or `application/json` for segment references | | **Response** | `201 Created` or `200 OK` with `VoiceProfile` + `Location: /voice-profiles/{person_id}` | ```json { "person_id": "person-uuid", "meeting_id": "meeting-uuid", "segment_ids": ["segment-uuid"], "embedding_model": "ecapa-tdnn-v1" } ``` ```json { "person_id": "person-uuid", "embedding_model": "ecapa-tdnn-v1", "sample_count": 12, "quality_score": 0.91, "updated_at": "2026-05-01T10:15:00Z" } ``` # Recording (/docs/work/meeting-recording/tdd/contracts/recording) # Recording [#recording] Recording state is a sub-resource of Meeting. This page covers the full lifecycle: starting, stopping, resuming, ML session orchestration, and gap repair. For binary audio frame formats and transport, see [Audio](audio). For shared connection semantics, see [Infrastructure](infrastructure). **Recording creation:** The recording resource is created as a side effect of `StartRecordingCommand` (see WebSocket commands below). There is no `POST /meetings/{id}/recording` endpoint — the recording lifecycle is entirely driven by WebSocket commands. The REST surface provides read-only access to recording state and chunk management for gap recovery. ## Resource Shape [#resource-shape] ```json { "meeting_id": "meeting-uuid", "status": "active", "started_at": "2026-05-01T09:00:00Z", "stopped_at": null, "stop_reason": null, "last_received_sequence": 1842, "missing_sequences": [1801, 1802], "audio_object_prefix": "meetings/meeting-uuid/chunks/", "degraded_reasons": ["ml_unavailable"], "max_duration_seconds": 14400, "ml_session_id": "ml-session-uuid" } ``` Valid statuses: `active`, `stopping`, `composing`, `completed`, `failed`. ## REST API [#rest-api] ### `GET /meetings/{id}/recording` [#get-meetingsidrecording] Returns the recording state for a meeting, including audio-chunk continuity and degradation state. Returns `404` if the meeting has never been recorded. | | | | ----------------- | ------------------------------------------------------------------------ | | **Auth** | `bearerAuth` | | **Response** | `200 MeetingRecording` | | **Cache-Control** | `private, no-store` | | **Errors** | `404` meeting not found or never recorded; `403` belongs to another user | ### `GET /meetings/{id}/recording/missing-chunks` [#get-meetingsidrecordingmissing-chunks] Returns the chunk sequences Core has not durably stored in GCS. The app calls this after reconnect or stop to determine which OPFS chunks to upload. | | | | ----------------- | ---------------------- | | **Auth** | `bearerAuth` | | **Response** | `200 MissingChunkList` | | **Cache-Control** | `private, no-store` | ```json { "meeting_id": "meeting-uuid", "missing_sequences": [1801, 1802], "accepted_mime_types": ["audio/webm"], "max_chunk_bytes": 1048576 } ``` ### `GET /meetings/{id}/recording/chunk-inventory` — **New** (Diagnostic) [#get-meetingsidrecordingchunk-inventory--new-diagnostic] Returns the full chunk inventory for a recording. Admin/diagnostic use only — not called by the app during normal operation. | | | | ----------------- | -------------------- | | **Auth** | service auth | | **Response** | `200 ChunkInventory` | | **Cache-Control** | `private, no-store` | ```json { "meeting_id": "meeting-uuid", "total_chunks_stored": 36000, "highest_contiguous_sequence": 35998, "gaps": [35999, 36000], "total_bytes": 172800000, "composition_status": "pending", "first_chunk_at": "2026-05-01T09:00:00Z", "last_chunk_at": "2026-05-01T10:00:00Z" } ``` ### `POST /meetings/{id}/recording/chunks` [#post-meetingsidrecordingchunks] Uploads OPFS gap chunks. Core verifies `sha256`, de-duplicates by sequence, stores each chunk at `meetings/{meeting_id}/chunks/{sequence}.webm`, and returns the remaining gap set. Only `multipart/form-data` is accepted — no base64-encoded JSON. | | | | ------------------------ | ---------------------------------------------------------------------------- | | **Auth** | `bearerAuth` | | **Idempotency** | Required | | **Request Content-Type** | `multipart/form-data` | | **Response** | `200 GapUploadResult` + `Location: /meetings/{id}/recording` | | **Errors** | `409` audio already composed; `413` chunk too large; `422` checksum mismatch | Each part includes: | Field | Type | Description | | --------------- | ------- | -------------------------------------- | | `sequence` | integer | Monotonic chunk sequence number | | `started_at_ms` | integer | Chunk start offset in milliseconds | | `duration_ms` | integer | Chunk duration in milliseconds | | `mime_type` | string | `audio/webm` | | `sha256` | string | Hex-encoded SHA-256 of the audio bytes | | `audio` | binary | Raw audio chunk bytes | ```json { "meeting_id": "meeting-uuid", "accepted_sequences": [1801], "remaining_missing_sequences": [1802], "last_contiguous_sequence": 1842 } ``` ## Real-Time Events [#real-time-events] ### Browser → Core [#browser--core] #### `StartRecordingCommand` [#startrecordingcommand] Starts a live recording for a meeting. If the user already has an active recording, Core returns `RecordingErrorEvent` with `code: "session_conflict"`. ```json { "specversion": "1.0", "id": "command-uuid", "source": "wordloop-app/ws", "type": "com.wordloop.recording.start.v1", "time": "2026-05-01T09:00:00Z", "traceparent": "00-...", "data": { "meeting_id": "meeting-uuid", "client_recording_id": "browser-generated-uuid", "audio_config": { "encoding": "webm", "sample_rate": 48000, "channels": 1, "chunk_duration_ms": 100 }, "max_duration_seconds": 14400 } } ``` #### `StopRecordingCommand` [#stoprecordingcommand] Stops the recording. The app includes the last sequence written to OPFS so Core can report gaps precisely. ```json { "specversion": "1.0", "id": "command-uuid", "source": "wordloop-app/ws", "type": "com.wordloop.recording.stop.v1", "time": "2026-05-01T10:00:00Z", "traceparent": "00-...", "data": { "meeting_id": "meeting-uuid", "last_client_sequence": 36000, "opfs_manifest_sha256": "hex-encoded-sha256" } } ``` #### `ResumeRecordingCommand` — **New** [#resumerecordingcommand--new] Sent by the app after a WebSocket reconnect during an active recording. Carries the client's last known sequence so Core can report the GCS gap. ```json { "specversion": "1.0", "id": "command-uuid", "source": "wordloop-app/ws", "type": "com.wordloop.recording.resume.v1", "time": "2026-05-01T09:30:00Z", "traceparent": "00-...", "data": { "meeting_id": "meeting-uuid", "last_client_sequence": 18000 } } ``` ### Core → Browser [#core--browser] #### `RecordingStartedEvent` [#recordingstartedevent] Confirms that Core, ML, storage, and transcription pre-warm are ready. The banner transitions from **Connecting** to **Recording**. ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-core/ws", "type": "com.wordloop.recording.started.v1", "time": "2026-05-01T09:00:00Z", "traceparent": "00-...", "data": { "meeting_id": "meeting-uuid", "ml_session_id": "ml-session-uuid", "started_at": "2026-05-01T09:00:00Z", "max_duration_seconds": 14400 } } ``` #### `RecordingResumedEvent` — **New** [#recordingresumedevent--new] Sent in response to `ResumeRecordingCommand`. Tells the app where GCS stands so the app knows which OPFS chunks to upload. ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-core/ws", "type": "com.wordloop.recording.resumed.v1", "time": "2026-05-01T09:30:00Z", "traceparent": "00-...", "data": { "meeting_id": "meeting-uuid", "last_stored_sequence": 17500, "missing_sequences": [17501, 17502], "ml_session_id": "ml-session-uuid" } } ``` #### `GapUploadCompleteEvent` — **New** [#gapuploadcompleteevent--new] Confirms that all gap chunks have been received and stored. The app can clear OPFS and resume normal operation. ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-core/ws", "type": "com.wordloop.recording.gap_upload_complete.v1", "time": "2026-05-01T09:31:00Z", "traceparent": "00-...", "data": { "meeting_id": "meeting-uuid", "last_stored_sequence": 18000 } } ``` #### `RecordingStoppedEvent` [#recordingstoppedevent] Confirms recording has stopped. The client calls `GET /meetings/{id}/recording/missing-chunks` to determine which OPFS gap chunks to upload before audio composition can proceed. ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-core/ws", "type": "com.wordloop.recording.stopped.v1", "time": "2026-05-01T10:00:00Z", "traceparent": "00-...", "data": { "meeting_id": "meeting-uuid", "reason": "user_requested", "last_received_sequence": 35998, "last_client_sequence": 36000, "post_processing_started": true } } ``` Valid reasons: `user_requested`, `duration_limit`, `connection_closed`, `server_shutdown`, `storage_failure`. #### `RecordingDurationWarningEvent` [#recordingdurationwarningevent] Warns the client before server-side auto-stop. ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-core/ws", "type": "com.wordloop.recording.duration_warning.v1", "time": "2026-05-01T12:50:00Z", "traceparent": "00-...", "data": { "meeting_id": "meeting-uuid", "remaining_seconds": 600, "auto_stop_at": "2026-05-01T13:00:00Z" } } ``` #### `RecordingErrorEvent` [#recordingerrorevent] Reports degraded or failed recording conditions. Recoverable conditions include a paired recovery code (e.g., `ml_unavailable` → `ml_recovered`). ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-core/ws", "type": "com.wordloop.recording.error.v1", "time": "2026-05-01T09:10:00Z", "traceparent": "00-...", "data": { "meeting_id": "meeting-uuid", "code": "ml_unavailable", "severity": "degraded", "message": "Live insights paused. Audio is still recording.", "retry_after_ms": 1000 } } ``` Valid codes: `ml_unavailable`, `ml_recovered`, `storage_unavailable`, `storage_recovered`, `insight_warning`, `transcoder_error`, `no_audio_detected`, `session_conflict`, `audio_checksum_mismatch`, `backpressure`. #### `AudioChunkStoredEvent` — **New** [#audiochunkstoredevent--new] Periodically reports the highest contiguous sequence number durably stored in GCS. The client uses this to trim the OPFS shadow buffer during normal operation — without it, OPFS grows unboundedly during long sessions. Core emits this event every 10 seconds (or every 100 chunks, whichever comes first) during an active recording. ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-core/ws", "type": "com.wordloop.recording.audio_chunk_stored.v1", "time": "2026-05-01T09:05:00Z", "traceparent": "00-...", "data": { "meeting_id": "meeting-uuid", "highest_contiguous_sequence": 5000, "total_chunks_stored": 5000 } } ``` ## ML Integration [#ml-integration] ### Core → ML [#core--ml] #### REST: `POST /meetings/{id}/live-session` [#rest-post-meetingsidlive-session] Creates and pre-warms the ML side of a live recording. ML opens the upstream AssemblyAI session, loads speaker states (pushed by Core in the request body), prepares the insight pipeline, and returns the WebSocket URL. | | | | --------------- | ----------------------------------------------------------------------------------------- | | **Auth** | service auth | | **Idempotency** | Required; key maps to `(meeting_id, transcription_id)` | | **Response** | `201 Created` with `MLLiveSession` + `Location: /meetings/{id}/live-session` | | **Errors** | `409` active session already exists for meeting; `503` transcription provider unavailable | **Request:** ```json { "meeting_id": "meeting-uuid", "transcription_id": "transcription-uuid", "user_id": "user-uuid", "audio_config": { "encoding": "webm", "sample_rate": 48000, "channels": 1, "chunk_duration_ms": 100 }, "speaker_states": [ { "speaker_label": "speaker_1", "state": "manual", "person_id": "person-uuid", "attempt_count": 0 } ], "voice_profiles": [ { "person_id": "person-uuid", "embedding_model": "ecapa-tdnn-v1", "embedding": [0.12, -0.34] } ], "insight_policy": { "talking_point_cadence_seconds": 30, "talking_point_cadence_segments": 5, "task_extraction": "live" } } ``` **Response:** ```json { "id": "ml-session-uuid", "meeting_id": "meeting-uuid", "status": "ready", "websocket_url": "wss://ml.internal/meetings/meeting-uuid/live-session/stream", "expires_at": "2026-05-01T13:00:00Z", "max_duration_seconds": 14400 } ``` #### REST: `GET /meetings/{id}/live-session` [#rest-get-meetingsidlive-session] Returns ML's authoritative view of a live session. Core uses this for diagnostics and recovery decisions. | | | | ------------ | ----------------------------------- | | **Auth** | service auth | | **Response** | `200 MLLiveSessionStatus` | | **Errors** | `404` no active session for meeting | ```json { "id": "ml-session-uuid", "meeting_id": "meeting-uuid", "status": "streaming", "last_audio_sequence_received": 1842, "last_audio_sequence_processed": 1841, "last_output_event_id": "event-uuid", "speaker_state_count": 3, "context_buffer_segment_count": 45, "degraded_reasons": [] } ``` Valid statuses: `created`, `ready`, `streaming`, `draining`, `completed`, `failed`, `expired`. #### REST: `POST /meetings/{id}/live-session/drain` [#rest-post-meetingsidlive-sessiondrain] Requests a graceful drain of the live session. Core normally sends `DrainCommand` over the ML WebSocket first; this REST endpoint is the idempotent fallback for control-plane cleanup. Uses POST (not DELETE) to carry the request body and express intent clearly. | | | | --------------- | ------------------------------------------------------------------ | | **Auth** | service auth | | **Idempotency** | Required | | **Response** | `202 Accepted` while draining; `204 No Content` if already drained | | **Errors** | `404` no active session for meeting | ```json { "reason": "user_requested", "last_received_sequence": 35998, "audio_composed_path": "gs://wordloop-audio/meetings/meeting-uuid/audio.webm" } ``` #### REST: `POST /meetings/{id}/live-session/speaker-states` [#rest-post-meetingsidlive-sessionspeaker-states] Pushes a speaker-state update to ML outside the live WebSocket path. The normal path is `SpeakerStateUpdatedEvent` over WebSocket; REST is the fallback when Core recovers a session and needs to reconcile state. For the full speaker identification pipeline, see [Person & Speaker Identity](person). | | | | --------------- | -------------------------------------------------------- | | **Auth** | service auth | | **Idempotency** | Required | | **Response** | `204 No Content` | | **Errors** | `404` no active session; `409` session already completed | ```json { "speaker_label": "speaker_1", "state": "manual", "person_id": "person-uuid", "updated_at": "2026-05-01T09:15:00Z" } ``` #### WebSocket: `StreamStartEvent` [#websocket-streamstartevent] Sent immediately after WebSocket open. Confirms the session and gives ML the replay point for reconnects. **Includes current speaker states and voice profiles** so ML can reconstruct its in-memory state on every connection — initial or reconnect. ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-core/ml-ws", "type": "com.wordloop.ml.stream.start.v1", "time": "2026-05-01T09:00:00Z", "traceparent": "00-...", "data": { "meeting_id": "meeting-uuid", "ml_session_id": "ml-session-uuid", "transcription_id": "transcription-uuid", "last_audio_sequence": 0, "last_ml_event_id": null, "speaker_states": [ { "speaker_label": "speaker_1", "state": "manual", "person_id": "person-uuid", "attempt_count": 0 } ], "voice_profiles": [ { "person_id": "person-uuid", "embedding_model": "ecapa-tdnn-v1", "embedding": [0.12, -0.34] } ] } } ``` #### WebSocket: `DrainCommand` [#websocket-draincommand] Requests a graceful drain. ML closes upstream transcription, flushes final segments, emits `StreamDrainedEvent`, then closes the WebSocket. ```json { "specversion": "1.0", "id": "command-uuid", "source": "wordloop-core/ml-ws", "type": "com.wordloop.ml.recording.drain.v1", "time": "2026-05-01T10:00:00Z", "traceparent": "00-...", "data": { "meeting_id": "meeting-uuid", "reason": "user_requested", "last_audio_sequence": 35998, "audio_composed_path": "gs://wordloop-audio/meetings/meeting-uuid/audio.webm" } } ``` ### ML → Core [#ml--core] #### WebSocket: `StreamReadyEvent` [#websocket-streamreadyevent] Confirms ML has opened the upstream transcription stream and is ready to receive audio. ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-ml/ws", "type": "com.wordloop.ml.stream.ready.v1", "time": "2026-05-01T09:00:01Z", "traceparent": "00-...", "data": { "meeting_id": "meeting-uuid", "ml_session_id": "ml-session-uuid" } } ``` #### WebSocket: `StreamDrainedEvent` [#websocket-streamdrainedevent] Final ML event for a live session. Core can now close the socket, verify final audio, and publish post-meeting work. ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-ml/ws", "type": "com.wordloop.ml.stream.drained.v1", "time": "2026-05-01T10:00:10Z", "traceparent": "00-...", "data": { "meeting_id": "meeting-uuid", "ml_session_id": "ml-session-uuid", "last_audio_sequence_processed": 35998, "final_segment_count": 812, "closed_provider_sessions": ["assemblyai"] } } ``` ## Pub/Sub [#pubsub] ### `meeting.session.terminated.v1` [#meetingsessionterminatedv1] Signals ML to drain and finalise a live streaming session. Core publishes this when the user stops, the duration limit is reached, or the client disappears. | | | | -------------------- | -------------------------------------------- | | **Producer** | Core | | **Consumer** | ML streaming coordinator | | **CloudEvents type** | `com.wordloop.meeting.session.terminated.v1` | | **Ordering key** | `meeting_id` | | **Dead-letter** | `meeting-session-terminated-dlq` | ```json { "ml_session_id": "ml-session-uuid", "meeting_id": "meeting-uuid", "user_id": "user-uuid", "reason": "user_requested", "last_received_sequence": 35998, "audio_storage_prefix": "gs://wordloop-audio/meetings/meeting-uuid/chunks/", "audio_composed_path": "gs://wordloop-audio/meetings/meeting-uuid/audio.webm" } ``` Valid reasons: `user_requested`, `duration_limit`, `connection_closed`, `server_shutdown`, `storage_failure`. # Synthesis (/docs/work/meeting-recording/tdd/contracts/synthesis) # Synthesis [#synthesis] Synthesis artefacts are the ML-generated summary, topics, and talking points for a meeting. During a live session, talking points stream incrementally. After post-meeting processing, synthesis is atomically replaced with the final version. Headline is a separate meeting-level field — see [Meeting](meeting). For shared semantics, see [Infrastructure](infrastructure). ## Resource Shape [#resource-shape] ```json { "summary": "The team aligned on rollout sequencing and follow-up owners.", "topics": [ { "id": "topic-uuid", "name": "Launch readiness", "summary": "Discussed go/no-go criteria for next week's launch.", "is_final": true, "segments": [{ "segment_id": "segment-uuid" }] } ], "talking_points": [ { "id": "talking-point-uuid", "content": "Design review is the next blocker.", "is_final": true, "segments": [{ "segment_id": "segment-uuid" }], "topic_id": "topic-uuid" } ] } ``` ## REST API [#rest-api] ### `GET /meetings/{id}/synthesis` [#get-meetingsidsynthesis] Returns the synthesis artefacts for a meeting. | | | | ------------ | ------------------------------------------------------ | | **Auth** | `bearerAuth` | | **Response** | `200 MeetingSynthesis` | | **Errors** | `404` meeting not found or synthesis not yet generated | ### `PUT /meetings/{id}/synthesis` — ML Write-Back [#put-meetingsidsynthesis--ml-write-back] Atomically replaces final synthesis artefacts: summary, topics, and final talking points. | | | | ---------------- | ---------------------------------------------------------------------------------------------------- | | **Auth** | service auth | | **Idempotency** | Required | | **Response** | `204 No Content` | | **Side effects** | Broadcasts `SynthesisUpdatedEvent` and `EntityChangedEvent { entity: "meeting", action: "updated" }` | ```json { "summary": "The team aligned on rollout sequencing and follow-up owners.", "topics": [ { "title": "Launch readiness", "segment_ids": ["segment-uuid"] } ], "talking_points": [ { "content": "Design review is the next blocker.", "segment_ids": ["segment-uuid"] } ] } ``` ### `GET /meetings/{id}/talking-points` [#get-meetingsidtalking-points] Returns talking points for a meeting. During a live session, includes draft (non-final) talking points. | | | | ------------ | ---------------------- | | **Auth** | `bearerAuth` | | **Response** | `200 TalkingPointList` | ### `POST /meetings/{id}/talking-points` — ML Write-Back [#post-meetingsidtalking-points--ml-write-back] Creates or updates a live talking point emitted by ML. | | | | ---------------- | ------------------------------------------------------------------------------------------------- | | **Auth** | service auth | | **Idempotency** | Required | | **Response** | `201 Created` or `200 OK` with `TalkingPoint` + `Location: /meetings/{id}/talking-points/{tp_id}` | | **Side effects** | Broadcasts `TalkingPointEvent` for live clients; `EntityChangedEvent` for cache revalidation | ```json { "id": "talking-point-uuid", "content": "The rollout plan needs design review before launch.", "segment_ids": ["segment-uuid"], "is_final": false } ``` ### `GET /meetings/{id}/topics` [#get-meetingsidtopics] Returns topics extracted from the meeting transcript. | | | | ------------ | --------------- | | **Auth** | `bearerAuth` | | **Response** | `200 TopicList` | *** ## Real-Time Events [#real-time-events] ### Core → Browser [#core--browser] #### `TalkingPointEvent` [#talkingpointevent] Streams a live talking point without forcing the client to re-fetch synthesis. ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-core/ws", "type": "com.wordloop.meeting.talking_point.v1", "time": "2026-05-01T09:04:00Z", "traceparent": "00-...", "data": { "meeting_id": "meeting-uuid", "talking_point": { "id": "talking-point-uuid", "content": "The rollout plan needs design review before launch.", "segment_ids": ["segment-uuid"], "is_final": false } } } ``` #### `SynthesisUpdatedEvent` — **New** [#synthesisupdatedevent--new] Signals that synthesis artefacts (summary, topics, talking points) have been updated. Fired after `PUT /meetings/{id}/synthesis` completes. Clients should reload via `GET /meetings/{id}/synthesis`. ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-core/ws", "type": "com.wordloop.meeting.synthesis.updated.v1", "time": "2026-05-01T10:05:00Z", "traceparent": "00-...", "data": { "meeting_id": "meeting-uuid", "version": 2 } } ``` *** ## ML Integration [#ml-integration] ### ML → Core [#ml--core] #### WebSocket: `TalkingPointProducedEvent` [#websocket-talkingpointproducedevent] Emits a live talking point from the batched insight pipeline. Talking points and [tasks](task) are extracted by the same LLM structured-output call. ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-ml/ws", "type": "com.wordloop.ml.talking_point.v1", "time": "2026-05-01T09:04:00Z", "traceparent": "00-...", "data": { "meeting_id": "meeting-uuid", "talking_point": { "id": "talking-point-uuid", "content": "The rollout plan needs design review before launch.", "segment_ids": ["segment-uuid"], "is_final": false } } } ``` # Task (/docs/work/meeting-recording/tdd/contracts/task) # Task [#task] Tasks are action items associated with a meeting. They can be user-created or system-generated by ML during live recording. Tasks support nesting (sub-tasks), assignment to people, and due dates. For shared semantics, see [Infrastructure](infrastructure). ## Resource Shape [#resource-shape] ```json { "id": "task-uuid", "meeting_id": "meeting-uuid", "content": "Send the rollout plan to design", "status": "pending", "source": "system", "assigned_to": "person-uuid", "due_date": "2026-05-08", "parent_task_id": null, "sub_task_summary": { "total": 2, "completed": 1 }, "created_at": "2026-05-01T09:04:30Z" } ``` Valid `source` values: `user`, `system`. Editing a system-generated task promotes it to `user`. Valid `status` values: `pending`, `completed`. ## REST API [#rest-api] ### `GET /meetings/{id}/tasks` [#get-meetingsidtasks] Returns tasks for a specific meeting. | | | | ---------------- | ----------------- | | **Auth** | `bearerAuth` | | **Response** | `200 TaskList` | | **Query params** | `cursor`, `limit` | ### `GET /tasks` [#get-tasks] Lists all tasks across meetings for the authenticated user. | | | | ---------------- | -------------------------------------------------------- | | **Auth** | `bearerAuth` | | **Response** | `200 TaskList` | | **Query params** | `cursor`, `limit`, `meeting_id`, `status`, `assigned_to` | ### `POST /tasks` [#post-tasks] Creates a task. User-created tasks come from the app; system-generated tasks come from ML through Core. | | | | -------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Auth** | `bearerAuth` or service auth | | **Idempotency** | Required | | **Echo suppression** | `Client-Session-Id` optional | | **Response** | `201 Created` with `Task` + `Location: /tasks/{id}` | | **Side effects** | Broadcasts `EntityChangedEvent { entity: "task", action: "created" }`. During live sessions, Core also emits `TaskEvent` with the full task for immediate UI insertion. | ```json { "meeting_id": "meeting-uuid", "content": "Send the rollout plan to design", "assigned_to": "person-uuid", "due_date": "2026-05-08", "parent_task_id": null, "source": "user" } ``` ### `GET /tasks/{id}` [#get-tasksid] Returns a single task with sub-task summary. | | | | ------------ | -------------------- | | **Auth** | `bearerAuth` | | **Response** | `200 Task` | | **Errors** | `404` task not found | ### `PATCH /tasks/{id}` [#patch-tasksid] Updates a task. Editing a system-generated task promotes it to `source: "user"`. | | | | -------------------- | --------------------------------------------------------------------- | | **Auth** | `bearerAuth` | | **Echo suppression** | `Client-Session-Id` optional | | **Response** | `200 Task` | | **Side effects** | Broadcasts `EntityChangedEvent { entity: "task", action: "updated" }` | ```json { "content": "Send the rollout plan to design and engineering", "status": "completed", "source": "user" } ``` ### `DELETE /tasks/{id}` [#delete-tasksid] Deletes a task and cascades to sub-tasks. | | | | -------------------- | --------------------------------------------------------------------- | | **Auth** | `bearerAuth` | | **Echo suppression** | `Client-Session-Id` optional | | **Response** | `204 No Content` | | **Side effects** | Broadcasts `EntityChangedEvent { entity: "task", action: "deleted" }` | ### `GET /tasks/{id}/sub-tasks` [#get-tasksidsub-tasks] Returns sub-tasks for a parent task. | | | | ------------ | -------------- | | **Auth** | `bearerAuth` | | **Response** | `200 TaskList` | ### `POST /tasks/{id}/sub-tasks` [#post-tasksidsub-tasks] Creates a sub-task nested under a parent task. | | | | -------------------- | --------------------------------------------------------------------- | | **Auth** | `bearerAuth` | | **Idempotency** | Required | | **Echo suppression** | `Client-Session-Id` optional | | **Response** | `201 Created` with `Task` + `Location: /tasks/{sub_id}` | | **Side effects** | Broadcasts `EntityChangedEvent { entity: "task", action: "created" }` | *** ## Real-Time Events [#real-time-events] ### Core → Browser [#core--browser] #### `TaskEvent` [#taskevent] Carries system-generated tasks produced during live recording. User-originated task mutations use `EntityChangedEvent` only because the optimistic local state already has the full payload. ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-core/ws", "type": "com.wordloop.meeting.task.v1", "time": "2026-05-01T09:04:30Z", "traceparent": "00-...", "data": { "meeting_id": "meeting-uuid", "task": { "id": "task-uuid", "content": "Send the rollout plan to design", "assigned_to": null, "due_date": null, "parent_task_id": null, "source": "system", "status": "pending" } } } ``` *** ## ML Integration [#ml-integration] ### ML → Core [#ml--core] #### WebSocket: `TaskProducedEvent` [#websocket-taskproducedevent] Emits a live system task from the same structured-output call as [talking points](synthesis). Core persists the task via `POST /tasks` and fans out `TaskEvent` to the browser. ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-ml/ws", "type": "com.wordloop.ml.task.v1", "time": "2026-05-01T09:04:30Z", "traceparent": "00-...", "data": { "meeting_id": "meeting-uuid", "task": { "id": "task-uuid", "content": "Send the rollout plan to design", "assigned_to": null, "due_date": null, "parent_task_id": null, "source": "system" } } } ``` # Transcription (/docs/work/meeting-recording/tdd/contracts/transcription) # Transcription [#transcription] A transcription tracks the processing lifecycle for a meeting's audio. Each meeting has at most one transcription. Transcript segments are the individual speaker-attributed text fragments produced during live recording and refined during post-meeting batch processing. For shared semantics, see [Infrastructure](infrastructure). ## Resource Shapes [#resource-shapes] ### Transcription [#transcription-1] ```json { "id": "transcription-uuid", "meeting_id": "meeting-uuid", "status": "transcribing", "status_message": "Batch transcription in progress", "progress_percent": 45, "is_degraded": false, "created_at": "2026-05-01T09:00:00Z", "updated_at": "2026-05-01T10:01:00Z" } ``` Valid statuses: `pending`, `transcribing`, `synthesizing`, `completed`, `failed`. * **`pending`** — created but processing has not started (e.g., waiting for audio upload or first byte of live audio). * **`transcribing`** — batch transcription and diarisation are in progress. * **`synthesizing`** — transcript is complete; headline, summary, topics, and talking points are being generated. * **`completed`** — all artefacts are final. * **`failed`** — processing failed; `status_message` carries the reason. ### Transcript Segment [#transcript-segment] ```json { "id": "segment-uuid", "source_sequence": 1842, "revision": 2, "speaker_label": "speaker_1", "person_id": "person-uuid", "text": "Let's follow up tomorrow.", "start_ms": 183900, "end_ms": 185100, "confidence": 0.94, "is_final": true, "feature_vector": [0.12, -0.34] } ``` `source_sequence` is assigned by ML as a monotonic counter per transcription session. It is independent of the audio chunk sequence number — the relationship between audio chunks and transcript segments is not 1:1 (one chunk may produce zero or multiple segments). Deduplication uses `(transcription_id, source_sequence, revision)`. ## REST API [#rest-api] ### `GET /meetings/{id}/transcriptions` [#get-meetingsidtranscriptions] Lists transcriptions for a meeting (currently always 0 or 1). | | | | ------------ | ----------------------- | | **Auth** | `bearerAuth` | | **Response** | `200 TranscriptionList` | ### `GET /transcriptions/{id}` [#get-transcriptionsid] Returns transcription metadata and processing status. | | | | ------------ | ----------------------------- | | **Auth** | `bearerAuth` | | **Response** | `200 Transcription` | | **Errors** | `404` transcription not found | ### `GET /transcriptions/{id}/segments` [#get-transcriptionsidsegments] Returns transcript segments with cursor-based pagination. Supports time-range filtering for audio-synced views and ML context recovery. | | | | ---------------- | ----------------------------------------------------------------------------- | | **Auth** | `bearerAuth` or service auth | | **Response** | `200 TranscriptSegmentList` | | **Query params** | `cursor`, `limit` (default 100, max 500), `after_ms`, `before_ms`, `is_final` | The `after_ms` and `before_ms` parameters filter by segment `start_ms`, enabling ML to fetch recent segments for LLM context recovery after a pod restart. ### `POST /transcriptions/{id}/segments` — ML Write-Back [#post-transcriptionsidsegments--ml-write-back] Appends live transcript segments during an active session. Used for low-latency durable writes. | | | | ---------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Auth** | service auth | | **Idempotency** | De-duplicates by `(transcription_id, source_sequence, revision)` — `source_sequence` is ML-assigned (monotonic per session), not the audio chunk sequence number | | **Response** | `204 No Content` | | **Side effects** | Broadcasts `TranscriptSegmentEvent` for live clients and `EntityChangedEvent { entity: "transcript_segment" }` for cache revalidation | ```json { "segments": [ { "id": "segment-uuid", "source_sequence": 1842, "revision": 1, "speaker_label": "speaker_1", "person_id": null, "text": "Let's follow up tomorrow.", "start_ms": 183900, "end_ms": 185100, "confidence": 0.94, "is_final": true } ] } ``` ### `PUT /transcriptions/{id}/segments` — ML Write-Back [#put-transcriptionsidsegments--ml-write-back] Atomically replaces all transcript segments after batch transcription completes. This is the post-meeting quality pass — a new transcript version. | | | | ---------------- | ---------------------------------------------------------------------------------------------------------- | | **Auth** | service auth | | **Idempotency** | Required | | **Response** | `204 No Content` | | **Errors** | `404` transcription not found; `409` live session still active | | **Side effects** | Broadcasts `TranscriptRevisedEvent` (not `EntityChangedEvent`) — clients must reload the full segment list | ### `PATCH /transcriptions/{id}/status` — ML Write-Back [#patch-transcriptionsidstatus--ml-write-back] Updates processing state for the Meeting Summary progress indicator. | | | | ---------------- | -------------------------------------------------------------------------------------------------------------------------- | | **Auth** | service auth | | **Response** | `204 No Content` | | **Side effects** | Inserts `transcription_status_history` row; broadcasts `EntityChangedEvent { entity: "transcription", action: "updated" }` | ```json { "status": "synthesizing", "message": "Generating summary and talking points", "progress_percent": 75 } ``` *** ## Real-Time Events [#real-time-events] ### Core → Browser [#core--browser] #### `TranscriptSegmentEvent` [#transcriptsegmentevent] Carries a full segment for immediate rendering. Interim segments are replaced in-place by later events with the same `id` and a higher `revision`. ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-core/ws", "type": "com.wordloop.transcript.segment.v1", "time": "2026-05-01T09:03:05Z", "traceparent": "00-...", "data": { "meeting_id": "meeting-uuid", "transcription_id": "transcription-uuid", "segment": { "id": "segment-uuid", "revision": 2, "source_sequence": 1842, "speaker_label": "speaker_1", "person_id": null, "text": "Let's follow up tomorrow.", "start_ms": 183900, "end_ms": 185100, "confidence": 0.94, "is_final": true } } } ``` #### `TranscriptRevisedEvent` — **New** [#transcriptrevisedevent--new] Signals that the entire transcript has been replaced by a post-meeting quality pass. Clients must reload the full segment list via `GET /transcriptions/{id}/segments`. This replaces the ambiguous `EntityChangedEvent { entity: "transcript_segment" }` for the bulk replacement case. ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-core/ws", "type": "com.wordloop.transcript.revised.v1", "time": "2026-05-01T10:05:00Z", "traceparent": "00-...", "data": { "meeting_id": "meeting-uuid", "transcription_id": "transcription-uuid", "segment_count": 812, "version": 2 } } ``` *** ## ML Integration [#ml-integration] ### ML → Core [#ml--core] #### WebSocket: `TranscriptSegmentProducedEvent` [#websocket-transcriptsegmentproducedevent] Emits an interim or final transcript segment. Core immediately fans this out to the app via WebSocket and persists it through Core REST/domain services. ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-ml/ws", "type": "com.wordloop.ml.transcript.segment.v1", "time": "2026-05-01T09:03:05Z", "traceparent": "00-...", "data": { "meeting_id": "meeting-uuid", "transcription_id": "transcription-uuid", "segment": { "id": "segment-uuid", "source_sequence": 1842, "revision": 1, "speaker_label": "speaker_1", "person_id": null, "text": "Let's follow up tomorrow.", "start_ms": 183900, "end_ms": 185100, "confidence": 0.94, "is_final": true } } } ``` #### WebSocket: `SegmentFeaturesProducedEvent` [#websocket-segmentfeaturesproducedevent] Sends feature vectors for speaker matching and later voice-profile enrichment. Core persists vectors but does not broadcast them to the browser. For how these feed the speaker identification pipeline, see [Person & Speaker Identity](person). ```json { "specversion": "1.0", "id": "event-uuid", "source": "wordloop-ml/ws", "type": "com.wordloop.ml.segment_features.v1", "time": "2026-05-01T09:03:06Z", "traceparent": "00-...", "data": { "meeting_id": "meeting-uuid", "segment_id": "segment-uuid", "speaker_label": "speaker_1", "embedding_model": "ecapa-tdnn-v1", "embedding": [0.12, -0.34] } } ``` ### ML Batch Processing [#ml-batch-processing] Batch processing handles post-meeting transcription and synthesis. Pub/Sub is the normal trigger; REST provides a deterministic control surface for Core and tests. #### `POST /transcription-jobs/{id}/run` [#post-transcription-jobsidrun] Starts or resumes a post-meeting transcription job. | | | | --------------- | --------------------------------------------------------------------------- | | **Auth** | service auth | | **Idempotency** | Required | | **Response** | `202 Accepted` with job status | | **Errors** | `404` job unknown; `409` job already running with a different audio version | ```json { "meeting_id": "meeting-uuid", "transcription_id": "transcription-uuid", "storage_path": "gs://wordloop-audio/meetings/meeting-uuid/audio.webm", "audio_version": 2, "task_extraction_policy": "skip", "speaker_profile_policy": "enrich_after_completion" } ``` #### `GET /transcription-jobs/{id}` [#get-transcription-jobsid] Returns ML job progress for diagnostics. Core remains the user-facing source of truth for transcription status. | | | | ------------ | ------------------------------ | | **Auth** | service auth | | **Response** | `200 MLTranscriptionJobStatus` | ```json { "id": "transcription-uuid", "meeting_id": "meeting-uuid", "status": "transcribing", "progress_percent": 45, "current_stage": "batch_transcription", "started_at": "2026-05-01T10:01:00Z", "completed_at": null } ``` *** ## Pub/Sub [#pubsub] ### `transcription-jobs` [#transcription-jobs] Dispatches batch transcription and synthesis work to ML after an audio upload completes or a live recording has composed `audio.webm`. This is the single actionable trigger for post-meeting processing. | | | | -------------------- | ----------------------------------------- | | **Producer** | Core | | **Consumer** | ML post-meeting worker | | **CloudEvents type** | `com.wordloop.transcription.requested.v1` | | **Ordering key** | `meeting_id` | | **Idempotency** | `transcription_id` plus `audio_version` | | **Dead-letter** | `transcription-jobs-dlq` | ```json { "transcription_id": "transcription-uuid", "meeting_id": "meeting-uuid", "user_id": "user-uuid", "storage_path": "gs://wordloop-audio/meetings/meeting-uuid/audio.webm", "audio_version": 2, "source_type": "live", "task_extraction_policy": "skip", "speaker_profile_policy": "enrich_after_completion" } ``` Valid `source_type` values: `upload`, `live`. Valid `task_extraction_policy` values: `extract`, `skip`, `replace_system`. Live recordings use `skip` because tasks captured during the live session are preserved. Valid `speaker_profile_policy` values: `enrich_after_completion`, `skip`. Controls whether ML updates voice profiles with session embeddings. ### Consumer Outcomes [#consumer-outcomes] | Event | Consumer outcome | | ---------------------------- | -------------------------------------------------------------------------------------------------------------------- | | `transcription.requested` | ML downloads audio, runs batch transcription/synthesis, writes results to Core REST, and updates status transitions. | | `meeting.session.terminated` | ML drains AssemblyAI, flushes final live segments via Core REST, and closes its ML WebSocket connection. | # Pub/Sub Events (/docs/work/delivered/live-capture/03-tdd/contracts/core/pubsub) # Pub/Sub Events [#pubsub-events] ### `TranscriptionJobCloudEvent` [#transcriptionjobcloudevent] Dispatched by Core when recording stops (or audio upload completes). Consumed by the ML post-meeting worker. ```yaml data: transcription_id: string # required meeting_id: string # required storage_path: string # required user_id: string # required skip_tasks: boolean # default: false — set to true for live recordings ``` ### `MeetingTerminatedCloudEvent` [#meetingterminatedcloudevent] Dispatched by Core when a live recording stops. Consumed by ML to drain its AssemblyAI pipeline. ```yaml data: session_id: string # required meeting_id: string # required user_id: string # required ``` *** # Core REST API (/docs/work/delivered/live-capture/03-tdd/contracts/core/rest) # Core REST API [#core-rest-api] ### `GET /meetings/{id}/audio-url` [#get-meetingsidaudio-url] Returns a short-lived signed URL for direct audio playback from Cloud Storage. | | | | ----------------- | --------------------------------------------------- | | **Auth** | `bearerAuth` | | **Cache-Control** | `private, no-store` | | **Response** | `200 { url: string, expires_at: string }` | | **Errors** | `404` meeting not found; `404` meeting has no audio | *** ### `POST /meetings` [#post-meetings] Creates a meeting. Set `source_type: "recording"` for live recordings. | | | | -------------------- | ----------------------------------------- | | **Auth** | `bearerAuth` | | **Idempotency** | `Idempotency-Key: ` header required | | **Echo suppression** | `X-Client-Id: ` header | | **Response** | `201 Created` with full `Meeting` body | *** ### `PATCH /meetings/{id}` [#patch-meetingsid] Partially updates meeting metadata. Used by ML to set the `headline` after post-meeting processing. | | | | -------------------- | --------------------------------------------------------------- | | **Auth** | `bearerAuth` or `serviceAuth` | | **Echo suppression** | `X-Client-Id: ` header | | **Response** | `200` with full `Meeting` body | | **Side effects** | Broadcasts `EntityChanged { entity: meeting, action: updated }` | *** ### `POST /meetings/{id}/speaker-labels` [#post-meetingsidspeaker-labels] Action endpoint that associates an AssemblyAI speaker label with a known Person. Triggers a bulk `person_id` update across all segments sharing that label and enriches the Person's voice embedding. | | | | -------------------- | ----------------------------------------------------------------------------------------------------------------------- | | **Auth** | `bearerAuth` | | **Idempotency** | `Idempotency-Key: ` header required | | **Echo suppression** | `X-Client-Id: ` header | | **Response** | `200 { updated_count: integer }` | | **Errors** | `404` meeting not found; `422` `person_id` does not exist; `422` `speaker_label` not found in this meeting's transcript | **Request body:** ```json { "speaker_label": "Speaker A", "person_id": "uuid" } ``` **Side effects:** * All `transcript_segments` with `speaker_label = "Speaker A"` receive `person_id` * Person's `voice_vector` updated from aggregated segment `feature_vector` values * `EntityChanged { entity: transcript_segment, action: updated }` broadcast (echo-suppressed) * `EntityChanged { entity: person, action: updated }` broadcast (echo-suppressed) *** ### `PUT /meetings/{id}/synthesis` [#put-meetingsidsynthesis] Atomically replaces the meeting's AI-generated synthesis. Called by ML after post-meeting processing. | | | | -------------------- | --------------------------------------------------------------- | | **Auth** | `serviceAuth` | | **Echo suppression** | `X-Client-Id: ` header | | **Response** | `204 No Content` | | **Side effects** | Broadcasts `EntityChanged { entity: meeting, action: updated }` | *** ### `POST /meetings/{id}/talking-points` [#post-meetingsidtalking-points] Creates a talking point for a meeting. Called by ML to deliver live insights per finalised segment. | | | | -------------------- | ------------------------------------------- | | **Auth** | `serviceAuth` | | **Idempotency** | `Idempotency-Key: ` header required | | **Echo suppression** | `X-Client-Id: ` header | | **Response** | `201 Created` with full `TalkingPoint` body | | **Side effects** | Broadcasts `TalkingPointEvent` on WebSocket | *** ### `POST /transcriptions/{transcriptionId}/segments` [#post-transcriptionstranscriptionidsegments] Appends transcript segments to a transcription. Used on the live path. | | | | ---------------- | ----------------------------------------------------------------------- | | **Auth** | `serviceAuth` | | **Response** | `204 No Content` | | **Side effects** | Broadcasts `TranscriptSegmentEvent` on WebSocket for each final segment | *** ### `PUT /transcriptions/{transcriptionId}/segments` [#put-transcriptionstranscriptionidsegments] Atomically replaces all segments for a transcription. Used by the post-meeting batch worker. | | | | ---------------- | ----------------------------------------------------------------------------------------------------- | | **Auth** | `serviceAuth` | | **Response** | `204 No Content` | | **Errors** | `404` transcription not found; `409` transcription is still in `processing` state | | **Side effects** | Deletes existing segments, inserts new set, broadcasts `EntityChanged { entity: transcript_segment }` | *** ### `DELETE /meetings/{meetingId}/tasks?source=system` [#delete-meetingsmeetingidtaskssourcesystem] Deletes all system-generated tasks for a meeting. Used by the post-meeting worker when re-extracting tasks. | | | | ------------ | ---------------- | | **Auth** | `serviceAuth` | | **Response** | `204 No Content` | *** # Core WebSocket API (/docs/work/delivered/live-capture/03-tdd/contracts/core/websocket) # Core WebSocket API [#core-websocket-api] All events follow the CloudEvents v1.0 envelope. Every event that originates from a user action includes `sourceClientId` at the envelope root — the originating client discards events where `sourceClientId` matches its own `X-Client-Id`. ### `EntityChangedEvent` (Server → Client) [#entitychangedevent-server--client] Cache-invalidation signal. Clients re-fetch the affected entity via REST on receipt. ```json { "specversion": "1.0", "id": "", "source": "wordloop-core/ws", "type": "com.wordloop.entity.changed.v1", "time": "", "sourceClientId": "", "data": { "entity": "meeting | person | task | note | transcript_segment | talking_point", "action": "created | updated | deleted", "id": "" } } ``` ### `TranscriptSegmentEvent` (Server → Client) [#transcriptsegmentevent-server--client] Live transcript segment during recording. Carries the full segment payload inline. ```json { "specversion": "1.0", "id": "", "source": "wordloop-core/ws", "type": "com.wordloop.transcript.segment.v1", "time": "", "data": { "meeting_id": "", "segment": { "id": "", "speaker_label": "Speaker A", "person_id": "", "text": "string", "start_ms": 1200, "end_ms": 2400, "confidence": 0.96, "is_final": true } } } ``` ### `TalkingPointEvent` (Server → Client) [#talkingpointevent-server--client] Streamed per finalised segment during live recording. Carries the full payload inline. ```json { "specversion": "1.0", "id": "", "source": "wordloop-core/ws", "type": "com.wordloop.meeting.talking_point.v1", "time": "", "sourceClientId": "", "data": { "meeting_id": "", "talking_point": { "id": "", "content": "string", "is_final": true, "segments": [""] } } } ``` ### `TaskEvent` (Server → Client) [#taskevent-server--client] Streamed in batches (\~60s) during live recording. ```json { "specversion": "1.0", "id": "", "source": "wordloop-core/ws", "type": "com.wordloop.meeting.task.v1", "time": "", "sourceClientId": "", "data": { "meeting_id": "", "task": { "id": "", "content": "string", "source": "system" } } } ``` ### `RecordingStartedEvent` (Server → Client) [#recordingstartedevent-server--client] ```json { "specversion": "1.0", "id": "", "source": "wordloop-core/ws", "type": "com.wordloop.recording.started.v1", "time": "", "data": { "meeting_id": "", "session_id": "" } } ``` ### `RecordingStoppedEvent` (Server → Client) [#recordingstoppedevent-server--client] ```json { "specversion": "1.0", "id": "", "source": "wordloop-core/ws", "type": "com.wordloop.recording.stopped.v1", "time": "", "data": { "meeting_id": "" } } ``` ### `RecordingDegradedEvent` (Server → Client) [#recordingdegradedevent-server--client] ```json { "specversion": "1.0", "id": "", "source": "wordloop-core/ws", "type": "com.wordloop.recording.degraded.v1", "time": "", "data": { "meeting_id": "", "reason": "string" } } ``` ### `StartRecordingCommand` (Client → Server) [#startrecordingcommand-client--server] ```json { "specversion": "1.0", "id": "", "source": "wordloop-app/ws", "type": "com.wordloop.recording.start.v1", "time": "", "data": { "meeting_id": "", "audio_config": { "encoding": "pcm16 | webm | mp3", "sample_rate": 16000, "channels": 1 } } } ``` ### `StopRecordingCommand` (Client → Server) [#stoprecordingcommand-client--server] ```json { "specversion": "1.0", "id": "", "source": "wordloop-app/ws", "type": "com.wordloop.recording.stop.v1", "time": "", "data": { "meeting_id": "" } } ``` *** # ML NDJSON Stream API (/docs/work/delivered/live-capture/03-tdd/contracts/ml/ndjson-stream) # ML HTTP Streaming API [#ml-http-streaming-api] The ML service exposes a bidirectional HTTP streaming endpoint. Core opens the connection on `StartRecordingCommand` and holds it open for the duration of the session. ### `POST /streaming/sessions` [#post-streamingsessions] Creates a streaming session. The HTTP response body remains open. | | | | ------------------------ | ------------------------------------------------------------------------------------------------------------------- | | **Auth** | `serviceAuth` (Core → ML, mTLS service token) | | **Request Content-Type** | `application/json` | | **Body** | `{ meeting_id, transcription_id, audio_config }` | | **Response** | `201 Created` — initial JSON `{ session_id }` header line, then `Content-Type: application/x-ndjson` streaming body | | **Errors** | `409` a session for this `meeting_id` already exists; `503` AssemblyAI unavailable | ### `POST /streaming/sessions/{session_id}/audio` [#post-streamingsessionssession_idaudio] Delivers a binary audio chunk to the active session. | | | | ------------------------ | ---------------------------------------------------------- | | **Auth** | `serviceAuth` | | **Request Content-Type** | `application/octet-stream` | | **Body** | Raw binary audio chunk | | **Response** | `204 No Content` | | **Errors** | `404` session not found; `410` session has been terminated | ### `DELETE /streaming/sessions/{session_id}` [#delete-streamingsessionssession_id] Terminates the session cleanly. ML drains the AssemblyAI buffer and closes the streaming response. | | | | ------------ | ----------------------- | | **Auth** | `serviceAuth` | | **Response** | `204 No Content` | | **Errors** | `404` session not found | ### Streaming Response Envelope [#streaming-response-envelope] ML writes NDJSON events to the open response stream. Each newline-terminated line is one event. ```json { "type": "transcript_segment", "data": { "segment_id": "", "speaker_label": "Speaker A", "text": "Hello world", "start_ms": 1200, "end_ms": 2400, "confidence": 0.96, "is_final": true } } { "type": "feature_vector", "data": { "segment_id": "", "vector": [0.12, -0.34, 0.77] } } { "type": "speaker_match", "data": { "segment_id": "", "person_id": "", "score": 0.91 } } { "type": "talking_point", "data": { "id": "", "content": "Discussed Q3 roadmap", "is_final": false, "segments": [""] } } { "type": "task", "data": { "id": "", "content": "Follow up with design team", "source": "system" } } ``` Core routes each event type to the appropriate handler: | Event type | Core action | | -------------------- | -------------------------------------------------------------------------- | | `transcript_segment` | Broadcast `TranscriptSegmentEvent` on WS → async DB insert | | `feature_vector` | Async DB update on segment (no WS broadcast) | | `speaker_match` | Async DB update → broadcast `EntityChanged { entity: transcript_segment }` | | `talking_point` | Broadcast `TalkingPointEvent` on WS → async DB upsert | | `task` | Broadcast `TaskEvent` on WS → async DB insert | *** # Full Meeting Experience (/docs/work/delivered/live-capture/03-tdd/milestones/full-experience) # Milestone 3: Full Meeting Experience [#milestone-3-full-meeting-experience] > **Integration Lead**: App engineer > **Combines**: Milestone 2 + Slice 5 (App) The complete meeting recording experience: record → live insights → post-meeting reprocessing → play back audio with synchronised transcript. The bet is complete when all 12 user stories pass. **End-to-end tests:** | Test | Assertion | | --------------------------------------- | ------------------------------------------------------------------------------------- | | Full upload → reprocess → playback flow | User records meeting, waits for reprocessing, plays audio, transcript highlights sync | | Transcript click-to-seek | Clicking a transcript segment seeks audio to that position | | Playback speed control | Audio plays at 0.5×, 1×, 1.5×, 2× — transcript sync maintains accuracy at all speeds | | Speaker names in playback | Resolved person names display on transcript segments during playback | | All 12 user stories pass | Manual walkthrough of every acceptance criterion from the [User Flow](user-flow) | *** # App — Playback UI (/docs/work/delivered/live-capture/03-tdd/milestones/full-experience/slice-app) # Slice 5: App — Playback UI + Transcript Synchronisation [#slice-5-app--playback-ui--transcript-synchronisation] > **Owner**: App engineer > **Domain**: App > **Complexity**: M > **Status**: 🔧 In Progress > **Prerequisite**: Slice 4 merged Audio playback with synchronised transcript highlighting. ### Tasks [#tasks] * [x] Core: `GET /meetings/{id}/audio-url` and `GET /meetings/{id}/audio` (dev proxy) * [x] App: `AudioPlayer` component — play/pause, ±10s skip, seek bar, playback speed * [x] App: transcript sync — highlight segment where `start_ms ≤ currentTime < end_ms` * [x] App: click-to-seek — clicking segment sets audio position * [x] App: display resolved person names on segments * [ ] App: auto-scroll — keep active transcript segment in view * [ ] App: handle audio URL expiry — re-fetch before expiry * [ ] App: regenerate OpenAPI client to include `person_id` on `TranscriptionSegment` **Test cases:** | Test | Location | Assertion | | ---------------------------------------- | ------------- | -------------------------------------------------------------------------- | | Audio player renders with controls | `test_app` | Play/pause, skip, speed, and seek bar visible | | Transcript highlight syncs with playback | `test_app` | Segment with matching time range has active style | | Click-to-seek works | `test_app` | Clicking timestamp sets `audio.currentTime` to segment's `start_ms / 1000` | | Auto-scroll follows playback | `test_app` | Active segment scrolled into viewport during playback | | Speaker names displayed | `test_app` | Segment with `person_id` shows person name, not speaker label | | Full playback flow end-to-end | `test_system` | User opens completed meeting, plays audio, sees synced transcript | # Full Pipeline Operational (/docs/work/delivered/live-capture/03-tdd/milestones/full-pipeline) # Milestone 2: Full Pipeline Operational [#milestone-2-full-pipeline-operational] > **Integration Lead**: Core engineer > **Combines**: Milestone 1 + Slice 4 (Core + ML) After recording stops, the system automatically re-processes the meeting: higher-accuracy transcript replaces the live version, talking points are finalised, headline is generated, and speakers are identified by voice profile matching. **End-to-end tests:** | Test | Assertion | | ----------------------------------- | ----------------------------------------------------------------------------------------------- | | Post-meeting reprocessing completes | After recording stops, transcript segments are replaced with higher-accuracy version within 60s | | Talking points finalised | Talking points show `is_final: true` after reprocessing | | Headline generated automatically | Meeting has a non-null `headline` after reprocessing | | Live tasks preserved | Both user-created and system-extracted tasks from the live session remain unchanged | | Speaker identification | Segments attributed to enrolled voice profiles show person names | | UI updates in real time | Each reprocessed artefact appears in the UI without page refresh (WebSocket events) | *** # Core — Post-Meeting (/docs/work/delivered/live-capture/03-tdd/milestones/full-pipeline/slice-core) # Slice 4: Post-Meeting Reprocessing + Speaker ID [#slice-4-post-meeting-reprocessing--speaker-id] > **Owner**: Core engineer > **Domain**: Core > **Complexity**: M > **Status**: ✅ Done The automatic pipeline that runs after recording stops. ### Tasks [#tasks] * [x] publish `TranscriptionJobMessage` and `MeetingTerminatedMessage` on session stop consume `MeetingSessionTerminated` — drain AssemblyAI buffer; send final segments via REST * [x] ML: skip task extraction when existing live tasks present * [x] Core: `POST /transcriptions/{id}/remap-speaker` and `POST /transcriptions/{id}/identify-speakers` * [x] ML: speaker centroid computation + identify + remap pipeline **Test cases:** | Test | Location | Assertion | | -------------------------------------------- | ------------- | -------------------------------------------------------------------------- | | Post-meeting pipeline triggers automatically | `test_system` | Recording stop → transcript segments replaced with higher-accuracy version | | Tasks from live session preserved | `test_system` | After post-meeting processing, user-created and system tasks still present | | Speaker labels resolved to person\_id | `test_core` | Cosine similarity match → segment `person_id` updated | | Talking points promoted to `is_final: true` | `test_system` | After post-meeting processing, talking points have `is_final: true` | | Headline generated | `test_system` | Meeting has non-null `headline` after post-meeting processing | *** # ML — Post-Meeting (/docs/work/delivered/live-capture/03-tdd/milestones/full-pipeline/slice-ml) # Slice 4: Post-Meeting Reprocessing + Speaker ID [#slice-4-post-meeting-reprocessing--speaker-id] > **Owner**: ML engineer > **Domain**: ML > **Complexity**: M > **Status**: ✅ Done The automatic pipeline that runs after recording stops. ### Tasks [#tasks] publish `TranscriptionJobMessage` and `MeetingTerminatedMessage` on session stop * [x] consume `MeetingSessionTerminated` — drain AssemblyAI buffer; send final segments via REST * [x] ML: skip task extraction when existing live tasks present * [x] Core: `POST /transcriptions/{id}/remap-speaker` and `POST /transcriptions/{id}/identify-speakers` * [x] ML: speaker centroid computation + identify + remap pipeline **Test cases:** | Test | Location | Assertion | | -------------------------------------------- | ------------- | -------------------------------------------------------------------------- | | Post-meeting pipeline triggers automatically | `test_system` | Recording stop → transcript segments replaced with higher-accuracy version | | Tasks from live session preserved | `test_system` | After post-meeting processing, user-created and system tasks still present | | Speaker labels resolved to person\_id | `test_core` | Cosine similarity match → segment `person_id` updated | | Talking points promoted to `is_final: true` | `test_system` | After post-meeting processing, talking points have `is_final: true` | | Headline generated | `test_system` | Meeting has non-null `headline` after post-meeting processing | *** # Live Recording Operational (/docs/work/delivered/live-capture/03-tdd/milestones/live-recording) # Milestone 1: Live Recording Operational [#milestone-1-live-recording-operational] > **Integration Lead**: App engineer (closest to the user) > **Combines**: Slice 1 (Core) + Slice 2 (ML) + Slice 3 (App) The user can start a recording, watch live transcription and AI insights stream in, add manual tasks, and stop the recording. The core value proposition of the bet is functional. **End-to-end tests:** | Test | Assertion | | ----------------------------- | ---------------------------------------------------------------------------------------------------------------- | | Full live recording flow | User starts recording → sees live transcript → sees talking points → adds task → stops recording → meeting saved | | Multi-speaker recording | Recording with 2+ speakers shows distinct speaker labels on segments | | Graceful degradation | With ML unavailable: recording continues, audio captured, degraded banner shown | | Duration limit enforcement | After configurable limit, recording auto-stops and post-processing triggers | | Concurrent session prevention | Starting a second recording while one is active → error shown | *** # App — Live UI (/docs/work/delivered/live-capture/03-tdd/milestones/live-recording/slice-app) # Slice 3: App — Live Recording UI + Audio Streaming [#slice-3-app--live-recording-ui--audio-streaming] > **Owner**: App engineer > **Domain**: App > **Complexity**: L > **Prerequisite**: Slice 1 merged and `./dev gen all` run The user-facing live recording experience. ### Tasks [#tasks] * [x] Add recording controls to Meeting Detail (Record / Stop button, state-aware) * [x] Implement microphone capture: `MediaRecorder` → binary chunks → WebSocket * [x] Handle `RecordingStartedEvent`, `RecordingStoppedEvent`, `RecordingDegradedEvent` * [x] Render live transcript, talking points, and tasks * [x] Task input during recording: optimistic mutation with echo suppression * [x] Graceful error handling: ML unavailable, session already active **Test cases:** | Test | Location | Assertion | | -------------------------------------- | ------------- | ----------------------------------------------------------------------------- | | Start recording → indicator visible | `test_app` | Recording indicator component renders when session active | | Live transcript renders final segments | `test_app` | Final segment appears in transcript list | | Interim segments visually distinct | `test_app` | Interim segment has `opacity` or `muted` style | | Task creation is optimistic | `test_app` | Task appears in list before server response | | Echo suppression works | `test_app` | WebSocket event with matching `sourceClientId` is ignored | | Degraded mode shows banner | `test_app` | `RecordingDegradedEvent` → warning banner visible | | Full live flow end-to-end | `test_system` | User starts recording, sees transcript, adds task, stops — all data persisted | *** # Core — Schema, Endpoints, Events (/docs/work/delivered/live-capture/03-tdd/milestones/live-recording/slice-core) # Slice 1: Core — Schema, Endpoints, WebSocket Events [#slice-1-core--schema-endpoints-websocket-events] > **Owner**: Core engineer > **Domain**: Core > **Complexity**: L The foundation everything else builds on. No UI can be built until these contracts exist and the API spec is regenerated. ### Tasks [#tasks] * [x] Write and apply DB migrations (UUIDv7, `speaker_label`, `start_ms`/`end_ms`, `person_id`, `is_final`, `headline`, `status`) * [x] Implement `GET /meetings/{id}/audio-url` — returns signed GCS URL * [x] Implement `POST /meetings/{id}/speaker-labels` — bulk person\_id assignment + voice profile enrichment * [x] Implement `PUT /transcriptions/{id}/segments` — atomic replacement * [x] Add new WebSocket events: `RecordingStartedEvent`, `RecordingStoppedEvent`, `RecordingDegradedEvent`, `TalkingPointEvent`, `TaskEvent` * [x] Implement ML streaming session management and audio GCS upload stream * [x] Regenerate API spec: `./dev gen all` **Test cases:** | Test | Location | Assertion | | -------------------------------------------- | ----------- | -------------------------------------------------------------- | | Migrations apply cleanly | `test_core` | `./dev db migrate` succeeds with no errors | | `GET /audio-url` returns signed URL | `test_core` | 200 with `url` and `expires_at` fields | | `POST /speaker-labels` updates segments | `test_core` | All segments with matching `speaker_label` receive `person_id` | | `PUT /segments` replaces atomically | `test_core` | New segments replace old; count matches input | | WebSocket events conform to CloudEvents v1.0 | `test_core` | Events include `specversion`, `id`, `source`, `type`, `time` | | Unauthenticated requests rejected | `test_core` | 401 for all new endpoints without auth | *** # ML — Streaming API (/docs/work/delivered/live-capture/03-tdd/milestones/live-recording/slice-ml) # Slice 2: ML — Streaming Session API + NDJSON Routing [#slice-2-ml--streaming-session-api--ndjson-routing] > **Owner**: ML engineer > **Domain**: ML > **Complexity**: L Introduces the resource-oriented streaming session API and the NDJSON event stream that Core consumes. ### Tasks [#tasks] * [x] Implement `POST /streaming/sessions` — response body remains open as `application/x-ndjson` * [x] Implement `POST /streaming/sessions/{id}/audio` — audio chunk delivery * [x] Implement `DELETE /streaming/sessions/{id}` — clean session termination with AssemblyAI buffer drain * [x] Add NDJSON event types: `transcript_segment`, `feature_vector`, `speaker_match`, `talking_point`, `task` * [x] Implement speaker identification logic and task extraction (\~60s cadence) * [x] Extend post-meeting pipeline to honour `skip_tasks` flag * [x] Regenerate API spec: `./dev gen all` **Test cases:** | Test | Location | Assertion | | -------------------------------------- | --------- | --------------------------------------------------------------- | | Session lifecycle works end-to-end | `test_ml` | Create → audio → events stream → delete completes without error | | All 5 NDJSON event types emitted | `test_ml` | Each event type appears in the stream with correct envelope | | `skip_tasks=true` preserves live tasks | `test_ml` | Post-meeting pipeline does not call task extraction | | Session conflict returns 409 | `test_ml` | Second `POST /streaming/sessions` for same meeting → 409 | *** # Cloud Storage Schema (/docs/work/delivered/live-capture/03-tdd/schemas/core/gcs) # Cloud Storage Schema [#cloud-storage-schema] Bucket: `wordloop-meeting-audio` Path format: `meetings/{meeting_id}/{uuid}.{ext}` # PostgreSQL Schema (/docs/work/delivered/live-capture/03-tdd/schemas/core/postgres) # PostgreSQL Schema [#postgresql-schema] Target state of every table involved in this bet after all migrations are applied. ### `meetings` [#meetings] ```sql CREATE TABLE meetings ( id UUID NOT NULL DEFAULT uuidv7(), user_id UUID NOT NULL, title TEXT NOT NULL, headline TEXT, -- ML-generated one-line summary source_type TEXT NOT NULL, start_time TIMESTAMPTZ NOT NULL, end_time TIMESTAMPTZ, summary TEXT, key_points JSONB, created_at TIMESTAMPTZ NOT NULL DEFAULT now(), deleted_at TIMESTAMPTZ, CONSTRAINT pk_meetings PRIMARY KEY (id), CONSTRAINT fk_meetings_user FOREIGN KEY (user_id) REFERENCES users (id) ON DELETE CASCADE, CONSTRAINT chk_meetings_source_type CHECK (source_type IN ('recording', 'upload', 'text', 'anecdotal')), CONSTRAINT chk_meetings_timeline CHECK (end_time IS NULL OR end_time > start_time) ); ``` ### `meeting_audio_files` [#meeting_audio_files] ```sql CREATE TABLE meeting_audio_files ( meeting_id UUID NOT NULL, gcs_path TEXT, status TEXT NOT NULL, created_at TIMESTAMPTZ NOT NULL DEFAULT now(), updated_at TIMESTAMPTZ NOT NULL DEFAULT now(), CONSTRAINT pk_meeting_audio_files PRIMARY KEY (meeting_id), CONSTRAINT fk_meeting_audio_meeting FOREIGN KEY (meeting_id) REFERENCES meetings (id) ON DELETE CASCADE, CONSTRAINT chk_meeting_audio_status CHECK ( status IN ('recording', 'processing', 'completed', 'failed') ), CONSTRAINT chk_meeting_audio_path CHECK ( status IN ('processing', 'failed') OR gcs_path IS NOT NULL ) ); ``` ### `transcript_segments` [#transcript_segments] ```sql CREATE TABLE transcript_segments ( id UUID NOT NULL DEFAULT uuidv7(), transcription_id UUID NOT NULL, speaker_label TEXT, person_id UUID, text TEXT NOT NULL, start_ms BIGINT NOT NULL, end_ms BIGINT NOT NULL, confidence REAL, is_final BOOLEAN NOT NULL DEFAULT false, is_highlighted BOOLEAN NOT NULL DEFAULT false, feature_vector vector(512), created_at TIMESTAMPTZ NOT NULL DEFAULT now(), CONSTRAINT pk_transcript_segments PRIMARY KEY (id), CONSTRAINT fk_segments_transcription FOREIGN KEY (transcription_id) REFERENCES transcriptions (id) ON DELETE CASCADE, CONSTRAINT fk_segments_person FOREIGN KEY (person_id) REFERENCES people (id) ON DELETE SET NULL, CONSTRAINT chk_segments_timeline CHECK (end_ms > start_ms), CONSTRAINT chk_segments_confidence CHECK (confidence IS NULL OR (confidence >= 0 AND confidence <= 1)) ); ``` ### `talking_points` [#talking_points] ```sql CREATE TABLE talking_points ( id UUID NOT NULL DEFAULT uuidv7(), meeting_id UUID NOT NULL, topic_id UUID, content TEXT NOT NULL, is_final BOOLEAN NOT NULL DEFAULT false, created_at TIMESTAMPTZ NOT NULL DEFAULT now(), deleted_at TIMESTAMPTZ, CONSTRAINT pk_talking_points PRIMARY KEY (id), CONSTRAINT fk_talking_points_meeting FOREIGN KEY (meeting_id) REFERENCES meetings (id) ON DELETE CASCADE, CONSTRAINT fk_talking_points_topic FOREIGN KEY (topic_id) REFERENCES topics (id) ON DELETE SET NULL ); ```