# Wordloop Platform (/docs)
{/* LLM-Context: TL;DR:
This is the root index of the Wordloop Platform documentation.
Wordloop is a monorepo consisting of:
- wordloop-core (Go, Port 4002): REST API, DB (pgvector), domain logic.
- wordloop-ml (Python/FastAPI, Port 4003): AI/ML tasks, transcription.
- wordloop-app (Next.js, Port 4001): Web frontend with SSR.
Core routing philosophy: Trace-First Development.
Dependencies mapping: Check knowledge-graph.json.
*/}
# Wordloop Platform [#wordloop-platform]
Meeting transcription, speaker identification, and AI-powered conversation intelligence.
## Services [#services]
| Service | Language | Port | Role |
| --------------------------------------------- | ---------------- | ---- | -------------------------------------- |
| [wordloop-core](learn/services/core/index.md) | Go | 4002 | REST API, domain logic, database |
| [wordloop-ml](learn/services/ml/index.md) | Python / FastAPI | 4003 | Transcription, speaker embeddings, LLM |
| [wordloop-app](learn/services/app/index.md) | Next.js | 4001 | Web frontend |
## Architecture at a glance [#architecture-at-a-glance]
## Navigating the Documentation [#navigating-the-documentation]
If you are new to the platform, we recommend following the sidebar from top to bottom:
1. **[Principles](principles/index.mdx)** — Start by understanding our core philosophy, engineering values, and system constraints.
2. **[Architecture](learn/architecture/overview.mdx)** — See how those principles are applied structurally across the system and infrastructure.
3. **[Development](start/quickstart.md)** — Learn how to spin up the entire platform locally via our custom `./dev` CLI.
4. **Services** — Dive deep into specific implementations for [Core](learn/services/core/index.md), [ML](learn/services/ml/index.md), and [App](learn/services/app/index.md).
5. **API & Schemas** — Reference material for system contracts.
# Postgres with pgvector as the production vector store (/docs/decisions/0001-postgres-for-vector-search)
# 0001 — Postgres with `pgvector` as the production vector store [#0001--postgres-with-pgvector-as-the-production-vector-store]
**Status:** Accepted
**Date:** 2026-04-19
**Deciders:** core platform
**Supersedes:** —
**Superseded by:** —
## Context [#context]
Wordloop generates and stores embeddings for transcript chunks, speaker utterances, and recap summaries. A retrieval-augmented generation (RAG) workflow at read time uses these embeddings to supply context to model calls.
The default instinct when adding a GenAI feature is to reach for a dedicated vector database — Pinecone, Milvus, Weaviate, or similar. These systems offer specialised ANN indexes, horizontal scale, and purpose-built tooling. At our current scale, they also introduce an operational surface we do not need and a split-brain failure mode we actively want to avoid.
Embeddings in Wordloop are not an island. They exist **because** a specific transcript chunk exists. They must appear atomically with the chunk, be removed atomically when the chunk is removed, and obey the same authorisation rules the chunk does. A system where the transcript lives in Postgres and its embedding lives in a separate service that is updated "eventually" is a system where queries will silently return embeddings for deleted content or miss content that was just created — neither of which is acceptable.
## Decision [#decision]
Use PostgreSQL with the `pgvector` extension as the single production vector store. Embeddings live on the row they describe (or in a sibling table joined by primary key), committed in the same transaction as their source data.
## Consequences [#consequences]
**Atomic writes.** Inserting a transcript chunk and its embedding happens in one transaction. If the embedding fails to compute or save, the chunk rolls back. There is no asynchronous reconciliation process and no inconsistency window.
**One operational surface.** The database we already run, already back up, already monitor, already manage migrations for, is also the vector store. No second system to provision, secure, or teach on-call about.
**One authorisation model.** The row-level security rules that protect transcript data also protect the embeddings. We do not have to re-implement access control in a second system and hope the two models agree.
**Adequate performance at current scale.** `pgvector`'s IVFFlat and HNSW indexes are sufficient for our current and projected vector counts. We benchmark quarterly; we have not approached the scale where a purpose-built vector database would outperform `pgvector` by a margin that justifies the operational cost.
## Alternatives considered [#alternatives-considered]
* **Pinecone, Milvus, Weaviate.** Rejected for the split-brain failure mode and the second operational surface. Revisit if vector count per tenant exceeds \~10M and `pgvector` benchmarks degrade materially.
* **Embeddings in a denormalised column with in-Go cosine comparison.** Rejected for O(n) query cost — acceptable for small datasets in prototypes, unacceptable in production.
* **Embeddings in an object store with a hand-rolled ANN index.** Rejected for the cost of maintaining the index and the absence of transactional guarantees.
## Debt annotation [#debt-annotation]
**Principal:** None beyond the `pgvector` extension install, which is a single SQL statement per environment.
**Interest:** Low. `pgvector` is actively maintained and widely deployed; index tuning (IVFFlat `lists`, HNSW `ef_construction`) is a one-time cost per table.
**Multiplier:** Vector count per tenant. If a single tenant's embedding set grows beyond the point where `pgvector`'s ANN indexes outperform full scan by a useful margin — empirically, in the tens of millions — revisit this decision. The migration path is well-understood (dual-write, shadow-read, cut over), but non-trivial.
## Verification [#verification]
* `SELECT extname FROM pg_extension WHERE extname = 'vector';` returns a row on every environment.
* Transcript insertion and embedding insertion appear in the same transaction log entry.
* No application code writes to an external vector service.
## Related [#related]
* [Postgres stack principle](/docs/principles/stack/postgres)
* [AI Engineering principle](/docs/principles/ai-native/ai-engineering)
# Next.js with Server Components for the web app (/docs/decisions/0002-nextjs-ssr-for-app)
# 0002 — Next.js with Server Components for `wordloop-app` [#0002--nextjs-with-server-components-for-wordloop-app]
**Status:** Accepted
**Date:** 2026-04-19
**Deciders:** app platform
**Supersedes:** —
**Superseded by:** —
## Context [#context]
The Wordloop web app renders deeply nested AI-derived context: a Meeting contains TranscriptSegments, each segment has a speaker attribution (Person), the Meeting has a MeetingSynthesis with Topics and TalkingPoints, and a list of Tasks. Opening a Meeting is the single most common view in the product.
A client-side single-page application fetching this context produces a cascading waterfall. The client first fetches the Meeting, waits for the response, fetches the Transcription, waits, fetches Segments, waits, resolves Person records per speaker, waits, fetches the MeetingSynthesis, waits, fetches Tasks. Each hop is a full round trip between the browser and the edge — in practice, five to seven seconds of blank screen on a median connection before any meaningful content appears.
This is not a problem to optimise with skeleton screens or lazy loading. The waterfall is inherent to the data shape and the client-side fetch model.
## Decision [#decision]
Build `wordloop-app` on Next.js with the App Router and React Server Components. Meeting views, synthesis views, and the dashboard fetch their data on the server, close to the database, in a single request trip. The client receives the fully resolved DOM with content already present.
Client components remain where interactivity demands them: the live transcript stream, the editor, the command palette. These are bounded, named islands inside a server-rendered shell.
## Consequences [#consequences]
**Single round trip for the primary view.** Opening a Meeting is one request from the browser; all downstream data fetches happen server-side in parallel, close to the database. Time-to-meaningful-paint drops from seconds to hundreds of milliseconds.
**Database queries colocate with the code that needs them.** A Server Component can query Postgres directly (through our Go API in practice, but the programming model is the same: the fetch happens where the latency cost is lowest).
**Client bundles stay small.** Components that never run on the client are never shipped to the client. The JavaScript bundle for the Meeting view is a fraction of what it would be in a pure-SPA architecture.
**A sharper client/server boundary.** Server Components cannot use `useState`, `useEffect`, or browser APIs. The boundary is explicit and enforced by the framework, which catches a common class of hydration bugs at build time.
## Alternatives considered [#alternatives-considered]
* **Pure client-side React + Vite.** Rejected for the waterfall problem described above. Viable only if the data shape were flat, which it is not.
* **Remix / TanStack Start / other RSC-capable frameworks.** Considered equivalent in principle. Next.js chosen for the ecosystem maturity, the production track record of the App Router at our scale, and the team's existing expertise. Revisit if Next.js' direction diverges from our needs.
* **Hybrid: SPA shell + server-rendered HTML snippets.** Rejected for the cognitive overhead of maintaining two rendering models. Server Components give us the same benefit with a single programming model.
## Debt annotation [#debt-annotation]
**Principal:** Moderate. The team has internalised the Server/Client Component boundary; new engineers spend their first week understanding when to use which.
**Interest:** Low to moderate. Next.js ships breaking changes in major versions; we pin and plan upgrades quarterly. The RSC model itself is stable.
**Multiplier:** Framework direction. If Next.js' architectural direction diverges materially from our needs, the cost of migrating is proportional to the size of the app. The Server Components abstraction is portable — Remix and TanStack Start implement the same conceptual model — so the migration risk is bounded.
## Verification [#verification]
* Primary Meeting view renders meaningful content in a single round trip (observed in Core Web Vitals on production).
* `next build` output shows Server Components are not included in client chunks.
* No data-fetch waterfalls in the Network panel for the dashboard or Meeting view.
## Related [#related]
* [Frontend stack principle](/docs/principles/stack/frontend)
* [App Service handbook](/docs/learn/services/app)
# Stateful containers for the ML service (/docs/decisions/0003-stateful-containers-for-ml)
# 0003 — Stateful containers for `wordloop-ml` [#0003--stateful-containers-for-wordloop-ml]
**Status:** Accepted
**Date:** 2026-04-19
**Deciders:** ml platform
**Supersedes:** —
**Superseded by:** —
## Context [#context]
The ML service is responsible for real-time transcription of live Meeting audio, MeetingSynthesis generation from finalised Transcriptions, and embedding generation for retrieval. The transcription path is latency-critical: from the moment a person speaks to the moment the caption renders, the user-perceived budget is under one second.
Serverless function platforms — Lambda, Cloud Run with scale-to-zero, Vercel Edge — are excellent for bursty, stateless workloads with tolerant latency budgets. They are a poor fit for workloads that require:
1. Large model weights loaded into memory (several hundred MB to several GB).
2. Connection-level state for streaming audio frames.
3. Cold start times measured in seconds, which translate directly into user-visible silence during a live meeting.
A cold start of five to ten seconds on the first segment of a Meeting destroys the real-time experience. Warm-up pings mitigate but do not eliminate this, and the cost of keeping a serverless function permanently warm approaches the cost of a dedicated container.
## Decision [#decision]
Run `wordloop-ml` as long-lived FastAPI workers inside orchestrated containers. Models are loaded at container start and remain resident across requests. The container is the unit of scaling — we scale horizontally by adding more containers, not by spinning up more cold functions.
## Consequences [#consequences]
**Models stay warm.** The first segment of a Meeting transcribes with the same latency as the hundredth. No cold-start penalty on the user-visible path.
**Streaming state is preserved.** An audio stream's position, rolling buffer, and partial transcription state live in the container that handles the stream. No cross-invocation state-reconstruction step.
**Operational posture matches a normal service.** The ML service has rolling deploys, health checks, graceful shutdown, and horizontal scaling — the same operational shape as `wordloop-core`. On-call engineers use the same mental model.
**We pay for idle capacity.** A serverless model would scale to zero at night; our containers do not. At current traffic this is cheaper than the alternative (warm-keeping costs in a serverless model exceed the dedicated container cost), but the crossover point will change with usage patterns.
## Alternatives considered [#alternatives-considered]
* **Lambda / Cloud Functions with scale-to-zero.** Rejected for cold-start latency on the transcription hot path.
* **Cloud Run with always-on minimum instances.** Considered, and a reasonable alternative. We chose explicit container orchestration because it also handles the streaming-state requirement cleanly; Cloud Run's per-request model is awkward for long-lived WebSocket-adjacent connections. Revisit if Cloud Run's streaming support matures.
* **Dedicated GPU nodes.** Not yet required — our current model mix runs adequately on CPU. If we adopt models that demand GPU inference, the decision to run stateful containers still holds; we add GPU node pools.
* **Batch transcription only (no real-time path).** Rejected as a product decision — live transcription is a core Wordloop feature.
## Debt annotation [#debt-annotation]
**Principal:** Moderate. Operating a stateful service means we handle graceful shutdown, connection draining, and rolling-deploy choreography ourselves. This is well-trodden ground and our Go core already does the same.
**Interest:** Steady. Container images must be rebuilt when model weights or the Python runtime update; that is a normal CI cost.
**Multiplier:** Model size. If model weights grow past what fits comfortably in a container's memory budget (low single-digit GB), we may need to split inference into a dedicated model-serving layer (Triton, Ray Serve) fronted by thin FastAPI workers. The service boundary stays the same; the implementation changes.
## Verification [#verification]
* Time-to-first-caption on a cold Meeting start is under one second at p95 (observed in production latency dashboards).
* No cold-start warm-up hack exists in the deploy pipeline (no scheduled pings, no keep-warm loop).
* Model weights are loaded exactly once per container process, at boot.
## Related [#related]
* [ML Systems stack principle](/docs/principles/stack/ml-systems)
* [Real-Time system-design principle](/docs/principles/system-design/real-time)
* [ML Service handbook](/docs/learn/services/ml)
# Hosting-layer Link header for llms.txt discovery (/docs/decisions/0004-hosting-layer-llms-txt-link-header)
# 0004 — Hosting-layer `Link` header for `llms.txt` discovery [#0004--hosting-layer-link-header-for-llmstxt-discovery]
**Status:** Accepted
**Date:** 2026-04-19
**Deciders:** docs platform
**Supersedes:** —
**Superseded by:** —
## Context [#context]
The `llms.txt` specification recommends that sites advertise their machine-readable index via the HTTP `Link` header with `rel="llms-txt"`, in addition to serving the file at `/llms.txt`. This lets agents discover the index without guessing at conventional paths and without parsing HTML.
The Wordloop documentation site is built with Next.js 15 in static-export mode (`output: 'export'`). Static export does not support runtime middleware, route handlers that mutate response headers, or `next.config.js` `headers()` for the exported bundle — those hooks are only honoured by the Node server, which we are not running in production. Consequently, the `Link` header cannot be set at the framework layer.
The site is served by Firebase Hosting, which supports per-path response headers declaratively in `firebase.json` under `hosting.headers`.
## Decision [#decision]
Set the `Link: ; rel="llms-txt", ; rel="llms-full-txt"` header on every response from Firebase Hosting, via the `firebase.json` `hosting.headers` array. Additionally, set `Content-Type: text/markdown; charset=utf-8` on every `**/*.md` path so the per-page markdown exports are served with the correct media type, and `text/plain` on the two `llms*.txt` files.
The header applies to all paths (`source: "/**"`). The `rel` advertisement is cheap and universally safe — every Wordloop documentation page is a valid entry point for an agent that then looks up the index.
## Consequences [#consequences]
* Agents following the `llms.txt` discovery pattern via `curl -I` or a HEAD request find the index without needing to hardcode `/llms.txt`.
* The `.md` exports of each documentation page are served with the correct MIME type; command-line tooling (`curl`, `wget`) treats them as text.
* The configuration lives in `firebase.json` — a hosting-platform-specific file. If we ever migrate hosting providers, this configuration has to be reimplemented in the new provider's equivalent. This is captured in the debt annotation below.
## Alternatives considered [#alternatives-considered]
* **Set the header in a Next.js middleware.** Rejected: middleware is incompatible with static export.
* **Set the header via a meta tag in `
`.** Rejected: meta equivalents of the `Link` header (``) are not part of the spec and not observed by agents doing header-only HEAD requests.
* **Add an Express shim in front of the static export.** Rejected: introducing a server just to set one header sacrifices the operational simplicity that motivated static export in the first place.
* **Rely on convention only (`/llms.txt` at the root).** Rejected: the spec explicitly recommends the header. It is cheap to set and the canonical way for agents to discover the index.
## Debt annotation [#debt-annotation]
**Principal:** \~1 hour. One `firebase.json` edit, one ADR, one test.
**Interest:** Near-zero. The configuration does not drift; the header string is stable.
**Multiplier:** Hosting migration. If we move off Firebase, the `firebase.json` block has to be translated to the new hosting provider's header syntax. The content of the header does not change; only the declaration site does. If we ever move to a self-hosted Next.js runtime, the header moves to middleware and `firebase.json` can be discarded.
## Verification [#verification]
* `curl -I https://docs.wordloop.ai/docs/learn/architecture/overview` shows the `Link` header with both `llms-txt` and `llms-full-txt` targets.
* `curl -I https://docs.wordloop.ai/docs/learn/architecture/overview.md` returns `Content-Type: text/markdown; charset=utf-8`.
* `curl -I https://docs.wordloop.ai/llms.txt` returns `Content-Type: text/plain; charset=utf-8`.
## Related [#related]
* [Documentation principle](/docs/principles/foundations/documentation) — the dual-audience stance this header operationalises.
* [Agent-Native Systems](/docs/principles/ai-native/agent-native-systems) — the broader principle the discovery mechanism serves.
# Docs are canonical knowledge and skills are the agent execution layer (/docs/decisions/0005-docs-canonical-skills-execution-layer)
# 0005 — Docs are canonical knowledge and skills are the agent execution layer [#0005--docs-are-canonical-knowledge-and-skills-are-the-agent-execution-layer]
**Status:** Accepted
**Date:** 2026-05-01
**Deciders:** docs platform, agent tooling
**Supersedes:** —
**Superseded by:** —
## Context [#context]
Wordloop maintains both a documentation site and a set of agent skills. The docs site is built for humans and agents: it publishes navigable pages, `llms.txt`, `llms-full.txt`, per-page Markdown exports, and MCP resources. The skills are loaded by AI agents to guide task execution.
The previous stance kept docs and skills as fully separate surfaces. That avoided prompt-like content leaking into the docs site, but it also created a drift risk: durable engineering policy could be duplicated in both docs and skill files. We have already seen signs of this class of drift, such as stack-version claims differing between service docs and package metadata.
Modern skill design favours progressive disclosure: concise trigger metadata, a short operating contract, and selective loading of deeper references. This means skill files should not become large documentation mirrors. They should tell the agent what to read, how to act, and how to verify.
## Decision [#decision]
The documentation site is the canonical source for durable engineering knowledge. Agent skills are the execution layer that selects, loads, and applies that knowledge safely.
A docs page owns:
* Principles and architecture guidance.
* Service handbooks and implementation conventions.
* Workflow guides and runbooks.
* ADRs and decision history.
* Generated reference material from specs, schemas, and code.
* Glossary and domain vocabulary.
A skill owns:
* Triggering and task routing.
* Which docs pages to read for each task shape.
* Tool usage, command sequencing, and safety gates.
* Verification steps and eval discipline.
* Agent-specific constraints that do not belong in human-facing docs.
Skills may reference docs pages by slug or MCP resource. Docs pages must not depend on skill internals for their meaning.
## Consequences [#consequences]
* Durable guidance has one canonical maintenance path.
* Human and agent readers consume the same engineering knowledge.
* Skills remain smaller, more triggerable, and easier to evaluate.
* Documentation changes can identify affected skills through a skill-to-doc map.
* Skill changes can identify which canonical docs pages need review.
* The docs site needs stronger freshness, metadata, and health checks because more agent behaviour depends on it.
## Alternatives considered [#alternatives-considered]
* **Keep docs and skills completely separate.** Rejected because it preserves duplicated policy and makes drift a review-discipline problem only.
* **Move most docs into skills.** Rejected because skills are not a good human-reading surface and large skill files weaken progressive disclosure.
* **Have skills fetch arbitrary public documentation at runtime.** Rejected as the default because public retrieval introduces prompt-injection and freshness risks. Trusted local docs, generated Markdown exports, and the Wordloop MCP server are the default context path.
* **Generate skills entirely from docs.** Deferred. It may become useful for simple doc-reference sections, but skill trigger wording and safety gates still need deliberate evaluation.
## Debt annotation [#debt-annotation]
**Principal:** Medium. We need a skill-to-doc map, workflow docs, freshness metadata, and documentation health checks.
**Interest:** Low if automated checks run in CI; high if this remains a manual checklist.
**Multiplier:** Agent autonomy. The more agents rely on docs for task execution, the more expensive stale docs become.
## Verification [#verification]
* Each maintained skill declares its canonical docs dependencies in the skill-to-doc map.
* Documentation health checks validate mapped docs pages exist.
* Stale active docs are flagged by review cadence.
* Skill updates include a docs review step.
* Docs updates include an affected-skills review step.
## Related [#related]
* [Documentation](/docs/principles/foundations/documentation)
* [Agent-Native Systems](/docs/principles/ai-native/agent-native-systems)
* [Keep Docs and Skills in Sync](/docs/guides/keep-docs-and-skills-in-sync)
* [Correct Documentation Drift](/docs/guides/correct-documentation-drift)
# Architecture Decision Records (/docs/decisions)
# Architecture Decision Records [#architecture-decision-records]
An ADR is how we remember *why*. Code shows what we built; commit history shows when it changed; ADRs show which options we rejected, what tradeoffs we accepted, and what debt we took on. The log is **append-only**: once an ADR is accepted, it is never edited — only superseded.
## Why ADRs matter on this team [#why-adrs-matter-on-this-team]
Two years from now, an engineer — or an agent — will look at a piece of Wordloop and ask "why is this like this?" The answer lives in the ADR. Without it, every design decision regresses to "this is how it was when I got here," and the team loses the ability to challenge decisions on their merits because the merits have been forgotten. We write ADRs for decisions that will be expensive to reverse and decisions that will surprise a reader who does not share our context.
## Statuses [#statuses]
| Status | Meaning |
| -------------- | ------------------------------------------------- |
| **Proposed** | Authored but not yet accepted. Under discussion. |
| **Accepted** | Current, in force. |
| **Rejected** | Considered and declined, with reasoning. |
| **Deprecated** | No longer applicable, but historically important. |
| **Superseded** | Replaced by a later ADR (which links back). |
## Log [#log]
*The catalogue populates as decisions are committed. Each entry includes title, status, author, date, and a Principal / Interest / Multiplier debt annotation — see [Engineering Principles / Documentation](/docs/principles/foundations/documentation) for the model.*
Authoring a new ADR? Copy the frontmatter and 7-section structure from any existing ADR in this directory. The title is the decision in plain language; the filename is `NNNN-kebab-case-decision.mdx` with the next available number.
# Add an API Endpoint (/docs/guides/add-api-endpoint)
# Add an API Endpoint [#add-an-api-endpoint]
## Goal [#goal]
Add a new endpoint to `wordloop-core`, following the spec-first workflow so that the server handler, the TypeScript client, and the reference docs all stay aligned.
## Prerequisites [#prerequisites]
* Local stack running (`./dev start all`) — see [Quickstart](/docs/start/quickstart).
* Familiarity with [API Design](/docs/principles/system-design/api-design) and [Hexagonal Architecture](/docs/principles/system-design/hexagonal-architecture) principles.
## Steps [#steps]
### 1. Update the OpenAPI spec [#1-update-the-openapi-spec]
The spec is the source of truth. Open `specs/core-openapi.json` and add your endpoint:
* Path, method, operationId.
* Request and response schemas with descriptions on every field.
* Example payloads.
* Error responses mapped to our standard error codes ([Reference / Errors](/docs/reference/errors)).
### 2. Regenerate handlers and clients [#2-regenerate-handlers-and-clients]
```bash
./dev generate core
```
This produces the server-side handler stub and the TypeScript client surface. See [Code Generation](/docs/guides/code-generation) for details on what runs under the hood.
### 3. Implement the handler [#3-implement-the-handler]
Fill in the generated handler stub. Handlers stay thin — extract inputs, call the application service, shape the response. Business rules belong in the domain; orchestration belongs in the application service.
### 4. Write a service test [#4-write-a-service-test]
In the handler's test file, spin up the Testcontainers Postgres, make the HTTP call, assert on behaviour and on the OTel trace shape. See [Testing](/docs/principles/foundations/testing) for the discipline.
### 5. Run the relevant checks [#5-run-the-relevant-checks]
```bash
./dev lint core
./dev test core
```
## Verification [#verification]
* `./dev test core` passes.
* The [Core API Reference](/docs/reference/api/core) renders the new endpoint automatically.
* Hitting the endpoint from the local frontend produces the expected response.
## Troubleshooting [#troubleshooting]
* **Generated code is out of date.** Re-run `./dev generate core` and commit the generated files.
* **Testcontainers failing to start.** Check `./dev status` and that Docker is running.
* **Frontend cannot reach the endpoint.** The frontend uses the generated TypeScript client; re-running generation and restarting the Next.js dev server usually fixes it.
See [API Design](/docs/principles/system-design/api-design) for the stance this workflow expresses.
# Add a Service (/docs/guides/add-service)
# Add a Service [#add-a-service]
## Goal [#goal]
Scaffold a new backend service that conforms to our platform conventions — hexagonal structure, OTel instrumentation, standard CI pipeline, `./dev` integration — from day one.
## Prerequisites [#prerequisites]
* An accepted [ADR](/docs/decisions) justifying the new service. "We could just add this to `wordloop-core`" is often the right answer; the ADR documents why it is not.
* Familiarity with [Hexagonal Architecture](/docs/principles/system-design/hexagonal-architecture), the [Platform](/docs/principles/delivery/platform) stance, and [Go Services](/docs/principles/stack/go-services) or [ML Systems](/docs/principles/stack/ml-systems) depending on the language.
## Steps [#steps]
### 1. Use the scaffolding template [#1-use-the-scaffolding-template]
Our platform ships a bootstrapping template per supported language. It produces:
* The hexagonal directory layout (`domain/`, `ports/`, `adapters/`, `application/`).
* A stub HTTP server with OTel instrumentation configured.
* A standard CI pipeline definition.
* Dockerfile and Cloud Run deployment config.
* `./dev` integration (start, stop, logs, test, lint).
### 2. Register the service with the platform [#2-register-the-service-with-the-platform]
Add the service to the platform's service registry so that shared tooling — observability, feature flags, secrets — knows it exists. This is the step that makes the service "real" to the rest of the platform.
### 3. Write the first ADR [#3-write-the-first-adr]
A new service is a decision. Capture its purpose, its expected ownership, and the debt it carries (runtime cost, operational surface, coordination overhead) as an ADR.
### 4. Define the service's first SLO [#4-define-the-services-first-slo]
Before the service receives traffic, define the user-facing SLO it will live inside ([Reliability](/docs/principles/quality/reliability)). An SLO-less service is a service that nobody can defend.
### 5. Write the service handbook [#5-write-the-service-handbook]
Create `content/docs/learn/services//` with `index.mdx`, `architecture.mdx`, and `implementation.mdx`. The handbook explains the "why" that the code cannot.
## Verification [#verification]
* `./dev start ` starts the service cleanly.
* `./dev test ` passes.
* The service is visible on the platform observability dashboard.
* A fresh engineer can open the service handbook and understand the shape.
## Troubleshooting [#troubleshooting]
* **OTel not exporting.** Check that the service registered its collector endpoint; the template defaults should work but custom configuration may override.
* **CI failing on the first push.** The template ships a minimal CI pipeline; extend it with service-specific tests as needed.
See [Platform](/docs/principles/delivery/platform) for the broader stance on service scaffolding.
# Code Generation (/docs/guides/code-generation)
# Code Generation [#code-generation]
The platform uses code generation pipelines to keep API contracts in sync across all services.
## Event types (AsyncAPI) [#event-types-asyncapi]
The AsyncAPI specification in `services/wordloop-core/asyncapi.yaml` is the single source of truth for all event-driven types (WebSocket events and Pub/Sub messages).
```bash
# Compile AsyncAPI spec to typed internal Events for all services
./dev gen events
```
This produces:
| Target | Tool | Output |
| -------------- | -------------------------- | -------------------------------------------------------------------------- |
| **Go** | `asyncapi-codegen` | `services/wordloop-core/internal/provider/generated/asyncapi.gen.go` |
| **TypeScript** | `@asyncapi/cli` (Modelina) | `services/wordloop-app/lib/generated/asyncapi.ts` |
| **Python** | `@asyncapi/cli` (Modelina) | `services/wordloop-ml/src/wordloop/providers/generated/asyncapi_models.py` |
Consumer scripts (App, ML) try to fetch the spec from a running Core instance at `http://localhost:4002/asyncapi.yaml` first, and fall back to the local monorepo path for offline generation.
:::info
Core owns the spec and generates its own types locally. App and ML are consumers that pull the spec from Core — following the same pattern as OpenAPI client generation.
:::
## Core → ML client (oapi-codegen) [#core--ml-client-oapi-codegen]
`wordloop-core` generates a Go HTTP client for calling `wordloop-ml`'s API.
```bash
# Core must be running at localhost:4002 and ML at localhost:4003
./dev gen clients
```
Under the hood:
```bash
cd services/wordloop-core
WORDLOOP_ML_BASE_URL=http://127.0.0.1:4003 ./scripts/generate-clients.sh
```
**Adding a new external API client in Core:**
1. Create `internal/provider//`
2. Add an `oapi-codegen.yaml` config in that directory
3. Set `_BASE_URL` when running the script
## ML → Core client (openapi-python-client) [#ml--core-client-openapi-python-client]
`wordloop-ml` generates a Python client for calling `wordloop-core`'s API.
```bash
# Generated simultaneously alongside Core's
./dev gen clients
```
Under the hood:
```bash
cd services/wordloop-ml
./scripts/generate_wordloop_core_client.sh
```
The generated client is written to `src/wordloop/providers/wordloop_core/client/` and **must not be edited manually**.
## App TypeScript client (Orval) [#app-typescript-client-orval]
`wordloop-app` generates TypeScript types, SWR hooks, and API functions from Core's OpenAPI spec.
```bash
# Generated simultaneously via Orval
./dev gen clients
```
Under the hood:
```bash
curl http://localhost:4002/openapi.json -o services/wordloop-app/openapi.json
cd services/wordloop-app && pnpm orval
```
The generated file is `lib/api/generated.ts` — **never edit it manually**. Use the wrapper hooks in `hooks/use-data.ts`.
## Regenerate everything [#regenerate-everything]
```bash
# All services must be running for clients to pull live specs
./dev gen all
```
This runs: `events` → `clients` → `docs`.
# Correct Documentation Drift (/docs/guides/correct-documentation-drift)
# Correct Documentation Drift [#correct-documentation-drift]
## TL;DR [#tldr]
Do not fix drift by editing the first wrong-looking page. First classify the disagreement, identify the source of truth, decide whether the current system or the documented intent is correct, then update every affected surface in one change.
## Drift types [#drift-types]
| Drift type | Example | Default source of truth |
| ---------------------------- | ------------------------------------------------------- | ----------------------------------------------- |
| Docs vs code | Docs say Next.js 15; package metadata says Next.js 16. | Code and package metadata |
| Docs vs generated contract | Guide names an endpoint missing from OpenAPI. | OpenAPI or AsyncAPI source |
| Docs vs skill | Skill duplicates old architecture guidance. | Docs for knowledge; skill for execution |
| Data flow vs implementation | TDD says Core publishes an event that code never emits. | Active delivery decision, then code/tests |
| Diagram vs topology | Architecture diagram omits a service boundary. | Code, deployment config, specs, traces |
| ADR vs current docs | Principle page contradicts an accepted ADR. | ADR until superseded |
| Active bet vs delivered code | TDD intent differs from implementation. | Product decision: fix code or revise active TDD |
| Runbook vs operations | Runbook references a retired dashboard. | Current operational tooling |
## Workflow [#workflow]
### 1. Capture the mismatch [#1-capture-the-mismatch]
Write down the two or more conflicting claims. Be concrete:
* Page or file path.
* Claim text or diagram element.
* Source that contradicts it.
* Date or commit where the contradiction appeared, if known.
Avoid vague reports such as "docs are stale." They are not actionable.
### 2. Classify the surfaces [#2-classify-the-surfaces]
Mark each surface as one of:
* **Generated reference** — contracts, schemas, CLI tables, error catalogues.
* **Runtime source** — code, tests, migrations, deployment config, traces.
* **Active guidance** — principles, service handbooks, runbooks, active TDD docs.
* **Historical record** — accepted ADRs, delivered bets, incident records.
* **Agent execution** — skills and skill evals.
### 3. Identify the source of truth [#3-identify-the-source-of-truth]
Use this order unless the page states a stricter rule:
1. Generated contracts and schemas define public interfaces.
2. Code, migrations, deployment config, and tests define shipped behaviour.
3. Accepted ADRs define historical decisions until superseded.
4. Active bet and TDD docs define current delivery intent before shipping.
5. Principle and service handbook pages define durable guidance.
6. Skills define agent execution behaviour, not durable engineering knowledge.
### 4. Decide whether to fix code or docs [#4-decide-whether-to-fix-code-or-docs]
A mismatch does not always mean the docs are wrong. Ask:
* Did code drift away from an intentional design?
* Did the design change but docs were not updated?
* Did a generated reference fail to regenerate?
* Did a skill preserve old policy after docs changed?
* Did an ADR get superseded without a new ADR?
If the documented design is still correct, fix code or create a delivery task. If shipped behaviour is correct, update active docs and skill references.
### 5. Update all affected surfaces [#5-update-all-affected-surfaces]
A complete drift correction may need changes to:
* Docs page content and `last_reviewed` metadata.
* Diagrams and data-flow descriptions.
* OpenAPI or AsyncAPI specs.
* Code, tests, migrations, or deployment config.
* ADRs when the decision changed.
* Skill context routing and verification steps.
* `llms.txt`, `llms-full.txt`, and Markdown exports.
* Skill-to-doc map entries.
### 6. Add a regression guard [#6-add-a-regression-guard]
Choose the cheapest guard that would have caught the drift:
* Health check for version strings, missing frontmatter, or broken links.
* Contract generation check for API/event reference drift.
* Diagram drift check for service-topology claims.
* Test or trace assertion for runtime flow claims.
* Skill eval for agent behaviour drift.
* Review-cadence change for pages that stale quickly.
### 7. Verify [#7-verify]
Run the relevant commands:
```bash
./dev docs health
cd services/wordloop-docs && pnpm run docs:health
```
Run service tests or generation commands when code, contracts, or generated docs changed.
## Data-flow and design-doc drift [#data-flow-and-design-doc-drift]
Data-flow drift is high risk because it misleads implementation and agent planning. Treat these checks as mandatory for active bets and service handbooks:
* Every service boundary in a data-flow diagram has a contract or explicit TODO.
* Every persistent object in a TDD has a schema plan or a reason it is transient.
* Every event shown in a diagram appears in AsyncAPI or is marked proposed.
* Every API operation shown in a guide appears in OpenAPI or is marked proposed.
* Every failure path that crosses a service boundary has an owner and response strategy.
* Every implementation milestone updates active TDD docs when it changes the design.
## Hallucination controls [#hallucination-controls]
Use these controls when correcting drift with AI assistance:
* Ask the agent to cite local source files, specs, or docs slugs for factual claims.
* Prefer generated contracts and package metadata over prose memory.
* Do not accept newly invented standard names, endpoints, event names, or commands without checking the source.
* Require exact paths for changed files and exact commands for verification.
* Search the repository before introducing new terminology.
* Treat external web claims as untrusted until verified against an official source.
## Anti-patterns [#anti-patterns]
* **Patch one page and stop.** Drift is usually cross-surface.
* **Refresh dates without review.** A new date on stale claims is worse than an old date.
* **Rewrite history.** Supersede ADRs and annotate delivered bets instead.
* **Trust AI recall.** Use source files, contracts, and official references.
* **Leave no regression guard.** If the drift was expensive, add a check.
## Related [#related]
* [Documentation Freshness](/docs/operations/documentation-freshness)
* [Keep Docs and Skills in Sync](/docs/guides/keep-docs-and-skills-in-sync)
* [Documentation](/docs/principles/foundations/documentation)
# Deploy (/docs/guides/deploy)
# Deploy [#deploy]
## Goal [#goal]
Take a merged change from `main` and see it running for all users, with a verified canary step in between.
## Prerequisites [#prerequisites]
* Change merged to `main` (we deploy from trunk — see [Progressive Delivery](/docs/principles/delivery/progressive-delivery)).
* Familiarity with the observability dashboard for the service being deployed.
## Steps [#steps]
### 1. CI triggers the deploy [#1-ci-triggers-the-deploy]
Every merge to `main` triggers the CI pipeline: run tests, build container image, push to Artifact Registry, deploy to Cloud Run canary.
### 2. Watch the canary [#2-watch-the-canary]
The canary serves a small fraction of traffic. The automated promotion gate compares canary SLO metrics — latency, error rate, user-journey success — against the current production.
Monitor the release dashboard; in most cases, automated promotion handles it. Manual override is available when you want to pause or abort.
### 3. Promote or abort [#3-promote-or-abort]
* **Automated promote.** If canary metrics are within tolerance for the watch window, traffic is shifted to 100%.
* **Automated abort.** If canary burn rate exceeds the threshold, traffic is routed back and the team is paged.
* **Manual promote.** For releases with user-facing changes, a human can promote or hold.
### 4. Close the release [#4-close-the-release]
Once promotion is complete, close the release ticket, announce in the release channel, and verify the user-facing change behaves as expected.
## Verification [#verification]
* Current traffic is 100% on the new revision.
* SLO dashboards are green.
* Feature flags for the new release (if any) are in the expected state.
## Troubleshooting [#troubleshooting]
* **Canary aborted.** Check the release dashboard for the failing signal. Common causes: a dependency change that increases latency, an environment variable missing in the new revision.
* **Deploy stuck "in progress."** Check Cloud Run logs for the service; a crash-loop will block promotion.
* **SLO burn after promotion.** Roll back via the dashboard; file the incident ticket.
See [Progressive Delivery](/docs/principles/delivery/progressive-delivery) for the broader stance and [Operations / Runbooks](/docs/operations/runbooks) for post-deploy recovery procedures.
# Guides (/docs/guides)
# Guides [#guides]
Guides are **task-oriented**: each one walks you through completing a specific goal, from first command to verification. They assume you already know roughly why you want to do the thing; if you do not, follow the links into [Learn](/docs/learn) or [Engineering Principles](/docs/principles) from inside the guide.
## Developer workflow [#developer-workflow]
## How to read a guide [#how-to-read-a-guide]
Every guide is structured the same way: **Goal → Prerequisites → Steps → Verification → Troubleshooting**. If you find a step that fails in a way the guide does not cover, treat that as a bug in the documentation and open a PR against the guide itself — see [Your First Contribution](/docs/start/first-contribution).
# Keep Docs and Skills in Sync (/docs/guides/keep-docs-and-skills-in-sync)
# Keep Docs and Skills in Sync [#keep-docs-and-skills-in-sync]
## TL;DR [#tldr]
Docs hold durable engineering knowledge. Skills control agent execution. When either surface changes, update the skill-to-doc map, review the other surface, run documentation health checks, and evaluate any affected skill behaviour.
## When to use this workflow [#when-to-use-this-workflow]
Use this workflow when you:
* Change a principle, service handbook, workflow guide, runbook, or reference page that an agent skill may load.
* Create, edit, split, rename, or remove an agent skill.
* Move durable guidance from a skill into the docs site.
* Add a docs page that should become canonical context for an existing skill.
* Change skill trigger wording, safety gates, verification commands, or reference-loading instructions.
## Source-of-truth rule [#source-of-truth-rule]
| Content type | Canonical home |
| -------------------------------------------- | -------------------------------------- |
| Durable architecture and engineering policy | Docs site |
| Service-specific implementation conventions | Docs site |
| API, event, schema, CLI, and error reference | Generated docs where possible |
| Historical decisions | ADRs |
| Active delivery intent | Active bet and TDD docs |
| Skill triggering and task routing | Skill frontmatter and SKILL.md |
| Agent safety gates and verification workflow | Skill SKILL.md |
| Skill evaluation prompts and harness | Skill workspace or skill-factory evals |
## Workflow: changing docs [#workflow-changing-docs]
1. **Identify affected skills.** Check the skill-to-doc map for skills that depend on the page.
2. **Update the docs page.** Keep the page human-readable and agent-readable. Do not write prompt-like instructions into human docs.
3. **Update freshness metadata.** Change `last_reviewed` only after checking the claims against the source of truth.
4. **Review affected skills.** Check whether the skill still points to the right page, loads the right context, and verifies the right behaviour.
5. **Update skill references if needed.** Keep the skill concise; point to docs instead of copying durable guidance.
6. **Run health checks.** Use `./dev docs health` from the platform root.
7. **Run skill evals when behaviour changed.** If trigger wording, routing, or safety gates changed, run representative skill prompts before merging.
## Workflow: changing skills [#workflow-changing-skills]
1. **Decide whether the change is knowledge or execution.** Move durable knowledge to docs. Keep execution behaviour in the skill.
2. **Update the source skill.** Edit `tools/skill-factory/skills//` first; sync to `.agents/skills/` after review.
3. **Update the skill-to-doc map.** Add, remove, or rename canonical docs dependencies.
4. **Review mapped docs pages.** Confirm the docs still contain the knowledge the skill is expected to load.
5. **Create or update eval prompts.** Include should-trigger and should-not-trigger cases for trigger changes.
6. **Run health checks.** Confirm mapped docs pages and skill paths exist.
7. **Sync consumed skills.** Run `./dev sync skills` or copy the reviewed skill into `.agents/skills/` using the approved repository workflow.
## Skill-to-doc map rules [#skill-to-doc-map-rules]
Each maintained skill should declare:
* The skill name.
* The source skill path.
* The consumed skill path.
* Canonical docs dependencies by docs slug.
* Optional secondary docs used for specific task variants.
* The review owner.
The map is intentionally lightweight. It does not prove semantic correctness; it makes affected-surface review discoverable.
## Review checklist [#review-checklist]
* Does the skill still trigger for the right user prompts?
* Does the skill avoid triggering for adjacent but wrong prompts?
* Does the skill load canonical docs instead of duplicating them?
* Does the docs page avoid agent-only prompt language?
* Do docs, skills, code, generated specs, and ADRs agree on the source-of-truth hierarchy?
* Did `last_reviewed` change only after a real review?
* Did generated `llms-full.txt` and Markdown exports stay current?
## Anti-patterns [#anti-patterns]
* **Shadow policy in skills.** Durable rules copied into SKILL.md instead of linked to docs.
* **Prompt-shaped docs.** Human docs that read like system prompts.
* **Unmapped skills.** A skill that depends on docs but is invisible to health checks.
* **Blind freshness updates.** Changing `last_reviewed` without validating claims.
* **Eval-free trigger edits.** Changing trigger wording without testing realistic prompts.
## Related [#related]
* [Documentation](/docs/principles/foundations/documentation)
* [Documentation Freshness](/docs/operations/documentation-freshness)
* [Correct Documentation Drift](/docs/guides/correct-documentation-drift)
* [Docs are canonical knowledge and skills are the agent execution layer](/docs/decisions/0005-docs-canonical-skills-execution-layer)
# Migrate the Schema (/docs/guides/migrate-schema)
# Migrate the Schema [#migrate-the-schema]
## Goal [#goal]
Change the Postgres schema in a way that is safe for production: additive first, reversible, and non-blocking on hot tables.
## Prerequisites [#prerequisites]
* Familiarity with [Postgres](/docs/principles/stack/postgres) and [Data Engineering](/docs/principles/system-design/data-engineering) principles.
* Local stack running (`./dev start infra`) so you can test the migration against a real database.
## Steps [#steps]
### 1. Draft the migration [#1-draft-the-migration]
Migrations live under `services/wordloop-core/migrations/` (or the equivalent directory for the service that owns the schema). Name them by timestamp and intent: `20260419123000_add_loops_archived_at.up.sql`.
Write the `.up.sql` **additively**:
* Add columns as nullable, or with a default expression that is cheap on a hot table.
* Add new tables as empty.
* Never rename or drop in a single migration — split into "add new", "backfill", "stop reading old", "drop old" across releases.
Write the `.down.sql` as an exact reverse, tested locally.
### 2. Test locally [#2-test-locally]
```bash
./dev migrate up
./dev migrate down
./dev migrate up
```
Round-tripping catches broken `.down.sql` early.
### 3. Backfill in a separate job [#3-backfill-in-a-separate-job]
If the column needs a non-trivial value on historical rows, write a backfill job that chunks through the table and commits in batches. Do **not** backfill inside the migration itself — long-running DDL blocks replication and terrifies on-call engineers.
### 4. Coordinate with consumers [#4-coordinate-with-consumers]
If the schema change is part of a renaming or restructuring, the order of deploys matters:
* Deploy the code that reads both old and new columns.
* Run the migration.
* Backfill.
* Deploy the code that reads only the new column.
* In a later release, drop the old column.
### 5. Commit the migration and the code change together [#5-commit-the-migration-and-the-code-change-together]
The PR should include the migration and the code that uses it. Reviewers can see the full scope of the change.
## Verification [#verification]
* `./dev migrate status` shows the migration as applied.
* Service tests pass against the migrated schema.
* Rollback tested locally.
* [Database Reference](/docs/reference/database) regenerates cleanly.
## Troubleshooting [#troubleshooting]
* **`ALTER TABLE` is taking forever in staging.** If it is a large table with a `NOT NULL DEFAULT`, the DDL is rewriting every row. Split into "add nullable → backfill → tighten to NOT NULL."
* **`.down.sql` fails.** Down migrations often break when the up migration contains data transformations. Consider whether the down is genuinely needed; some migrations are forward-only (and the code has to be able to tolerate that).
See [Postgres](/docs/principles/stack/postgres) for the stance that shapes this workflow.
# Run Tests (/docs/guides/run-tests)
# Run Tests [#run-tests]
## Goal [#goal]
Run the right tests for the change you are making — unit, service, or system — and read the output in a way that makes failures actionable.
## Prerequisites [#prerequisites]
* Local stack bootstrapped (`./dev start infra`) so that Testcontainers has a working Docker daemon.
* Familiarity with [Testing](/docs/principles/foundations/testing) — especially the "favour service tests over unit tests" and "emulate, don't mock" disciplines.
## Steps [#steps]
### 1. Run per-service tests [#1-run-per-service-tests]
```bash
./dev test core # Go service tests for wordloop-core
./dev test ml # Python tests + evals for wordloop-ml
./dev test app # Vitest + React Testing Library for wordloop-app
./dev test # Everything
```
Service tests spin up real Postgres and Pub/Sub containers where needed.
### 2. Run system tests [#2-run-system-tests]
System tests exercise multiple services together through their real APIs and trace assertions.
```bash
./dev test system
```
These take longer; run them before opening a PR that touches multiple services.
### 3. Run with race detection (Go) [#3-run-with-race-detection-go]
```bash
./dev test core -- -race
```
Concurrency bugs are easier to find than to debug; run with `-race` on any change that touches goroutines.
### 4. Run ML evals [#4-run-ml-evals]
```bash
./dev test ml -- --evals
```
Runs the committed eval set. Regressions above the threshold fail the command.
## Verification [#verification]
* Exit code 0 on the targeted suites.
* Trace assertions pass (no missing spans).
* Coverage report (if enabled) shows the change is exercised.
## Troubleshooting [#troubleshooting]
* **"Cannot connect to Docker daemon."** Start Docker Desktop; verify with `./dev status`.
* **Testcontainers start slow.** First run pulls the Postgres image; subsequent runs use the cached image.
* **Flaky test.** Flakiness is a bug. File it; do not retry until green.
See [Testing](/docs/principles/foundations/testing) for the underlying stance.
# Learn the Platform (/docs/learn)
# Learn the Platform [#learn-the-platform]
This section is for understanding — the *why* and *how* behind Wordloop. It is not a tutorial (see [Start Here](/docs/start/quickstart)) and it is not a reference (see [Reference](/docs/reference)). It is the narrative layer that turns a repository of code into a system you can reason about.
## What you will find here [#what-you-will-find-here]
## How to read this section [#how-to-read-this-section]
Start with **Concepts** if the domain is new to you — understanding what a Meeting, Person, and MeetingSynthesis mean in code matters for every change downstream. Move to **Architecture** to see how services compose into a platform, then drop into a **Service** handbook when you need implementation-level depth.
If you want to know *what we believe* about building software at this scale and why, read [Engineering Principles](/docs/principles). If you want to *do something*, see [Guides](/docs/guides). If you want to *look something up*, see [Reference](/docs/reference).
# Documentation Freshness (/docs/operations/documentation-freshness)
# Documentation Freshness [#documentation-freshness]
## TL;DR [#tldr]
Every active documentation page needs an owner, a review cadence, and a visible freshness state. Stale docs are not automatically wrong, but they are lower-trust until reviewed. Historical records such as ADRs and delivered bets are handled differently: they are preserved, corrected with explicit notes when necessary, or superseded.
## Freshness states [#freshness-states]
| State | Meaning | Reader guidance |
| -------------- | ------------------------------------------------------ | ----------------------------------------------------------------------------- |
| **Fresh** | `last_reviewed` is inside the review window. | Treat as current unless code or contracts prove otherwise. |
| **Review due** | The review window has passed. | Use with caution; verify against source of truth before making major changes. |
| **Stale** | The page is more than one review window overdue. | Do not use as authoritative without checking code, specs, traces, or owners. |
| **Generated** | The page is produced from code, contracts, or schemas. | Regenerate from source instead of editing by hand. |
| **Historical** | The page records past intent or decisions. | Preserve history; supersede or add correction notes instead of rewriting. |
## Review windows [#review-windows]
| Surface | Default review window | Status model | Source of truth |
| ------------------- | -----------------------------------: | ------------------- | ------------------------------------------ |
| Principles | 6 months | Active | Docs and accepted ADRs |
| Service handbooks | 3 months | Active | Code, package metadata, architecture docs |
| How-to guides | 6 months | Active | Commands, workflows, and tested paths |
| Runbooks | 3 months | Active | Operational reality and incident follow-up |
| API reference | Every contract change | Generated | OpenAPI specs |
| Event reference | Every contract change | Generated | AsyncAPI specs |
| Database reference | Every schema change | Generated or active | Migrations and schema introspection |
| Glossary | 6 months | Active | Domain vocabulary and product language |
| Active bet TDD docs | Every material implementation change | Active | Current delivery intent and code reality |
| Delivered bet docs | No expiry | Historical | Archived design record |
| ADRs | No expiry | Historical | Append-only decision record |
| Agent skills | Every skill or mapped docs change | Active | Skill source plus mapped docs pages |
## Required frontmatter [#required-frontmatter]
Active authored pages should include:
```yaml
title: Documentation
description: One sentence describing the page.
audience: engineers
owner: docs-platform
last_reviewed: 2026-05-01
review_frequency: P6M
status: active
source_of_truth: docs
```
Generated pages should declare that they are generated where the generator supports it:
```yaml
status: generated
source_of_truth: specs/core-openapi.json
```
Historical pages should not be forced into an active freshness cycle:
```yaml
status: historical
source_of_truth: accepted-adr
```
## Review triggers [#review-triggers]
Review a page before its normal review window when one of these events happens:
* A package, language runtime, framework, or infrastructure version changes.
* A public command, environment variable, port, endpoint, event, or schema changes.
* A service boundary or data-flow diagram changes.
* A skill starts depending on the page for agent execution.
* An incident exposes missing or misleading operational guidance.
* An ADR supersedes a decision that the page explains.
* A user or agent reports confusion caused by the page.
## Stale-page handling [#stale-page-handling]
1. **Classify the page.** Decide whether it is active, generated, or historical.
2. **Find the source of truth.** Use code, specs, migrations, traces, ADRs, or active design docs depending on the claim.
3. **Update the page or mark it historical.** Do not silently keep stale active guidance.
4. **Update `last_reviewed`.** Only update the date after checking the claims, not after touching formatting.
5. **Run documentation health checks.** Confirm metadata, internal links, skill-doc references, and generated corpora are still valid.
6. **Review affected skills.** If a skill depends on the page, check whether the skill's routing or verification steps need to change.
## What not to do [#what-not-to-do]
* Do not refresh `last_reviewed` without reviewing the claims.
* Do not rewrite accepted ADRs to make them current.
* Do not edit generated reference pages by hand.
* Do not hide stale badges because they are inconvenient.
* Do not rely on humans to notice stale stack versions, command names, or broken links when a script can check them.
## Commands [#commands]
Run the health check from the platform root:
```bash
./dev docs health
```
Run the underlying docs script directly when working inside the docs service:
```bash
cd services/wordloop-docs
pnpm run docs:health
```
## Related [#related]
* [Documentation](/docs/principles/foundations/documentation)
* [Keep Docs and Skills in Sync](/docs/guides/keep-docs-and-skills-in-sync)
* [Correct Documentation Drift](/docs/guides/correct-documentation-drift)
# Operations (/docs/operations)
# Operations [#operations]
The Operations section is written for the person staring at a red graph at 3am — or the one who will, one day. It is different from [Guides](/docs/guides): guides walk you through a happy-path operation you *want* to perform; runbooks walk you through a degraded state you *have to* respond to.
## When to use this section [#when-to-use-this-section]
## Writing for 3am [#writing-for-3am]
Operational documentation has a harsh audience: a stressed engineer under time pressure. The bar is high.
* **State the goal at the top.** Every runbook begins with "This runbook restores *X* when *Y*."
* **Number the steps.** Imperative sentences. Exact commands, exact flags, exact expected output.
* **Include rollback.** Every step that changes state must explain how to undo it.
* **Link to observability.** Every step that checks state must link to the dashboard that proves it.
* **Close with escalation.** If the runbook fails, who or what is next?
See [Engineering Principles / Reliability](/docs/principles/quality/reliability) for why we hold this bar.
# On-Call (/docs/operations/on-call)
# On-Call [#on-call]
On-call is the contract we sign with our users: if the platform breaks, someone is responsible for putting it back together, and that someone is paged promptly. This page describes how the rotation is structured, how incidents are handled, and the tools an on-call engineer should have open before their shift starts.
## Rotation [#rotation]
Primary and secondary on-call shifts run in one-week blocks. The calendar is maintained in our paging system; pages route to the current primary with automatic escalation to the secondary if unacknowledged.
## Before your shift [#before-your-shift]
1. **Skim the last two weeks of incidents.** Patterns recur — knowing the last time this alert fired is usually the fastest lead.
2. **Confirm paging works.** Send yourself a test page; verify the escalation chain.
3. **Verify dashboard access.** Observability dashboards, feature-flag console, deploy dashboard, Cloud Run console, database console.
4. **Review recent deploys.** A page five minutes after a deploy is almost certainly about the deploy.
## When you are paged [#when-you-are-paged]
1. **Acknowledge within 5 minutes.** Even if you are not ready to act, acknowledge stops escalation.
2. **Open the incident channel.** The paging system creates one automatically; post your initial assessment there.
3. **Localise, don't rebuild.** Use [Troubleshooting](/docs/operations/troubleshooting) to find the matching diagnostic tree. Do not write new code in an incident unless necessary.
4. **Apply the relevant [runbook](/docs/operations/runbooks).** If none exists, write one during the postmortem.
5. **Escalate when stuck.** 30 minutes without progress is the soft threshold. Call the secondary; call the service owner; call the service leader.
## Communication [#communication]
The incident channel is the record. Post:
* What you saw (the symptom).
* What you checked (the diagnostic path).
* What you did (the mitigation).
* Who else is involved.
One line every few minutes is better than radio silence. Other engineers read the channel to decide whether to jump in; absence of updates reads as "this is handled" when it may not be.
## After the incident [#after-the-incident]
* **Close the page.** Confirm the alert is cleared.
* **Open a postmortem ticket.** Use the blameless postmortem template; name the specific reliability assumption that was invalidated.
* **File action items.** One concrete, closable ticket per action. "Be more careful" is not an action item.
* **Update the runbook.** If the runbook missed a step, fix it while the experience is fresh.
## Tools every on-call engineer should have ready [#tools-every-on-call-engineer-should-have-ready]
* Observability dashboards, pinned per service.
* Deploy dashboard with rollback on hand.
* Feature-flag console with write access.
* Cloud Run console with per-service revision access.
* Database console (read-only by default; write access only on demand, with an audit trail).
* The team's runbook index.
## Related [#related]
* [Reliability](/docs/principles/quality/reliability) — the SLO and error-budget model that shapes what gets paged.
* [Troubleshooting](/docs/operations/troubleshooting) — diagnostic trees for common symptoms.
* [Runbooks](/docs/operations/runbooks) — step-by-step recovery procedures.
# Troubleshooting (/docs/operations/troubleshooting)
# Troubleshooting [#troubleshooting]
This page is for the "something feels off" moment, before you know which runbook to follow. It is a set of diagnostic trees — start from the symptom you can see, follow the branch that narrows the cause, then consult the matching [runbook](/docs/operations/runbooks) or escalate.
## Symptom: the frontend is blank after sign-in [#symptom-the-frontend-is-blank-after-sign-in]
1. **Check the browser console.** Look for 401/403 from `wordloop-core` → Clerk token issue. Look for 5xx → backend issue.
2. **Check the Core service health.** Hit `/healthz` on Core. If it responds, the backend is up; the problem is in auth or in the specific call the app makes first.
3. **Check JWT verification logs** on Core for the incoming request. A mismatch between the Clerk environment and the Core configuration will produce "token signature does not verify" here.
## Symptom: transcription lag is spiking [#symptom-transcription-lag-is-spiking]
1. **Check the ML service trace**. Filter for `transcribe.turn` spans with latency > SLO. If the model call itself is slow, the model provider or network is the cause.
2. **Check the model-client adapter logs.** Rate-limit responses from the provider surface here.
3. **Check the audio queue depth**. If the queue is deep, consumers are not keeping up — scale the ML workers or investigate a backpressure signal.
## Symptom: WebSocket connections drop repeatedly [#symptom-websocket-connections-drop-repeatedly]
1. **Check the gateway logs** for timeout errors — that usually indicates a platform-layer idle timeout below our expected session length.
2. **Check the client reconnect pattern.** A flood of reconnects from one client suggests a client-side bug; a broader pattern suggests a server-side issue.
3. **Check for `BACKPRESSURE_SHED` error frames.** If clients are being shed, the server is overloaded — check the SLO dashboard.
## Symptom: deploys are failing in CI [#symptom-deploys-are-failing-in-ci]
1. **Check the CI logs for the failing step.** Most failures are one of: tests broke, image build broke, vulnerability scan flagged a dependency.
2. **If tests broke,** run them locally (`./dev test `) — a flaky test should be fixed, not retried.
3. **If the image build broke,** often due to Dockerfile layer changes or base-image updates. The CI log shows the layer.
4. **If the vulnerability scan flagged,** the dependency audit is doing its job. Upgrade the dependency or add a justified waiver.
## When to move to a runbook [#when-to-move-to-a-runbook]
If you have localised the symptom to a known failure mode (database slow, cache cold, model provider degraded, Pub/Sub backed up), move to the corresponding [runbook](/docs/operations/runbooks) for the recovery procedure.
## When to escalate [#when-to-escalate]
* Symptom is user-visible and you cannot localise it within 10 minutes.
* Symptom involves suspected security or privacy breach — escalate immediately ([Security](/docs/principles/quality/security), [Privacy](/docs/principles/quality/privacy)).
* Symptom is a novel failure mode not covered by any runbook. Document it in the postmortem for future detection.
See [On-Call](/docs/operations/on-call) for the escalation tree.
# Engineering Manifesto (/docs/principles)
# Engineering Manifesto [#engineering-manifesto]
Software engineering is the discipline of managing complexity and optimising for change. Wordloop is a platform that processes high-volume asynchronous workloads and serves clients in real time at scale — so we lean hard on a solid technical foundation, frictionless developer velocity, and a rigorous engineering culture.
> \[!IMPORTANT]
> These principles are the shared vocabulary we use to decide what to build, how to build it, and what trade-offs we accept. Every page in this hub stands on its own and does not require context from any other document to be useful.
The hub serves three audiences equally: engineers new to Wordloop learning how we think, experienced engineers returning for a stance on a specific domain, and AI agents working on a Wordloop task.
## What we believe [#what-we-believe]
1. **Complexity is the enemy; clarity is the goal.** We choose simple designs, simple tools, and simple processes — and we accept the cost of doing so. Speculative abstraction, premature generalisation, and fear of deletion all compound into the kind of complexity that slows teams down.
2. **Contracts are the single source of truth.** API specifications, event schemas, and database definitions are authoritative. Clients, tests, documentation, and UIs are derived from them. When a spec is wrong, everything downstream is wrong — and that is the correct failure mode, because one visible error beats silent drift across hand-maintained artefacts.
3. **Reliability is designed in, not patched in.** We build for failure from the first commit: idempotency at the API boundary, graceful degradation at the edges, backpressure when downstream systems slow, and observability as a design-time concern rather than an afterthought.
4. **We test the system, not the mock of the system.** Tests that run against real databases, real message brokers, and real HTTP stacks catch the bugs that mocked tests hide. Emulation beats mocking wherever the dependency can run in a container.
5. **Hexagonal architecture is how we structure services.** Ports and adapters, with dependencies flowing inward toward the domain. The predictable file topology is as valuable for the humans reading the code as it is for the agents writing it.
6. **Documentation is a product, not a by-product.** This site is versioned, reviewed, and shipped with the same discipline as code. It serves humans and AI agents, and the structures that help one help the other.
7. **Architectural decisions are append-only.** We record trade-offs as they are made, model them as debt (principal + interest + multiplier), and preserve the history. Re-litigating a past decision without a new decision record is how teams lose their memory.
8. **AI agents are first-class engineers.** They read our docs, write our code, review our diffs, and run our tooling. We design our codebase, our conventions, and this documentation so an agent can operate at the same level of quality as a senior engineer.
## How to read this hub [#how-to-read-this-hub]
Start with the principle closest to your current task. Every page follows the same shape: a short statement of our stance, the industry context that makes it matter, the concrete principles we follow, and the anti-patterns we explicitly reject.
* **[Testing](/docs/principles/foundations/testing)** — How we guarantee reliability with Continuous Risk Assurance: service tests over unit tests, high-fidelity emulation, observability-driven development, and risk-based coverage.
More principle pages are being added as the hub expands to cover foundations, system design, our stack, quality, delivery, and AI-native development. Each new page is self-contained and lands on its own merits.
# CLI Reference (/docs/reference/cli)
# CLI Reference [#cli-reference]
The WordLoop platform has fully deprecated legacy Makefiles in favor of a bespoke, shell-native `./dev` interface that powers all local execution logic safely and predictively.
All targets are run from the monorepo root. Run `./dev help` for a formatted list.
## Lifecycle [#lifecycle]
| Command | Description |
| ------------------------------------ | --------------------------------------------------------------------------------- |
| `./dev start all` | Start infra (Docker) + Core, ML, App, Docs (native) |
| `./dev start all --docker` | Start everything in Docker containers |
| `./dev start infra` | Start shared infra only (Postgres, Pub/Sub, Storage, OTel) |
| `./dev start [services...]` | Start specific services natively (e.g. `./dev start core ml`) |
| `./dev start [services...] --docker` | Start specific services in Docker containers |
| `./dev stop all` | Stop everything safely (Docker + native processes) |
| `./dev stop wipe` | Destructive: stop everything and destroy all data volumes |
| `./dev stop [services...]` | Stop specific services (auto-detects native vs Docker) |
| `./dev logs all` | Tail logs for all running services |
| `./dev logs [services...]` | Tail logs for specific services — supports multi-tail (e.g. `./dev logs core ml`) |
| `./dev attach db` | Drop into an interactive psql shell |
| `./dev status` | Print local environment ports and endpoints |
Services run **natively** by default with auto-reload (Air for Go, uvicorn for Python, HMR for Next.js). Use `--docker` to opt into Docker containers when needed.
## Quality [#quality]
| Command | Description |
| ------------------- | ------------------------------------------------------------- |
| `./dev test all` | Execute all testing suites across all packages |
| `./dev test system` | Execute strictly end-to-end integration boundaries via Pytest |
| `./dev test smoke` | Run infrastructure health smoke tests |
| `./dev test core` | Run Go test suites |
| `./dev test ml` | Run Python Pytest suites |
| `./dev test app` | Run TS Vitest suites |
| `./dev lint all` | Run static analysis across all services |
| `./dev lint core` | Run `go vet` on Core |
| `./dev lint ml` | Run `ruff check` on ML |
| `./dev lint app` | Run `eslint` on App |
## Utilities [#utilities]
| Command | Description |
| --------------------- | ------------------------------------------------- |
| `./dev db migrate` | Apply all pending Core DB migrations |
| `./dev db rollback` | Revert the single most recently applied migration |
| `./dev db drop` | Destructive: completely drop the schema |
| `./dev db shell` | Drop securely into the local PostgreSQL console |
| `./dev dash obs` | Open the .NET Aspire Observability UI Dashboard |
| `./dev dash api` | Open the ML API Swagger docs |
| `./dev dash app` | Open the Next.js App |
| `./dev dash docs` | Open the Fumadocs Documentation UI |
| `./dev gcp pubsub` | Interact with local Pub/Sub emulator via gcloud |
| `./dev gcp storage` | Query the local Storage emulator REST API |
| `./dev gen api` | Generate OpenAPI schemas |
| `./dev gen events` | Generate AsyncAPI structs across all services |
| `./dev gen clients` | Rebuild typed API clients (Orval + Go + Python) |
| `./dev gen docs` | Recompile OpenAPI metadata for docs UI |
| `./dev setup env` | Copy environment baseline configurations |
| `./dev setup install` | Install workspace-wide package dependencies |
## System [#system]
| Command | Description |
| ------------------------ | --------------------------------------------------------------------------------- |
| `./dev doctor` | Validate all system dependencies, Docker status, port availability, and env files |
| `./dev completions zsh` | Output zsh auto-completion script |
| `./dev completions bash` | Output bash auto-completion script |
**First time?** Run `./dev doctor` immediately after cloning to verify your machine has everything needed.
### Enabling auto-completion [#enabling-auto-completion]
```bash
# Zsh — add to ~/.zshrc for permanent access
eval "$(./dev completions zsh)"
# Bash — add to ~/.bashrc
eval "$(./dev completions bash)"
```
After sourcing, typing `./dev ` then pressing Tab will suggest available commands and sub-targets.
## Native vs Docker [#native-vs-docker]
By default, `./dev start core` runs the Go service natively using Air for auto-reload. This means:
* **File changes are detected automatically** — Air watches `.go` files and rebuilds in \~1 second
* **Migrations run on every restart** — database schema is always current
* **Logs go to `.dev/logs/`** — tail them with `./dev logs core`
* **IDE debugging works** — you can also run Core from your IDE's debugger instead
Use `--docker` when you need full containerized behavior (e.g., testing Dockerfiles, CI parity, or running without Go installed locally).
## Debug Environments [#debug-environments]
By running selectively (e.g., `./dev start infra core`), you intentionally leave services like `wordloop-ml` turned off. This allows you to run those specific services through your IDE (like VSCode Launch actions) so you get full debugging breakpoint control while depending on a containerized or native backend.
## Resilience Model [#resilience-model]
The CLI is designed for safety and resilience:
* **Graceful shutdown**: `./dev stop` sends `SIGTERM` first, allowing services to flush connections and clean up. Only falls back to `SIGKILL` after a 3-second grace period.
* **Subshell isolation**: All commands run in isolated subshells, preventing `cd` side-effects from corrupting your terminal's working directory.
* **Port conflict detection**: `./dev doctor` and `./dev start` both check for port conflicts before launching services.
* **No external dependencies**: Port checking uses native bash `/dev/tcp` instead of requiring `nc` or `netcat`.
# Configuration (/docs/reference/configuration)
# Configuration [#configuration]
Every service in the Wordloop platform loads its configuration from environment variables, following the [Twelve-Factor App](https://12factor.net/) config principle. This page is the canonical catalogue of those variables — what they do, what their defaults are, and which service owns them.
Local defaults are generated by `./dev setup env`. The variables listed here are the full contract; your local `.env` files typically override only the subset you need.
## Common variables [#common-variables]
Variables consumed by multiple services.
| Variable | Service(s) | Default (local) | Purpose |
| ----------------------------- | ---------- | ----------------------- | ------------------------------------------------------------------------------------------------------------ |
| `APP_ENV` | all | `development` | `development`, `test`, `staging`, `production`. Controls auth mode, logging verbosity, and feature defaults. |
| `DATABASE_URL` | core | derived | Postgres connection string. |
| `PUBSUB_EMULATOR_HOST` | core, ml | `localhost:8085` | Local Pub/Sub emulator. Unset in production. |
| `OTEL_EXPORTER_OTLP_ENDPOINT` | all | `http://localhost:4318` | Collector endpoint for traces, metrics, and logs. |
| `LOG_LEVEL` | all | `info` | `debug`, `info`, `warn`, `error`. |
## `wordloop-core` [#wordloop-core]
| Variable | Default | Purpose |
| ----------------------- | ---------------------- | ---------------------------------------- |
| `CORE_PORT` | `4002` | HTTP + WebSocket port. |
| `CLERK_SECRET_KEY` | — | Backend Clerk key for JWT verification. |
| `CLERK_PUBLISHABLE_KEY` | — | Frontend-shared key; surfaced for debug. |
| `STORAGE_BUCKET` | `wordloop-local-audio` | GCS bucket for audio artefacts. |
## `wordloop-ml` [#wordloop-ml]
| Variable | Default | Purpose |
| ---------------------- | ----------- | --------------------------------------------- |
| `ML_PORT` | `4003` | FastAPI port. |
| `MODEL_PROVIDER` | `anthropic` | Chooses which model adapter to load. |
| `ANTHROPIC_API_KEY` | — | Set when `MODEL_PROVIDER=anthropic`. |
| `OPENAI_API_KEY` | — | Set when `MODEL_PROVIDER=openai`. |
| `ML_CACHE_TTL_SECONDS` | `3600` | Cache lifetime for deterministic model calls. |
## `wordloop-app` [#wordloop-app]
| Variable | Default | Purpose |
| ----------------------------------- | ----------------------- | ----------------------------------- |
| `NEXT_PUBLIC_CORE_URL` | `http://localhost:4002` | URL the browser uses to reach Core. |
| `NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY` | — | Clerk frontend key. |
| `APP_PORT` | `4001` | Next.js port. |
## Feature flags [#feature-flags]
Feature flags are served dynamically — they are not environment variables. See the flag dashboard for the current state and owners. Progressive-delivery principles ([Progressive Delivery](/docs/principles/delivery/progressive-delivery)) govern how flags are created, rolled, and retired.
## Further reading [#further-reading]
* [Quickstart](/docs/start/quickstart) — bootstrapping local `.env` files.
* [Security](/docs/principles/quality/security) — the rules around secret handling.
* [Twelve-Factor App](https://12factor.net/) — the philosophy behind environment-based config.
# Database Schema (/docs/reference/database)
# Database Schema [#database-schema]
The Postgres database is owned exclusively by `wordloop-core`.
Schema changes must be managed through versioned SQL migrations in `services/wordloop-core/scripts/migrations/`. Do not apply manual schema alterations.
## ER diagram [#er-diagram]
## Tables [#tables]
### `users` [#users]
Primary user account linked to Auth0.
| Column | Type | Notes |
| ------------ | ---------------- | ------------------ |
| `id` | UUID PK | |
| `auth0_id` | TEXT UNIQUE | External identity |
| `email` | TEXT | |
| `name` | TEXT | |
| `person_id` | UUID FK → people | Optional self-link |
| `created_at` | TIMESTAMPTZ | |
### `people` [#people]
Contacts and meeting participants.
| Column | Type | Notes |
| -------------------- | ----------- | ---------------------------------- |
| `id` | UUID PK | |
| `name` | TEXT | |
| `role` | TEXT | Job title / role description |
| `email` | TEXT | |
| `company` | TEXT | |
| `tags` | JSONB | |
| `voice_model_status` | TEXT | `untrained` / `training` / `ready` |
| `voice_confidence` | DECIMAL | |
| `voice_vector` | vector(512) | Optional SpeechBrain embedding |
### `meetings` [#meetings]
Recorded or uploaded conversations.
| Column | Type | Notes |
| ------------- | --------------- | --------------------------------------------- |
| `id` | UUID PK | |
| `user_id` | UUID FK → users | |
| `title` | TEXT | |
| `start_time` | TIMESTAMPTZ | |
| `end_time` | TIMESTAMPTZ | |
| `summary` | TEXT | AI-generated summary |
| `key_points` | JSONB | |
| `source_type` | TEXT | `recording` / `upload` / `text` / `anecdotal` |
| `created_at` | TIMESTAMPTZ | |
### `meeting_audio_files` [#meeting_audio_files]
Audio files attached to meetings.
| Column | Type | Notes |
| -------------- | ------------------ | --------- |
| `id` | UUID PK | |
| `meeting_id` | UUID FK → meetings | |
| `storage_path` | TEXT | GCS path |
| `file_name` | TEXT | |
| `content_type` | TEXT | MIME type |
| `file_size` | BIGINT | Bytes |
| `created_at` | TIMESTAMPTZ | |
### `transcriptions` [#transcriptions]
Records the transcription job details connected to a meeting.
| Column | Type | Notes |
| ---------------- | ------------------ | ------------------------------------------------------------------------------------------ |
| `id` | UUID PK | |
| `meeting_id` | UUID FK → meetings | |
| `status` | TEXT | enum: `pending`, `transcribing`, `diarizing`, `extracting_features`, `completed`, `failed` |
| `status_message` | TEXT | Optional error details |
| `created_at` | TIMESTAMPTZ | |
| `updated_at` | TIMESTAMPTZ | |
### `transcription_status_history` [#transcription_status_history]
Audit log of transcription status changes.
| Column | Type | Notes |
| ------------------ | ------------------------ | ----- |
| `id` | UUID PK | |
| `transcription_id` | UUID FK → transcriptions | |
| `status` | TEXT | |
| `status_message` | TEXT | |
| `created_at` | TIMESTAMPTZ | |
### `transcript_segments` [#transcript_segments]
Timestamped chunks of transcribed speech.
| Column | Type | Notes |
| ------------------ | ------------------------ | --------------------------------------------- |
| `id` | UUID PK | |
| `transcription_id` | UUID FK → transcriptions | |
| `person_id` | UUID FK → people | Nullable |
| `speaker_label` | TEXT | Temporary label before identification |
| `text` | TEXT | |
| `start_time` | DECIMAL | Seconds from start |
| `end_time` | DECIMAL | Seconds from start |
| `confidence` | DECIMAL | Transcription confidence |
| `is_final` | BOOLEAN | Indicates if segment is finalized (streaming) |
| `feature_vector` | vector(512) | SpeechBrain embedding |
### `tasks` (formerly `action_items`) [#tasks-formerly-action_items]
Actionable items extracted from meetings.
| Column | Type | Notes |
| ------------- | ------------------ | ----------------------- |
| `id` | UUID PK | |
| `user_id` | UUID FK → users | |
| `content` | TEXT | |
| `status` | TEXT | `pending` / `completed` |
| `due_date` | DATE | |
| `assigned_to` | UUID FK → people | |
| `meeting_id` | UUID FK → meetings | |
| `sub_tasks` | JSONB | |
| `created_at` | TIMESTAMPTZ | |
### `notes` [#notes]
Free-form notes attached to people or meetings.
| Column | Type | Notes |
| -------------- | --------------- | -------------------- |
| `id` | UUID PK | |
| `user_id` | UUID FK → users | |
| `content` | TEXT | |
| `subject_type` | TEXT | `PERSON` / `MEETING` |
| `subject_id` | UUID | Polymorphic FK |
| `tags` | JSONB | |
| `created_at` | TIMESTAMPTZ | |
| `updated_at` | TIMESTAMPTZ | |
### `ai_threads` [#ai_threads]
Contextual AI conversation containers.
| Column | Type | Notes |
| -------------- | --------------- | -------------------- |
| `id` | UUID PK | |
| `user_id` | UUID FK → users | |
| `context_type` | TEXT | `PERSON` / `MEETING` |
| `context_id` | UUID | Polymorphic FK |
| `created_at` | TIMESTAMPTZ | |
### `chat_messages` [#chat_messages]
Individual messages within an AI thread.
| Column | Type | Notes |
| ------------ | --------------------- | ---------------------------------------- |
| `id` | UUID PK | |
| `thread_id` | UUID FK → ai\_threads | |
| `role` | TEXT | `user` / `assistant` / `system` / `tool` |
| `content` | TEXT | |
| `tool_calls` | JSONB | |
| `created_at` | TIMESTAMPTZ | |
## Migration history [#migration-history]
Migrations are applied via `./dev db migrate` and live in `services/wordloop-core/scripts/migrations/`.
| Version | Description |
| ---------------- | ------------------------------------------------------------------ |
| `20250709123530` | Initial schema (users, people) |
| `20260309152000` | Meetings, transcripts, tasks, notes, AI threads |
| `20260313204400` | Add `person_id` to users |
| `20260315213000` | Rename `action_items` → `tasks` |
| `20260324204621` | Add `meeting_audio_files` |
| `20260324211500` | Add `meeting.status` |
| `20260326090621` | Add `meeting.status_message` |
| `20260327200316` | Update transcript segment fields |
| `20260329060000` | Add `is_final` to transcript\_segments |
| `20260329204000` | Add `meeting_status_history` (later dropped) |
| `20260330203000` | Add pgvector extension, `transcriptions` table, and `voice_vector` |
# Errors (/docs/reference/errors)
# Errors [#errors]
The Wordloop Core API follows RFC 9457 (`application/problem+json`) for error responses. Every error carries a `status` (HTTP code), a `title` (short stable description), and an optional `detail` string with context. Clients and AI agents should branch on `status` and `title` — these are stable and never renumbered.
## Envelope [#envelope]
All error responses follow this shape:
```json
{
"status": 404,
"title": "Not Found",
"detail": "No meeting with the provided id exists.",
"instance": "/meetings/abc123"
}
```
Validation errors include an `errors` array of field-level diagnostics:
```json
{
"status": 400,
"title": "Unprocessable Entity",
"detail": "Request body did not match the schema.",
"errors": [
{ "message": "required", "path": "body.title", "value": "" }
]
}
```
## Common HTTP status codes [#common-http-status-codes]
| Status | Title | Meaning | Action |
| ------ | --------------------- | ----------------------------------------------------------------- | --------------------------------------------- |
| 401 | Unauthorized | The request lacked a valid Clerk token or session. | Re-authenticate; refresh token. |
| 403 | Forbidden | The caller is authenticated but not authorised for this resource. | Confirm role and scope. Do not retry. |
| 404 | Not Found | The resource does not exist. | Verify the identifier; check user visibility. |
| 400 | Unprocessable Entity | The request body did not match the schema. | Inspect `errors` for field-level diagnostics. |
| 409 | Conflict | An `Idempotency-Key` was reused with a different payload. | Generate a fresh key; retry. |
| 429 | Too Many Requests | Per-caller rate limit exceeded. | Back off per `Retry-After`. |
| 504 | Gateway Timeout | A downstream dependency timed out. | Retry with exponential backoff. |
| 500 | Internal Server Error | Unexpected server error; details captured in our observability. | Retry with backoff; escalate if sustained. |
## WebSocket error frames [#websocket-error-frames]
Real-time errors use a custom envelope on the wire:
```json
{
"type": "error",
"error": {
"code": "SESSION_EXPIRED",
"message": "Session token expired; reconnect with a fresh one.",
"details": { "session_id": "sess_..." }
}
}
```
| Code | Meaning | Action |
| ------------------- | ---------------------------------------------------------------------------------------------- | -------------------------------------------------- |
| `SESSION_EXPIRED` | The WebSocket session token is no longer valid. | Fetch a new token; reconnect. |
| `RESUME_FAILED` | The server could not resume the session at the supplied sequence. | Reconnect without a resume token; rehydrate state. |
| `BACKPRESSURE_SHED` | Informational: the server dropped a low-priority message because the client could not keep up. | No client action required. |
## Further reading [#further-reading]
* [API Design](/docs/principles/system-design/api-design) — the stance on structured errors.
* [Agent-Native Systems](/docs/principles/ai-native/agent-native-systems) — why stable codes matter for agent consumers.
* [Core API Reference](/docs/reference/api/core) — per-endpoint error catalogues rendered from the OpenAPI spec.
# Glossary (/docs/reference/glossary)
# Glossary [#glossary]
The authoritative vocabulary of Wordloop. When code, docs, or conversation refers to one of these terms, this page is what the term means. The domain-level concepts also appear in [Learn / Concepts](/docs/learn/concepts), where they are explained with more narrative context.
## A [#a]
**ADR — Architecture Decision Record.** An append-only document capturing a significant, hard-to-reverse decision, with explicit debt annotations. See [Decisions](/docs/decisions).
**Adapter.** A component that implements a [port](#p), bridging the domain to an external dependency (database, message broker, HTTP framework). Part of the [hexagonal](#h) architecture vocabulary.
**AsyncAPI.** The machine-readable specification format we use to document event streams — the asynchronous counterpart to OpenAPI. See [Core Events Reference](/docs/reference/events/core-ws).
## B [#b]
**Backpressure.** The explicit control signal by which a producer is slowed when a consumer cannot keep up. In Wordloop, backpressure is designed into every real-time flow — we shed, coalesce, or block rather than buffering unbounded. See [Real-Time](/docs/principles/system-design/real-time).
## C [#c]
**Canary.** A release shape where a small fraction of traffic reaches a new revision before it is promoted to 100%. See [Progressive Delivery](/docs/principles/delivery/progressive-delivery).
**Clerk.** The third-party authentication provider we use for user identity. JWTs from Clerk are verified by `wordloop-core` on every request.
**Core (wordloop-core).** The Go HTTP and WebSocket API that is the source of truth for Meetings, People, Transcriptions, Tasks, and real-time session state.
## D [#d]
**DORA metrics.** Deployment frequency, lead time for changes, change failure rate, and mean time to recover — the four research-backed metrics we use to measure delivery performance. See [DevEx](/docs/principles/delivery/devex).
## E [#e]
**Error budget.** The quantity of "bad" events allowed by an SLO over a rolling window. Consumed by outages; restored by uptime. See [Reliability](/docs/principles/quality/reliability).
**Eval.** A scored comparison of model output against a reference. We run evals in CI to catch regressions in AI-driven behaviour. See [AI Engineering](/docs/principles/ai-native/ai-engineering).
## H [#h]
**Hexagonal architecture.** The structural pattern — domain core, ports, adapters — that every non-trivial Wordloop service follows. See [Hexagonal Architecture](/docs/principles/system-design/hexagonal-architecture).
## I [#i]
**Idempotency key.** A client-supplied identifier that lets the server recognise and safely handle retried writes. Every write endpoint in Wordloop accepts one. See [API Design](/docs/principles/system-design/api-design).
**IDP — Internal Developer Platform.** The set of shared tooling, runtimes, and golden paths engineers use to build on Wordloop. See [Platform](/docs/principles/delivery/platform).
## J [#j]
**JIT — Just-in-Time provisioning.** The pattern by which Wordloop creates a local User and Person record the first time a user signs in via Clerk. No webhooks, no seeding. See [Quickstart](/docs/start/quickstart).
## L [#l]
**llms.txt.** The machine-readable index of a documentation site, consumed by AI agents. See [/llms.txt](/llms.txt) and [Agent-Native Systems](/docs/principles/ai-native/agent-native-systems).
## M [#m]
**MCP — Model Context Protocol.** The interoperable protocol we use to expose tools and resources to AI agents. See [Agent-Native Systems](/docs/principles/ai-native/agent-native-systems).
**Meeting.** The primary entity in Wordloop — a bounded session that is captured in the system, attended by People, and producing a Transcription, a MeetingSynthesis, and Tasks. The `meetings` table and `/meetings` routes are the centre of the domain.
**MeetingSynthesis.** The AI-generated summary attached to a Meeting. Contains a headline, prose summary, key points, Topics, and TalkingPoints. Produced by the ML service after the Transcription finalises.
**ML (wordloop-ml).** The Python FastAPI runtime responsible for transcription, synthesis generation, and embedding.
## N [#n]
**Note.** A free-form annotation attached to any entity via a polymorphic `subject_type` / `subject_id` pair.
## O [#o]
**OpenAPI.** The machine-readable specification format we use to document HTTP APIs. Our server handlers and clients are generated from it. See [API Design](/docs/principles/system-design/api-design).
**OTel — OpenTelemetry.** The vendor-neutral observability framework we use for traces, metrics, and logs. See [Observability](/docs/principles/quality/observability).
**Outbox pattern.** The transactional pattern by which a database write and an event emission are committed together, via an `outbox` table. See [Integration Patterns](/docs/principles/system-design/integration-patterns).
## P [#p]
**Person.** A contact record representing someone who appeared in a Meeting, with or without a Wordloop account. Carries identity fields and an optional voice model for speaker attribution.
**pgvector.** The Postgres extension we use as our production vector store. See [Postgres](/docs/principles/stack/postgres).
**Port.** An interface declared by the domain describing a capability it needs, implemented by an [adapter](#a). Part of the [hexagonal](#h) architecture vocabulary.
## R [#r]
**RAG — Retrieval-Augmented Generation.** The pattern of enriching a model call with retrieved context from our own data. See [AI Engineering](/docs/principles/ai-native/ai-engineering).
**Runbook.** A step-by-step recovery procedure for a known failure mode. See [Operations / Runbooks](/docs/operations/runbooks).
## S [#s]
**SLO — Service Level Objective.** A per-journey target for latency and success rate, measured over a rolling window. The foundation of [Reliability](/docs/principles/quality/reliability).
## T [#t]
**Tag.** A user-defined label applied to Meetings, People, or Tasks for organisation.
**TalkingPoint.** A specific point or claim within a Topic, surfaced as a bullet in the MeetingSynthesis view.
**Task.** An action item extracted from a Meeting. Tasks are assignable, hierarchical, and tracked through to completion. Statuses: `pending`, `in_progress`, `completed`, `canceled`.
**Topic.** A thematic cluster extracted from a Meeting's TranscriptSegments, carrying a name, summary, and the contributing segments.
**Transcription.** The speech-to-text record attached to a Meeting, aggregating TranscriptSegments as they arrive from the ML service.
**TranscriptSegment.** The atomic unit of a Transcription — one speaker turn, carrying speaker label, attributed Person, text, timestamps, confidence score, and a `is_final` flag.
## U [#u]
**User.** A Wordloop account holder, identified via Clerk and JIT-provisioned on first sign-in. Each User has an associated Person record.
## V [#v]
**Voice model.** The speaker-identification vector attached to a Person, built from verified TranscriptSegments and used to attribute future segments to a specific Person.
## W [#w]
**WebSocket.** The default transport for real-time streams in Wordloop. See [Real-Time](/docs/principles/system-design/real-time).
# Reference (/docs/reference)
# Reference [#reference]
Reference material is **information-oriented**: terse, complete, and predictable. If you are returning to Wordloop after a break and need to remember the exact `./dev` flag, the JSON shape of an event, or what error code 4003 means — you are in the right section.
## Contract surfaces [#contract-surfaces]
## A note on sources of truth [#a-note-on-sources-of-truth]
Wherever possible, reference pages are **generated from the same specs the code is generated from**. The API reference is rendered directly from `specs/*-openapi.json`, the events reference from `specs/*-asyncapi.yaml`, and the database schema from the live migrations. If a reference page seems to drift from reality, the spec is the canonical source — open an issue, then fix the spec, and the page will follow.
# Your First Contribution (/docs/start/first-contribution)
# Your First Contribution [#your-first-contribution]
You have the platform running locally and you have read the relevant principle pages. Your first change is a chance to learn the tooling and the review culture, not to design a system. Pick something bounded.
## Good candidates for a first PR [#good-candidates-for-a-first-pr]
* **Fix a typo or broken link in the docs.** The docs site lives in `services/wordloop-docs`. Edit an MDX file, rebuild locally, open a PR.
* **Add a missing Vale word** to our style dictionary when linting flags a legitimate term. This teaches you the quality-governance workflow.
* **Tighten an existing test** against a real behaviour it does not yet cover. Real bugs fall out of this kind of reading; fabricating new features does not.
* **Improve a runbook** after you follow it and find a step that is unclear.
## The mechanics [#the-mechanics]
1. **Branch.** `git checkout -b your-name/short-description` from `main`. Keep names short and descriptive.
2. **Make the change.** Small and focused. If you find yourself fixing two things at once, split into two PRs.
3. **Run the relevant tests.** Use `./dev test ` for unit tests and `./dev test system` for cross-service integration. See [Run Tests](/docs/guides/run-tests).
4. **Lint your work.** `./dev lint all` covers Go, Python, and TypeScript; for docs changes run Vale if configured.
5. **Commit.** Write a commit message that explains *why* the change is needed, not just what changed. Our history is a long-term artefact.
6. **Open a PR.** Include a description of the change, the reasoning, a test plan, and links to any related issues or decision records.
7. **Respond to review.** Reviewers may push back on naming, structure, or scope. Treat review comments as invitations to improve the change, not as attacks.
8. **Merge when green.** CI must pass; a reviewer must approve.
## After merge [#after-merge]
Watch the deploy. Our [CI/CD pipeline](/docs/learn/architecture/infrastructure) builds a Docker image, pushes it to Artifact Registry, and deploys to Cloud Run. If anything breaks in production, the on-call engineer will page — you may be asked to revert quickly. That is normal; it means the feedback loop is working.
## What to read next [#what-to-read-next]
* [Guides](/docs/guides) — task-oriented how-tos for the common operations.
* [Engineering Principles](/docs/principles) — the stance behind the code you are about to touch.
* [Reference / CLI](/docs/reference/cli) — every `./dev` command in one table.
# Getting Started (/docs/start/quickstart)
# Getting Started [#getting-started]
## The `./dev` CLI Driver [#the-dev-cli-driver]
All local orchestration, testing, database migrations, and telemetry dashboards are driven exclusively by the custom `./dev` CLI tool located in the repository root. See the [CLI Reference](/docs/reference/cli) to get started!
## Prerequisites [#prerequisites]
| Tool | Version | Purpose |
| --------------------------------------------- | -------------- | ------------------------------------------------------------- |
| [Docker](https://docs.docker.com/get-docker/) | Compose v2.20+ | Infrastructure services |
| [Go](https://go.dev/) | 1.25+ | wordloop-core |
| [Air](https://github.com/air-verse/air) | latest | Go auto-reload (`go install github.com/air-verse/air@latest`) |
| [uv](https://github.com/astral-sh/uv) | latest | wordloop-ml Python env |
| [pnpm](https://pnpm.io/) | latest | wordloop-app dependencies |
| [ffmpeg](https://ffmpeg.org/) | latest | ML audio processing |
{/* LLM-Context: TL;DR:
This guide is a "Day Zero" guided walkthrough. It moves beyond raw commands mapping out
how developers should use `./dev start infra` to bootstrap their local environment,
and attach IDE debuggers (like VSCode / GoLand) specifically for service debugging.
*/}
## The "Day Zero" Guided Walkthrough [#the-day-zero-guided-walkthrough]
Welcome to Wordloop! Instead of throwing a wall of terminal commands at you, this guide walks you through setting up your environment for an optimal local development experience, including hooking up your IDE debuggers.
### Step 1: Environment Checks [#step-1-environment-checks]
Before starting, validate your local toolchain:
```bash
# Assumes you have cloned the repo and are at the root
./dev doctor
```
If `doctor` flags any missing dependencies (like Docker, Go, or Node) or occupied ports, follow its provided instructions to resolve them.
### Step 2: Bootstrapping Config & Secrets [#step-2-bootstrapping-config--secrets]
Generate and configure your local environment files:
```bash
./dev setup env
```
This scaffolds `.env` and `.env.local` files across the monorepo.
* **wordloop-ml:** Edit to add ML/AI API keys.
* **wordloop-app & core:** Add your Clerk frontend & backend keys for authentication.
### Step 3: Install Package Dependencies [#step-3-install-package-dependencies]
```bash
./dev setup install
```
### Step 4: Infrastructure & IDE Debugging [#step-4-infrastructure--ide-debugging]
We use a Hybrid Development Model. Infrastructure (Postgres, PubSub, etc.) runs statically in Docker, allowing you to run your target application natively in your IDE.
If you are working on the App Frontend but want to run Core natively so you can step through Go code:
1. **Start the dependencies in the background:**
```bash
./dev start infra ml app
```
*This starts the DB, Pub/Sub, the ML service, and the Next.js frontend.*
2. **Launch the Core service in your IDE:**
* **VSCode:** Open the debug panel and run the "Launch Core API" configuration.
* **GoLand:** Run the `cmd/server/main.go` file with Debug context.
Now, any frontend requests will hit your breakpoints in the Core API.
If you just want to run everything locally without IDE debugging (e.g., verifying a PR):
```bash
./dev start all
```
## The Hybrid Development Model [#the-hybrid-development-model]
Infrastructure runs in Docker (stable, rarely changes). Application services run natively for instant feedback:
| What runs | Where | Auto-reload? |
| ------------------------------- | ----------------- | ------------------------------- |
| Postgres, PubSub, Storage, OTel | Docker containers | n/a |
| Core API (Go) | Native via Air | ✅ Rebuilds on `.go` file change |
| ML API (Python) | Native via uv | ✅ Restarts on `.py` file change |
| App (Next.js) | Native via pnpm | ✅ HMR in browser |
### Typical workflows [#typical-workflows]
```bash
# Full stack (recommended for daily work)
./dev start all
# Infrastructure only (run services from your IDE)
./dev start infra
# Infrastructure + specific services
./dev start infra core # Debug ML from IDE
./dev start infra core ml # Debug App from IDE
# Force Docker containers (for integration testing)
./dev start core ml --docker
```
### Tailing logs [#tailing-logs]
Native service logs are written to `.dev/logs/` and can be tailed with the same CLI:
```bash
./dev logs core # Tail Core output
./dev logs ml # Tail ML output
./dev logs core ml # Multi-tail Core + ML simultaneously
./dev logs all # Tail everything (Docker)
```
## Full stack in Docker [#full-stack-in-docker]
For CI-like environments or full-stack integration testing:
```bash
./dev start all --docker # Everything in containers
./dev logs all # Tail all logs
./dev stop all # Stop everything
```
## Authentication [#authentication]
Authentication is handled automatically through **JIT provisioning**:
1. Sign in via Clerk (Google, email, or test accounts) in the browser
2. The Core API verifies the Clerk JWT
3. If the user doesn't exist locally yet, they're auto-created from the Clerk API
4. No webhook tunnels, no manual tokens, no database seeding
System tests use a separate `APP_ENV=test` mode with raw UUID tokens. See [Testing](/docs/principles/foundations/testing) for details.
## Linting [#linting]
```bash
./dev lint # Lints core (go vet), ml (ruff), and app (eslint)
./dev lint core # Go linter only
./dev lint ml # Python Ruff only
./dev lint app # TypeScript ESLint only
```
## Checking status [#checking-status]
```bash
./dev status # Show nicely formatted dashboard of running services
```
See [CLI Reference](/docs/reference/cli) for the complete target list.
# How We Work (/docs/work)
# How We Work [#how-we-work]
This section describes how we move from an observed problem to shipped customer value. The process is lean by design, enforcing that technical execution is strictly bound to clear intent and verified by automated tests from the very beginning. It answers the fundamental question: *How do you move fast without skipping the discovery that stops you building the wrong thing?*
Work flows through four stages, each more concrete than the last:
***
# Inside each stage [#inside-each-stage]
## 1. Problem Statement [#1-problem-statement]
Most wasted work is caused by excellent execution of the wrong thing. Skipping from idea to solution — without pausing to understand the problem — leads to building with false confidence.
A **Problem Statement** captures observed pain — real, evidenced, specific — alongside an **appetite**: a judgment about how much time this problem is worth solving.
* **Appetite** is not an estimate of how long a solution will take. It is an opportunity cost judgment made *before* the solution is defined. You are betting that the problem is worth that much time.
* Problem statements do not accumulate indefinitely. They are a curated list, updated as understanding evolves and retired when no longer relevant.
* **Platform and infrastructure problems are valid problem statements.** The "who experiences it" can be internal — the engineering team, the system's reliability, the business's compliance posture. Feature bets routinely surface infrastructure gaps (e.g., a missing event backplane, no deletion cascade). The right response is to extract the gap as its own problem statement — not to expand the feature bet. The feature bet declares the constraint explicitly; the platform bet solves it.
## 2. Pitch [#2-pitch]
Unformed ideas become backlogs. Backlogs create the illusion that everything is captured and considered, when really they are lists of things nobody explicitly said no to.
Before a problem reaches the build phase, it is shaped into a **Pitch**. A pitch links a validated problem to a rough solution proposal. It is concrete enough to execute against but stays away from micro-detail.
A pitch must contain:
* **The problem** — what was observed, who experiences it, why it matters now.
* **The appetite** — how much time to spend.
* **A rough solution sketch** — the general approach to the solution.
* **Rabbit holes** — approaches already considered and ruled out. Include plausible-looking approaches that would blow the appetite or the scope, and infrastructure assumptions the bet makes (e.g., "we assume sticky sessions, not a backplane").
* **Explicit no-gos** — what is completely out of scope. Include both obvious exclusions *and* natural extensions that users would reasonably expect but that don't belong in this version (e.g., pause/resume, mobile support, export/download). Vague no-gos invite scope creep — be specific about what's excluded and why.
A funded pitch becomes a **Bet** — a commitment bounded by the appetite.
## 3. TDD: Foundations [#3-tdd-foundations]
Technical Design bridges the intent of the pitch to parallel execution. We start by laying the technical foundation so that progress isn't blocked later by misaligned interfaces.
### UI Design [#ui-design]
The **UI Design** doc translates the pitch's rough solution sketch into concrete, screen-level detail. It answers: *what exactly will the user see and do?*
Organise it by screen — not by feature, not by user story. Each screen the bet touches gets its own section with:
* **A wireframe** — even a rough sketch. This is the anchor; the text describes it.
* **Layout** — the regions on screen and what content lives in each one.
* **States** — what the user sees during loading, active use, empty states, errors, and degraded conditions.
* **Key interactions** — what the user can do and what happens in response.
After the screens, map the **user journeys** between them (how the user moves from entry point to final outcome), and list **edge cases** (anything unusual the system needs to handle visibly).
**Be specific about data objects.** If a screen shows tasks, define what a task is: which fields it has, which are required, whether they nest, what states they can be in. If a screen has a text editor, say whether it's rich text or plain, whether it auto-saves or has a save button. These details directly determine the API contracts and database schema that come next.
**Stay at the user level.** If you're specifying which service owns the logic, how the frontend integrates, or where data persists — you've gone too far. The UI Design doc describes what the user experiences, not how the system delivers it. System concerns belong in the Data Flow doc.
**The output feeds directly into:** Data Flow diagrams (which service calls which), API contracts (what fields and endpoints exist), and database schemas (what gets stored). If someone can't design those artefacts from the UI Design doc alone, the doc isn't detailed enough.
### Data Flows [#data-flows]
The **Data Flow** doc maps every user interaction from the UI Design through service boundaries. It answers: *what calls what, what data crosses each boundary, and what happens when something fails.* It is the primary input for API contracts and database schemas — if someone can't design those artefacts from this doc alone, the doc isn't complete.
**Start with a system context graph.** Before drawing any sequences, draw the topology: which services exist, which protocols connect them, which data stores each service owns. This is a static map — it orients readers and makes the scope of the bet explicit.
**Name flows after what triggers them.** Group related flows into logical Parts (e.g., "Session Lifecycle", "Streaming Processing", "Failure Modes"). A flow name describes what the user does or what system condition fires — not the implementation.
**Use descriptive operation labels — never endpoint paths.** Diagram labels should read like `Create task (idempotent, echo-suppressed)` not `POST /meetings/{id}/tasks`. Header names, field names, and HTTP methods all belong in the Contracts doc. Each arrow in a flow is a **contract boundary** (what shape the data takes) and a **sequencing constraint** (downstream cannot build until upstream is agreed). Naming the operation is enough — the Contracts doc defines the shape precisely.
**Failure modes are required, not optional.** For every significant service boundary in the bet, there must be at least one flow describing what happens when that boundary fails. If the UI Design doc models a "Degraded" or "Connectivity Lost" state, the Data Flow doc must show the recovery sequence. Resilience is a first-class design concern — not an afterthought.
**Close with two required sections:**
* **Design Decisions** — tradeoffs made, alternatives ruled out, constraints that drove choices. Captures reasoning that isn't visible in the diagrams.
* **Boundary Inventory** — a table of every service-to-service boundary in the doc. Five columns: Boundary | Flows | From → To | Protocol | Data shape. Each row here becomes a contract entry in the Contracts doc.
### Contracts & Schemas [#contracts--schemas]
The agreed API contracts (REST, WebSocket, Pub/Sub) and database schemas (PostgreSQL, object storage). Downstream UI can mock against the contract; upstream Core can build against it.
## 4. TDD: Execution [#4-tdd-execution]
Once the technical foundation is set, the bet is decomposed into deliverable units.
* **Integration Milestones:** Points of user-visible value. This is the integration of multiple pieces that results in a cohesive feature or state change for the user.
* **Domain Slices:** The smallest independently buildable and testable units of work. We **never** slice horizontally (e.g. building all databases, then all APIs, then all UI). We always slice vertically. A vertical slice could be a full feature connecting App -> Core -> ML, or a complete vertical slice completely within a single domain (e.g., being able to do CRUD on a Meeting in Core via the API). Slices must be independently deployable and verifiable.
### Tests as Proof of Delivery [#tests-as-proof-of-delivery]
Every milestone and slice has its test overview properly documented, and corresponding **empty test stubs are generated in the test runner** before any production code is written.
These tests serve as the **single source of truth** for progress signaling. Red means work to do; green means proven.
1. **Service/system tests** (permanent) — implemented in the service repo or `tests/system/` during the build.
2. **Bet progress suite** (temporary) — mirrored in `tests/bets//`, run on demand via `./dev test bet `.
***
## Bet Operations [#bet-operations]
By utilizing the Golden Path CLI tools, documentation is kept exactly in sync with the integration testing layout.
### Start a new bet [#start-a-new-bet]
```bash
./dev new bet
```
Promotes a pitch into an active bet at `work//` and creates the baseline test boundary suite in `tests/bets//`. The slug must be lowercase kebab-case (e.g. `speaker-navigation`). A pitch must exist first — run `./dev new pitch `.
### Scaffolding Architecture (TDD) [#scaffolding-architecture-tdd]
```bash
# Scaffolds architectural boundaries
./dev new contract
./dev new schema
# Scaffolds milestones and domains
./dev new milestone
./dev new slice
```
Generating a `slice` or a `milestone` will drop corresponding placeholder testing boundaries in `tests/bets/`.
### Run bet progress tests [#run-bet-progress-tests]
```bash
./dev test bet
```
Runs the bet progress suite on demand. Watch the test output to verify that your delivery is progressing as intended.
### Archive a delivered bet [#archive-a-delivered-bet]
```bash
./dev archive bet
```
Moves the bet directory to `_archive/` and the associated test suite to `tests/bets/_archive//`. URL routing is preserved.
# Authentication & Authorization (/docs/learn/architecture/auth)
# Authentication & Authorization [#authentication--authorization]
Wordloop delegates absolute identity management to **Clerk** while retaining local user schemas strictly to anchor database relations.
Internal services rely on symmetric tokens for system-level trust. Zero-trust principles apply at external boundaries; inherited trust applies internally.
## User Authentication Flow (Clerk) [#user-authentication-flow-clerk]
Clerk acts as our authoritative identity provider (IdP).
### Frontend Implementation [#frontend-implementation]
* **Identity Context:** `wordloop-app` uses `@clerk/nextjs` for all auth flows.
* **Header Injection:** JWT tokens are automatically injected into `wordloop-core` requests as `Authorization: Bearer ` by the Orval API clients via a custom fetch interceptor.
### Backend Validation [#backend-validation]
* **Middleware:** `wordloop-core` uses robust Clerk middleware within the Huma framework.
* **Verification:** The middleware validates the JWT symmetrically against Clerk's JWKS endpoint, extracting the `clerk_user_id` directly into the Request `context.Context`.
## Data Synchronization [#data-synchronization]
To link auth identities with core business entities (like Meetings or Transcripts), users are synchronized into the local Postgres database.
Database synchronization occurs asynchronously via Clerk Webhooks.
1. **User Creation:** When a user registers, Clerk fires a `user.created` webhook to `wordloop-core`.
2. **Database Sink:** Core validates the Svix headers, parses the webhook payload, and idempotently upserts the record into the `users` table.
## Service-to-Service Authentication [#service-to-service-authentication]
When internal services communicate outside of standard user contexts (e.g., the ML engine pulling an audio binary from Core API endpoints), they use a static symmetric token.
* **Header Specification:** `Authorization: Bearer `
* **Assumed Scope:** Full administrative access.
**Never expose the `SERVICE_AUTH_TOKEN` to the frontend or public-facing API routes.** This token bypasses user validation logic.
# Optimistic Mutation with Echo-Suppressed Streaming (/docs/learn/architecture/data-flow)
# Optimistic Mutation with Echo-Suppressed Streaming [#optimistic-mutation-with-echo-suppressed-streaming]
This is Wordloop's core data architecture for all user-initiated CRUD operations. The pattern separates **writes** (REST) from **reads** (WebSocket) to achieve perceived zero-latency mutations with real-time multi-device synchronization.
This pattern governs all entity-level operations — notes, tasks, topics, meeting metadata, and any future entity types. Audio streaming and ML-generated events use different pipelines documented in [System Workflows](/docs/learn/architecture/system-workflows).
## Why This Design [#why-this-design]
Traditional request/response flows force the user to wait for the server round-trip before seeing results. Polling-based updates miss state changes between intervals. Full event sourcing introduces operational complexity that isn't justified for Wordloop's entity CRUD workloads.
This pattern sits in the pragmatic middle:
| Concern | Approach |
| --------------------- | -------------------------------------------------------------------------------------------------------------------- |
| **Write path** | REST — transactional, idempotent, familiar error handling. The server is the single source of truth. |
| **Read path** | WebSocket — server pushes complete entity payloads on every state change. No polling, no stale cache windows. |
| **Perceived latency** | Optimistic updates — the client applies the change locally before the REST response. The UI responds in under 16ms. |
| **Multi-device sync** | All connected clients for a user receive every state change via WebSocket. No refresh required. |
| **Echo prevention** | Source-aware events — the originating client ignores its own echo by matching the `clientId` on the WebSocket event. |
***
## The Five-Step Data Loop [#the-five-step-data-loop]
Every mutation follows this exact sequence:
***
## Step-by-Step Breakdown [#step-by-step-breakdown]
### Step 1 — Optimistic Update [#step-1--optimistic-update]
When a user performs an action (add note, edit title, delete task), the client applies the change to local state **immediately**, before the network request fires. Three things happen:
1. **The change is applied to the UI.** The user sees the result instantly.
2. **A rollback snapshot is stored.** If the server rejects the mutation, the client reverts to this snapshot.
3. **A pending indicator is shown.** Optimistic entities render with a subtle visual cue (reduced opacity, syncing badge, or a small spinner) so the user understands the change is not yet confirmed. The indicator is removed when the REST response arrives.
For entity **creation**, the client generates a temporary ID (a UUID prefixed with `temp_`) so the new entity can appear in the UI and be referenced before the server assigns a permanent ID.
### Step 2 — REST Mutation [#step-2--rest-mutation]
The mutation is sent to the appropriate REST endpoint with two critical headers:
```http
POST /api/v1/notes HTTP/1.1
Authorization: Bearer
X-Client-Id: abc-123
Content-Type: application/json
{
"meetingId": "mtg_01J...",
"content": "Follow up with the design team"
}
```
| Header | Purpose |
| --------------- | ---------------------------------------------------------------------------------------------------------------------- |
| `Authorization` | User identity (JWT from Clerk). Determines **who** is performing the action. |
| `X-Client-Id` | Client instance identity. Determines **which device/tab** initiated the action. Used exclusively for echo suppression. |
The REST response returns the **complete server-authoritative entity** — including the server-assigned `id`, `createdAt`, `updatedAt`, and `version` fields. The client uses this response to replace its temporary optimistic state with the confirmed server state.
### Step 3 — Event Broadcast [#step-3--event-broadcast]
After the database write succeeds, Core publishes a WebSocket event to **all connected clients** within the event's scope. The event uses the CloudEvents envelope and carries the full entity payload:
```json
{
"specversion": "1.0",
"type": "note.created",
"source": "wordloop-core",
"id": "evt_01J...",
"data": {
"id": "note_01J...",
"meetingId": "mtg_01J...",
"content": "Follow up with the design team",
"createdAt": "2026-04-17T20:00:00Z",
"updatedAt": "2026-04-17T20:00:00Z",
"version": 1
},
"sourceClientId": "abc-123"
}
```
Events carry the full entity state, not a delta. This keeps client logic simple — the receiving client replaces its local copy of the entity directly without applying patch operations or maintaining a change log. The trade-off is larger payloads, which is acceptable for Wordloop's entity sizes.
### Step 4 — Echo Suppression [#step-4--echo-suppression]
The originating client receives the WebSocket event and compares `sourceClientId` against its own client ID:
```
Incoming event sourceClientId: "abc-123"
My clientId: "abc-123"
→ Match. Discard event (UI already reflects this from the optimistic update).
```
Without echo suppression, the originating client would render the change twice — once from the optimistic update and once from the WebSocket event — causing visual flicker and duplicate list entries.
### Step 5 — Cross-Device Sync [#step-5--cross-device-sync]
Other clients connected for the same user receive the identical WebSocket event. Since their `clientId` does not match the `sourceClientId`, they apply the entity payload directly to their local UI state:
```
Incoming event sourceClientId: "abc-123"
My clientId: "def-456"
→ No match. Apply entity to local state. UI updates in real time.
```
No REST call is needed. The WebSocket event contains the complete entity, so the receiving client has everything it needs to render the change.
***
## Client Identity [#client-identity]
### What Is a Client ID? [#what-is-a-client-id]
A `clientId` is a UUID generated **per browser tab** when the application initializes. It is **not** tied to the user's authentication identity — a single user can have multiple client IDs across different tabs and devices.
| Property | Value |
| --------------- | --------------------------------------------------------------------------------------- |
| **Scope** | One per browser tab / app instance |
| **Lifetime** | Created on tab open, discarded on tab close |
| **Persistence** | Stored in `sessionStorage` (survives page refresh within the same tab, not across tabs) |
| **Format** | UUIDv4 (e.g., `abc-123-def-456`) |
### Why Per-Tab, Not Per-Session? [#why-per-tab-not-per-session]
If the client ID were per-session (shared across tabs), a mutation from Tab A would suppress the WebSocket event in Tab B — meaning Tab B would never render the change. Per-tab IDs ensure that only the exact tab that initiated the mutation suppresses the echo.
### Why Client ID, Not Mutation ID? [#why-client-id-not-mutation-id]
Some architectures use a unique `mutationId` per operation instead of a persistent `clientId`. The trade-off:
| Approach | Pros | Cons |
| ------------------------ | ----------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ |
| **Client ID** (Wordloop) | Simpler — one header, no per-mutation tracking state. Echo suppression is a single string comparison. | Cannot distinguish between two of *your own* rapid mutations on the same entity (both are suppressed). |
| **Mutation ID** | Each operation is individually tracked. Can precisely reconcile specific operations. | Requires a pending-mutation queue on the client and mutation-ID propagation through the server. |
Wordloop uses `clientId` because entity operations are independent and non-overlapping — a user does not typically create the same note twice in rapid succession. If Wordloop ever introduces collaborative editing where individual keystrokes must be tracked, mutation IDs would be required.
***
## Event Scoping [#event-scoping]
Not every connected client receives every event. Events are scoped to the relevant audience:
| Scope | Events Delivered To | Example |
| ----------- | ----------------------------------------- | -------------------------------------------------- |
| **User** | All clients authenticated as that user | `note.created`, `task.updated`, `meeting.deleted` |
| **Meeting** | All clients viewing that specific meeting | `transcript.segment.produced`, `insight.generated` |
The WebSocket hub maintains a mapping of `userId → [clientId, clientId, ...]` and a subscription registry of `meetingId → [clientId, clientId, ...]`. When Core emits an event, the hub resolves the target audience and delivers to only those connections.
***
## Initial State Hydration [#initial-state-hydration]
When a client first loads, the WebSocket is not yet connected. The client must establish initial state before subscribing to real-time updates. The sequence is:
Between the REST response and the WebSocket connection being established, events can be lost. To handle this, the client should include a `since` timestamp (from the REST response's latest `updatedAt`) in the WebSocket connection handshake. Core replays any events that occurred after that timestamp during the connection setup.
***
## Edge Cases [#edge-cases]
### Mutation Failure and Rollback [#mutation-failure-and-rollback]
If the REST call fails, the client must **undo the optimistic update** and surface the error. The rollback strategy depends on the error category:
| Error Category | HTTP Status | Retry? | Client Behavior |
| ----------------------------- | ------------- | ------ | -------------------------------------------------------------------------------------------------------------------- |
| **Validation / Client Error** | 400, 409, 422 | ❌ No | Roll back immediately. Surface error to user. The request is structurally wrong and retrying won't help. |
| **Authentication Error** | 401, 403 | ❌ No | Roll back. Redirect to login or refresh token. |
| **Not Found** | 404 | ❌ No | Roll back. Entity was deleted by another client. Surface "item no longer exists" notification. |
| **Server Error** | 500, 502, 503 | ✅ Yes | Retry with exponential backoff + jitter (1s, 2s, 4s, max 3 retries). Roll back only after all retries are exhausted. |
| **Network Timeout** | — | ✅ Yes | Retry once. If it still fails, roll back and surface ambiguous error: "Changes may not have been saved." |
For **network timeouts**, the client cannot know whether the server received and processed the request. If the mutation did succeed server-side, the WebSocket event will eventually deliver the confirmed state — at which point the client should silently accept it rather than showing a duplicate.
### Optimistic ID Reconciliation [#optimistic-id-reconciliation]
When creating a new entity, the client uses a temporary ID (`temp_xxx`) for the optimistic update. When the REST response returns with the server-assigned ID, the client must **replace the temporary ID** everywhere it appears in local state:
```
Optimistic state: { id: "temp_abc", content: "..." }
REST response: { id: "note_01J...", content: "...", createdAt: "..." }
→ Replace temp_abc → note_01J in all local state references
```
The subsequent WebSocket echo is suppressed by `clientId` matching, so no further reconciliation is needed for the originating client.
### WebSocket Event Arrives Before REST Response [#websocket-event-arrives-before-rest-response]
The WebSocket event can arrive at the originating client **before** the REST response under high load. This is safe because:
1. Echo suppression discards the event regardless of timing (the `clientId` matches).
2. The REST response is the authoritative confirmation — it arrives independently and the client reconciles from it.
No special handling is required.
### Concurrent Mutations (Last-Write-Wins) [#concurrent-mutations-last-write-wins]
If two devices edit the same entity simultaneously, the **last write to reach the database wins**. Both REST calls succeed independently, and both produce WebSocket events. Each client receives the other client's update event and replaces its local state.
Wordloop uses last-write-wins, not conflict resolution. This is appropriate for the current entity types (notes, tasks, meeting metadata) where conflicts are rare and the cost of a lost edit is low. If collaborative editing (e.g., simultaneous text editing within a note) is introduced, this section must be revisited with CRDTs or Operational Transform.
### Delete Race Condition [#delete-race-condition]
If Client A deletes an entity while Client B is editing it:
1. Client A's `DELETE` succeeds. Core publishes `note.deleted` over WebSocket.
2. Client B receives `note.deleted` and removes the entity from its UI — even if Client B has unsaved optimistic changes.
3. If Client B's `PATCH` arrives at Core **after** the delete, Core returns `404 Not Found`. Client B rolls back its optimistic update and surfaces the error.
The delete always wins. The client must handle the case where a `deleted` event arrives for an entity the user is currently editing by closing the editor and surfacing a notification.
### Stale Event Ordering [#stale-event-ordering]
Under network jitter or high load, WebSocket events for the same entity can arrive out of order. Each entity carries a `version` field (monotonically incrementing integer) and an `updatedAt` timestamp:
```
Current local state: { id: "note_01J...", version: 3 }
Incoming WS event: { id: "note_01J...", version: 2 }
→ Event version < local version. Discard as stale.
```
The client must **never apply an event whose version is less than or equal to the local version** for the same entity.
### Reconnection and Missed Events [#reconnection-and-missed-events]
When the WebSocket connection drops (network change, server restart, mobile backgrounding), events published during the disconnection window are lost:
The client tracks the `id` of the last received event. On reconnection, it sends this as `lastEventId` in the handshake. Core replays all events after that ID from a short-lived event buffer before resuming the live stream.
**Reconnection strategy:** The client uses **exponential backoff with jitter** to avoid thundering-herd reconnection storms when the server restarts:
| Attempt | Base Delay | With Jitter (±30%) |
| ------- | ---------- | ------------------ |
| 1 | 1s | 0.7s – 1.3s |
| 2 | 2s | 1.4s – 2.6s |
| 3 | 4s | 2.8s – 5.2s |
| 4 | 8s | 5.6s – 10.4s |
| 5+ | 16s (cap) | 11.2s – 20.8s |
The event buffer has a finite retention window. If the client has been disconnected longer than the buffer window, a replay is not possible. In this case, the client must perform a full state re-fetch via REST (the same hydration flow as initial page load) and then resume WebSocket subscription.
### Idempotency on REST Retry [#idempotency-on-rest-retry]
If a REST mutation times out and the client retries, the server may process the same mutation twice — producing two WebSocket events for a single user action.
For **create** operations, the client should generate and send an `Idempotency-Key` header. Core checks this key against a short-lived cache and returns the cached response if the key has been seen, preventing duplicate creation and duplicate WebSocket events.
For **update** and **delete** operations, natural idempotency applies — updating to the same values or deleting an already-deleted entity produces the same result.
```http
POST /api/v1/notes HTTP/1.1
Idempotency-Key: idem_7f3a9c...
X-Client-Id: abc-123
```
### Partial Server Failure [#partial-server-failure]
If the database write succeeds but the WebSocket broadcast fails (hub crash, network partition between Core and hub):
* **Originating client**: Receives the REST `201 Created` response and knows the mutation succeeded. Its optimistic update is confirmed.
* **Other clients**: Miss the WebSocket event and do not update their UI.
This is an eventually-consistent failure. Other clients will receive corrected state on their next REST fetch (page navigation, tab focus) or when the WebSocket reconnects and replays missed events. This is acceptable because the originating client — the device where the user performed the action — always sees the confirmed state.
### Rapid Mutations on the Same Entity [#rapid-mutations-on-the-same-entity]
If a user edits the same entity in rapid succession (typing a title, adjusting a slider), firing a REST call for every keystroke wastes bandwidth and creates ordering hazards where a slow early response overwrites a fast later one.
**Strategy: Debounce + Coalesce**
1. **Debounce the REST call.** Wait until the user pauses interaction (300–500ms of inactivity) before sending the mutation. The optimistic update still applies immediately on every keystroke — only the network request is debounced.
2. **Coalesce intermediate states.** Only the final state is sent to the server, not every intermediate value. If the user types "Hel", "Hell", "Hello" — the server receives one `PATCH` with `"Hello"`.
3. **Cancel stale in-flight requests.** If a new mutation fires while a previous one is still in-flight for the same entity, abort the previous request using `AbortController` to prevent a stale response from overwriting the newer state.
```
User types: H → He → Hel → Hello → [pauses 300ms]
Optimistic UI: H → He → Hel → Hello (each applied immediately)
REST calls: [none] → [none] → [none] → PATCH { content: "Hello" }
```
### Tab Focus Revalidation [#tab-focus-revalidation]
When a browser tab regains focus after being backgrounded, the WebSocket may have silently disconnected without triggering an error event (common on mobile browsers and laptop lid-close). The client should treat tab-focus as a trigger to:
1. **Check WebSocket health.** If the connection is dead, initiate reconnection with `lastEventId` replay.
2. **Revalidate stale queries.** SWR's `revalidateOnFocus` (or equivalent) re-fetches the current view's data via REST to catch any mutations that occurred while the tab was inactive.
This ensures the client is never silently stale after returning from background.
### WebSocket Authentication Lifecycle [#websocket-authentication-lifecycle]
The WebSocket connection authenticates with a JWT during the initial handshake. Since JWTs have a finite lifetime, the connection must handle token expiry and session revocation:
**Token Refresh (Proactive):**
1. The client monitors its JWT expiration. A few minutes before expiry, it refreshes the token via the standard Clerk token refresh.
2. The client sends an `auth.refresh` message over the *existing* WebSocket with the new token.
3. Core validates the new token and associates it with the connection. No reconnection is needed.
**Session Revocation (Server-Initiated):**
1. When a user logs out from any device, or an admin revokes access, Core sends a `session.revoked` event to **all** WebSocket connections for that user.
2. Each client receives the event, closes the WebSocket, clears local state, and redirects to the login screen.
3. Core terminates the server-side connection after sending the event.
**Token Expired (Reactive):**
1. If the token expires without a proactive refresh (client was backgrounded), Core sends a WebSocket close frame with code `4401` (custom "Unauthorized" code).
2. The client refreshes its token and reconnects with the new JWT.
### Cache Reconciliation on Settled [#cache-reconciliation-on-settled]
After every mutation — whether it succeeds or fails — the client should revalidate the affected SWR cache key to ensure the local cache matches the server's authoritative state. This is the `onSettled` pattern:
1. **On success:** The REST response already contains the server-authoritative entity. The client updates the SWR cache with this response. A background revalidation is triggered to catch any concurrent mutations from other devices that may have occurred during the request.
2. **On error:** The rollback restores the snapshot, and a revalidation fetches the current server state to ensure the cache is clean.
This guarantees that even if echo suppression, version comparison, or reconnection logic has a subtle bug, the cache self-heals within one mutation cycle.
***
## What This Pattern Does NOT Cover [#what-this-pattern-does-not-cover]
| Concern | Handled By |
| ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Audio streaming** | Dedicated binary WebSocket pipeline — see [Real-Time WebSocket Streaming](/docs/learn/architecture/system-workflows#2-real-time-websocket-streaming) |
| **ML-generated events** | Transcript segments and insights originate from Pub/Sub consumers, not REST mutations. These always flow through the WebSocket without echo suppression because no client initiated them. |
| **Authentication** | JWT validation on REST and WebSocket handshake — see [Authentication](/docs/learn/architecture/auth) |
| **Pub/Sub worker pipelines** | Asynchronous inter-service communication — see [Unified Asynchronous Meeting Finalization](/docs/learn/architecture/system-workflows#1-unified-asynchronous-meeting-finalization) |
# Infrastructure & Hosting (/docs/learn/architecture/infrastructure)
# Infrastructure & Hosting [#infrastructure--hosting]
Wordloop deploys entirely to managed Google Cloud serverless infrastructure in production. For information about local development and emulation, see the [Local Infrastructure](local-infrastructure.md) page.
## Production Hosting [#production-hosting]
| Service | GCP Resource | Description |
| ----------------- | ---------------- | -------------------------------------------------------------------------------------- |
| **wordloop-docs** | Firebase Hosting | Next.js/Fumadocs static site deployment. |
| **wordloop-app** | Cloud Run | Next.js server utilizing SSR and Route Handlers. |
| **wordloop-core** | Cloud Run | Go REST API. |
| **wordloop-ml** | Cloud Run (x2) | Deployed as two separate services: an HTTP web server and a Pub/Sub background worker. |
| **Database** | Cloud SQL | Managed Postgres 15 database instance. |
| **Messaging** | Cloud Pub/Sub | Managed topics and subscriptions. |
### API Routing (Production) [#api-routing-production]
To ensure the frontend is environment-agnostic, the Next.js `wordloop-app` implements a **Server-Side API Proxy**.
* All frontend fetches are directed to `/api/...`.
* A Next.js Route Handler proxies these requests to the underlying `wordloop-core` URL (defined via the `CORE_API_URL` environment variable at runtime).
* This prevents hardcoding backend URLs during the Next.js build step.
## Environment Configuration [#environment-configuration]
Configuration relies exclusively on environment variables injected at runtime. There are NO configuration files deployed with the containers. See individual service handbooks for specifics.
# Local Infrastructure (/docs/learn/architecture/local-infrastructure)
# Local Infrastructure & Emulation [#local-infrastructure--emulation]
Wordloop utilizes a **hybrid local-first development model** orchestrated via our custom `./dev` CLI.
Instead of running the entire stack in heavy, monolithic Docker containers, we segment the environment:
* **Infrastructure & Observability (Docker):** Stateful services (Postgres, Pub/Sub emulator, Storage emulator) and telemetry tools (Aspire Dashboard) run in Docker.
* **Application Services (Native):** Code bases (`wordloop-core`, `wordloop-ml`, `wordloop-app`, `wordloop-docs`) run natively on your host machine. We use file monitoring tools (`air` for Go, `uvicorn` for Python, and Next.js dev server) to enable **instant hot-reloading**, bypassing the need to rebuild Docker images after every code change.
## Local Port Architecture [#local-port-architecture]
To prevent port collisions, all services follow a well-structured port layout:
### Application Services (Native) [#application-services-native]
| Service | Internal Target | Port | Tooling |
| ----------------- | ------------------- | ------ | ---------------------- |
| **wordloop-app** | Next.js Frontend | `4001` | `next dev` |
| **wordloop-docs** | Fumadocs Site | `4000` | `next dev` |
| **wordloop-core** | Go REST API | `4002` | `air` (hot-reload) |
| **wordloop-ml** | Python API & Worker | `4003` | `uvicorn` (hot-reload) |
### Infrastructure Spaces (Docker) [#infrastructure-spaces-docker]
| Service | Image / Role | Port |
| -------------------- | ----------------------------- | ------- |
| **Aspire Dashboard** | Local Observability UI | `18888` |
| **Postgres** | `postgres:15` | `5432` |
| **Pub/Sub** | `cloud-sdk:emulators` | `8085` |
| **Storage (GCS)** | `oittaa/gcp-storage-emulator` | `8086` |
* **Statefulness:** Postgres data is persisted in a local Docker volume (`db_data`). Emulators spin up ephemerally. The Core service programmatically provisions required Pub/Sub topics and buckets on boot.
* **Bootstrapping:** Use `./dev start all` to bring up the Docker infra and native host services concurrently. Run `./dev help` for more granularity.
## Environment Configuration [#environment-configuration]
Configuration relies exclusively on environment variables injected at runtime. There are NO configuration files deployed with the containers. See individual service handbooks for specifics.
# Observability (/docs/learn/architecture/observability)
# Observability [#observability]
Instead of emitting fragmented logs, metrics, and traces, we generate high-cardinality, wide events (Spans) using **OpenTelemetry (OTel)**. These spans serve as the single source of truth for the health, performance, and behavior of the entire platform.
## Tracing Architecture [#tracing-architecture]
We utilize W3C Trace Context headers to propagate traces across every service boundary, ensuring that identity and context are never severed from the symptom.
* **App (Next.js):** Generates the root span for user interactions, authenticates via Clerk, and injects `clerk_user_id` into OTel Baggage as `enduser.id`.
* **Core (Go):** Uses `otel/sdk/go` to trace HTTP handles, Postgres queries (via pgx), and Pub/Sub publishing. It automatically reads W3C Baggage from incoming requests and propagates it via Pub/Sub attributes.
* **ML (Python):** Uses `opentelemetry-python` to extract spans and identity Baggage from incoming Pub/Sub messages, trace ML pipelines, and propagate context when calling Core.
### Span-Derived Metrics [#span-derived-metrics]
We do **not** manually instrument and roll up traditional RED (Rate, Errors, Duration) metrics at runtime. Emitting isolated metrics destroys the context necessary for debugging.
Instead, our system relies on dynamic aggregations of our wide spans. Because every span contains the exact duration, status code, and rich metadata (tenant IDs, roles), our observability backend continuously calculates and visualizes RED metrics derived directly from the trace stream. If an aggregate error rate spikes, engineers can simply click the spike to see the exact traces that generated it.
## Logging [#logging]
To ensure structural consistency, all logs are written as structured JSON and natively integrate the OpenTelemetry context.
* **Go Logging:** Implemented via `slog` with an OpenTelemetry handler.
* **Python Logging:** Implemented via `structlog` naturally wrapping the OTel context.
Every log emitted within the scope of a request automatically inherits the `trace_id` and `span_id`, allowing developers to find any application log by looking at its parent trace.
## Telemetry Destinations & Sampling [#telemetry-destinations--sampling]
Our services act purely as OTLP (OpenTelemetry Protocol) emitters. They never communicate directly with the final observability storage backend. Data routing and sampling are centrally managed.
### Local Development (.NET Aspire) [#local-development-net-aspire]
Locally, all services export OTLP data to the **.NET Aspire Dashboard**.
1. Run `./dev dash obs` (or start it automatically via `./dev start infra`).
2. Access the UI at [http://localhost:18888](http://localhost:18888).
3. You can view Traces, Metrics, and Structured Logs across all containers in real-time. Since `enduser.id` Baggage is propagated, you can search for a user's exact ID to trace their entire session timeline end-to-end.
### Production Pipeline & Tail-Based Sampling [#production-pipeline--tail-based-sampling]
In production, SDKs do not push directly to Google Cloud. We deploy instances of the **OpenTelemetry Collector Gateway** to act as an intermediary buffer.
Because we employ **Tail-Based Sampling** for financial responsibility, the Collector buffers the entire distributed trace. Once the trace is complete, the Collector executes our sampling rules:
* **100% Sampling for Errors & High Latency:** If any span anywhere in the trace breaches our latency threshold or contains an error, the entire trace is preserved and exported to Google Cloud.
* **5% Sampling for Happy Paths:** If the request succeeded without anomalies, we drop 95% of them at the Collector level to save ingest and storage costs without sacrificing visibility into system failures.
# System Architecture (/docs/learn/architecture/overview)
# System Architecture Overview [#system-architecture-overview]
Wordloop is a localized, intelligence-first platform structured so that each service owns an isolated domain boundary, communicating through strictly typed, declarative contracts.
## High-Level Topology [#high-level-topology]
## Service Boundaries [#service-boundaries]
The platform is decoupled into three primary execution domains:
### `wordloop-core` (Go) [#wordloop-core-go]
The absolute system of record. Responsible for transactional orchestration, state management, Clerk webhook syncing, and exposing the primary REST API via [Huma](https://huma.rocks).
* [Core Service Handbook](../services/core/index.md)
### `wordloop-ml` (Python) [#wordloop-ml-python]
The async intelligence engine. Stateless, event-driven, and built on FastAPI. It consumes Pub/Sub events from Core, interfaces with external APIs (AssemblyAI), and uses a symmetric service token to push structured data back to Core.
* [ML Service Handbook](../services/ml/index.md)
### `wordloop-app` (Next.js) [#wordloop-app-nextjs]
The presentation layer built on React Server Components. Authenticates via Clerk and communicates with Core via Orval-generated API clients wrapped in a Next.js server-side proxy route.
* [App Service Handbook](../services/app/index.md)
## Communication Patterns [#communication-patterns]
The client–server data architecture follows the **[Optimistic Mutation with Echo-Suppressed Streaming](data-flow)** pattern: REST for writes, WebSocket for reads, with optimistic UI and source-aware echo suppression for multi-device sync. Contracts act as the sole source of truth. Hand-written API clients are forbidden.
| Pattern | Mechanism |
| ----------------------- | --------------------------------------------------------------------------------------------------- |
| **Mutations (CUD)** | REST via Orval-generated clients. Optimistic UI with rollback. Next.js proxies to circumvent CORS. |
| **Streaming Reads** | WebSocket pushes complete entity payloads on every state change. Echo suppressed via `X-Client-Id`. |
| **Worker Dispatch** | GCP Pub/Sub utilizing strict AsyncAPI schemas for inter-service async work. |
| **Internal Writebacks** | Internal REST calls authenticated via strict Service Tokens (ML → Core). |
| **Identity Sync** | Webhooks from Clerk ingested to local Postgres `users` table. |
See the dedicated documentation for [Authentication](auth.md), [Data Flow](data-flow), [Observability](observability.md), and [Hosting](infrastructure.md).
# System Workflows (/docs/learn/architecture/system-workflows)
# System Workflows [#system-workflows]
This document outlines the vital data pipelines and chronological component interactions driving the Wordloop platform.
## 1. Unified Asynchronous Meeting Finalization [#1-unified-asynchronous-meeting-finalization]
WordLoop utilizes a singular background processing pipeline capable of finishing *both* batch-uploaded raw audio files and finalizing severed/abandoned WebSocket Live meetings.
By deferring all complex generation algorithms to the asynchronous `TranscriptionJobMessage`, Wordloop protects the stateful live recording connections from cascading OOM crashes while ensuring offline tasks natively self-heal broken streams.
## 2. Real-Time WebSocket Streaming [#2-real-time-websocket-streaming]
The synchronous audio pipeline designed around high-availability, zero-in-memory-buffering, and multi-endpoint data dispersion.
## 3. Voice Context Pipelines [#3-voice-context-pipelines]
Workflows for orchestrating speaker identity, embeddings, and context.
Vector matching operations are computationally intensive. The frontend must expect varying latency when querying nearest neighbors.
## 4. AI Chat Context Orchestration [#4-ai-chat-context-orchestration]
Retrieving meeting context for intelligent conversational RAG queries.
# Concepts (/docs/learn/concepts)
# Concepts [#concepts]
A shared vocabulary is not a cosmetic concern. When every engineer on the team means the same thing by "segment," "synthesis," or "task," design conversations become faster and bugs become easier to describe. This page is the canonical glossary of the domain; use it when writing code, specs, or tests.
## Core entities [#core-entities]
**Meeting** — the primary unit of work in Wordloop. A Meeting is a bounded session captured in the system, tied to a user, optionally attended by multiple People, and producing a Transcription, a MeetingSynthesis, and Tasks. The `meetings` table and the `/meetings` routes are the center of gravity for the entire platform.
**Person** — a contact record representing an attendee of one or more Meetings. A Person carries identity fields (display name, email, title, company) and an optional voice model used to attribute TranscriptSegments to a speaker. People are distinct from Users — a User is someone with a Wordloop account; a Person is someone who appeared in a meeting, with or without an account.
**Transcription** — the speech-to-text record attached to a Meeting. A Transcription aggregates TranscriptSegments as they are produced in near-real-time by the ML service and reaches a `completed` status when the meeting closes.
**TranscriptSegment** — the atomic unit of the Transcription. Each segment carries a speaker label, the attributed Person (if matched), text, start and end timestamps, a confidence score, and a `is_final` flag. Most ML processing — embeddings, topic extraction, synthesis — operates over segments.
**MeetingSynthesis** — the AI-generated summary attached to a Meeting. Contains a headline, a prose summary, key points, a list of Topics, and nested TalkingPoints. Produced by the ML service after the Transcription finalises; can be regenerated on demand.
**Topic** — a thematic cluster extracted from a Meeting's segments. Topics carry a name, a summary, and the set of TranscriptSegments that contributed to them. A Meeting has many Topics; a Topic belongs to one Meeting.
**TalkingPoint** — a specific point or claim within a Topic. TalkingPoints are the most granular unit of the MeetingSynthesis, surfaced in the recap UI as bullets under each Topic.
**Task** — an action item extracted from a Meeting. Tasks are assignable, trackable, and hierarchical (via `parent_task_id`). They live beyond the Meeting itself and are the primary output a user acts on after review. Status values: `pending`, `in_progress`, `completed`, `canceled`.
## Supporting entities [#supporting-entities]
**User** — a Wordloop account holder, identified via Clerk. A User has an associated Person record (the voice model and contact info for their own participation in meetings). JIT-provisioned on first sign-in.
**Note** — a free-form annotation attached to any entity (`meeting`, `person`, `task`, etc.) via a polymorphic `subject_type` / `subject_id` pair.
**Tag** — a label a user can apply to Meetings, People, or Tasks for organisation.
## Cross-cutting concepts [#cross-cutting-concepts]
**Voice model** — the speaker-identification vector attached to a Person. The ML service matches incoming audio against stored voice vectors to attribute TranscriptSegments to a specific Person rather than an anonymous `SpeakerLabel`. Voice models are built incrementally from verified segments.
**JIT provisioning** — "just-in-time" user creation. When a user signs in via Clerk for the first time, the Core API reads their Clerk profile and creates both the local User record and the corresponding Person record on demand. No webhooks, no seeding.
**Echo suppression** — the mechanism by which a person's own outgoing audio is not re-ingested as incoming segments. A subtle but load-bearing piece of the real-time pipeline; see [Real-Time principles](/docs/principles/system-design/real-time) for the design model.
## Further reading [#further-reading]
* [Architecture Overview](/docs/learn/architecture/overview) — how these entities are distributed across services.
* [Data Flow](/docs/learn/architecture/data-flow) — the lifecycle of a segment, from microphone to synthesis.
* [Reference / Glossary](/docs/reference/glossary) — the complete, link-resolvable vocabulary.
# Platform Services (/docs/learn/services)
# Platform Services [#platform-services]
Wordloop is composed of four services, each with a distinct responsibility, language, and runtime. This section contains one handbook per service — how it is structured, what it owns, and how to work on it.
The documentation site itself (`wordloop-docs`) is a fourth deployable but is treated as a piece of platform tooling rather than an application surface; it is documented via the [Reference](/docs/reference) and [Guides](/docs/guides) sections.
# Runbooks (/docs/operations/runbooks)
# Runbooks [#runbooks]
A runbook is a script. It is written so a tired, stressed engineer can follow it at 3am and restore service without having to reason from first principles. Each runbook in this section targets a specific, recognisable failure symptom and walks through detection, diagnosis, mitigation, and recovery.
## Runbook authoring [#runbook-authoring]
New runbooks are welcome — every incident we resolve should teach the team one. The template:
```markdown
# Runbook:
**Owner:**
**Last tested:** YYYY-MM-DD
**Pager rule:**
## Goal
Restore when .
## Detection
How to confirm this is the failure you are hitting.
## Diagnosis
Fast checks to localise the fault.
## Mitigation
Immediate actions to restore user-facing health.
## Recovery
Steps to return to a fully healthy state.
## Rollback
How to undo each state-changing step.
## Escalation
When and whom to escalate to.
## Postmortem
Link to the incident doc once one exists.
```
## Available runbooks [#available-runbooks]
*The catalogue is populated as real incidents drive new runbooks. Writing a runbook "just in case" is usually wasted effort; writing one in the follow-up from an actual incident captures the specific, sharp-edged lessons a generic version would miss.*
See [On-Call](/docs/operations/on-call) for rotation logistics and [Troubleshooting](/docs/operations/troubleshooting) for exploratory diagnostic trees.
# Agent-Native Systems (/docs/principles/ai-native/agent-native-systems)
# Agent-Native Systems [#agent-native-systems]
## TL;DR [#tldr]
AI agents read our APIs, our events, and our documentation programmatically. Building agent-native systems means designing every interface — contract, spec, doc page — so that an agent can consume it without a human translator in the loop. MCP for structured tool surfaces, `llms.txt` for discoverable documentation, stable error codes, rich OpenAPI examples — the pieces compose into a system agents can work inside.
## Why this matters [#why-this-matters]
The organisation that takes agent-readiness seriously in 2026 gets a multiplier on every engineer's output. Agents write code faster, answer questions faster, and onboard faster when the systems they are working against are designed for them. The organisation that treats agent-readiness as an afterthought pays the cost in a constant low-grade friction: agents that need babysitting, outputs that need correction, onboarding that requires a human bootstrapping step for every task. The investment is modest; the return compounds.
## Our principles [#our-principles]
### 1. Every interface has a machine-consumable specification [#1-every-interface-has-a-machine-consumable-specification]
HTTP endpoints have OpenAPI; events have AsyncAPI; documentation has `llms.txt` and `.md` exports; the tools an agent should use have MCP schemas. An interface without a machine-consumable spec is off-limits to agents by default.
### 2. Specifications include descriptions, examples, and constraints [#2-specifications-include-descriptions-examples-and-constraints]
A spec that says a field is `string` without saying what the string represents is a spec an agent cannot use correctly. We write descriptions, give examples, enumerate finite domains, and state constraints explicitly. The standard is: a competent agent should be able to use the interface without reading the implementation.
### 3. MCP is our standard tool surface [#3-mcp-is-our-standard-tool-surface]
When we want agents to interact with Wordloop beyond reading, we expose the capability through a Model Context Protocol server. Tools are typed, documented, and error-reporting; resources are typed and fetchable. A bespoke prompt-engineering integration is a deprecated pattern — MCP is the interop.
### 4. `llms.txt` and `.md` exports are shipped alongside docs [#4-llmstxt-and-md-exports-are-shipped-alongside-docs]
Every docs site ships `llms.txt` (the index) and `llms-full.txt` (the consolidated corpus), plus a `.md` export for every page. Agents navigate the docs the same way a human would, but through a plain-text channel that does not require HTML parsing.
### 5. Error responses are structured, stable, and actionable [#5-error-responses-are-structured-stable-and-actionable]
Every error carries a stable code, a human message, and machine-readable details. The code is catalogued in [Reference / Errors](/docs/reference/errors) and never renumbered. Agents branch on codes; they do not parse prose. This is the single highest-leverage API hygiene choice for agent-readiness.
### 6. Idempotency enables retry [#6-idempotency-enables-retry]
Agents retry. Systems that penalise retry — duplicate records, doubled charges, phantom events — cannot be worked against reliably. Every write endpoint accepts an idempotency key ([API Design](/docs/principles/system-design/api-design)); every event consumer is de-duplicating ([Integration Patterns](/docs/principles/system-design/integration-patterns)).
### 7. Outputs are structured where it matters [#7-outputs-are-structured-where-it-matters]
When an agent is producing a structured result — a database record, an API payload, a configuration fragment — we use schema-constrained generation (JSON schema, tool calling) rather than free-text-then-parse. Free-text parsing is how agent pipelines become brittle.
### 8. Documentation is reviewed for agent consumption [#8-documentation-is-reviewed-for-agent-consumption]
When we write a page, we ask: would an agent reading this through MCP understand what to do? If the page assumes visual hierarchy, colour, or context that does not survive serialisation, we re-shape it. Agent-readiness is a docs quality attribute, not a separate track of work.
## How we apply this [#how-we-apply-this]
* [/llms.txt](/llms.txt) and [/llms-full.txt](/llms-full.txt) — the canonical entry points.
* The MCP server at `scripts/mcp-server.ts` — the current tool and resource surface.
* [API Design](/docs/principles/system-design/api-design) — the OpenAPI discipline that makes our APIs agent-consumable.
* [Documentation](/docs/principles/foundations/documentation) — the dual-audience docs stance.
## Anti-patterns we reject [#anti-patterns-we-reject]
* **Auth flows that require human interaction.** A consent screen with a "click here" button is a dead end for an automated client. Design auth that supports programmatic token issuance.
* **Prose-only error responses.** `"something went wrong"` is unusable by any automated caller.
* **Undocumented "internal" APIs.** An API without a spec is an API that agents cannot use — which means humans will be asked to do the thing an agent should be doing.
* **MCP tools that wrap everything.** An MCP server that mirrors every endpoint in the API is noise. Expose the capabilities agents actually need, named in the agent's vocabulary.
* **Documentation that leans on rendered visuals.** An architecture diagram nobody can parse from Markdown is a diagram an agent cannot read. Prefer Mermaid source in the Markdown.
## Further reading [#further-reading]
* *Model Context Protocol* ([modelcontextprotocol.io](https://modelcontextprotocol.io)) — the canonical MCP specification.
* *llms.txt specification* ([llmstxt.org](https://llmstxt.org)) — the dual-audience docs convention.
* *OpenAPI Specification* ([openapis.org](https://www.openapis.org)) — the HTTP contract format.
* *Anthropic's agent engineering posts* — practical patterns for building agents against real APIs.
* *Simon Willison's blog* ([simonwillison.net](https://simonwillison.net)) — ongoing, practical commentary on the state of tooling.
# AI Engineering (/docs/principles/ai-native/ai-engineering)
# AI Engineering [#ai-engineering]
## TL;DR [#tldr]
AI engineering is software engineering with a non-deterministic component in the loop. We treat prompts as code, evaluations as tests, context as a first-class design surface, and agents as distributed systems. The discipline is about making probabilistic systems behave predictably enough to ship.
## Why this matters [#why-this-matters]
Every team that has tried to ship an AI feature has learned the same lesson the hard way: the part that feels like magic in a demo is the part that fails in unpredictable ways in production. The gap between "it works in the playground" and "it works for every user, every day" is where AI engineering happens. The discipline treats the non-determinism as an engineering problem — measurable, testable, and addressable — rather than as an inherent limitation to shrug at.
## Our principles [#our-principles]
### 1. Prompts are code [#1-prompts-are-code]
Prompts live in version control, are reviewed, are tested, and are versioned. A prompt change is a code change; it ships through the same PR review as any other change. "We tweaked the prompt in the dashboard" is how a team loses the ability to reason about its own AI behaviour.
### 2. Evals are tests [#2-evals-are-tests]
Every meaningful AI behaviour has an eval: a scored comparison of model output against a reference. Evals run in CI; thresholds are committed; regressions block merge the same way unit-test failures do. Without evals, "did we make the model worse?" is unanswerable, which means every improvement is also a potential regression you will discover from users.
### 3. Context is the interface [#3-context-is-the-interface]
The content of the context window — what system prompt, what few-shot examples, what retrieved documents, what tool outputs — is the single biggest lever on model behaviour. We design it deliberately, measure its token budget, and treat it as a first-class interface. "Throw in everything relevant" is the anti-pattern that blows up the bill and dilutes the signal.
### 4. Retrieval matters more than the model [#4-retrieval-matters-more-than-the-model]
For most RAG systems, the retrieval layer determines the ceiling. A clever model with bad retrieval gives confident nonsense; a boring model with good retrieval gives boring, correct answers. We invest in the retrieval quality — indexing, ranking, reranking, chunk boundaries — before we invest in the model choice.
### 5. Model outputs are validated at the boundary [#5-model-outputs-are-validated-at-the-boundary]
Every model output that crosses into code is validated: shape, length, content, and expected enumerations. Parse failures are handled explicitly, not allowed to propagate. A model output flowing into business logic without validation is an injection vector waiting to happen.
### 6. Agents are distributed systems [#6-agents-are-distributed-systems]
An agent loop — model plans, model takes action, agent observes, model re-plans — has all the problems of a distributed system: retries, idempotency, timeouts, failure isolation. We apply the same patterns ([Integration Patterns](/docs/principles/system-design/integration-patterns)): bounded retries, circuit breakers, auditable history. The hardest agent failures are system failures, not model failures.
### 7. Cost is part of the evaluation [#7-cost-is-part-of-the-evaluation]
A prompt that is 10% better but 5× more expensive is not obviously better. Evals track quality, latency, *and* cost, and decisions about which configuration to ship consider all three. Cost-unaware evaluation is how an AI feature becomes a cost incident after launch ([Cost Engineering](/docs/principles/delivery/cost-engineering)).
### 8. Human oversight is designed in [#8-human-oversight-is-designed-in]
For high-stakes AI outputs — a recap that a user will act on, an automated action taken on behalf of a user — we design the review point deliberately. The human reviewer gets a summary, not a wall of text; the review UX is built alongside the AI feature, not retrofitted. "Let the model do it" without a review loop is a promise the model will eventually fail.
## How we apply this [#how-we-apply-this]
* [ML Systems](/docs/principles/stack/ml-systems) — the implementation principles for the Python ML service.
* [Agent-Native Systems](/docs/principles/ai-native/agent-native-systems) — the flip side, making our interfaces consumable by agents.
* [Observability](/docs/principles/quality/observability) — the trace surface for model calls.
* [Testing](/docs/principles/foundations/testing) — the broader testing discipline evals sit inside.
## Anti-patterns we reject [#anti-patterns-we-reject]
* **"The model will figure it out."** Hope is not a design.
* **Prompts as configuration.** Untracked prompts drift silently, and evals cannot catch drift they are not told about.
* **Over-stuffed context windows.** Throwing the kitchen sink at the model is usually how quality *decreases*.
* **Skipping evals "this once."** This once becomes always. Evals compound when you have them and compound against you when you do not.
* **Agent loops without termination.** A loop without a clear exit condition is how a runaway agent becomes a runaway bill.
* **Deterministic reasoning on top of probabilistic output.** If you need a number, ask for a number in a structured schema. Do not regex-extract it from prose.
## Further reading [#further-reading]
* *Prompt Engineering Guide* ([promptingguide.ai](https://www.promptingguide.ai)) — the practitioner's summary of current patterns.
* *Evaluating and Reinforcing LLM Behaviors*, Shreya Shankar et al. — the academic grounding for eval design.
* *Anthropic's Building Effective Agents* — the reference for agent architecture patterns.
* *Context Engineering* (Shopify, 2024; see public writeups) — the emerging discipline that elevates context design to first-class engineering.
* *A Survey on Retrieval-Augmented Generation*, multiple authors — RAG ground truth.
# AI-Native (/docs/principles/ai-native)
# AI-Native [#ai-native]
Wordloop is AI-native in two directions: the product runs on AI (transcription, recap, embedding), and the team builds with AI (agents write substantial code, read documentation programmatically, and contribute to reviews). Both directions demand a stance — on how models are integrated, how agents consume our interfaces, and how we keep a human on the hook for outcomes.
Related reading: [Hexagonal Architecture](/docs/principles/system-design/hexagonal-architecture) — the structural choice that, more than any other, determines how effectively an agent can contribute to a codebase.
# Cost Engineering (/docs/principles/delivery/cost-engineering)
# Cost Engineering [#cost-engineering]
## TL;DR [#tldr]
Cost is a non-functional requirement with a dashboard and a dollar sign. Every significant architectural decision considers cost-per-user and cost-per-call; every service has a budget it lives inside; surprising spend is an incident. FinOps is how we stay honest about the economics of running what we build.
## Why this matters [#why-this-matters]
Most teams discover cost too late — after a quarterly bill raises eyebrows in a meeting. By then, the decisions that drove the cost are in production, have consumers, and are expensive to reverse. Cost engineering is the discipline of making the economic consequences of decisions visible at the point of the decision. It turns cost from a finance concern into an engineering variable.
## Our principles [#our-principles]
### 1. Cost is a first-class metric [#1-cost-is-a-first-class-metric]
Cost-per-call, cost-per-user, cost-per-feature — all tracked alongside latency and error rate. A feature's success includes its unit economics, not just its engagement numbers. A team that does not know what its features cost cannot reason about trade-offs that matter.
### 2. Budgets are set and defended [#2-budgets-are-set-and-defended]
Every significant service runs inside a cost budget. The budget is set at design time, reviewed monthly, and treated as a commitment. Exceeding budget triggers the same response as exceeding any other SLO: investigate, remediate, or explicitly negotiate an increase.
### 3. Autoscaling is designed, not enabled [#3-autoscaling-is-designed-not-enabled]
Autoscaling is a tool with sharp edges. Aggressive autoscaling on a bursty workload can multiply cost without improving user experience; conservative autoscaling on a steady workload wastes headroom. Each scaling policy is tuned per workload with the production load profile in mind, not set to vendor defaults and left.
### 4. Cheap queries beat fast queries [#4-cheap-queries-beat-fast-queries]
The fastest query is the one that does not run. We cache what we can, compute what we must, and denormalise when the read-to-write ratio justifies it. A cheap-and-fast query is a rare combination; when they conflict, the cheap version is usually the right default.
### 5. Egress is expensive; plan for it [#5-egress-is-expensive-plan-for-it]
Cloud provider egress is the most mispriced line item in most bills. Inter-region chatter, chatty logs, bulky screenshots uploaded constantly — these add up. We place data where its consumers are, batch where we can, and compress where it is cheap to do so.
### 6. AI spend has the same discipline [#6-ai-spend-has-the-same-discipline]
Every model call has a measured cost and a caching strategy. Prompts are versioned with token-count measurement; expensive prompts are justified by value. "Just pass the whole context to the largest model" is how an AI feature becomes a cost incident ([ML Systems](/docs/principles/stack/ml-systems)).
### 7. Reservations and commits where they pay [#7-reservations-and-commits-where-they-pay]
For predictable baseline workloads, reserved instances and committed-use discounts save 30-50% over on-demand. The discipline is to match the reservation to the baseline — over-reserving locks us in, under-reserving wastes the committed spend.
### 8. FinOps is a practice, not an office [#8-finops-is-a-practice-not-an-office]
Cost engineering is something every team does, not a team that does it on behalf of others. The central function provides tooling and visibility; the distributed decisions are made by the teams that built the spend.
## How we apply this [#how-we-apply-this]
* [Observability](/docs/principles/quality/observability) — the measurement substrate for cost per unit.
* [ML Systems](/docs/principles/stack/ml-systems) — the cost discipline for model calls.
* [Platform](/docs/principles/delivery/platform) — the shared infra that every team's cost sits on.
* [Performance](/docs/principles/quality/performance) — cheap code is often also fast code.
## Anti-patterns we reject [#anti-patterns-we-reject]
* **"We will optimise cost later."** Later never comes; the architecture is what it is by then.
* **Autoscale-and-forget.** Default autoscaling on a workload you have not profiled is how you get a thousand-dollar day.
* **Chatty logs forever.** Unstructured debug logs at volume are a non-trivial line on the bill.
* **AI calls without budget.** Model spend without a measured cost-per-request grows silently until it does not.
* **"It's just pennies."** Pennies × N × daily = a real number. Track it.
## Further reading [#further-reading]
* *Cloud FinOps*, Storment & Fuller — the canonical text on cross-functional cost management.
* *The Cost of Complexity*, Frederic Lardinois (various articles) — the essays on why complex architectures cost more than they appear.
* *AWS Well-Architected Framework — Cost Optimization pillar* — applicable beyond AWS, useful as a checklist.
* *FinOps Foundation framework* ([finops.org](https://finops.org)) — the practitioner's handbook.
# Developer Experience (/docs/principles/delivery/devex)
# Developer Experience [#developer-experience]
## TL;DR [#tldr]
A team ships as fast as its feedback loop lets it. We invest deliberately in the inner loop — the seconds between a code change and the evidence that the change works — because every second saved there is paid back a thousand times over across the team. `./dev` is our golden path, DORA metrics are how we measure the loop, and friction in the loop is an engineering bug.
## Why this matters [#why-this-matters]
The single largest predictor of a team's output, over months and years, is the quality of its feedback loop. A team that sees the result of a change in five seconds ships more and ships better than a team that sees it in five minutes — not because the individuals are smarter, but because the loop of hypothesis-and-test runs an order of magnitude more often. Developer experience is not a perk; it is an engineering lever.
## Our principles [#our-principles]
### 1. The inner loop is sacred [#1-the-inner-loop-is-sacred]
The inner loop is the sequence from "I think this code will work" to "yes or no, here is the evidence." We invest in making this loop as short as it can be: incremental compilation, test selection, hot reload, one-command bootstrapping, fast linting. Every second shaved off the inner loop multiplies across every engineer, every day.
### 2. `./dev` is the single entry point [#2-dev-is-the-single-entry-point]
Every local task — start, stop, test, lint, migrate, deploy, generate — runs through `./dev`. One command to remember, one tool to teach a new engineer, one surface to improve. Proliferating ad-hoc scripts in `Makefile`, `package.json`, and `bin/` is how a developer experience becomes a treasure hunt.
### 3. Golden paths, not mandatory paths [#3-golden-paths-not-mandatory-paths]
The golden path is the well-trodden, well-supported way to do a common task. It is the default, and it is the path new engineers and agents follow by default. Deviation is allowed when a task genuinely does not fit, but the deviator pays the cost of their own tooling. Golden paths concentrate investment; mandatory paths breed resentment.
### 4. DORA metrics keep us honest [#4-dora-metrics-keep-us-honest]
Deployment frequency, lead time for changes, change failure rate, mean time to recover — the four DORA metrics are how we measure whether the delivery system is healthy. We track them, surface them, and react to them. A regression in any one of the four is a signal to invest in the loop.
### 5. Onboarding time-to-first-value is a design target [#5-onboarding-time-to-first-value-is-a-design-target]
A new engineer should reach their first local contribution — "I changed something and I can see the change" — in their first day. A new service should reach its first deploy in the first week. These are targets we hold ourselves to, and regressions here are treated as bugs.
### 6. Documentation is part of the loop [#6-documentation-is-part-of-the-loop]
A command you cannot find is a command you do not use. Every `./dev` subcommand has a reference entry, every golden path has a guide, every service has a handbook. The documentation exists so the loop does not depend on tribal memory.
### 7. Local environments match production shape [#7-local-environments-match-production-shape]
The local stack uses the same Postgres version, the same Pub/Sub contract, the same container runtime. "It works on my machine" is eliminated by eliminating the gap between the machines. Emulation over mocks ([Testing](/docs/principles/foundations/testing)) applies here too.
### 8. Friction is filed as a bug [#8-friction-is-filed-as-a-bug]
If a process is painful, that pain is a bug. File it, prioritise it, fix it. "Everyone deals with it" is how chronic friction becomes chronic velocity loss. The developer experience team — or whoever is the local maintainer of `./dev` — owns the backlog the same way a product team owns its user-bug backlog.
## How we apply this [#how-we-apply-this]
* [CLI Reference](/docs/reference/cli) — the surface of `./dev`.
* [Quickstart](/docs/start/quickstart) — the first-contact experience we measure.
* [Platform](/docs/principles/delivery/platform) — the broader internal platform `./dev` is a part of.
* [Progressive Delivery](/docs/principles/delivery/progressive-delivery) — the outer loop the inner loop feeds into.
## Anti-patterns we reject [#anti-patterns-we-reject]
* **"Follow the README and read between the lines."** Onboarding that depends on tacit knowledge is not onboarding.
* **Five CLIs for five tasks.** `./dev` is one. A second CLI earns its existence by solving a problem `./dev` cannot.
* **Skip-the-test culture.** Fast-but-unreliable tests are worse than slow-reliable tests. The inner loop is made fast by honest investment, not by cheating.
* **DORA theatre.** Tracking the metric while not responding to it is worse than not tracking it at all.
* **Ignoring friction.** If you find a sharp edge, file the ticket. Do not route around it silently.
## Further reading [#further-reading]
* *Accelerate*, Forsgren, Humble, Kim — the empirical foundation for DORA metrics.
* *The DevOps Handbook*, Kim et al. — the full treatment of the inner-and-outer loop view.
* *Team Topologies*, Skelton & Pais — the organisational side of platform and golden paths.
* *Developer Experience: Concept and Definition* (Fagerholm & Münch, 2012) — the academic framing that predates the modern DevEx term.
# Delivery (/docs/principles/delivery)
# Delivery [#delivery]
Delivery is the discipline of turning code into running software that users can feel. The pages in this section describe the four practices that determine whether our delivery loop is a source of leverage or a source of toil: developer experience, progressive delivery, platform engineering, and cost engineering.
# Platform (/docs/principles/delivery/platform)
# Platform [#platform]
## TL;DR [#tldr]
The platform is the substrate every application team builds on: the local stack, the CI/CD pipeline, the observability collector, the secrets manager, the IDP that fronts all of it. We treat the platform as a product — it has users (us), a backlog, a quality bar, and explicit investment. A good platform makes the right thing the easy thing.
## Why this matters [#why-this-matters]
Every team in a multi-service organisation eventually arrives at the same realisation: the biggest drag on productivity is not the code the team writes, but the accumulated friction of the common plumbing every project has to assemble. A platform that handles the plumbing well turns that friction into a paved road. A platform that does not becomes a tax every project pays repeatedly. The quality of the platform is a direct multiplier on the output of every engineer on top of it.
## Our principles [#our-principles]
### 1. Platform is a product, with users and a roadmap [#1-platform-is-a-product-with-users-and-a-roadmap]
The people who build the platform have explicit users — the application engineers — and treat their work as a product: backlog, priorities, measurement, feedback. A platform maintained "when we have time" decays; a platform treated as product investment compounds.
### 2. Self-service is the goal [#2-self-service-is-the-goal]
Every common task — spinning up a new service, requesting a secret, adding an OTel dashboard, changing a feature flag — should be self-service. When an application team has to file a ticket and wait for the platform team, the platform is the bottleneck. Self-service is the acid test.
### 3. Golden paths over policy [#3-golden-paths-over-policy]
We paved specific paths — how to create a service, how to deploy, how to observe — and we make those paths the easiest route. Policy documents without paved paths produce compliance in shape but drift in substance.
### 4. `./dev` is the platform's front door [#4-dev-is-the-platforms-front-door]
For local workflows, `./dev` is the abstraction over every underlying tool: Docker, pnpm, uv, Air, migrate. The platform team maintains `./dev`; application teams use it without needing to know what is under it. See [DevEx](/docs/principles/delivery/devex).
### 5. One paved-road CI pipeline [#5-one-paved-road-ci-pipeline]
One pipeline definition for every Go service; one for every Python service; one for every TypeScript service. Teams that deviate earn the cost of maintaining their own pipeline. This is how we prevent snowflake CI configurations from accumulating.
### 6. Observability is part of the platform [#6-observability-is-part-of-the-platform]
Traces, metrics, and logs flow through the same collector, into the same backend, onto the same dashboards. Observability set up by each team independently ([Observability](/docs/principles/quality/observability)) is observability broken in five different ways.
### 7. The platform gets the same scrutiny as the product [#7-the-platform-gets-the-same-scrutiny-as-the-product]
Platform code is reviewed, tested, versioned, and deployed the same way product code is. A broken platform release can hurt every team at once, so the bar is actually higher. "It is just tooling, ship it" is how a platform becomes an obstacle.
### 8. Measure what the users feel [#8-measure-what-the-users-feel]
Platform success is measured by the application teams' outcomes — DORA metrics, onboarding time, number of tickets filed against the platform. Not by the platform team's own output metrics, which can be excellent while the users are miserable.
## How we apply this [#how-we-apply-this]
* [CLI Reference](/docs/reference/cli) — the `./dev` surface.
* [DevEx](/docs/principles/delivery/devex) — the developer-facing experience the platform enables.
* [Observability](/docs/principles/quality/observability) — the centralised telemetry substrate.
* [Progressive Delivery](/docs/principles/delivery/progressive-delivery) — the CI/CD pipeline as a platform service.
## Anti-patterns we reject [#anti-patterns-we-reject]
* **Platform-as-gatekeeper.** A platform that says "no" more than it says "self-serve" is a bottleneck, not a platform.
* **Five ways to do one thing.** Historical pipelines that nobody cleaned up. The platform should consolidate.
* **Tooling that only the platform team can use.** If the API requires insider knowledge, the tool is incomplete.
* **"Platform investment later."** The platform is either invested in or decaying; there is no steady state.
* **Metrics for the platform's sake.** Measuring "tickets closed by platform team" without measuring application-team outcomes misses the point.
## Further reading [#further-reading]
* *Team Topologies*, Skelton & Pais — the canonical framing of platform teams and enabling teams.
* *Platform Engineering on Kubernetes*, Mauricio Salatino — the practical engineering view.
* *The DevOps Handbook*, Kim et al. — the broader cultural context the platform sits inside.
* *Backstage documentation* ([backstage.io](https://backstage.io)) — the archetype of an internal developer portal.
# Progressive Delivery (/docs/principles/delivery/progressive-delivery)
# Progressive Delivery [#progressive-delivery]
## TL;DR [#tldr]
Progressive delivery is how we decouple the act of deploying code from the act of releasing a feature. We ship to production multiple times a day from a single branch, but users see changes only when we open a flag, route a canary, or promote a cohort. The production environment is stable; the user experience is controlled independently.
## Why this matters [#why-this-matters]
The reason most teams avoid shipping often is that shipping carries risk — a bad deploy can break production for every user at once. Progressive delivery breaks the link. A deploy puts the code into production. A release makes the code reach users. With the two decoupled, deploys become small, frequent, and boring; releases become observable, controllable, and reversible. That asymmetry is how modern teams sustain a fast release cadence without a proportional rate of incidents.
## Our principles [#our-principles]
### 1. Trunk-based development with short-lived branches [#1-trunk-based-development-with-short-lived-branches]
Every change lands on `main` as soon as it is ready. Branches measured in days, not weeks. Long-lived branches are how integration bugs accumulate quietly; trunk-based development surfaces them constantly, which makes them cheap to fix.
### 2. Deploy on every merge [#2-deploy-on-every-merge]
Main is always deployable, and we deploy from it continuously. A merged PR reaches production within the deploy window — not hours or days later. This is enforced by automation; a team that relies on a human "release engineer" has already lost the bet on cadence.
### 3. Feature flags separate deploy from release [#3-feature-flags-separate-deploy-from-release]
A new feature is deployed behind a flag, defaulted off. The flag state decides who sees the feature — nobody, internal users, a cohort, everyone. A bad feature is disabled without a redeploy; a controversial feature is rolled to 1% before 100%. Flags are a core primitive, not a third-party dependency.
### 4. Canary before promote [#4-canary-before-promote]
Every release that could affect latency, reliability, or user experience goes through a canary — a small fraction of traffic for a bounded window — before promoting. Canary signals (error rate, p99 latency, user journey success) are automated comparisons, not eyeballs on a dashboard.
### 5. Release is reversible, cheaply [#5-release-is-reversible-cheaply]
Every release has a rollback path that can be executed in a few minutes by any on-call engineer. Database migrations are designed reversibly ([Migrate the Schema](/docs/guides/migrate-schema)); flags can be flipped; canaries can be re-routed. "We can't roll that back" is a red flag on the release itself.
### 6. Flag hygiene is continuous [#6-flag-hygiene-is-continuous]
Flags are an asset and a debt. A long-lived flag that nobody remembers the purpose of is a drag on every future change. Every flag has an owner, a purpose, and an expiry date; stale flags are removed in the normal course of work.
### 7. Observability defines "healthy" [#7-observability-defines-healthy]
A release is healthy when the relevant user-journey SLOs are within tolerance ([Reliability](/docs/principles/quality/reliability)). Not when CPU is low, not when memory is steady — when users' journeys are succeeding at the rate they did before. The canary is evaluated against SLO burn rates.
### 8. The release story is the same for every service [#8-the-release-story-is-the-same-for-every-service]
One rollout model, one flag system, one canary pattern. Different services with different release mechanics multiply cognitive load and reduce the effectiveness of the on-call engineer. Consistency is a force multiplier.
## How we apply this [#how-we-apply-this]
* [DevEx](/docs/principles/delivery/devex) — the inner loop that feeds into continuous delivery.
* [Reliability](/docs/principles/quality/reliability) — the SLO surface that gates canary promotion.
* [Observability](/docs/principles/quality/observability) — the signal layer for release health.
* [Deploy](/docs/guides/deploy) — the canonical deploy workflow.
## Anti-patterns we reject [#anti-patterns-we-reject]
* **Release trains.** Batching up a month of changes and shipping them on Friday is how you get a huge, unreviewable deploy that breaks in ways nobody can localise.
* **Flags without expiry.** A flag that has been "temporary" for a year is permanent — and a permanent decision hidden inside a runtime config.
* **Canary-by-eyeball.** Promoting because the graph "looks fine" is a coin flip. Automate the comparison.
* **"We will test it in staging."** Staging has no users. A canary in production is the only test of production behaviour.
* **Commit-and-hope.** No canary, no flag, deploy to 100%. You will find out in the morning.
## Further reading [#further-reading]
* *Accelerate*, Forsgren, Humble, Kim — the data on trunk-based development and its outcomes.
* *Continuous Delivery*, Humble & Farley — the canonical treatment of the release pipeline.
* James Governor, *Progressive Delivery* (RedMonk, 2018) — the essay that named the practice.
* *The Release It! Second Edition*, Michael Nygard — the stability-pattern view of rollout.
# Code Craft (/docs/principles/foundations/code-craft)
# Code Craft [#code-craft]
## TL;DR [#tldr]
Code is read far more than it is written. Our craft is to write code that the next reader — human or agent — can understand, change, and delete with confidence. Simplicity is the default; abstraction is a cost that must be earned.
## Why this matters [#why-this-matters]
In a codebase that is alive for more than a year, the dominant cost is not writing code — it is understanding the code already there so you can change it. Every abstraction, every layer of indirection, every "flexible" interface is a tax on future readers. Our stance is that taxes must be justified. When we optimise for future flexibility we have not yet needed, we pay a certain cost today against an uncertain benefit later; more often than not, the benefit never arrives and we are left with the cost.
## Our principles [#our-principles]
### 1. Simpler is better than clever [#1-simpler-is-better-than-clever]
A function that a tired engineer can understand in thirty seconds is worth more than a function that demonstrates the author's taste in type systems. Prefer plain data structures over clever abstractions, plain control flow over meta-programming, plain naming over in-joke naming. When "clever" and "clear" conflict, clear wins.
### 2. No speculative abstraction [#2-no-speculative-abstraction]
Do not build a generalisation until you have at least three concrete use cases driving the same shape. Premature abstractions are harder to change than the duplication they replace — because now you have to understand the abstraction, the use cases, and the compatibility between them before you can change any of them. Three similar lines of code is almost always better than a half-designed helper.
### 3. Deletion is a virtue [#3-deletion-is-a-virtue]
The code you delete cannot break, cannot require maintenance, cannot confuse the next reader, and cannot leak a vulnerability. When a feature is removed, the code should go with it — including the tests, the config flags, and the docs. Leaving dead code "just in case" is a bet that is almost always wrong: if we need it back, we will write a clearer version with the benefit of hindsight.
### 4. Names are the interface [#4-names-are-the-interface]
A badly named function is a broken interface even if its behaviour is correct, because every caller has to read the implementation to know what it does. We spend time on names. We rename aggressively when a better name becomes clear. Variables, functions, types, files, directories — all of them communicate, and a mismatch between name and behaviour is a bug.
### 5. Comments explain the "why" [#5-comments-explain-the-why]
Code explains the "what" — the comment is redundant. Names explain the "who" and "where." The only thing left for a comment is the "why": the non-obvious constraint, the invariant that must hold, the bug that drove an odd choice, the reference to an ADR. If a comment would be obvious to anyone who read the surrounding code, it is noise.
### 6. Error handling is design, not decoration [#6-error-handling-is-design-not-decoration]
Errors are a first-class part of the interface, not an afterthought. We decide — explicitly — which errors a function can return, how callers are expected to respond, and where the boundary between recoverable and fatal is. `err != nil` sprinkled through a codebase without a model behind it is a failure of design.
### 7. Trust the boundary; distrust the internal [#7-trust-the-boundary-distrust-the-internal]
We validate at system boundaries — user input, external APIs, message payloads — where the data is untrusted. We do not re-validate between internal callers in the same service; if an internal contract is wrong, the right fix is the contract, not a runtime check in every consumer. Defensive programming inside the trust boundary is a form of noise.
### 8. Dead code is a bug [#8-dead-code-is-a-bug]
Commented-out code, `_unused` variables, orphan functions, legacy configuration — all of it decays the signal-to-noise ratio of the codebase. When we find it, we delete it. `git` preserves anything we lose; the working tree should contain only code that is alive today.
## How we apply this [#how-we-apply-this]
* [Hexagonal Architecture](/docs/principles/system-design/hexagonal-architecture) — the structural discipline that makes simplicity scalable.
* [Testing](/docs/principles/foundations/testing) — tests that exercise behaviour keep refactoring cheap.
* [Go Services](/docs/principles/stack/go-services) — the idioms that keep our Go code readable.
* [Frontend](/docs/principles/stack/frontend) — the conventions that keep our React code readable.
* [Decisions](/docs/decisions) — the ADRs that capture the "why" our comments do not.
## Anti-patterns we reject [#anti-patterns-we-reject]
* **Defensive programming without a threat model.** Guarding every internal call against nil is not robustness — it is distrust of our own type system.
* **"Might need it later" scaffolding.** Config flags for scenarios that do not exist, plugin systems with one plugin, interfaces with one implementation. Delete.
* **Fashion-driven refactors.** Rewriting working code to match a new pattern the team read about this week is debt, not progress.
* **Multi-paragraph docstrings.** If the function needs a multi-paragraph docstring to be understood, the function is wrong. Split it, rename it, or simplify it — then the docstring is not needed.
* **Backwards-compatibility shims for internal APIs.** If it is fully internal, changing it is allowed and expected; compatibility layers are debt we impose on ourselves for no benefit.
## Further reading [#further-reading]
* *A Philosophy of Software Design*, John Ousterhout — deep-module principle, the cost of shallow abstractions.
* *Tidy First?*, Kent Beck — the economics of refactoring as a separable activity.
* *The Pragmatic Programmer*, Hunt & Thomas — the canonical treatment of names, duplication, and orthogonality.
# Documentation (/docs/principles/foundations/documentation)
# Documentation [#documentation]
## TL;DR [#tldr]
Documentation is an active product surface. Wordloop docs are the canonical source for durable engineering knowledge; agent skills are the execution layer that selects, loads, and applies that knowledge safely. We design documentation for humans and AI agents at the same time, organise it with Diátaxis, expose it through `llms.txt`, Markdown exports, and MCP, and enforce freshness with automation wherever a human would drift.
## Why this matters [#why-this-matters]
In 2026, documentation is part of the runtime environment for engineering work. A human reads the site through navigation and search; an agent reads the same knowledge through MCP resources, `llms.txt`, `llms-full.txt`, and per-page Markdown exports. If those surfaces disagree, the system teaches different readers different truths. That is not a documentation problem; it is an engineering defect.
The operating model is simple: **docs hold the knowledge, skills control the agent behaviour**. Durable guidance belongs in the docs site where humans and agents can inspect it. Skill files stay concise and directive: they define when to trigger, what context to load, which tools to use, and which safety checks must run. This keeps prompts lean, reduces duplicated policy, and gives us one canonical place to correct factual drift.
## Our principles [#our-principles]
### 1. Documentation is canonical knowledge [#1-documentation-is-canonical-knowledge]
Architecture principles, service handbooks, workflow guides, glossary terms, ADRs, API references, and generated schemas belong in the docs site. A skill may point to these pages, but it does not become the source of truth for material that humans also need to understand.
### 2. Skills are the agent execution layer [#2-skills-are-the-agent-execution-layer]
Agent skills are a control surface, not a second documentation site. A skill owns triggering, task routing, tool use, safety constraints, verification steps, and context-loading instructions. It should say, for example, "read the App service handbook before changing `wordloop-app` data fetching," not duplicate the handbook in full.
### 3. AI-native documentation is first class [#3-ai-native-documentation-is-first-class]
Every important documentation surface must survive machine consumption. We publish `llms.txt` as the curated index, `llms-full.txt` as the consolidated corpus, `.md` exports for individual pages, and MCP resources for structured retrieval. Agent-readiness is not an afterthought or an SEO trick; it is a quality attribute of the docs system.
### 4. Diátaxis is the structural frame [#4-diátaxis-is-the-structural-frame]
We organise by reader intent, not by our internal org chart. Tutorials teach, how-to guides solve, reference pages support lookup, and explanation pages build understanding. A page that mixes these jobs forces both humans and agents to infer the purpose from context, which makes retrieval weaker and maintenance harder.
### 5. Active docs replace passive docs [#5-active-docs-replace-passive-docs]
A page is not "done" when it is written. Active docs declare ownership, review cadence, freshness status, and source-of-truth boundaries. Pages that age past their review window are visibly flagged and reviewed as part of normal engineering work, not as a cleanup project.
### 6. Automation is the first reviewer [#6-automation-is-the-first-reviewer]
Automated checks enforce the cheap, high-signal rules: required frontmatter, broken internal links, stale review dates, invalid skill-to-doc references, stale generated corpora, and known version mismatches. Humans review accuracy, judgment, and usefulness. Automation handles the facts it can verify without fatigue.
### 7. Prefer generated reference over prose [#7-prefer-generated-reference-over-prose]
API specs, event contracts, database schemas, CLI command tables, and error catalogues have machine-readable sources. We render them from those sources instead of hand-writing reference pages. Hand-written reference material drifts; generated reference material can be rebuilt and checked.
### 8. Decisions are append-only [#8-decisions-are-append-only]
Hard-to-reverse decisions live in ADRs. Accepted ADRs are not edited to match current preference; they are superseded. Each ADR carries enough consequence and debt context for a future reader to understand why the decision existed, what it cost, and when to revisit it.
### 9. Metadata interoperability matters [#9-metadata-interoperability-matters]
Formal documentation standards are useful when they sharpen interoperability discipline. ISO/PAS 25955:2026 is a Publicly Available Specification for Data Documentation Initiative interoperability, not a generic agent-documentation linking standard. The lesson we apply is precise metadata, stable identifiers, and explicit relationships between documentation objects. For agent discovery specifically, Wordloop uses `llms.txt`, Markdown exports, MCP resources, and HTTP `Link` headers.
### 10. Drift is corrected at the source [#10-drift-is-corrected-at-the-source]
When code, docs, skills, specs, and design records disagree, we identify the source of truth before editing. Code and generated contracts win for shipped runtime behaviour. ADRs win for historical decisions. Active design docs win for current delivery intent until the shipped system proves otherwise. Skills win for agent execution behaviour only.
## Freshness model [#freshness-model]
| Surface | Review window | Freshness rule |
| ----------------------- | -----------------------------------: | ----------------------------------------------------------------------------------- |
| Principles | 6 months | Review when operating model or engineering policy changes. |
| Service handbooks | 3 months | Review when code structure, stack versions, commands, or service boundaries change. |
| API and event reference | Every contract change | Generated from OpenAPI and AsyncAPI sources. |
| Runbooks | 3 months | Review after incidents, operational changes, or ownership changes. |
| Active bet and TDD docs | Every material implementation change | Keep design intent aligned with delivery reality. |
| Delivered bet docs | Historical | Freeze except for explicit correction notes. |
| ADRs | Historical | Supersede instead of rewriting accepted records. |
| Agent skills | Every skill or mapped docs change | Validate trigger logic, context routing, and verification steps. |
See [Documentation Freshness](/docs/operations/documentation-freshness) for the operational policy.
## How we apply this [#how-we-apply-this]
* [llms.txt](/llms.txt) and [llms-full.txt](/llms-full.txt) are the machine-readable entry points.
* [Agent-Native Systems](/docs/principles/ai-native/agent-native-systems) defines the broader interface discipline for agent consumers.
* [Keep Docs and Skills in Sync](/docs/guides/keep-docs-and-skills-in-sync) defines the change workflow for canonical docs and skill files.
* [Correct Documentation Drift](/docs/guides/correct-documentation-drift) defines the triage workflow when docs, skills, code, specs, and design records disagree.
* [Decisions](/docs/decisions) records architectural decisions with append-only history.
* [Reference](/docs/reference) contains generated and lookup-oriented material.
## Anti-patterns we reject [#anti-patterns-we-reject]
* **Skill files as shadow docs.** A skill that duplicates durable engineering policy becomes stale faster than the canonical docs page.
* **Docs pages as prompts.** Documentation should explain systems and decisions; skills should instruct agents how to act.
* **Documentation as an afterthought.** Docs ship with the feature or the feature is incomplete.
* **Manual reference tables.** If a table can be generated from code, contracts, or schemas, generate it.
* **Unowned pages.** A page without owner and review cadence has no maintenance path.
* **Stale diagrams.** A diagram that does not match the system is worse than no diagram because it creates false confidence.
* **Screenshots as reference.** Screenshots are acceptable as evidence in incidents, not as canonical UI or architecture documentation.
* **Marketing-flavoured engineering docs.** Assertions need evidence, examples, or source-of-truth links.
* **Overstated standards claims.** Distinguish formal standards from emerging conventions. Name the standard, its scope, and why it applies.
## Further reading [#further-reading]
* [Diátaxis](https://diataxis.fr) — the structural model for tutorials, how-to guides, reference, and explanation.
* [llms.txt](https://llmstxt.org) — the emerging convention behind our AI-readable documentation index.
* [Model Context Protocol](https://modelcontextprotocol.io) — the protocol we use for structured agent access to docs resources and tools.
* [ISO/PAS 25955:2026](https://www.iso.org/standard/92127.html) — DDI interoperability specification; useful as a metadata-interoperability reference, not as an agent-discovery standard.
* *Docs for Developers*, Bhatti et al. — practical guidance for engineering documentation.
* *Living Documentation*, Cyrille Martraire — using code and automation to reduce documentation drift.
# Foundations (/docs/principles/foundations)
# Foundations [#foundations]
Foundations are the ideas that shape our engineering before any specific stack, service, or feature enters the conversation. They are deliberately stack-agnostic — the same principles should hold whether we are writing Go, Python, or TypeScript, whether the target is a backend API or a frontend surface, whether the change is large or small.
Four pages live here:
Read these before reading anything else in the principles hub. They are the filter through which every subsequent decision makes sense.
# Product Engineering (/docs/principles/foundations/product-engineering)
# Product Engineering [#product-engineering]
## TL;DR [#tldr]
We are product engineers before we are coders. Our job is to move user outcomes — not to ship tickets. Work is shaped before it is scheduled, scheduled against a fixed appetite rather than an estimate, and measured by the change it makes in user behaviour rather than the volume of code it produces.
## Why this matters [#why-this-matters]
The dominant failure mode of engineering teams in 2026 is not technical debt — it is building the wrong thing well. Feature factories optimise cycle time and output velocity and end up with a product surface that grows faster than the value it delivers. Product engineering is the discipline of resisting that. It says the unit of work is a user outcome, the unit of planning is an appetite, and the test of a PR is whether a real user can feel it.
## Our principles [#our-principles]
### 1. Outcomes over outputs [#1-outcomes-over-outputs]
An "output" is a feature shipped, a ticket closed, a migration completed. An "outcome" is a change in what a user can do, how quickly they can do it, or how reliably the system supports them. We plan around outcomes and let outputs be whatever shape is required to deliver them. A sprint ending with three closed tickets and no user-visible outcome is a sprint of failed work.
### 2. Shape work before scheduling it [#2-shape-work-before-scheduling-it]
No work enters a sprint without having been *shaped*: the problem stated in user terms, the rough solution sketched, the boundaries drawn to exclude rabbit holes. Shaped work is expensive upfront and cheap downstream. Unshaped work is the single biggest source of mid-sprint drift, scope creep, and late-breaking discovery that the whole approach was wrong.
### 3. Appetite, not estimate [#3-appetite-not-estimate]
We set an *appetite* — "this is worth about two weeks of one engineer's attention" — and then design a solution that fits inside it. If it cannot fit, we either reduce scope or reject the work. This inverts the usual flow: instead of estimating the cost of a fixed solution, we fix the cost and negotiate the solution. It forces the team to ask "what is the cheapest version of this that delivers the outcome?" and it kills the tendency of work to expand to the time available.
### 4. Kill your darlings [#4-kill-your-darlings]
If a feature is not moving an outcome, we remove it. Deletion is the most under-used tool in a product engineer's kit. Every line of code, every page of docs, every dashboard tile, every CLI flag that does not pay for its maintenance cost should be cut. A smaller, sharper product is cheaper to operate and easier for the next engineer to understand.
### 5. Instrument everything you ship [#5-instrument-everything-you-ship]
A feature that is not measured does not exist from a product engineering point of view. We decide the signal *before* we ship — event, dashboard, success criterion — and we check the signal after release. If we cannot measure it, we negotiate the feature until we can.
## How we apply this [#how-we-apply-this]
* [Run Tests](/docs/guides/run-tests) — we test the outcome, not the implementation.
* [Progressive Delivery](/docs/principles/delivery/progressive-delivery) — canaries and flags are the mechanism by which we measure outcomes safely.
* [Observability](/docs/principles/quality/observability) — the signal layer that makes outcome-based engineering possible.
* [Decisions](/docs/decisions) — the record of shaping decisions that cost us real time.
## Anti-patterns we reject [#anti-patterns-we-reject]
* **Velocity-as-KPI.** Story points per sprint measure nothing about user outcomes. Optimising for it corrupts the team.
* **Estimate-driven planning.** Estimates anchor on how long the team thinks work will take, not on how much the work is worth. We use appetites instead.
* **"Build it and they will come."** Launching a feature without a measurement plan is a signal that no one owns the outcome.
* **Technical-debt-for-its-own-sake projects.** Refactors without a user-visible payoff are a smell; wrap them inside an outcome that demands them.
## Further reading [#further-reading]
* *Shape Up*, Ryan Singer — the canonical treatment of shaped work and fixed appetites.
* *Inspired*, Marty Cagan — the product-engineering triad and its implications for how teams are built.
* *Escaping the Build Trap*, Melissa Perri — why feature-factory metrics corrupt outcomes.
# Testing (/docs/principles/foundations/testing)
# Testing [#testing]
## TL;DR [#tldr]
Tests are risk-weighted assertions about production behaviour — not boxes ticked for coverage. We favour high-fidelity service tests over solitary unit tests, emulate dependencies rather than mocking them, and treat observability signals as first-class test assertions.
## Why this matters [#why-this-matters]
The dominant failure mode of a test suite in 2026 is not that it is too small — it is that it passes while production breaks. Mocked dependencies drift from their real counterparts, unit tests assert on implementation rather than behaviour, and green CI gives a false sense of security. *Continuous Risk Assurance* is our name for the discipline that replaces "coverage as a target" with "risk as the thing we actually measure."
## Our principles [#our-principles]
### 1. Favour service tests over solitary unit tests [#1-favour-service-tests-over-solitary-unit-tests]
The "sociable" service test is our foundational unit of validation. We test from the API entry point through to real, ephemeral database containers. We reserve solitary unit tests exclusively for complex isolated algorithms (parsers, validators, pure computation). In a service-oriented codebase, the interesting bugs live at the boundaries — HTTP serialisation, SQL query correctness, event emission — and those are exactly what solitary unit tests mock away.
### 2. Emulate, don't mock [#2-emulate-dont-mock]
If a dependency can run in a container — Postgres, Pub/Sub, object storage — we emulate it via Testcontainers or equivalent. In-memory fakes miss critical data-integrity, serialisation, and networking issues. The startup cost is strictly worth the confidence gain; these are precisely the bugs that escape to production when you mock them out. Emulators are reset per test suite to maintain determinism and prevent test pollution.
### 3. Observability is a test surface [#3-observability-is-a-test-surface]
OpenTelemetry instrumentation is a design-time concern, not an afterthought. System tests assert that traces are unbroken end-to-end: a missing span, a lost TraceID, or a broken parent-child relationship is a test failure, not an instrumentation TODO. The boundary between "test" and "monitor" dissolves — both are asking whether the system is behaving as we claim.
### 4. Name tests by behaviour, not implementation [#4-name-tests-by-behaviour-not-implementation]
Every test follows a BDD-style name: `[Function] should [expected outcome] when [condition]`. This ensures the test log alone tells the story: an on-call engineer reading a failure can form a hypothesis without opening the test code. Names like `TestCreateLoop_Success` are banned — they convey nothing beyond what already appears on the dashboard.
### 5. Risk-based depth, not blanket coverage [#5-risk-based-depth-not-blanket-coverage]
Coverage percentages are meaningless without proof that the assertions catch real faults. We score modules using a risk matrix — Impact × Complexity × Change-frequency — before deciding on test depth. High-risk modules earn live system tests and chaos experiments; low-risk modules need only small tests and static analysis. Equal test depth everywhere is wasted effort.
### 6. Tests are part of the change, not after it [#6-tests-are-part-of-the-change-not-after-it]
A PR without tests is incomplete. A test added in a follow-up PR is a test that will never be written. We write tests alongside the code they verify, and we review the test with the same rigour as the code. If a change resists testing, that is a signal about the design of the code, not the design of the test.
## How we apply this [#how-we-apply-this]
* [Run Tests](/docs/guides/run-tests) — how to invoke the suites locally and in CI.
* [Observability](/docs/principles/quality/observability) — the OTel-first stance that makes traces-as-assertions possible.
* [Reliability](/docs/principles/quality/reliability) — how tests compose with chaos and load experiments.
* [Hexagonal Architecture](/docs/principles/system-design/hexagonal-architecture) — the structural choice that makes tests cheap to write and fast to run.
## Anti-patterns we reject [#anti-patterns-we-reject]
* **Mocking the database.** A test that mocks the database is a test that asserts against your SQL-writing skill, not against database behaviour. Use an ephemeral container.
* **Snapshot tests as a default.** Snapshots are a brittle, noisy substitute for behavioural assertions. They are acceptable only when the thing being snapshotted is a genuinely opaque artefact (a rendered email, a serialised response).
* **Coverage-gated CI.** "95% line coverage required" is a metric that can be gamed without improving real risk reduction. Use it as a read-out, never as a gate.
* **Shared staging environments as the integration test.** Staging has no hermetic guarantees, no reproducibility, and no determinism. It is a deployment target; it is not a test bed.
* **"It's hard to test, so we didn't."** That is a signal the code is badly designed. Fix the code.
## Further reading [#further-reading]
* *Accelerate*, Forsgren, Humble, Kim — the empirical case for continuous delivery and its testing discipline.
* *Working Effectively with Legacy Code*, Michael Feathers — seams, test doubles, and when each is appropriate.
* *Growing Object-Oriented Software, Guided by Tests*, Freeman & Pryce — the canonical treatment of outside-in service testing.
* *xUnit Test Patterns*, Gerard Meszaros — the vocabulary we use for test doubles, fixtures, and strategies.
# Accessibility (/docs/principles/quality/accessibility)
# Accessibility [#accessibility]
## TL;DR [#tldr]
Every user interface we ship meets WCAG 2.2 AA as a baseline. Keyboard, screen reader, and visual assistive technology are first-class targets, not after-launch polish. A feature that does not work for a keyboard user or a screen-reader user is not finished.
## Why this matters [#why-this-matters]
Accessibility is not a niche concern — a significant fraction of our users rely on assistive technology at some point. Beyond the moral case (equal access is a baseline), the design constraints that accessibility imposes — clear hierarchy, visible focus, semantic structure, predictable navigation — tend to produce better software for *every* user. An accessible interface is almost always also a clearer, calmer interface.
## Our principles [#our-principles]
### 1. WCAG 2.2 AA is the floor, not the ceiling [#1-wcag-22-aa-is-the-floor-not-the-ceiling]
We conform to WCAG 2.2 AA for every page, every component, every release. AA is the baseline, and we aim for AAA on critical journeys where the cost is bearable. Falling below AA is a bug; it is not a trade-off we make.
### 2. Keyboard first [#2-keyboard-first]
Every interactive element is reachable and usable with the keyboard. Tab order is logical, focus is always visible, and there are no keyboard traps. The design test is simple: can a power user — or a user who cannot use a mouse — complete every journey without touching the pointer?
### 3. Screen readers see what sighted users see [#3-screen-readers-see-what-sighted-users-see]
Semantic HTML first; ARIA only when HTML is not expressive enough. Headings form an outline, landmarks mark regions, form fields carry labels, images carry alt text, live regions announce updates. A screen reader should produce a narrative that matches what a sighted user sees — not a richer or poorer version of it.
### 4. Colour is never the only signal [#4-colour-is-never-the-only-signal]
A red error, a green success, a blue link — each one is accompanied by a label, an icon, or a structural cue. Colour-blind users exist; colour-only signalling is an exclusion.
### 5. Motion is optional [#5-motion-is-optional]
Animations respect `prefers-reduced-motion`. Large-scale parallax and aggressive transitions are used sparingly; for users with vestibular conditions, unrequested motion is not decoration, it is an accessibility failure.
### 6. Live regions are used sparingly and correctly [#6-live-regions-are-used-sparingly-and-correctly]
Real-time updates — transcription chunks appearing, participants joining — are announced via `aria-live` when they matter to the user's understanding. But over-announcement is as bad as under-announcement; noisy announcements make screen readers ignore the ones that matter.
### 7. Testing is multi-layered [#7-testing-is-multi-layered]
We run automated accessibility checks in CI (axe, Lighthouse accessibility audits), keyboard-walk every new journey manually, and run screen-reader walkthroughs on major features. Automated testing catches the common failures; humans catch the semantic ones.
### 8. Accessibility is reviewed like code [#8-accessibility-is-reviewed-like-code]
Accessibility issues are tracked, owned, and closed the same way any other bug is. The backlog does not accumulate "we will get to the a11y later" — that queue grows forever. Every PR author is expected to include the accessibility check in their definition-of-done.
## How we apply this [#how-we-apply-this]
* [Frontend](/docs/principles/stack/frontend) — the component-library patterns that make accessibility default.
* [App Service Handbook](/docs/learn/services/app) — the wordloop-app architectural view.
* [DevEx](/docs/principles/delivery/devex) — the CI gates that block accessibility regressions.
* [Performance](/docs/principles/quality/performance) — related budgets that compound with accessibility.
## Anti-patterns we reject [#anti-patterns-we-reject]
* **Placeholder text as label.** The placeholder disappears when the field is filled; the label is gone. Users who come back to check the field see nothing. Use a visible label.
* **`
` as button.** A `div` with an `onClick` is invisible to keyboard, screen reader, and user agent. Use `