Wordloop Platform
Platform ServicesML (Python)

ML Implementation Guide

Concrete Python patterns for trace-first logging, clean architecture, and error handling.

ML Implementation Guide (Python)

This guide translates WordLoop's overarching Engineering Principles into explicit, copy-pasteable Python code for the wordloop-ml service.

1. Concrete Trace-First Development

We rely on OpenTelemetry for all observability. Because Python requires explicit context propagation in background tasks, we must properly extract and inject W3C Baggage.

Initializing a Span

A new operation must start a span. In FastAPI, this is often handled automatically, but for background pipeline tasks, you must explicitly declare it.

from opentelemetry import trace
 
tracer = trace.get_tracer(__name__)
 
def process_audio(meeting_id: str) -> None:
    # 1. Start the span
    with tracer.start_as_current_span("ProcessAudio") as span:
        # 2. Enrich the span with concrete attributes
        span.set_attribute("meeting.id", meeting_id)
        
        # ... processing logic

Passing Context

When publishing to Pub/Sub or calling another service, you must explicitly inject the current trace context into the HTTP headers or message attributes.

2. Concrete Error Handling

We use explicit Python Exception subclasses defined in our pure Domain to prevent external SDK errors from polluting our business logic.

Defining Sentinels

Define business rule errors in src/wordloop/core/exceptions.py. They should inherit from a base WordLoopError:

class WordLoopError(Exception):
    """Base exception for all Wordloop errors."""
    def __init__(self, message: str, cause: Exception | None = None):
        super().__init__(message)
        self.cause = cause
 
class ModelInferenceError(WordLoopError):
    """Raised when an AI model fails to return a valid response."""

Wrapping & Mapping Errors in Providers

An Adapter (Provider) interacting with the AssemblyAI SDK or OpenAI SDK must catch the library-specific error and raise a pure Domain exception.

import assemblyai as aai
from wordloop.core.exceptions import ModelInferenceError
 
class AssemblyAIProvider:
    def transcribe(self, url: str) -> str:
        try:
            transcript = aai.Transcriber().transcribe(url)
            if transcript.error:
                 raise ModelInferenceError(f"AssemblyAI failed: {transcript.error}")
            return transcript.text
        except aai.errors.AssemblyAIError as e:
            # Map infrastructure error to Domain concept
            raise ModelInferenceError("AssemblyAI SDK crashed", cause=e)

3. Concrete Dependency Injection

We use Python's Protocol from the typing module to define Interfaces (Ports).

The Port (Defined by the Core)

The protocol belongs in src/wordloop/core/gateways/ and strictly uses Domain language, completely ignorant of AssemblyAI or Postgres.

from typing import Protocol
from wordloop.core.domain.models import TranscriptionResult
 
class TranscriptionProvider(Protocol):
    def transcribe(self, audio_uri: str) -> TranscriptionResult:
        ...

The Wiring (Entrypoint)

Constructor injection is used. FastAPI's Depends system automatically resolves these during the request lifecycle.

from fastapi import Depends
from wordloop.core.services import AudioService
from wordloop.providers.assembly import AssemblyAIProvider
 
# Dependency Injection function
def get_audio_service(
    provider: AssemblyAIProvider = Depends()
) -> AudioService:
    # The AudioService requires a TranscriptionProvider protocol!
    return AudioService(provider=provider)

4. Idiomatic Python & Standards

We do not aim to rewrite foundational guidance on writing excellent Python code. Instead, we adhere to established industry baselines and mapping them to our internal engineering principles.

We expect all Wordloop ML engineers to understand:

Below is concrete guidance on how overarching Python idioms manifest as system-enforced architectural invariants.

Strict Typing over Duck Typing (Clean Architecture)

The Python Idiom: Using strong static typing (mypy) instead of traditional dynamic duck-typing.
The Principle Connection: Clean Architecture (Ports and Adapters) relies heavily on explicit Contracts/Ports across boundaries. We enforce the use of typing.Protocol and strict type hints on all domain models to ensure dependency inversion is compile-time verifiable.

from typing import Protocol
from dataclasses import dataclass
 
@dataclass(frozen=True)
class TranscriptionRequest:
    audio_url: str
    target_language: str
 
# 1. We use a strictly typed Protocol instead of relying on duck-typed methods.
class TranscriptionProvider(Protocol):
    def transcribe(self, request: TranscriptionRequest) -> str:
        ...

Context Managers for Resource Leaks (Resilience)

The Python Idiom: Using with and @contextmanager for resource lifecycle management.
The Principle Connection: We practice robust Error Handling & Resilience. If an ML SDK or file stream throws an exception, failing to clean up memory or connections results in persistent leaks and eventual cluster death.

Always utilize Context Managers when handling stateful resources. This guarantees the __exit__ cleanup executes even if your domain logic crashes.

import tempfile
import os
from contextlib import contextmanager
 
@contextmanager
def temporary_audio_file(audio_bytes: bytes):
    """Context manager to ensure ephemeral files are always deleted after processing."""
    temp_path = tempfile.mktemp(suffix=".wav")
    try:
        with open(temp_path, "wb") as f:
            f.write(audio_bytes)
        yield temp_path
    finally:
        # This cleanup is guaranteed to run, preventing disk exhaustion.
        if os.path.exists(temp_path):
            os.remove(temp_path)
 
def process():
    # The file safely deletes itself the moment the block exits or throws.
    with temporary_audio_file(b"...") as path:
        result = run_inference(path)