Key takeaways
- Define the runtime contract before binding the application to one model provider or agent framework.
- Separate product workflow from model execution, context providers, tools, policy, memory, and telemetry.
- Tool calls and memory writes are side effects and require schemas, authority, idempotency, and audit.
- Use stable error categories and deterministic fallbacks rather than provider-specific exceptions.
- Build failure injection, replay fixtures, evaluation gates, and production SLOs into the implementation.
- Long-running tasks require durable checkpoints and resumable state outside the serving process.
Runtime boundary
A useful architecture identifies what this layer receives, owns, emits, measures, and refuses to own. That boundary prevents overlapping products from being treated as interchangeable.
Receives
Product request, identity/tenant, runtime contract, adapters, configuration, policies, tools, context providers, and budgets.
Owns
Application-facing contract, component interfaces, orchestration, validation, error taxonomy, telemetry, test seams, and deployment readiness.
Emits
Structured response, trace, evidence, tool results, policy decisions, memory changes, errors, and durable checkpoints.
Does not own
Unbounded access to business systems or provider-specific behavior hidden behind a generic interface.
Failure modes
Contract drift, leaky adapter, ambiguous side effect, duplicate tool execution, context leakage, provider lock-in, missing trace, and untestable fallback.
Evidence and metrics
Contract-valid requests/responses, route/fallback, tool success, duplicate prevention, policy coverage, trace completeness, task success, latency, and cost.
Version the runtime contract
The contract describes identity, task, risk, permissions, context policy, route constraints, tools, memory, output, trace, deadline, and budget.
Implementation
Validate at the boundary, reject unknown incompatible versions, and use additive evolution where possible.
Operational implications
Do not pass arbitrary provider request bodies through the product boundary.
Measure
Contract version, validation failure, deprecated field use, and client compatibility.
Separate workflow and execution
Product workflow owns user/business state while runtime execution owns governed model, context, tool, memory, and trace behavior.
Implementation
Use ports/adapters so product code depends on stable interfaces rather than provider SDKs.
Operational implications
This keeps provider retries, token streaming, and tool schemas out of core domain records.
Measure
Adapter coverage, provider-specific leakage, change impact, and test isolation.
Context providers
A provider returns bounded content plus provenance, classification, freshness, tenant scope, and retrieval evidence.
Implementation
Use typed queries and approved domain/semantic interfaces; enforce token and data-class budgets.
Operational implications
Raw database or vector-store access spreads security and business logic across prompts.
Measure
Retrieval latency, selected/rejected sources, tokens, freshness, and citation validity.
Model adapters and routing
A model adapter normalizes provider/local engine differences; a route policy chooses capability, privacy, cost, region, latency, and fallback.
Implementation
Keep the policy centralized and return a route summary without hidden reasoning.
Operational implications
A provider SDK should not decide tenant policy or fallback implicitly.
Measure
Route distribution, fallback reason, provider errors, latency, quality, and cost.
Typed tools
Tools have versioned input/output schemas, permission, side-effect class, timeout, retry, idempotency, approval, and audit fields.
Implementation
Validate and authorize before invocation; verify output and authoritative side effect after ambiguous failures.
Operational implications
Tool descriptions help selection but do not establish permission.
Measure
Validation, auth/approval, duration, retry, idempotency, result validity, and side effects.
Explicit memory
Memory writes are structured proposals with scope, provenance, owner, confidence, expiry, and deletion policy.
Implementation
Separate working/session/long-term memory from systems of record; require review for durable or shared writes.
Operational implications
Never store arbitrary model output or hidden chain-of-thought as durable memory.
Measure
Read/write by scope, approval, expiry, conflicts, deletion, and poisoning alerts.
Policy checkpoints
Policy runs at boundary, context access, model route, tool proposal, memory write, output release, and high-impact action.
Implementation
Use a policy decision point plus enforcement points; record decision ID, rule version, inputs by reference, effect, and reason code.
Operational implications
Policy text in a prompt is advisory, not enforcement.
Measure
Decisions, denies/challenges, latency, stale policy, and enforcement coverage.
Streaming and asynchronous work
The runtime emits accepted/progress/token/tool/approval/completed/failed events while durable tasks persist state outside a connection.
Implementation
Define ordered versioned event schemas, cancellation, reconnection, backpressure, and replay cursor.
Operational implications
Do not keep an HTTP request open as the only durable state mechanism.
Measure
Event order/gaps, reconnect, cancellation, time to first event, and task completion.
Errors, retries, and idempotency
Stable categories distinguish validation, auth, policy, capacity, transient dependency, model, tool, timeout, cancellation, and internal failure.
Implementation
Centralize retry policy, use exponential jitter, honor deadlines, cap attempts, and attach idempotency to side effects.
Operational implications
Retry only classified transient work; query authoritative state after ambiguous tool timeout.
Measure
Error class, attempt, retry success, duplicate prevention, deadline, and compensation.
Testing and operations
Use contract tests, adapter fixtures, golden traces, provider doubles, evaluation datasets, failure injection, load tests, and production runbooks.
Implementation
Gate release on schema compatibility, quality, security, SLO, recovery, and rollback evidence.
Operational implications
Unit tests of prompt text do not prove runtime behavior.
Measure
Test coverage, evaluation pass, injected-failure recovery, Goodput, trace completeness, and rollback time.
Reference tables
| Interface | Input | Output | Failure classes |
|---|---|---|---|
| Runtime boundary | Versioned request envelope | Accepted/rejected run | Validation/auth/policy |
| Context provider | Typed query and policy | Content plus provenance | Denied/not found/stale/dependency |
| Model adapter | Normalized model request | Events/result/usage | Capacity/provider/model/timeout |
| Route policy | Requirements and candidates | Selected route/fallback | No compliant route |
| Tool broker | Authorized tool invocation | Validated tool result | Validation/auth/approval/tool |
| Memory manager | Explicit read/write command | Versioned memory result | Conflict/denied/retention |
| Policy service | Decision input refs | Allow/deny/challenge | Unavailable/invalid policy |
| Trace sink | Structured event/span | Export acknowledgement | Dropped/backpressure |
Decision checklist
- What is the smallest stable product-facing runtime contract?
- Which interfaces isolate providers, context, tools, memory, policy, and traces?
- Where is identity and authority verified?
- Which operations are side effects and how are they idempotent?
- How are long-running tasks checkpointed and resumed?
- What stable error categories and fallback rules exist?
- Which evaluations and failure tests block deployment?
- What SLOs, budgets, and runbooks govern production?
Common mistakes
- Passing provider SDK request objects through the application.
- Letting model output invoke tools without deterministic controls.
- Using raw database queries as context contracts.
- Mixing systems of record with conversational memory.
- Retrying all exceptions uniformly.
- Using connection lifetime as workflow durability.
- Logging raw secrets and full sensitive context.
- Changing schemas without compatibility tests.
Sources and further reading
-
JSON Schema specification
(opens in a new tab)
-
OpenAPI Specification
(opens in a new tab)
-
Model Context Protocol specification
(opens in a new tab)
-
OpenTelemetry concepts
(opens in a new tab)
-
Temporal documentation
(opens in a new tab)
-
NIST AI Risk Management Framework
(opens in a new tab)
Last reviewed: 2026-06-21 UTC
