Key takeaways
- The request boundary establishes identity, authority, risk, budget, and output contract before model work begins.
- Control, context, execution, and trust are cross-cutting planes over the hardware-to-product stack.
- Providers and tools are replaceable adapters behind versioned interfaces.
- Durable workflow state is separate from model serving and from long-term memory.
- Policy decisions and trace evidence cross every privileged boundary.
- The same logical architecture can deploy in one process, distributed services, edge/cloud, or managed-provider combinations.
Runtime boundary
A useful architecture identifies what this layer receives, owns, emits, measures, and refuses to own. That boundary prevents overlapping products from being treated as interchangeable.
Receives
Product request, identity, deployment configuration, component catalogs, policies, model/tool/memory adapters, and operational constraints.
Owns
Logical component boundaries, interface responsibilities, state ownership, enforcement points, and replaceability criteria.
Emits
A component topology, interface contracts, data/control flows, failure domains, telemetry, and deployment mapping.
Does not own
One mandatory vendor implementation or a claim that every deployment needs every component as a service.
Failure modes
Shared mutable state, cross-plane coupling, leaky provider APIs, policy bypass, ambiguous ownership, trace gaps, and broad failure domains.
Evidence and metrics
Interface errors, dependency latency, policy coverage, trace completeness, component availability, recovery, and portability tests.
Request gateway and identity
The gateway authenticates actor/service, resolves tenant, validates the contract, applies rate/deadline/budget, and starts trace context.
Implementation
Keep validation and authority outside model prompts and provider adapters.
Operational implications
The gateway may be in-process for a small deployment but remains a logical boundary.
Measure
Auth/validation, rate-limit, accepted/rejected, contract version, and trace creation.
Runtime coordinator
The coordinator executes the governed state machine and separates transient execution from durable checkpoints.
Implementation
Use explicit step results, cancellation, timeouts, retry classification, and state versions.
Operational implications
Avoid one monolithic function containing provider, tool, policy, and storage details.
Measure
Step duration/status, attempt, checkpoint, cancellation, and task outcome.
Context plane
Context providers expose approved domain data, retrieval, files, memory, and semantic metrics with provenance and policy.
Implementation
Normalize provider results and apply classification/minimization before model assembly.
Operational implications
Context is not equivalent to unrestricted database access.
Measure
Retrieval latency, source/citation, tokens, freshness, and denied content.
Model routing and adapters
The router chooses a compliant candidate; adapters normalize provider/local engine protocols and usage.
Implementation
Use a capability catalog and explicit fallback order. Keep provider details in adapter traces.
Operational implications
Routing changes can affect privacy, cost, quality, and residency.
Measure
Route/fallback, provider latency/error, tokens, cost, quality, and compliance.
Tool broker and execution sandbox
The broker discovers permitted tools, validates proposals, authorizes/approves, executes with idempotency, and validates results.
Implementation
Use narrowly scoped credentials and isolate generated/untrusted code.
Operational implications
The tool subsystem is the primary boundary between probabilistic proposals and deterministic side effects.
Measure
Tool stage timings, policy decisions, approval, result, side effects, and sandbox events.
Memory and systems of record
The memory manager owns runtime memory scopes while domain services own authoritative business records.
Implementation
Use typed read/write commands, provenance, expiry, conflict, and deletion.
Operational implications
Do not make vector stores authoritative systems of record.
Measure
Memory hits/writes/conflicts/deletes and domain-command outcomes.
Policy and trust plane
A policy decision point evaluates versioned policy; enforcement points gate boundary, context, routing, tools, memory, and output.
Implementation
Record decision ID/effect/reason, fail behavior, policy version, and protected input references.
Operational implications
A shared policy library without consistent enforcement can create false confidence.
Measure
Coverage, allow/deny/challenge, latency, unavailable decisions, and bypass attempts.
Telemetry, evaluation, and replay
Tracing correlates components; evaluation assesses output/outcome; replay reconstructs versions, state, and decisions.
Implementation
Use OpenTelemetry-compatible propagation, controlled attributes, evidence references, and workflow links.
Operational implications
Do not store sensitive content merely to make replay convenient.
Measure
Trace completeness, evaluation coverage, evidence availability, and replay success.
Deployment variants
Small systems may deploy components in one process; larger systems separate model serving, workflow, policy, memory, and tools.
Implementation
Preserve logical contracts and trace context across process/network boundaries.
Operational implications
Service decomposition should follow scaling, security, ownership, or failure needs—not diagram aesthetics.
Measure
Network/dependency latency, availability, scaling, failure scope, and operating cost.
Portability tests
Replaceability is proven by contract tests and fixture parity, not by interface names.
Implementation
Maintain test adapters, capability conformance, trace fixtures, failure behavior, and migration/rollback.
Operational implications
Provider-neutral abstractions should not erase capabilities that matter; expose them through versioned extensions.
Measure
Conformance pass, migration effort, output parity, failure parity, and fallback.
Reference tables
| Component | Owns | Does not own |
|---|---|---|
| Gateway | Identity, boundary validation, budgets | Model execution |
| Coordinator/workflow | Task state and transitions | Provider-specific API |
| Context providers | Approved data retrieval and provenance | Final authorization to act |
| Router/adapters | Model selection and protocol normalization | Product business state |
| Tool broker | Authorized side-effect execution | Model reasoning |
| Memory manager | Runtime memory lifecycle | Authoritative domain records |
| Policy service | Versioned decisions | Enforcement without PEPs |
| Telemetry/evaluation | Evidence and outcome assessment | Permission to store unrestricted data |
Decision checklist
- Which component owns every state mutation?
- Where are authentication, authorization, and policy enforced?
- Which adapters can be replaced independently?
- What durable state survives process failure?
- What data crosses trust boundaries?
- How are model and tool capacity isolated?
- What one trace crosses the whole task?
- Which failure is contained to a request, worker, or system?
- What conformance test proves portability?
Common mistakes
- Deploying a diagram with no interface or state ownership.
- Putting durable workflow state inside the model server.
- Letting context providers return unclassified raw records.
- Giving tool adapters independent retry policies.
- Using provider-neutral abstractions that hide privacy/cost/quality differences.
- Centralizing every component into one failure domain at scale.
- Splitting every logical component into a service prematurely.
Sources and further reading
-
OpenTelemetry concepts
(opens in a new tab)
-
Open Policy Agent
(opens in a new tab)
-
Temporal documentation
(opens in a new tab)
-
Model Context Protocol specification
(opens in a new tab)
-
ONNX Runtime architecture
(opens in a new tab)
Last reviewed: 2026-06-21 UTC
