Search ARuntime.com

Find runtime definitions and implementation guidance

Search page titles, summaries, headings, glossary terms, use cases, and runtime-directory entries.

Enter at least two characters.

Developer

Reference Architecture

Detailed AI runtime reference architecture with request gateway, identity, context providers, model router, inference adapters, tool broker, memory, policy, workflow, evaluation, telemetry, and deployment variants.

Audience: Technical readers Reading time: 6 minutes Status: Architecture Last reviewed:

Key takeaways

  • The request boundary establishes identity, authority, risk, budget, and output contract before model work begins.
  • Control, context, execution, and trust are cross-cutting planes over the hardware-to-product stack.
  • Providers and tools are replaceable adapters behind versioned interfaces.
  • Durable workflow state is separate from model serving and from long-term memory.
  • Policy decisions and trace evidence cross every privileged boundary.
  • The same logical architecture can deploy in one process, distributed services, edge/cloud, or managed-provider combinations.

Runtime boundary

A useful architecture identifies what this layer receives, owns, emits, measures, and refuses to own. That boundary prevents overlapping products from being treated as interchangeable.

Receives

Product request, identity, deployment configuration, component catalogs, policies, model/tool/memory adapters, and operational constraints.

Owns

Logical component boundaries, interface responsibilities, state ownership, enforcement points, and replaceability criteria.

Emits

A component topology, interface contracts, data/control flows, failure domains, telemetry, and deployment mapping.

Does not own

One mandatory vendor implementation or a claim that every deployment needs every component as a service.

Failure modes

Shared mutable state, cross-plane coupling, leaky provider APIs, policy bypass, ambiguous ownership, trace gaps, and broad failure domains.

Evidence and metrics

Interface errors, dependency latency, policy coverage, trace completeness, component availability, recovery, and portability tests.

Request gateway and identity

The gateway authenticates actor/service, resolves tenant, validates the contract, applies rate/deadline/budget, and starts trace context.

Implementation

Keep validation and authority outside model prompts and provider adapters.

Operational implications

The gateway may be in-process for a small deployment but remains a logical boundary.

Measure

Auth/validation, rate-limit, accepted/rejected, contract version, and trace creation.

Runtime coordinator

The coordinator executes the governed state machine and separates transient execution from durable checkpoints.

Implementation

Use explicit step results, cancellation, timeouts, retry classification, and state versions.

Operational implications

Avoid one monolithic function containing provider, tool, policy, and storage details.

Measure

Step duration/status, attempt, checkpoint, cancellation, and task outcome.

Context plane

Context providers expose approved domain data, retrieval, files, memory, and semantic metrics with provenance and policy.

Implementation

Normalize provider results and apply classification/minimization before model assembly.

Operational implications

Context is not equivalent to unrestricted database access.

Measure

Retrieval latency, source/citation, tokens, freshness, and denied content.

Model routing and adapters

The router chooses a compliant candidate; adapters normalize provider/local engine protocols and usage.

Implementation

Use a capability catalog and explicit fallback order. Keep provider details in adapter traces.

Operational implications

Routing changes can affect privacy, cost, quality, and residency.

Measure

Route/fallback, provider latency/error, tokens, cost, quality, and compliance.

Tool broker and execution sandbox

The broker discovers permitted tools, validates proposals, authorizes/approves, executes with idempotency, and validates results.

Implementation

Use narrowly scoped credentials and isolate generated/untrusted code.

Operational implications

The tool subsystem is the primary boundary between probabilistic proposals and deterministic side effects.

Measure

Tool stage timings, policy decisions, approval, result, side effects, and sandbox events.

Memory and systems of record

The memory manager owns runtime memory scopes while domain services own authoritative business records.

Implementation

Use typed read/write commands, provenance, expiry, conflict, and deletion.

Operational implications

Do not make vector stores authoritative systems of record.

Measure

Memory hits/writes/conflicts/deletes and domain-command outcomes.

Policy and trust plane

A policy decision point evaluates versioned policy; enforcement points gate boundary, context, routing, tools, memory, and output.

Implementation

Record decision ID/effect/reason, fail behavior, policy version, and protected input references.

Operational implications

A shared policy library without consistent enforcement can create false confidence.

Measure

Coverage, allow/deny/challenge, latency, unavailable decisions, and bypass attempts.

Telemetry, evaluation, and replay

Tracing correlates components; evaluation assesses output/outcome; replay reconstructs versions, state, and decisions.

Implementation

Use OpenTelemetry-compatible propagation, controlled attributes, evidence references, and workflow links.

Operational implications

Do not store sensitive content merely to make replay convenient.

Measure

Trace completeness, evaluation coverage, evidence availability, and replay success.

Deployment variants

Small systems may deploy components in one process; larger systems separate model serving, workflow, policy, memory, and tools.

Implementation

Preserve logical contracts and trace context across process/network boundaries.

Operational implications

Service decomposition should follow scaling, security, ownership, or failure needs—not diagram aesthetics.

Measure

Network/dependency latency, availability, scaling, failure scope, and operating cost.

Portability tests

Replaceability is proven by contract tests and fixture parity, not by interface names.

Implementation

Maintain test adapters, capability conformance, trace fixtures, failure behavior, and migration/rollback.

Operational implications

Provider-neutral abstractions should not erase capabilities that matter; expose them through versioned extensions.

Measure

Conformance pass, migration effort, output parity, failure parity, and fallback.

Reference tables

Reference components
Component Owns Does not own
Gateway Identity, boundary validation, budgets Model execution
Coordinator/workflow Task state and transitions Provider-specific API
Context providers Approved data retrieval and provenance Final authorization to act
Router/adapters Model selection and protocol normalization Product business state
Tool broker Authorized side-effect execution Model reasoning
Memory manager Runtime memory lifecycle Authoritative domain records
Policy service Versioned decisions Enforcement without PEPs
Telemetry/evaluation Evidence and outcome assessment Permission to store unrestricted data

Decision checklist

  1. Which component owns every state mutation?
  2. Where are authentication, authorization, and policy enforced?
  3. Which adapters can be replaced independently?
  4. What durable state survives process failure?
  5. What data crosses trust boundaries?
  6. How are model and tool capacity isolated?
  7. What one trace crosses the whole task?
  8. Which failure is contained to a request, worker, or system?
  9. What conformance test proves portability?

Common mistakes

  • Deploying a diagram with no interface or state ownership.
  • Putting durable workflow state inside the model server.
  • Letting context providers return unclassified raw records.
  • Giving tool adapters independent retry policies.
  • Using provider-neutral abstractions that hide privacy/cost/quality differences.
  • Centralizing every component into one failure domain at scale.
  • Splitting every logical component into a service prematurely.

Sources and further reading


  1. OpenTelemetry concepts
    (opens in a new tab)

    OpenTelemetry · Official documentation · accessed 2026-06-21 UTC

  2. Open Policy Agent
    (opens in a new tab)

    OPA · Official documentation · accessed 2026-06-21 UTC

  3. Temporal documentation
    (opens in a new tab)

    Temporal · Official documentation · accessed 2026-06-21 UTC

  4. Model Context Protocol specification
    (opens in a new tab)

    MCP · Protocol specification · accessed 2026-06-21 UTC

  5. ONNX Runtime architecture
    (opens in a new tab)

    ONNX Runtime · Official documentation · accessed 2026-06-21 UTC

Last reviewed: 2026-06-21 UTC

Maintenance record

Found an error, outdated capability, or unclear category boundary? Submit a correction with a supporting source.