The ARuntime reference architecture connects model execution to product behavior through explicit runtime layers, cross-cutting controls, and versioned contracts. It is a blueprint for reasoning about ownership, not a requirement to deploy a separate service for every box.
Key takeaways
- Separate compiler, inference, serving, distributed, agentic, and product responsibilities even when one platform implements several.
- Use planes to distinguish management decisions, active execution, domain data, and evidence.
- Make identity, authority, budgets, versioning, recovery, and retention explicit across the lifecycle.
- Design failure behavior before optimizing the happy path.
[ar_diagram id=”seven-layer-stack”]
Layered reference model
- Hardware substrateProcessors, accelerators, memory, storage, fabric, operating system, drivers, isolation, and resource accounting.
- Kernel and communications layerOptimized operators, numerical libraries, collectives, transfer primitives, and hardware-specific implementations.
- Compiler and graph executionGraph capture, IR, rewriting, fusion, partitioning, lowering, code generation, scheduling, and memory planning.
- Inference engineModel loading, prefill, decode, cache allocation, batching, structured output, and token-level telemetry.
- Serving and distributed executionAPIs, repositories, admission, request scheduling, routing, autoscaling, rollout, parallelism, remote state, and node-failure handling.
- Agentic application runtimeTask boundaries, actor and tenant context, context assembly, model route, tools, memory, policy, approvals, evaluation, recovery, and evidence.
- Product and workflow layerUser experience, domain rules, systems of record, accountable decisions, and definition of success.
The interfaces between layers should be narrow enough to test and substitute. A serving layer should not infer business authority from model text. An application runtime should not pretend to control numerical kernels through prompt instructions. A product should not treat a successful HTTP response as evidence that a task achieved its intended outcome.
Cross-cutting concerns
Identity and tenancy
Human, service, workload, and delegated identities; tenant isolation; session ownership; credential binding.
Configuration and secrets
Versioned configuration, secret references, late credential resolution, rotation, and non-disclosure to models.
Policy and security
Data classification, least privilege, egress, tool classes, approvals, rate limits, sandboxing, and incident response.
Observability and evaluation
Infrastructure, model, tool, policy, business, and quality signals correlated without requiring raw sensitive prompts.
Cost and capacity
Tokens, GPU time, queue budgets, tool calls, storage, human review, retries, and energy.
Versioning
Model, engine, compiler, prompt, tool, policy, schema, retrieval index, and application release.
Audit and evidence
Request identity, sources, decisions, side effects, artifacts, failures, recovery, and unresolved uncertainty.
Failure recovery
Timeouts, retries, cancellation, checkpoint, compensation, rollback, escalation, and safe termination.
Control, execution, data, and evidence planes
[ar_diagram id=”control-execution-planes”]
Control plane
Defines desired state and governs the fleet: identity, tenancy, configuration, policies, model and tool registries, provisioning, routing rules, rollout, quotas, and lifecycle.
Execution plane
Runs active work: model processes, sandboxes, workers, caches, tool adapters, workflow activities, and isolated session state.
Data plane
Carries prompts, tensors, tokens, cache transfers, tool payloads, domain reads/writes, and artifacts. Data-classification and minimization rules follow the data, not just the request.
Evidence plane
Persists correlated, minimized records of configuration, decisions, actions, outcomes, and failures. It should remain queryable when an execution worker has disappeared.
The evidence plane is separated because ordinary logs are often ephemeral, operationally scoped, or too sensitive for broad review. Evidence can be implemented using existing telemetry, event, and artifact systems; the architectural requirement is durable correlation and controlled access.
Request lifecycle
[ar_diagram id=”agentic-request-lifecycle”]
- Request admission: validate protocol, identity, tenant, quota, deadline, idempotency key, and contract version.
- Authority resolution: determine allowed data, models, tools, side effects, and delegation.
- Risk classification: assign consequence, reversibility, data classification, and review requirements.
- Context assembly: select sources and memory under provenance, freshness, scope, and token budgets.
- Model-route selection: choose deployment, region, precision, capability, cost, and fallback within policy.
- Inference: serving and engine layers admit, schedule, execute, and stream model output.
- Tool planning: treat model output as a proposal; construct a typed call rather than executing free-form text.
- Authorization: validate schema, permission class, credentials, egress, side-effect class, and approval conditions.
- Tool execution: apply timeout, idempotency, concurrency, isolation, and observability requirements.
- Validation: check tool result, output contract, domain constraints, and unexpected state changes.
- Human approval: pause without holding unnecessary compute; present evidence and expiration.
- Response finalization: produce the user result, citations, status, and known uncertainty.
- Memory decision: choose what may be retained, corrected, consolidated, exported, or deleted.
- Evidence persistence: correlate request, model, tool, policy, approval, artifact, and recovery records.
- Evaluation: score task outcome, safety, cost, latency, and evidence completeness.
- Incident or recovery handling: retry, compensate, restore, escalate, terminate, and preserve the failure record.
Contract boundaries
| Contract | Purpose | Required owner |
|---|---|---|
| Runtime request | Identity, task, risk, context, route, tools, memory, budget, output, trace, deadline, retention | Application/runtime boundary |
| Model route | Capability, deployment, region, precision, fallback, provider and budget constraints | Serving/application routing |
| Tool | Typed I/O, permission and side-effect class, authentication reference, retry, idempotency, compensation | Tool owner plus runtime registry |
| Policy decision | Allow, deny, transform, require approval, or escalate with reason and expiry | Policy authority |
| Evidence record | Correlated, minimized proof of decisions, actions, artifacts, failures, and uncertainty | Evidence plane owner |
| Trace | Infrastructure, model, tool, policy, business, and evaluation correlation | Observability owner |
Downloadable schemas are available in the developer reference. They are editorial reference contracts, not an industry standard or a shipped SDK.
Failure model
[ar_diagram id=”failure-recovery-state-machine”]
| Failure | Detection | User-visible behavior | Retry and recovery | Evidence |
|---|---|---|---|---|
| Model timeout or refusal | Deadline/stop reason | Bounded failure or qualified fallback | Retry only within budget and policy; route fallback may change capability | Route, model version, attempts, final status |
| Invalid structured output | Schema/domain validation | No tool side effect | Repair or regenerate with bounded attempts | Validation errors and retry strategy |
| Context retrieval failure | Source timeout, empty set, freshness rule | State missing context or stop | Retry source, use approved fallback, never fabricate | Requested and unavailable sources |
| Tool timeout | Tool deadline and status | Pending, failed, or partial result | Retry only if idempotent or operation status can be queried | Operation key and observed side effects |
| Authorization denial | Policy decision | Explain bounded denial | No automatic privilege expansion; approval only through configured path | Policy version, reason codes, authority |
| Partial side effect | Tool result plus system-of-record check | Show partial state and recovery status | Compensate or reconcile; do not blind retry | Changed resources and compensation |
| Duplicate execution | Idempotency record | Return prior result or conflict | Never repeat irreversible effect | Original and duplicate request identifiers |
| Budget exhaustion | Token, time, cost, or action counters | Pause or return partial qualified result | Approval may extend budget; otherwise stop | Budget use and termination reason |
| Trace persistence failure | Evidence write status | Fail closed for evidence-required work or mark degraded | Buffer within bounded retention; reconcile later | Gap record and recovery status |
Deployment profiles
- Embedded local: application, engine, and policy run on one device; use OS boundaries and local encrypted storage.
- Service-oriented: product calls shared model services; route and data contracts cross network boundaries.
- Disaggregated cluster: prefill, decode, cache, and routing scale independently; network and remote state become first-class.
- Managed agent runtime: control plane provisions isolated task environments and durable state.
- Hybrid: local context, tools, policy, or evidence combine with hosted model execution under explicit egress and fallback rules.
- Browser/edge: capability detection, model packaging, update, offline behavior, and device limits shape the architecture.
Architecture review checklist
- Name every runtime layer and execution unit.
- Identify state owner, lifetime, retention, and deletion for each state class.
- Pin contracts and versions crossing team or trust boundaries.
- Document actor, tenant, workload, and delegated identity.
- Classify tool side effects and prove idempotency or compensation.
- Specify overload, deadline, cancellation, fallback, and degraded modes.
- Keep system-of-record state outside model memory.
- Correlate evidence without storing unnecessary raw sensitive input.
- Test permission denial, partial side effects, crash recovery, duplicate delivery, and trace outage.
- Assign an accountable owner for every failure boundary.
