Search ARuntime.com

Find runtime definitions and implementation guidance

Search page titles, summaries, headings, glossary terms, use cases, and runtime-directory entries.

Enter at least two characters.

ARuntime Reference

Reference Architecture

A concrete layered reference architecture with planes, lifecycle, cross-cutting controls, and failure recovery.

Audience: Technical readers Reading time: 6 minutes Status: Architecture Last reviewed:

The ARuntime reference architecture connects model execution to product behavior through explicit runtime layers, cross-cutting controls, and versioned contracts. It is a blueprint for reasoning about ownership, not a requirement to deploy a separate service for every box.

Key takeaways

  • Separate compiler, inference, serving, distributed, agentic, and product responsibilities even when one platform implements several.
  • Use planes to distinguish management decisions, active execution, domain data, and evidence.
  • Make identity, authority, budgets, versioning, recovery, and retention explicit across the lifecycle.
  • Design failure behavior before optimizing the happy path.

[ar_diagram id=”seven-layer-stack”]

Layered reference model

  1. Hardware substrateProcessors, accelerators, memory, storage, fabric, operating system, drivers, isolation, and resource accounting.
  2. Kernel and communications layerOptimized operators, numerical libraries, collectives, transfer primitives, and hardware-specific implementations.
  3. Compiler and graph executionGraph capture, IR, rewriting, fusion, partitioning, lowering, code generation, scheduling, and memory planning.
  4. Inference engineModel loading, prefill, decode, cache allocation, batching, structured output, and token-level telemetry.
  5. Serving and distributed executionAPIs, repositories, admission, request scheduling, routing, autoscaling, rollout, parallelism, remote state, and node-failure handling.
  6. Agentic application runtimeTask boundaries, actor and tenant context, context assembly, model route, tools, memory, policy, approvals, evaluation, recovery, and evidence.
  7. Product and workflow layerUser experience, domain rules, systems of record, accountable decisions, and definition of success.

The interfaces between layers should be narrow enough to test and substitute. A serving layer should not infer business authority from model text. An application runtime should not pretend to control numerical kernels through prompt instructions. A product should not treat a successful HTTP response as evidence that a task achieved its intended outcome.

Cross-cutting concerns

Identity and tenancy

Human, service, workload, and delegated identities; tenant isolation; session ownership; credential binding.

Configuration and secrets

Versioned configuration, secret references, late credential resolution, rotation, and non-disclosure to models.

Policy and security

Data classification, least privilege, egress, tool classes, approvals, rate limits, sandboxing, and incident response.

Observability and evaluation

Infrastructure, model, tool, policy, business, and quality signals correlated without requiring raw sensitive prompts.

Cost and capacity

Tokens, GPU time, queue budgets, tool calls, storage, human review, retries, and energy.

Versioning

Model, engine, compiler, prompt, tool, policy, schema, retrieval index, and application release.

Audit and evidence

Request identity, sources, decisions, side effects, artifacts, failures, recovery, and unresolved uncertainty.

Failure recovery

Timeouts, retries, cancellation, checkpoint, compensation, rollback, escalation, and safe termination.

Control, execution, data, and evidence planes

[ar_diagram id=”control-execution-planes”]

Control plane

Defines desired state and governs the fleet: identity, tenancy, configuration, policies, model and tool registries, provisioning, routing rules, rollout, quotas, and lifecycle.

Execution plane

Runs active work: model processes, sandboxes, workers, caches, tool adapters, workflow activities, and isolated session state.

Data plane

Carries prompts, tensors, tokens, cache transfers, tool payloads, domain reads/writes, and artifacts. Data-classification and minimization rules follow the data, not just the request.

Evidence plane

Persists correlated, minimized records of configuration, decisions, actions, outcomes, and failures. It should remain queryable when an execution worker has disappeared.

The evidence plane is separated because ordinary logs are often ephemeral, operationally scoped, or too sensitive for broad review. Evidence can be implemented using existing telemetry, event, and artifact systems; the architectural requirement is durable correlation and controlled access.

Request lifecycle

[ar_diagram id=”agentic-request-lifecycle”]

  1. Request admission: validate protocol, identity, tenant, quota, deadline, idempotency key, and contract version.
  2. Authority resolution: determine allowed data, models, tools, side effects, and delegation.
  3. Risk classification: assign consequence, reversibility, data classification, and review requirements.
  4. Context assembly: select sources and memory under provenance, freshness, scope, and token budgets.
  5. Model-route selection: choose deployment, region, precision, capability, cost, and fallback within policy.
  6. Inference: serving and engine layers admit, schedule, execute, and stream model output.
  7. Tool planning: treat model output as a proposal; construct a typed call rather than executing free-form text.
  8. Authorization: validate schema, permission class, credentials, egress, side-effect class, and approval conditions.
  9. Tool execution: apply timeout, idempotency, concurrency, isolation, and observability requirements.
  10. Validation: check tool result, output contract, domain constraints, and unexpected state changes.
  11. Human approval: pause without holding unnecessary compute; present evidence and expiration.
  12. Response finalization: produce the user result, citations, status, and known uncertainty.
  13. Memory decision: choose what may be retained, corrected, consolidated, exported, or deleted.
  14. Evidence persistence: correlate request, model, tool, policy, approval, artifact, and recovery records.
  15. Evaluation: score task outcome, safety, cost, latency, and evidence completeness.
  16. Incident or recovery handling: retry, compensate, restore, escalate, terminate, and preserve the failure record.

Contract boundaries

Reference contracts and their owners
Contract Purpose Required owner
Runtime request Identity, task, risk, context, route, tools, memory, budget, output, trace, deadline, retention Application/runtime boundary
Model route Capability, deployment, region, precision, fallback, provider and budget constraints Serving/application routing
Tool Typed I/O, permission and side-effect class, authentication reference, retry, idempotency, compensation Tool owner plus runtime registry
Policy decision Allow, deny, transform, require approval, or escalate with reason and expiry Policy authority
Evidence record Correlated, minimized proof of decisions, actions, artifacts, failures, and uncertainty Evidence plane owner
Trace Infrastructure, model, tool, policy, business, and evaluation correlation Observability owner

Downloadable schemas are available in the developer reference. They are editorial reference contracts, not an industry standard or a shipped SDK.

Failure model

[ar_diagram id=”failure-recovery-state-machine”]

Required failure semantics
Failure Detection User-visible behavior Retry and recovery Evidence
Model timeout or refusal Deadline/stop reason Bounded failure or qualified fallback Retry only within budget and policy; route fallback may change capability Route, model version, attempts, final status
Invalid structured output Schema/domain validation No tool side effect Repair or regenerate with bounded attempts Validation errors and retry strategy
Context retrieval failure Source timeout, empty set, freshness rule State missing context or stop Retry source, use approved fallback, never fabricate Requested and unavailable sources
Tool timeout Tool deadline and status Pending, failed, or partial result Retry only if idempotent or operation status can be queried Operation key and observed side effects
Authorization denial Policy decision Explain bounded denial No automatic privilege expansion; approval only through configured path Policy version, reason codes, authority
Partial side effect Tool result plus system-of-record check Show partial state and recovery status Compensate or reconcile; do not blind retry Changed resources and compensation
Duplicate execution Idempotency record Return prior result or conflict Never repeat irreversible effect Original and duplicate request identifiers
Budget exhaustion Token, time, cost, or action counters Pause or return partial qualified result Approval may extend budget; otherwise stop Budget use and termination reason
Trace persistence failure Evidence write status Fail closed for evidence-required work or mark degraded Buffer within bounded retention; reconcile later Gap record and recovery status

Deployment profiles

  • Embedded local: application, engine, and policy run on one device; use OS boundaries and local encrypted storage.
  • Service-oriented: product calls shared model services; route and data contracts cross network boundaries.
  • Disaggregated cluster: prefill, decode, cache, and routing scale independently; network and remote state become first-class.
  • Managed agent runtime: control plane provisions isolated task environments and durable state.
  • Hybrid: local context, tools, policy, or evidence combine with hosted model execution under explicit egress and fallback rules.
  • Browser/edge: capability detection, model packaging, update, offline behavior, and device limits shape the architecture.

Architecture review checklist

  • Name every runtime layer and execution unit.
  • Identify state owner, lifetime, retention, and deletion for each state class.
  • Pin contracts and versions crossing team or trust boundaries.
  • Document actor, tenant, workload, and delegated identity.
  • Classify tool side effects and prove idempotency or compensation.
  • Specify overload, deadline, cancellation, fallback, and degraded modes.
  • Keep system-of-record state outside model memory.
  • Correlate evidence without storing unnecessary raw sensitive input.
  • Test permission denial, partial side effects, crash recovery, duplicate delivery, and trace outage.
  • Assign an accountable owner for every failure boundary.

Maintenance record

Found an error, outdated capability, or unclear category boundary? Submit a correction with a supporting source.