Search ARuntime.com

Find runtime definitions and implementation guidance

Search page titles, summaries, headings, glossary terms, use cases, and runtime-directory entries.

Enter at least two characters.

Architectures

Agentic Runtimes

A production guide to agentic runtimes: identity, context, MCP tools and resources, typed execution, memory scopes, durable workflows, human approval, policy, replay, evaluation, and failure containment.

Audience: Technical readers Reading time: 7 minutes Status: Production guidance Last reviewed:

Key takeaways

  • An agent framework expresses flows; a production runtime must also own authority, state, durability, policy, audit, and recovery.
  • Tool descriptions are not permissions. High-impact actions require deterministic authorization and approval outside prompt text.
  • Working, session, and long-term memory have different lifetimes, trust, retention, and review requirements.
  • Durable execution must survive process and network failure without duplicating irreversible side effects.
  • MCP standardizes tool/resource exchange, but product identity, policy, security, and approval remain runtime responsibilities.
  • Replayable traces and evaluation gates are required to investigate and improve long-running behavior.

Runtime boundary

A useful architecture identifies what this layer receives, owns, emits, measures, and refuses to own. That boundary prevents overlapping products from being treated as interchangeable.

Receives

Actor, tenant, normalized task, purpose/risk, permitted context classes, model constraints, tool allowlist, memory policy, budget, deadline, and approvals.

Owns

Task state, context assembly, routing, tool brokerage, policy checkpoints, approvals, durable workflow, memory lifecycle, evaluation, and traceability.

Emits

Structured result, evidence, tool outcomes, policy decisions, memory changes, human-review state, replay handle, cost/timing, and compensation status.

Does not own

Authority not explicitly delegated, unrestricted database access, or permission inferred solely from model output.

Failure modes

Prompt injection, confused deputy, unauthorized tool use, duplicate side effects, memory poisoning, infinite loops, context leakage, and stale state.

Evidence and metrics

Task success, tool success, approvals, policy denies, retries, steps, context provenance, cost, latency, escalations, memory writes, and replay completeness.

Framework versus runtime

Frameworks compose prompts, nodes, or model-directed graphs. A production runtime establishes identity, authority, durable state, policy, tools, evaluation, and recovery.

Implementation

Map every framework component to a runtime responsibility and identify external services such as workflow, policy, secrets, and observability.

Operational implications

Do not claim production durability or security from a graph API alone.

Measure

Durable completion, recovery, policy coverage, tool errors, and audit completeness.

Request boundary and authority

Every run needs an actor, tenant, purpose, task, risk, data policy, tool permissions, budget, deadline, and approval rules.

Implementation

Use a versioned request envelope. Verify identity and resolve delegated authority before context or tools are exposed.

Operational implications

Downstream services must not trust prompt claims or user-controlled tenant fields.

Measure

Boundary validation failures, permission scope, policy decisions, and rejected requests.

Context assembly and provenance

Context providers return content with source, classification, freshness, tenant scope, and retrieval rationale.

Implementation

Assemble the smallest useful context, label untrusted content as data, and record included/rejected sources.

Operational implications

Avoid raw database access when business definitions require a semantic layer or typed domain API.

Measure

Context bytes/tokens, source count, freshness, classification blocks, citations, and retrieval latency.

MCP tools and resources

MCP defines client/server exchange for tools, resources, prompts, and capabilities.

Implementation

Expose only permitted servers/capabilities; validate schemas, URI/resource scope, lifecycle, and server identity.

Operational implications

Protocol interoperability does not grant business authorization or prove server safety.

Measure

Capabilities negotiated, tool/resource errors, server identity, and protocol version.

Tool brokerage

The broker converts a model proposal into a deterministic, authorized, idempotent execution.

Implementation

Validate schema and target, authorize, apply budgets/rate limits, require approval, execute with timeout, validate output, redact, and audit.

Operational implications

Do not let the model choose credentials or infer write authority.

Measure

Validation/authorization/approval latency, tool success, retries, side-effect class, and denials.

Memory scopes

Working memory serves one run; session memory spans a conversation/case; long-term memory crosses sessions; systems of record remain authoritative.

Implementation

Use explicit schemas, provenance, confidence, tenant, owner, expiry, review, deletion, and conflict policy.

Operational implications

Never write arbitrary model text directly into durable memory.

Measure

Reads/writes by scope, approvals, expiry/deletion, conflicts, poisoning detections, and hit value.

Durable execution

Long tasks persist versioned state after meaningful transitions and resume after process or dependency failure.

Implementation

Use workflow checkpoints, timers, idempotency keys, activity heartbeats, and explicit compensation.

Operational implications

A model retry may be non-deterministic; a tool write may have succeeded despite timeout. Query authoritative state before replay.

Measure

Resume success, duplicate prevention, ambiguous outcomes, compensation, and task age.

Human approval

Privileged or irreversible actions pause with a clear proposal, target, evidence, side effects, risk, and expiry.

Implementation

Bind approval to exact normalized arguments and a single-use or scoped token; authenticate reviewer authority.

Operational implications

A vague “approve agent” button delegates too much.

Measure

Approval rate/time, expiry, changes after review, unauthorized approvals, and post-action verification.

Evaluation and replay

Evaluation can gate model output, tool proposals, or final task success. Replay reconstructs control decisions from versions and protected references.

Implementation

Record evaluator/version/criteria/evidence and store trace/state references without exposing hidden chain-of-thought.

Operational implications

Replay may reproduce workflow decisions without identical stochastic text.

Measure

Evaluation coverage/score, blocked actions, replay completeness, trace gaps, and incident resolution.

Semantic layer integration

Governed metrics and business joins belong behind typed domain interfaces rather than arbitrary model-generated SQL.

Implementation

Expose approved semantic queries or APIs with identity, row/column policy, result limits, and provenance.

Operational implications

This improves consistency, security, observability, and change management.

Measure

Query validity, denied fields, result limits, metric version, and citation/provenance.

Reference tables

Agent stack boundaries
Component Primary responsibility What it does not prove
Agent framework Express model-driven flow Production durability or tool authority
Agentic runtime Governed execution and state Business authority beyond policy
Tool protocol Discovery and typed exchange Tool safety or user approval
Workflow engine Durable steps, timers, retries AI-specific context/evaluation
Observability layer Traces, metrics, logs, evaluations Permission to act
Product application UX and business workflow Low-level execution efficiency
Tool call lifecycle
Stage Runtime action Evidence
Discover Expose permitted capabilities Catalog/server version and scope
Propose Model returns typed call Structured arguments
Validate Schema, target, business rules Validation result
Authorize Policy/delegated authority Decision ID/reason
Approve Human/independent gate Approver and expiry
Execute Timeout, rate, idempotency, sandbox Invocation and side-effect class
Validate result Schema/safety checks Status and redaction
Commit state Workflow/memory update Versioned state change
Trace Link all events Trace/replay handle
Memory scope and control
Scope Lifetime Typical content Primary risk
Working One task/run Plan, intermediate results, counters Context overflow/stale branch
Session/thread Conversation or case Preferences and unresolved state Cross-user leakage
Long-term user Across sessions Approved stable facts/preferences Poisoning/unwanted retention
Organizational Shared durable knowledge Policies and reviewed facts Broad blast radius
System of record Business-defined Authoritative records Irreversible side effects

Decision checklist

  1. What identity and tenant scope enter every run?
  2. Which authority is delegated, for how long, and over which resources?
  3. How is context classified, minimized, and traced?
  4. Which tools are visible and which actions require approval?
  5. How are retries idempotent across model and tool steps?
  6. What memory scopes exist and who may write/delete them?
  7. How can a run resume after process or dependency failure?
  8. Which evaluation or policy gate can halt execution?
  9. What evidence is retained for replay without leaking secrets?

Common mistakes

  • Calling prompt templates and tool calling a production runtime.
  • Treating tool descriptions as authorization.
  • Giving agents raw database access instead of governed domain interfaces.
  • Writing model output directly into long-term memory.
  • Retrying irreversible tools after ambiguous timeouts.
  • Keeping accelerator reservations while tools run.
  • Logging secrets or full sensitive prompts.
  • Assuming MCP supplies product-specific policy.
  • Exposing hidden chain-of-thought instead of evidence and decisions.

Sources and further reading


  1. Model Context Protocol specification
    (opens in a new tab)

    MCP · Protocol specification · accessed 2026-06-21 UTC

  2. MCP tools
    (opens in a new tab)

    MCP · Protocol specification · accessed 2026-06-21 UTC

  3. MCP resources
    (opens in a new tab)

    MCP · Protocol specification · accessed 2026-06-21 UTC

  4. Temporal durable execution
    (opens in a new tab)

    Temporal · Official documentation · accessed 2026-06-21 UTC

  5. LangGraph persistence
    (opens in a new tab)

    LangGraph · Official documentation · accessed 2026-06-21 UTC

  6. OpenTelemetry concepts
    (opens in a new tab)

    OpenTelemetry · Official documentation · accessed 2026-06-21 UTC

  7. NIST AI Risk Management Framework
    (opens in a new tab)

    NIST · Government framework · accessed 2026-06-21 UTC

Last reviewed: 2026-06-21 UTC

Maintenance record

Found an error, outdated capability, or unclear category boundary? Submit a correction with a supporting source.