Search ARuntime.com

Find runtime definitions and implementation guidance

Search page titles, summaries, headings, glossary terms, use cases, and runtime-directory entries.

Enter at least two characters.

Foundations

Execution Models

Compare eager, graph, JIT, AOT, interpreted, dataflow, and hybrid AI execution models, including dynamic shapes, guards, warmup, caches, fallback, and deployment trade-offs.

Audience: Technical readers Reading time: 6 minutes Status: Foundational Last reviewed:

Key takeaways

  • Eager execution maximizes flexibility and debuggability but exposes host-language and per-operation dispatch costs.
  • Graph execution enables whole-program optimization, partitioning, memory planning, and reproducible artifacts.
  • JIT compilation specializes against observed inputs and hardware; AOT compilation moves work into a controlled build pipeline.
  • Dynamic shapes are implemented through symbolic dimensions, guards, profiles, padding, recompilation, or fallback—not one universal capability.
  • Most production systems are hybrid and must expose graph breaks, cache behavior, and fallback as operational evidence.

Runtime boundary

A useful architecture identifies what this layer receives, owns, emits, measures, and refuses to own. That boundary prevents overlapping products from being treated as interchangeable.

Receives

Framework code or an exported graph, tensor shapes and dtypes, control-flow assumptions, target capabilities, and optimization policy.

Owns

Capture semantics, specialization boundaries, compilation timing, cache keys, graph-break behavior, and fallback policy.

Emits

Immediate operator calls, an optimized graph, compiled modules, guard sets, cache entries, or an executable artifact.

Does not own

Business workflow, model-quality acceptance, request authorization, or deployment rollout.

Failure modes

Graph breaks, recompilation storms, unsupported control flow, stale caches, silent fallback, and shape-dependent correctness drift.

Evidence and metrics

Capture coverage, compile time, warmup latency, cache hit rate, recompilation count, fallback share, peak memory, and steady-state latency.

Eager and imperative execution

Operations execute as the host program reaches them. Ordinary language control flow and side effects are easy to express, but the runtime sees less of the whole program.

Implementation

Keep preprocessing, model calls, and postprocessing explicit. Profile host dispatch and synchronization rather than attributing all time to kernels.

Operational implications

Use eager mode for exploration, debugging, unsupported dynamic behavior, or controlled fallbacks. Avoid letting an accidental eager path become an invisible production default.

Measure

Per-op dispatch time, host CPU, synchronization count, eager fallback rate, and end-to-end latency.

Static graph execution

A captured graph makes operations, data dependencies, constants, types, and often shape constraints available before repeated execution.

Implementation

Validate graph semantics, run rewrites, partition supported subgraphs, plan buffers, and serialize a versioned artifact or execution plan.

Operational implications

Treat graph capture coverage and unsupported nodes as release evidence. A graph that executes partly on an unintended backend can remain numerically correct while missing SLOs.

Measure

Graph node count, optimized node count, partition count, provider placement, load time, and parity results.

Tracing, scripting, and export

Tracing observes operations for representative inputs, while scripting or export mechanisms attempt to preserve more explicit control flow and constraints.

Implementation

Version representative fixtures, record which branches were captured, and include dynamic-shape constraints or guards in the artifact manifest.

Operational implications

Re-run export when framework, model, tokenizer, or shape assumptions change. Test rarely used branches independently.

Measure

Captured branch coverage, export failures, guard failures, and unrepresented branch incidents.

Just-in-time compilation

JIT compiles at first use or when an existing specialization does not match. It can optimize for the actual device and shapes.

Implementation

Build strong cache keys from model hash, compiler/runtime version, device capability, precision, and shape constraints. Bound the number of variants.

Operational implications

Separate cold, warm, and recompilation latency. Production nodes that compile code require security, storage, and cache-eviction controls.

Measure

Cold compile time, JIT cache hit, variants per model, recompilations per request class, and warm Goodput.

Ahead-of-time compilation

AOT creates target-ready artifacts before deployment. It can reduce target footprint and make startup, signing, and reproducibility more predictable.

Implementation

Store toolchain, target, ABI, precision, shape profiles, and artifact hashes. Build a compatibility matrix for runtime, driver, and hardware.

Operational implications

Use AOT where cold starts, edge footprint, regulated build provenance, or offline deployment matter more than late specialization.

Measure

Build duration, artifact size, load/warmup time, compatibility failures, and reproducibility hashes.

Dynamic shapes, guards, and profiles

Dynamic-shape support ranges from bounded symbolic dimensions to multiple static profiles or generalized kernels.

Implementation

Model the production shape distribution, define accepted ranges, and decide whether out-of-profile inputs pad, recompile, route, or fail.

Operational implications

Alert on guard failures and compilation churn. Test worst-case memory, not only common shapes.

Measure

Shape-profile coverage, guard failure rate, padding overhead, compilation churn, and peak memory by profile.

Hybrid execution

Production systems commonly compile a stable model region while retaining dynamic routing, retrieval, tools, or unsupported operators outside it.

Implementation

Define every boundary, data transfer, synchronization point, and fallback. Keep product workflow separate from model-execution optimization.

Operational implications

Prefer explicit hybrid architecture over claims that an entire application is compiled. Trace which path each request used.

Measure

Compiled-region coverage, boundary transfer time, synchronization, fallback, and end-to-end Goodput.

Reference tables

Execution model comparison
Model Compilation point Flexibility Primary strength Primary risk
Eager / imperative None or per operation Highest Debugging and dynamic behavior Dispatch overhead and weak global optimization
Static graph Export or load Moderate Whole-graph rewrites and reproducibility Unsupported dynamic behavior
JIT First use or specialization High behind guards Device- and shape-specific code Warmup and recompilation storms
AOT Build/package time Lowest Predictable startup and compact targets Compatibility matrix and reduced flexibility
Dataflow / streaming Varies Pipeline-oriented Continuous asynchronous inputs Backpressure and state coordination
Hybrid Multiple stages Balanced Compiled hot path with dynamic boundaries Boundary and fallback complexity

Decision checklist

  1. Which inputs, shapes, dtypes, and branches occur in production?
  2. Where may graph breaks occur and what synchronization do they introduce?
  3. Can production nodes compile code or must artifacts be built and signed elsewhere?
  4. How many compiled variants can be cached safely?
  5. What happens when an operator, shape, or device is unsupported?
  6. How are warmup, cache misses, and recompilation represented in SLOs?

Common mistakes

  • Calling a framework “compiled” without identifying capture coverage and fallback behavior.
  • Benchmarking only one warmed shape while production traffic has broad variability.
  • Treating dynamic-shape support as unlimited instead of documenting ranges and guards.
  • Shipping AOT artifacts without runtime, driver, precision, and target metadata.
  • Hiding CPU or eager fallback because outputs remain numerically valid.

Sources and further reading


  1. torch.compile
    (opens in a new tab)

    PyTorch · Official documentation · accessed 2026-06-21 UTC

  2. torch.export
    (opens in a new tab)

    PyTorch · Official documentation · accessed 2026-06-21 UTC

  3. ONNX Runtime high-level design
    (opens in a new tab)

    ONNX Runtime · Official documentation · accessed 2026-06-21 UTC

  4. ExecuTorch overview
    (opens in a new tab)

    PyTorch · Official documentation · accessed 2026-06-21 UTC

  5. StableHLO specification
    (opens in a new tab)

    OpenXLA · Official specification · accessed 2026-06-21 UTC

Last reviewed: 2026-06-21 UTC

Maintenance record

Found an error, outdated capability, or unclear category boundary? Submit a correction with a supporting source.