Execution Models - aRuntime.com

Key takeaways

Eager execution maximizes flexibility and debuggability but exposes host-language and per-operation dispatch costs.
Graph execution enables whole-program optimization, partitioning, memory planning, and reproducible artifacts.
JIT compilation specializes against observed inputs and hardware; AOT compilation moves work into a controlled build pipeline.
Dynamic shapes are implemented through symbolic dimensions, guards, profiles, padding, recompilation, or fallback—not one universal capability.
Most production systems are hybrid and must expose graph breaks, cache behavior, and fallback as operational evidence.

Runtime boundary

A useful architecture identifies what this layer receives, owns, emits, measures, and refuses to own. That boundary prevents overlapping products from being treated as interchangeable.

Receives

Framework code or an exported graph, tensor shapes and dtypes, control-flow assumptions, target capabilities, and optimization policy.

Owns

Capture semantics, specialization boundaries, compilation timing, cache keys, graph-break behavior, and fallback policy.

Emits

Immediate operator calls, an optimized graph, compiled modules, guard sets, cache entries, or an executable artifact.

Does not own

Business workflow, model-quality acceptance, request authorization, or deployment rollout.

Failure modes

Graph breaks, recompilation storms, unsupported control flow, stale caches, silent fallback, and shape-dependent correctness drift.

Evidence and metrics

Capture coverage, compile time, warmup latency, cache hit rate, recompilation count, fallback share, peak memory, and steady-state latency.

Eager and imperative execution

Operations execute as the host program reaches them. Ordinary language control flow and side effects are easy to express, but the runtime sees less of the whole program.

Implementation

Keep preprocessing, model calls, and postprocessing explicit. Profile host dispatch and synchronization rather than attributing all time to kernels.

Operational implications

Use eager mode for exploration, debugging, unsupported dynamic behavior, or controlled fallbacks. Avoid letting an accidental eager path become an invisible production default.

Measure

Per-op dispatch time, host CPU, synchronization count, eager fallback rate, and end-to-end latency.

Static graph execution

A captured graph makes operations, data dependencies, constants, types, and often shape constraints available before repeated execution.

Implementation

Validate graph semantics, run rewrites, partition supported subgraphs, plan buffers, and serialize a versioned artifact or execution plan.

Operational implications

Treat graph capture coverage and unsupported nodes as release evidence. A graph that executes partly on an unintended backend can remain numerically correct while missing SLOs.

Measure

Graph node count, optimized node count, partition count, provider placement, load time, and parity results.

Tracing, scripting, and export

Tracing observes operations for representative inputs, while scripting or export mechanisms attempt to preserve more explicit control flow and constraints.

Implementation

Version representative fixtures, record which branches were captured, and include dynamic-shape constraints or guards in the artifact manifest.

Operational implications

Re-run export when framework, model, tokenizer, or shape assumptions change. Test rarely used branches independently.

Measure

Captured branch coverage, export failures, guard failures, and unrepresented branch incidents.

Just-in-time compilation

JIT compiles at first use or when an existing specialization does not match. It can optimize for the actual device and shapes.

Implementation

Build strong cache keys from model hash, compiler/runtime version, device capability, precision, and shape constraints. Bound the number of variants.

Operational implications

Separate cold, warm, and recompilation latency. Production nodes that compile code require security, storage, and cache-eviction controls.

Measure

Cold compile time, JIT cache hit, variants per model, recompilations per request class, and warm Goodput.

Ahead-of-time compilation

AOT creates target-ready artifacts before deployment. It can reduce target footprint and make startup, signing, and reproducibility more predictable.

Implementation

Store toolchain, target, ABI, precision, shape profiles, and artifact hashes. Build a compatibility matrix for runtime, driver, and hardware.

Operational implications

Use AOT where cold starts, edge footprint, regulated build provenance, or offline deployment matter more than late specialization.

Measure

Build duration, artifact size, load/warmup time, compatibility failures, and reproducibility hashes.

Dynamic shapes, guards, and profiles

Dynamic-shape support ranges from bounded symbolic dimensions to multiple static profiles or generalized kernels.

Implementation

Model the production shape distribution, define accepted ranges, and decide whether out-of-profile inputs pad, recompile, route, or fail.

Operational implications

Alert on guard failures and compilation churn. Test worst-case memory, not only common shapes.

Measure

Shape-profile coverage, guard failure rate, padding overhead, compilation churn, and peak memory by profile.

Hybrid execution

Production systems commonly compile a stable model region while retaining dynamic routing, retrieval, tools, or unsupported operators outside it.

Implementation

Define every boundary, data transfer, synchronization point, and fallback. Keep product workflow separate from model-execution optimization.

Operational implications

Prefer explicit hybrid architecture over claims that an entire application is compiled. Trace which path each request used.

Measure

Compiled-region coverage, boundary transfer time, synchronization, fallback, and end-to-end Goodput.

Reference tables

Execution model comparison
Model	Compilation point	Flexibility	Primary strength	Primary risk
Eager / imperative	None or per operation	Highest	Debugging and dynamic behavior	Dispatch overhead and weak global optimization
Static graph	Export or load	Moderate	Whole-graph rewrites and reproducibility	Unsupported dynamic behavior
JIT	First use or specialization	High behind guards	Device- and shape-specific code	Warmup and recompilation storms
AOT	Build/package time	Lowest	Predictable startup and compact targets	Compatibility matrix and reduced flexibility
Dataflow / streaming	Varies	Pipeline-oriented	Continuous asynchronous inputs	Backpressure and state coordination
Hybrid	Multiple stages	Balanced	Compiled hot path with dynamic boundaries	Boundary and fallback complexity

Decision checklist

Which inputs, shapes, dtypes, and branches occur in production?
Where may graph breaks occur and what synchronization do they introduce?
Can production nodes compile code or must artifacts be built and signed elsewhere?
How many compiled variants can be cached safely?
What happens when an operator, shape, or device is unsupported?
How are warmup, cache misses, and recompilation represented in SLOs?

Common mistakes

Calling a framework “compiled” without identifying capture coverage and fallback behavior.
Benchmarking only one warmed shape while production traffic has broad variability.
Treating dynamic-shape support as unlimited instead of documenting ranges and guards.
Shipping AOT artifacts without runtime, driver, precision, and target metadata.
Hiding CPU or eager fallback because outputs remain numerically valid.

Sources and further reading

torch.compile
(opens in a new tab)

PyTorch · Official documentation · accessed 2026-06-21 UTC
torch.export
(opens in a new tab)

PyTorch · Official documentation · accessed 2026-06-21 UTC
ONNX Runtime high-level design
(opens in a new tab)

ONNX Runtime · Official documentation · accessed 2026-06-21 UTC
ExecuTorch overview
(opens in a new tab)

PyTorch · Official documentation · accessed 2026-06-21 UTC
StableHLO specification
(opens in a new tab)

OpenXLA · Official specification · accessed 2026-06-21 UTC

Last reviewed: 2026-06-21 UTC

Key takeaways

Runtime boundary

Receives

Owns

Emits

Does not own

Failure modes

Evidence and metrics

Eager and imperative execution

Implementation

Operational implications

Measure

Static graph execution

Implementation

Operational implications

Measure

Tracing, scripting, and export

Implementation

Operational implications

Measure

Just-in-time compilation

Implementation

Operational implications

Measure

Ahead-of-time compilation

Implementation

Operational implications

Measure

Dynamic shapes, guards, and profiles

Implementation

Operational implications

Measure

Hybrid execution

Implementation

Operational implications

Measure

Reference tables

Decision checklist

Common mistakes

Sources and further reading

Maintenance record