Search ARuntime.com

Find runtime definitions and implementation guidance

Search page titles, summaries, headings, glossary terms, use cases, and runtime-directory entries.

Enter at least two characters.

ARuntime Reference

Runtime Selection Guide

Select runtime components by workload, artifact, hardware, deployment boundary, state, authority, SLO, operations, and evidence requirements.

Audience: Technical readers Reading time: 2 minutes Status: Production guidance Last reviewed:

Select an AI runtime architecture by workload, execution unit, deployment boundary, service objective, state, trust, and failure behavior. Do not begin with a universal product ranking.

Key takeaways

  • Name the runtime layer before comparing products.
  • Prefer the smallest architecture that meets consequence and operating needs.
  • Verify capabilities against official documentation and a workload-specific proof.

Start with the job

Execute a portable model

Evaluate graph/portable inference runtimes.

Maximize hardware-specific performance

Evaluate vendor-optimized compilers and engines.

Serve an LLM at concurrency

Evaluate generative engines plus model serving.

Scale one model across hosts

Add distributed inference.

Run in browser, mobile, or edge

Evaluate packaging, delegates, footprint, offline, and update.

Run long-lived tool-using work

Add an agentic application runtime and durable workflow semantics.

Selection questions

  1. What is the primary execution unit?
  2. Which model formats, operations, shapes, precisions, and hardware are required?
  3. What latency, goodput, availability, deadline, energy, and cost objectives apply?
  4. What state exists, who owns it, and how long must it survive?
  5. What data and trust boundaries are crossed?
  6. What external side effects occur?
  7. What failure, retry, compensation, and approval behavior is required?
  8. What evidence must be available?
  9. Which capabilities are verified for the exact version?

Category matrix

Selection by primary need
Need Primary category Common additions
Cross-hardware exported model Portable graph/inference runtime Target backend, application packaging
Lowest latency on one accelerator family Hardware-optimized engine Model server, profiler
High-concurrency generative API Generative engine Serving, gateway, observability
Multi-node large-model inference Distributed runtime Serving, cache tier, scheduler
Device-local inference Edge/mobile/browser runtime Update, fallback, telemetry
Tool-using durable task Agentic application runtime Workflow, policy, sandbox, evidence

Deployment decision

Choose embedded, local service, cluster service, managed cloud, browser, edge, confidential, or hybrid based on latency, privacy, connectivity, operations, hardware, and compliance. Include migration and exit paths; API compatibility alone may not preserve model behavior or operational semantics.

Trust and consequence

As consequences increase, require stronger identity, tool contracts, isolation, approval, idempotency, evidence, and incident response. Low-risk read-only generation can use a simpler path. Architecture should scale controls, not marketing labels.

Evaluation process

  1. Shortlist by required category and verified features.
  2. Build a representative conformance and workload suite.
  3. Measure correctness, quality, latency, goodput, reliability, cost, and operations.
  4. Test failure, upgrade, rollback, security, and portability.
  5. Record caveats and unverified dimensions.

Architecture decision record

Record context, decision, alternatives, layer ownership, versions, sources, workload, assumptions, risks, migration, validation results, and review date. Revisit when workload, model, hardware, or product capability changes.

Maintenance record

Found an error, outdated capability, or unclear category boundary? Submit a correction with a supporting source.