Search ARuntime.com

Find runtime definitions and implementation guidance

Search page titles, summaries, headings, glossary terms, use cases, and runtime-directory entries.

Enter at least two characters.

Directory

Runtime Directory

Browse a vendor-neutral AI runtime directory covering compiler and graph runtimes, inference engines, model servers, serving platforms, edge, browser, and local runtime systems.

Audience: Technical readers Reading time: 7 minutes Status: Research Last reviewed:

Key takeaways

  • Directory categories describe primary responsibility; many products span more than one layer.
  • Profiles retain official sources and UTC review dates rather than copying vendor marketing.
  • A model format, protocol, or compiler IR is listed only when its role is explicitly distinguished from a complete runtime.
  • Directory inclusion is not an endorsement, maturity claim, or performance ranking.
  • Use the comparison guide and selection guide before treating two entries as direct alternatives.

Runtime boundary

A useful architecture identifies what this layer receives, owns, emits, measures, and refuses to own. That boundary prevents overlapping products from being treated as interchangeable.

Receives

Official project documentation, repositories, standards, release information, category definitions, and reviewed profile metadata.

Owns

Classification, source provenance, review cadence, correction workflow, and neutral profile language.

Emits

Filterable profiles with category, stack layer, maintainer, capabilities, sources, review date, and related aRuntime.com guidance.

Does not own

Vendor certification, universal benchmarks, support guarantees, pricing, or a declaration that one system is best.

Failure modes

Stale project status, category confusion, copied marketing claims, broken official links, unscoped comparisons, and profile drift.

Evidence and metrics

Profiles reviewed, source quality, review age, broken links, correction time, category coverage, and unresolved claims.

How to use the directory

Filter by name, category, layer, maintainer, or capability, then open a profile and its official sources. Treat each profile as a starting point for a requirements-driven proof.

Implementation

Keep profile fields structured and versionable. Link names to canonical profile URLs and preserve source access dates.

Operational implications

Schedule review for time-sensitive features and project status. Route corrections through the public correction workflow.

Measure

Profile review age, source-link health, missing fields, and correction turnaround.

Category boundaries

Compiler runtimes, inference engines, model servers, serving platforms, local runtimes, edge/browser runtimes, and agentic infrastructure solve overlapping but distinct problems.

Implementation

Record primary and secondary layers, delegated backends, model formats, hardware targets, and external components.

Operational implications

Avoid a flat feature checklist across unlike categories. Use the taxonomy before comparison.

Measure

Category coverage, ambiguous profiles, external dependency count, and classification corrections.

Profile evidence

Profiles should prefer official documentation and repositories, retain a UTC review date, and qualify changing features or project status.

Implementation

Store source title, publisher, URL, type, access date, and page sections used.

Operational implications

Do not publish live pricing, performance, maintenance, or support claims without current verification and scope.

Measure

Primary-source share, source age, broken links, and unverified claims.

Runtime profiles

The filterable profile grid below is generated from the same structured PHP data used to seed individual runtime profile posts.

Implementation

Server-render the full directory and use lightweight vanilla JavaScript only for progressive filtering.

Operational implications

The directory remains usable without JavaScript and exposes shareable canonical profile URLs.

Measure

Rendered profile count, filter accessibility, keyboard behavior, and profile-link validity.

Filterable runtime profiles

Filter the reviewed profiles by runtime name, category, layer, maintainer, or capability. All profiles remain visible and usable without JavaScript.


All reviewed profiles are shown.

ONNX Runtime

ONNX graph execution with execution providers for heterogeneous targets.

Category
Compiler/graph runtime
Layer
Layers 2-3
Maintainer
ONNX Runtime project
Reviewed
2026-06-21 UTC

ONNXexecution providersCPUCUDATensorRTOpenVINO

vLLM

LLM inference and serving engine using PagedAttention, continuous batching, prefix caching, and related serving optimizations.

Category
LLM inference engine
Layer
Layer 3
Maintainer
vLLM project
Reviewed
2026-06-21 UTC

LLMPagedAttentioncontinuous batchingprefix caching

SGLang

Serving framework focused on efficient structured language model programs and prefix reuse through RadixAttention.

Category
LLM inference and structured generation runtime
Layer
Layer 3
Maintainer
SGLang project
Reviewed
2026-06-21 UTC

LLMRadixAttentionstructured generationprefix reuse

NVIDIA Triton Inference Server

Model serving platform with model repositories, HTTP/gRPC APIs, schedulers, dynamic batching, and multiple backends.

Category
Model server
Layer
Layer 4
Maintainer
NVIDIA
Reviewed
2026-06-21 UTC

model repositoryHTTPgRPCdynamic batching

NVIDIA TensorRT

SDK for optimizing and running neural network inference on NVIDIA GPUs and related targets.

Category
Inference compiler/runtime
Layer
Layers 2-3
Maintainer
NVIDIA
Reviewed
2026-06-21 UTC

TensorRT engineNVIDIA GPUprecision optimization

TensorRT-LLM

NVIDIA LLM inference stack for building TensorRT-based LLM engines and runtimes.

Category
LLM inference engine
Layer
Layer 3
Maintainer
NVIDIA
Reviewed
2026-06-21 UTC

LLMTensorRTNVIDIA GPU

Apache TVM

ML compiler stack for importing models, transforming IR, scheduling tensor programs, and generating target code.

Category
Compiler/graph runtime
Layer
Layer 2
Maintainer
Apache TVM
Reviewed
2026-06-21 UTC

RelaxTensorIRBYOCautotuning

StableHLO

Portable high-level operation set used in compiler pipelines; it is not a complete runtime by itself.

Category
Compiler IR / operation set
Layer
Layer 2
Maintainer
OpenXLA
Reviewed
2026-06-21 UTC

IROpenXLAHLO

ExecuTorch

PyTorch on-device inference stack for mobile, embedded, and edge targets.

Category
Edge/mobile runtime
Layer
Layer 3
Maintainer
PyTorch
Reviewed
2026-06-21 UTC

on-devicemobileedgedelegates

LiteRT

Google on-device runtime successor to TensorFlow Lite for high-performance edge and mobile deployment.

Category
Edge/mobile runtime
Layer
Layer 3
Maintainer
Google AI Edge
Reviewed
2026-06-21 UTC

on-deviceFlatBufferGPUNPUdelegates

WebNN

Web API for constructing and executing neural network graphs using operating-system and hardware ML capabilities.

Category
Browser runtime API
Layer
Layer 3
Maintainer
W3C
Reviewed
2026-06-21 UTC

browserNPUGPUCPU

WebGPU

Web API that exposes GPU compute paths used by browser AI runtimes and libraries.

Category
Browser compute API
Layer
Layer 1-3
Maintainer
W3C
Reviewed
2026-06-21 UTC

browserGPUcompute shaders

OpenVINO

Inference toolkit and runtime stack for Intel hardware targets and optimized model execution.

Category
Inference toolkit
Layer
Layers 2-3
Maintainer
Intel
Reviewed
2026-06-21 UTC

IntelCPUiGPUNPUOpenVINO

KServe

Kubernetes-native model serving pattern for production inference services and rollout workflows.

Category
Model serving platform
Layer
Layer 4
Maintainer
KServe project
Reviewed
2026-06-21 UTC

Kubernetesservingautoscaling

Ray Serve

Pythonic serving layer for scaling model-backed applications and inference services.

Category
Model serving framework
Layer
Layer 4
Maintainer
Ray project
Reviewed
2026-06-21 UTC

servingPythonautoscaling

BentoML

Packaging and serving layer for model APIs and deployment workflows.

Category
Model serving framework
Layer
Layer 4
Maintainer
BentoML project
Reviewed
2026-06-21 UTC

servingpackagingAPIs

Reference tables

Directory category guide
Category Primary responsibility Typical input Typical output What it is not
Compiler / graph runtime Import, optimize, partition, lower, and execute graphs Framework graph or portable model Optimized graph, executable plan, predictions A complete serving platform
LLM inference engine Efficient prefill, decode, KV cache, batching, and streaming LLM weights and token requests Generated token stream A complete agent runtime
Model server Expose model execution through APIs and lifecycle controls Network request and model repository Prediction/stream plus server telemetry A Kubernetes control plane
Serving platform Deploy, scale, route, and roll out model services Runtime definition and deployment spec Managed inference service The model kernel itself
Edge/mobile runtime Execute prepared models on constrained devices AOT model program and local input On-device prediction A universal cloud service
Browser runtime/API Execute model graphs or kernels in a web sandbox Web assets and client input Client-local prediction Guaranteed support on every browser
Agentic runtime infrastructure Coordinate context, tools, memory, policy, durability, and evaluation Governed task envelope Traceable task outcome Only a prompt or tool-calling library

Decision checklist

  1. Which runtime layer and responsibility does the reader need?
  2. Is the candidate a complete product, a backend, a format, a protocol, or a compiler component?
  3. Which official source verifies the feature or status being evaluated?
  4. What model, hardware, deployment, and operating requirements must the candidate satisfy?
  5. Which external components are required to form a production stack?
  6. What proof and review date are required before selection?

Common mistakes

  • Treating directory order as a ranking.
  • Comparing a model server directly with a graph compiler without a scope statement.
  • Copying unqualified vendor performance claims.
  • Calling a model format or protocol a complete runtime.
  • Leaving project status and feature claims undated.
  • Assuming a listed capability is enabled by default for every model and hardware target.

Sources and further reading


  1. ONNX Runtime high-level design
    (opens in a new tab)

    ONNX Runtime · Official documentation · accessed 2026-06-21 UTC

  2. vLLM documentation
    (opens in a new tab)

    vLLM · Official documentation · accessed 2026-06-21 UTC

  3. Triton Inference Server architecture
    (opens in a new tab)

    NVIDIA · Official documentation · accessed 2026-06-21 UTC

  4. KServe ServingRuntime
    (opens in a new tab)

    KServe · Official documentation · accessed 2026-06-21 UTC

  5. ExecuTorch overview
    (opens in a new tab)

    PyTorch · Official documentation · accessed 2026-06-21 UTC

Last reviewed: 2026-06-21 UTC

Maintenance record

Found an error, outdated capability, or unclear category boundary? Submit a correction with a supporting source.