Runtime Directory - aRuntime.com

Key takeaways

Directory categories describe primary responsibility; many products span more than one layer.
Profiles retain official sources and UTC review dates rather than copying vendor marketing.
A model format, protocol, or compiler IR is listed only when its role is explicitly distinguished from a complete runtime.
Directory inclusion is not an endorsement, maturity claim, or performance ranking.
Use the comparison guide and selection guide before treating two entries as direct alternatives.

Runtime boundary

A useful architecture identifies what this layer receives, owns, emits, measures, and refuses to own. That boundary prevents overlapping products from being treated as interchangeable.

Receives

Official project documentation, repositories, standards, release information, category definitions, and reviewed profile metadata.

Owns

Classification, source provenance, review cadence, correction workflow, and neutral profile language.

Emits

Filterable profiles with category, stack layer, maintainer, capabilities, sources, review date, and related aRuntime.com guidance.

Does not own

Vendor certification, universal benchmarks, support guarantees, pricing, or a declaration that one system is best.

Failure modes

Stale project status, category confusion, copied marketing claims, broken official links, unscoped comparisons, and profile drift.

Evidence and metrics

Profiles reviewed, source quality, review age, broken links, correction time, category coverage, and unresolved claims.

How to use the directory

Filter by name, category, layer, maintainer, or capability, then open a profile and its official sources. Treat each profile as a starting point for a requirements-driven proof.

Implementation

Keep profile fields structured and versionable. Link names to canonical profile URLs and preserve source access dates.

Operational implications

Schedule review for time-sensitive features and project status. Route corrections through the public correction workflow.

Measure

Profile review age, source-link health, missing fields, and correction turnaround.

Category boundaries

Compiler runtimes, inference engines, model servers, serving platforms, local runtimes, edge/browser runtimes, and agentic infrastructure solve overlapping but distinct problems.

Implementation

Record primary and secondary layers, delegated backends, model formats, hardware targets, and external components.

Operational implications

Avoid a flat feature checklist across unlike categories. Use the taxonomy before comparison.

Measure

Category coverage, ambiguous profiles, external dependency count, and classification corrections.

Profile evidence

Profiles should prefer official documentation and repositories, retain a UTC review date, and qualify changing features or project status.

Implementation

Store source title, publisher, URL, type, access date, and page sections used.

Operational implications

Do not publish live pricing, performance, maintenance, or support claims without current verification and scope.

Measure

Primary-source share, source age, broken links, and unverified claims.

Runtime profiles

The filterable profile grid below is generated from the same structured PHP data used to seed individual runtime profile posts.

Implementation

Server-render the full directory and use lightweight vanilla JavaScript only for progressive filtering.

Operational implications

The directory remains usable without JavaScript and exposes shareable canonical profile URLs.

Measure

Rendered profile count, filter accessibility, keyboard behavior, and profile-link validity.

Filterable runtime profiles

Filter the reviewed profiles by runtime name, category, layer, maintainer, or capability. All profiles remain visible and usable without JavaScript.

ONNX Runtime

ONNX graph execution with execution providers for heterogeneous targets.

Category: Compiler/graph runtime
Layer: Layers 2-3
Maintainer: ONNX Runtime project
Reviewed: 2026-06-21 UTC

vLLM

LLM inference and serving engine using PagedAttention, continuous batching, prefix caching, and related serving optimizations.

Category: LLM inference engine
Layer: Layer 3
Maintainer: vLLM project
Reviewed: 2026-06-21 UTC

SGLang

Serving framework focused on efficient structured language model programs and prefix reuse through RadixAttention.

Category: LLM inference and structured generation runtime
Layer: Layer 3
Maintainer: SGLang project
Reviewed: 2026-06-21 UTC

NVIDIA Triton Inference Server

Model serving platform with model repositories, HTTP/gRPC APIs, schedulers, dynamic batching, and multiple backends.

Category: Model server
Layer: Layer 4
Maintainer: NVIDIA
Reviewed: 2026-06-21 UTC

NVIDIA TensorRT

SDK for optimizing and running neural network inference on NVIDIA GPUs and related targets.

Category: Inference compiler/runtime
Layer: Layers 2-3
Maintainer: NVIDIA
Reviewed: 2026-06-21 UTC

TensorRT-LLM

NVIDIA LLM inference stack for building TensorRT-based LLM engines and runtimes.

Category: LLM inference engine
Layer: Layer 3
Maintainer: NVIDIA
Reviewed: 2026-06-21 UTC

Apache TVM

ML compiler stack for importing models, transforming IR, scheduling tensor programs, and generating target code.

Category: Compiler/graph runtime
Layer: Layer 2
Maintainer: Apache TVM
Reviewed: 2026-06-21 UTC

StableHLO

Portable high-level operation set used in compiler pipelines; it is not a complete runtime by itself.

Category: Compiler IR / operation set
Layer: Layer 2
Maintainer: OpenXLA
Reviewed: 2026-06-21 UTC

ExecuTorch

PyTorch on-device inference stack for mobile, embedded, and edge targets.

Category: Edge/mobile runtime
Layer: Layer 3
Maintainer: PyTorch
Reviewed: 2026-06-21 UTC

LiteRT

Google on-device runtime successor to TensorFlow Lite for high-performance edge and mobile deployment.

Category: Edge/mobile runtime
Layer: Layer 3
Maintainer: Google AI Edge
Reviewed: 2026-06-21 UTC

WebNN

Web API for constructing and executing neural network graphs using operating-system and hardware ML capabilities.

Category: Browser runtime API
Layer: Layer 3
Maintainer: W3C
Reviewed: 2026-06-21 UTC

WebGPU

Web API that exposes GPU compute paths used by browser AI runtimes and libraries.

Category: Browser compute API
Layer: Layer 1-3
Maintainer: W3C
Reviewed: 2026-06-21 UTC

OpenVINO

Inference toolkit and runtime stack for Intel hardware targets and optimized model execution.

Category: Inference toolkit
Layer: Layers 2-3
Maintainer: Intel
Reviewed: 2026-06-21 UTC

KServe

Kubernetes-native model serving pattern for production inference services and rollout workflows.

Category: Model serving platform
Layer: Layer 4
Maintainer: KServe project
Reviewed: 2026-06-21 UTC

Ray Serve

Pythonic serving layer for scaling model-backed applications and inference services.

Category: Model serving framework
Layer: Layer 4
Maintainer: Ray project
Reviewed: 2026-06-21 UTC

BentoML

Packaging and serving layer for model APIs and deployment workflows.

Category: Model serving framework
Layer: Layer 4
Maintainer: BentoML project
Reviewed: 2026-06-21 UTC

Reference tables

Directory category guide
Category	Primary responsibility	Typical input	Typical output	What it is not
Compiler / graph runtime	Import, optimize, partition, lower, and execute graphs	Framework graph or portable model	Optimized graph, executable plan, predictions	A complete serving platform
LLM inference engine	Efficient prefill, decode, KV cache, batching, and streaming	LLM weights and token requests	Generated token stream	A complete agent runtime
Model server	Expose model execution through APIs and lifecycle controls	Network request and model repository	Prediction/stream plus server telemetry	A Kubernetes control plane
Serving platform	Deploy, scale, route, and roll out model services	Runtime definition and deployment spec	Managed inference service	The model kernel itself
Edge/mobile runtime	Execute prepared models on constrained devices	AOT model program and local input	On-device prediction	A universal cloud service
Browser runtime/API	Execute model graphs or kernels in a web sandbox	Web assets and client input	Client-local prediction	Guaranteed support on every browser
Agentic runtime infrastructure	Coordinate context, tools, memory, policy, durability, and evaluation	Governed task envelope	Traceable task outcome	Only a prompt or tool-calling library

Decision checklist

Which runtime layer and responsibility does the reader need?
Is the candidate a complete product, a backend, a format, a protocol, or a compiler component?
Which official source verifies the feature or status being evaluated?
What model, hardware, deployment, and operating requirements must the candidate satisfy?
Which external components are required to form a production stack?
What proof and review date are required before selection?

Common mistakes

Treating directory order as a ranking.
Comparing a model server directly with a graph compiler without a scope statement.
Copying unqualified vendor performance claims.
Calling a model format or protocol a complete runtime.
Leaving project status and feature claims undated.
Assuming a listed capability is enabled by default for every model and hardware target.

Sources and further reading

ONNX Runtime high-level design
(opens in a new tab)

ONNX Runtime · Official documentation · accessed 2026-06-21 UTC
vLLM documentation
(opens in a new tab)

vLLM · Official documentation · accessed 2026-06-21 UTC
Triton Inference Server architecture
(opens in a new tab)

NVIDIA · Official documentation · accessed 2026-06-21 UTC
KServe ServingRuntime
(opens in a new tab)

KServe · Official documentation · accessed 2026-06-21 UTC
ExecuTorch overview
(opens in a new tab)

PyTorch · Official documentation · accessed 2026-06-21 UTC

Last reviewed: 2026-06-21 UTC

Key takeaways

Runtime boundary

Receives

Owns

Emits

Does not own

Failure modes

Evidence and metrics

How to use the directory

Implementation

Operational implications

Measure

Category boundaries

Implementation

Operational implications

Measure

Profile evidence

Implementation

Operational implications

Measure

Runtime profiles

Implementation

Operational implications

Measure

Filterable runtime profiles

Reference tables

Decision checklist

Common mistakes

Sources and further reading

Maintenance record