Key takeaways
- Directory categories describe primary responsibility; many products span more than one layer.
- Profiles retain official sources and UTC review dates rather than copying vendor marketing.
- A model format, protocol, or compiler IR is listed only when its role is explicitly distinguished from a complete runtime.
- Directory inclusion is not an endorsement, maturity claim, or performance ranking.
- Use the comparison guide and selection guide before treating two entries as direct alternatives.
Runtime boundary
A useful architecture identifies what this layer receives, owns, emits, measures, and refuses to own. That boundary prevents overlapping products from being treated as interchangeable.
Receives
Official project documentation, repositories, standards, release information, category definitions, and reviewed profile metadata.
Owns
Classification, source provenance, review cadence, correction workflow, and neutral profile language.
Emits
Filterable profiles with category, stack layer, maintainer, capabilities, sources, review date, and related aRuntime.com guidance.
Does not own
Vendor certification, universal benchmarks, support guarantees, pricing, or a declaration that one system is best.
Failure modes
Stale project status, category confusion, copied marketing claims, broken official links, unscoped comparisons, and profile drift.
Evidence and metrics
Profiles reviewed, source quality, review age, broken links, correction time, category coverage, and unresolved claims.
How to use the directory
Filter by name, category, layer, maintainer, or capability, then open a profile and its official sources. Treat each profile as a starting point for a requirements-driven proof.
Implementation
Keep profile fields structured and versionable. Link names to canonical profile URLs and preserve source access dates.
Operational implications
Schedule review for time-sensitive features and project status. Route corrections through the public correction workflow.
Measure
Profile review age, source-link health, missing fields, and correction turnaround.
Category boundaries
Compiler runtimes, inference engines, model servers, serving platforms, local runtimes, edge/browser runtimes, and agentic infrastructure solve overlapping but distinct problems.
Implementation
Record primary and secondary layers, delegated backends, model formats, hardware targets, and external components.
Operational implications
Avoid a flat feature checklist across unlike categories. Use the taxonomy before comparison.
Measure
Category coverage, ambiguous profiles, external dependency count, and classification corrections.
Profile evidence
Profiles should prefer official documentation and repositories, retain a UTC review date, and qualify changing features or project status.
Implementation
Store source title, publisher, URL, type, access date, and page sections used.
Operational implications
Do not publish live pricing, performance, maintenance, or support claims without current verification and scope.
Measure
Primary-source share, source age, broken links, and unverified claims.
Runtime profiles
The filterable profile grid below is generated from the same structured PHP data used to seed individual runtime profile posts.
Implementation
Server-render the full directory and use lightweight vanilla JavaScript only for progressive filtering.
Operational implications
The directory remains usable without JavaScript and exposes shareable canonical profile URLs.
Measure
Rendered profile count, filter accessibility, keyboard behavior, and profile-link validity.
Filterable runtime profiles
Filter the reviewed profiles by runtime name, category, layer, maintainer, or capability. All profiles remain visible and usable without JavaScript.
All reviewed profiles are shown.
ONNX Runtime
ONNX graph execution with execution providers for heterogeneous targets.
- Category
- Compiler/graph runtime
- Layer
- Layers 2-3
- Maintainer
- ONNX Runtime project
- Reviewed
- 2026-06-21 UTC
vLLM
LLM inference and serving engine using PagedAttention, continuous batching, prefix caching, and related serving optimizations.
- Category
- LLM inference engine
- Layer
- Layer 3
- Maintainer
- vLLM project
- Reviewed
- 2026-06-21 UTC
SGLang
Serving framework focused on efficient structured language model programs and prefix reuse through RadixAttention.
- Category
- LLM inference and structured generation runtime
- Layer
- Layer 3
- Maintainer
- SGLang project
- Reviewed
- 2026-06-21 UTC
NVIDIA Triton Inference Server
Model serving platform with model repositories, HTTP/gRPC APIs, schedulers, dynamic batching, and multiple backends.
- Category
- Model server
- Layer
- Layer 4
- Maintainer
- NVIDIA
- Reviewed
- 2026-06-21 UTC
NVIDIA TensorRT
SDK for optimizing and running neural network inference on NVIDIA GPUs and related targets.
- Category
- Inference compiler/runtime
- Layer
- Layers 2-3
- Maintainer
- NVIDIA
- Reviewed
- 2026-06-21 UTC
TensorRT-LLM
NVIDIA LLM inference stack for building TensorRT-based LLM engines and runtimes.
- Category
- LLM inference engine
- Layer
- Layer 3
- Maintainer
- NVIDIA
- Reviewed
- 2026-06-21 UTC
Apache TVM
ML compiler stack for importing models, transforming IR, scheduling tensor programs, and generating target code.
- Category
- Compiler/graph runtime
- Layer
- Layer 2
- Maintainer
- Apache TVM
- Reviewed
- 2026-06-21 UTC
StableHLO
Portable high-level operation set used in compiler pipelines; it is not a complete runtime by itself.
- Category
- Compiler IR / operation set
- Layer
- Layer 2
- Maintainer
- OpenXLA
- Reviewed
- 2026-06-21 UTC
ExecuTorch
PyTorch on-device inference stack for mobile, embedded, and edge targets.
- Category
- Edge/mobile runtime
- Layer
- Layer 3
- Maintainer
- PyTorch
- Reviewed
- 2026-06-21 UTC
LiteRT
Google on-device runtime successor to TensorFlow Lite for high-performance edge and mobile deployment.
- Category
- Edge/mobile runtime
- Layer
- Layer 3
- Maintainer
- Google AI Edge
- Reviewed
- 2026-06-21 UTC
WebNN
Web API for constructing and executing neural network graphs using operating-system and hardware ML capabilities.
- Category
- Browser runtime API
- Layer
- Layer 3
- Maintainer
- W3C
- Reviewed
- 2026-06-21 UTC
WebGPU
Web API that exposes GPU compute paths used by browser AI runtimes and libraries.
- Category
- Browser compute API
- Layer
- Layer 1-3
- Maintainer
- W3C
- Reviewed
- 2026-06-21 UTC
OpenVINO
Inference toolkit and runtime stack for Intel hardware targets and optimized model execution.
- Category
- Inference toolkit
- Layer
- Layers 2-3
- Maintainer
- Intel
- Reviewed
- 2026-06-21 UTC
KServe
Kubernetes-native model serving pattern for production inference services and rollout workflows.
- Category
- Model serving platform
- Layer
- Layer 4
- Maintainer
- KServe project
- Reviewed
- 2026-06-21 UTC
Ray Serve
Pythonic serving layer for scaling model-backed applications and inference services.
- Category
- Model serving framework
- Layer
- Layer 4
- Maintainer
- Ray project
- Reviewed
- 2026-06-21 UTC
BentoML
Packaging and serving layer for model APIs and deployment workflows.
- Category
- Model serving framework
- Layer
- Layer 4
- Maintainer
- BentoML project
- Reviewed
- 2026-06-21 UTC
Reference tables
| Category | Primary responsibility | Typical input | Typical output | What it is not |
|---|---|---|---|---|
| Compiler / graph runtime | Import, optimize, partition, lower, and execute graphs | Framework graph or portable model | Optimized graph, executable plan, predictions | A complete serving platform |
| LLM inference engine | Efficient prefill, decode, KV cache, batching, and streaming | LLM weights and token requests | Generated token stream | A complete agent runtime |
| Model server | Expose model execution through APIs and lifecycle controls | Network request and model repository | Prediction/stream plus server telemetry | A Kubernetes control plane |
| Serving platform | Deploy, scale, route, and roll out model services | Runtime definition and deployment spec | Managed inference service | The model kernel itself |
| Edge/mobile runtime | Execute prepared models on constrained devices | AOT model program and local input | On-device prediction | A universal cloud service |
| Browser runtime/API | Execute model graphs or kernels in a web sandbox | Web assets and client input | Client-local prediction | Guaranteed support on every browser |
| Agentic runtime infrastructure | Coordinate context, tools, memory, policy, durability, and evaluation | Governed task envelope | Traceable task outcome | Only a prompt or tool-calling library |
Decision checklist
- Which runtime layer and responsibility does the reader need?
- Is the candidate a complete product, a backend, a format, a protocol, or a compiler component?
- Which official source verifies the feature or status being evaluated?
- What model, hardware, deployment, and operating requirements must the candidate satisfy?
- Which external components are required to form a production stack?
- What proof and review date are required before selection?
Common mistakes
- Treating directory order as a ranking.
- Comparing a model server directly with a graph compiler without a scope statement.
- Copying unqualified vendor performance claims.
- Calling a model format or protocol a complete runtime.
- Leaving project status and feature claims undated.
- Assuming a listed capability is enabled by default for every model and hardware target.
Sources and further reading
-
ONNX Runtime high-level design
(opens in a new tab)
-
vLLM documentation
(opens in a new tab)
-
Triton Inference Server architecture
(opens in a new tab)
-
KServe ServingRuntime
(opens in a new tab)
-
ExecuTorch overview
(opens in a new tab)
Last reviewed: 2026-06-21 UTC
