Embodied and real-time AI runtimes connect model execution to sensors, world state, and actuators under latency, power, and safety constraints. Average throughput is insufficient when a missed deadline can destabilize physical control.
Key takeaways
- Worst-case execution time, jitter, stale data, and actuation authority matter more than benchmark peak throughput.
- Sensor pipelines, model inference, and control loops contend for shared memory and compute.
- Safety-critical actuation requires deterministic guards outside probabilistic model output.
Embodied constraints
Robotics, vehicles, cameras, and industrial systems run with small batches, continuous streams, limited power, thermal throttling, intermittent connectivity, and heterogeneous SoCs. Inputs have timestamps and validity windows; outputs may expire before they are applied.
Deadlines and determinism
Classify hard, firm, and soft deadlines. Use priority-aware scheduling, bounded queues, watchdogs, and a safe degraded mode. Deterministic kernels or static plans may sacrifice peak performance to reduce jitter and improve reproducibility.
Unified-memory contention
On edge SoCs, camera frames, lidar, preprocessing, model weights, KV state, and display or actuation share memory bandwidth. Runtime planning must account for the whole pipeline rather than model inference in isolation. Excessive cache or model streaming can delay sensor processing.
Graph capture and execution state
Capturing a stable graph can reduce per-kernel launch overhead. Research also explores snapshotting model execution state for rapid restore or branching. Such techniques require exact compatibility with model version, buffers, token position, recurrent state, and device configuration. They should be treated as low-level execution artifacts, not general application checkpoints.
Dataflow and middleware
Streaming runtimes connect acquisition, preprocessing, inference, tracking, fusion, and output. NVIDIA DeepStream is one production example of a pipeline-oriented video analytics runtime. [ar_cite id=”deepstream” label=”DeepStream”] Zero-copy transport, bounded buffers, timestamps, and backpressure reduce latency and avoid stale frames.
Safety boundary
- Validate model output against physical and domain constraints.
- Keep emergency stop and invariant enforcement outside the model.
- Reject stale or out-of-order observations.
- Use bounded action spaces and authority.
- Record sensor, model, policy, and actuator correlation.
- Fail to a safe state when timing or confidence requirements are not met.
Metrics
Measure sensor-to-action latency, deadline miss, jitter, stale-frame rate, memory bandwidth, thermal throttling, power, model quality under device precision, fallback frequency, safe-stop behavior, and recovery time. Report worst-case and percentile behavior, not only averages.
