Edge, mobile, and TinyML runtimes execute models under constrained memory, power, thermal, storage, connectivity, and update conditions. Their architecture prioritizes predictable footprint, device capability, privacy, and graceful fallback.
Key takeaways
- Model packaging and hardware delegate compatibility are deployment contracts.
- On-device execution improves data placement but does not automatically solve model, application, or telemetry privacy.
- Battery, thermal throttling, and update failure are runtime concerns.
Scope
Edge ranges from powerful desktops and mobile SoCs to embedded controllers. A runtime may support CPU, GPU, DSP, NPU, or microcontroller kernels, often with different operator coverage and quantization requirements. TinyML emphasizes statically bounded memory and minimal dependencies.
Model packaging
Packaging includes graph/artifact, weights, tokenizer, metadata, preprocessing, version, signature, and compatibility constraints. Update systems need atomic activation and rollback so a partially downloaded model cannot brick the application.
Hardware delegates
A delegate partitions supported operations to a device backend and leaves unsupported work elsewhere. Measure transfers and fallback, because a small unsupported region can erase accelerator gains. Verify the exact OS, device, driver, runtime, model, and precision combination.
Resource scheduling
Use bounded memory pools, static plans where possible, explicit thread limits, and thermal-aware workload control. Interactive workloads should yield to critical device functions. Robotics and sensor pipelines need deadline and priority semantics rather than average throughput alone.
Offline behavior and updates
Define which features remain available without a network, how model and policy versions are selected, and when hosted fallback is allowed. Cache integrity, storage pressure, rollback, and expiry should be visible to the user or operator.
Privacy and telemetry
Local inference can keep raw input on-device, but downloaded models, crash logs, analytics, remote fallback, and tool calls may still disclose data. Apply minimization and consent to telemetry and clearly distinguish local processing from local storage and local control.
Selection checklist
- Supported devices, OS versions, operators, shapes, and precisions
- Artifact size, install/update size, startup time, peak memory, and thermal behavior
- Fallback behavior and privacy consequences
- Offline capability and model expiration
- Hardware delegate observability and failure reporting
- Signed updates, rollback, and supply-chain provenance
Device failure model
On-device execution fails differently from a managed service. The model package may exceed available storage, a delegate may reject an operation, the operating system may reclaim memory, thermal throttling may lengthen a deadline, or an update may leave the application and model schema out of sync. Treat these as designed states. The application should be able to report capability, decline unsupported work, select a smaller model, defer execution, or use a policy-approved hosted fallback.
Offline operation also changes evidence handling. Telemetry may need to remain local until connectivity returns, and sensitive inputs should not be queued for upload merely because a trace exporter is unavailable. Use bounded local buffers, redaction at collection time, explicit retention, and a clear rule for whether missing telemetry blocks high-risk work.
Evaluation matrix
| Dimension | Questions |
|---|---|
| Compatibility | Which OS, chipset, delegate, operator set, and model package versions are supported? |
| Resources | What are peak memory, persistent storage, thermal, battery, and startup costs? |
| Lifecycle | How are models signed, staged, rolled back, and removed? |
| Privacy | Which data stays local, and what telemetry or fallback payload may leave? |
| Reliability | What happens when a delegate, model, sensor, or network path is unavailable? |
