Choose deployment based on data placement, hardware, latency, availability, cost, operations, and trust—not on a local-versus-cloud slogan.
Key takeaways
- Local process
- Data locality and classification must be explicit.
- Fallback and rollback behavior should be tested.
Patterns
- Local process
- Desktop application
- Browser
- Edge or mobile
- Single-node server
- Kubernetes or cluster
- Serverless
- Hybrid or federated
Placement decision
| Question | Why it matters |
|---|---|
| Data locality and classification | Record the constraint, assumption, and accepted trade-off in the Runtime Decision Record. |
| Cold-start tolerance | Record the constraint, assumption, and accepted trade-off in the Runtime Decision Record. |
| Hardware availability | Record the constraint, assumption, and accepted trade-off in the Runtime Decision Record. |
| Operational ownership | Record the constraint, assumption, and accepted trade-off in the Runtime Decision Record. |
| Network and provider failure | Record the constraint, assumption, and accepted trade-off in the Runtime Decision Record. |
| Update and rollback | Record the constraint, assumption, and accepted trade-off in the Runtime Decision Record. |
Failure and fallback
Define behavior for network loss, provider failure, device pressure, cache loss, invalid artifacts, and unavailable tools. A fallback must preserve data policy and output contracts; it should not silently broaden authority.
Implementation checklist
- Document the control, execution, data, and evidence locations.
- Pin artifact, runtime, and policy versions.
- Test cold start, steady state, overload, failure, and rollback.
- Expose data movement and hosted fallback to users where relevant.
- Record cost and capacity assumptions.
Topology trade-offs
A deployment pattern is a set of failure and authority boundaries, not simply a hosting location. A local process minimizes network dependencies but inherits device variability and update constraints. A shared service improves fleet utilization but introduces tenancy, queueing, regional placement, and provider dependencies. A disaggregated cluster can scale prefill, decode, cache, and routing independently, but makes the network and remote state part of the critical path. A managed task runtime improves isolation and durability at the cost of provisioning and control-plane complexity.
Evaluate the complete request path. Context may be local while inference is hosted; tools may be inside a private network while policy is centralized; evidence may require a separate region or retention tier. Draw each transfer, classify the data, assign a deadline, and state what happens when the transfer fails.
Migration strategy
- Baseline the current latency, error, cost, state, and recovery behavior.
- Introduce a stable request contract before moving execution.
- Separate state that must survive from process-local caches.
- Run the new path in shadow or read-only mode where possible.
- Canary by workload and risk class rather than by random traffic alone.
- Test rollback with in-flight work, duplicate delivery, and partial side effects.
- Retire the old path only after evidence, dashboards, and incident procedures are operational.
