Deployment Patterns

Choose deployment based on data placement, hardware, latency, availability, cost, operations, and trust—not on a local-versus-cloud slogan.

Key takeaways

Local process
Data locality and classification must be explicit.
Fallback and rollback behavior should be tested.

Patterns

Local process
Desktop application
Browser
Edge or mobile
Single-node server
Kubernetes or cluster
Serverless
Hybrid or federated

Placement decision

Question	Why it matters
Data locality and classification	Record the constraint, assumption, and accepted trade-off in the Runtime Decision Record.
Cold-start tolerance	Record the constraint, assumption, and accepted trade-off in the Runtime Decision Record.
Hardware availability	Record the constraint, assumption, and accepted trade-off in the Runtime Decision Record.
Operational ownership	Record the constraint, assumption, and accepted trade-off in the Runtime Decision Record.
Network and provider failure	Record the constraint, assumption, and accepted trade-off in the Runtime Decision Record.
Update and rollback	Record the constraint, assumption, and accepted trade-off in the Runtime Decision Record.

Failure and fallback

Define behavior for network loss, provider failure, device pressure, cache loss, invalid artifacts, and unavailable tools. A fallback must preserve data policy and output contracts; it should not silently broaden authority.

Implementation checklist

Document the control, execution, data, and evidence locations.
Pin artifact, runtime, and policy versions.
Test cold start, steady state, overload, failure, and rollback.
Expose data movement and hosted fallback to users where relevant.
Record cost and capacity assumptions.

Topology trade-offs

A deployment pattern is a set of failure and authority boundaries, not simply a hosting location. A local process minimizes network dependencies but inherits device variability and update constraints. A shared service improves fleet utilization but introduces tenancy, queueing, regional placement, and provider dependencies. A disaggregated cluster can scale prefill, decode, cache, and routing independently, but makes the network and remote state part of the critical path. A managed task runtime improves isolation and durability at the cost of provisioning and control-plane complexity.

Evaluate the complete request path. Context may be local while inference is hosted; tools may be inside a private network while policy is centralized; evidence may require a separate region or retention tier. Draw each transfer, classify the data, assign a deadline, and state what happens when the transfer fails.

Migration strategy

Baseline the current latency, error, cost, state, and recovery behavior.
Introduce a stable request contract before moving execution.
Separate state that must survive from process-local caches.
Run the new path in shadow or read-only mode where possible.
Canary by workload and risk class rather than by random traffic alone.
Test rollback with in-flight work, duplicate delivery, and partial side effects.
Retire the old path only after evidence, dashboards, and incident procedures are operational.

Find runtime definitions and implementation guidance