Serverless runtimes trade infrastructure management for cold-start, duration, state, networking, and provider constraints.
Key takeaways
- Function or container invocation
- Cold start must be explicit.
- Fallback and rollback behavior should be tested.
Patterns
- Function or container invocation
- Managed model endpoint
- Event-driven workflow
- Scale-to-zero agent session
Placement decision
| Question | Why it matters |
|---|---|
| Cold start | Record the constraint, assumption, and accepted trade-off in the Runtime Decision Record. |
| Execution duration | Record the constraint, assumption, and accepted trade-off in the Runtime Decision Record. |
| Ephemeral storage | Record the constraint, assumption, and accepted trade-off in the Runtime Decision Record. |
| Concurrency and quota | Record the constraint, assumption, and accepted trade-off in the Runtime Decision Record. |
| Checkpoint location | Record the constraint, assumption, and accepted trade-off in the Runtime Decision Record. |
| Cost under burst | Record the constraint, assumption, and accepted trade-off in the Runtime Decision Record. |
Failure and fallback
Define behavior for network loss, provider failure, device pressure, cache loss, invalid artifacts, and unavailable tools. A fallback must preserve data policy and output contracts; it should not silently broaden authority.
Implementation checklist
- Document the control, execution, data, and evidence locations.
- Pin artifact, runtime, and policy versions.
- Test cold start, steady state, overload, failure, and rollback.
- Expose data movement and hosted fallback to users where relevant.
- Record cost and capacity assumptions.
State, duration, and external effects
Serverless execution is a good fit for bounded, independently retryable activities. Long model downloads, warm caches, streaming sessions, approval waits, and durable agent loops should not be forced into one invocation. Keep durable workflow state, operation keys, artifacts, and approval state in external services; let short-lived workers perform idempotent activities and release resources while waiting.
Timeout semantics must distinguish work that never started from work that may have committed an external effect. A platform retry is safe only when the activity is idempotent or when the runtime can query authoritative operation status. Cold start should include container startup, model or adapter loading, connection establishment, and cache warmup—not only function dispatch.
Suitability test
| Question | Prefer serverless when | Prefer a resident runtime when |
|---|---|---|
| State | State is external and activities are replay-safe | Large local state or cache residency dominates |
| Duration | Activities complete within bounded limits | Sessions stream or wait for long periods |
| Traffic | Bursty demand benefits from rapid elasticity | Steady load benefits from warm capacity |
| Hardware | Required accelerators and packages are supported | Specialized topology or tuning is required |
