Key takeaways
- Hybrid routing should be policy-driven and visible, not an opaque fallback after failure.
- Each path may use a different model, precision, context capacity, and latency profile; output contracts must remain explicit.
- State, memory, and cache do not automatically follow requests across boundaries.
- Failover can change privacy, cost, and capability and may require consent or approval.
- Observability must correlate path selection without centralizing sensitive raw data unnecessarily.
Runtime boundary
A useful architecture identifies what this layer receives, owns, emits, measures, and refuses to own. That boundary prevents overlapping products from being treated as interchangeable.
Receives
Task, actor, data classification, device capability, connectivity, SLO, budget, available models, and state location.
Owns
Route policy, capability negotiation, fallback, state transfer, output normalization, and path-specific controls.
Emits
Route decision, execution result, fallback reason, state synchronization, privacy/cost record, and cross-boundary trace.
Does not own
Permission to move data across boundaries merely because another route is available.
Failure modes
Silent cloud fallback, state divergence, incompatible output, stale local model, duplicate action, partition, and route thrash.
Evidence and metrics
Route distribution, fallback reason, latency/cost by path, data transferred, version parity, offline success, and conflicts.
Policy-based route selection
Route inputs include classification, actor/tenant policy, device capability, model quality, context, deadline, cost, connectivity, capacity, and tool requirements.
Implementation
Evaluate before data movement and emit a route decision with policy version and reason.
Operational implications
A privacy-required task may fail closed rather than silently use a public endpoint.
Measure
Route distribution, policy reason, fail-closed count, latency, and cost.
Capability negotiation
Clients and edge nodes expose supported runtime, model, precision, memory, and API capabilities.
Implementation
Use signed or trusted capability data, probe actual support, and select approved tiers.
Operational implications
Marketing device labels or stale capability caches can misroute work.
Measure
Capability tier, probe failure, route mismatch, and fallback.
Output contract and model parity
Different routes may use different models or quantization.
Implementation
Normalize to a versioned response envelope and declare capability/quality differences.
Operational implications
Do not promise semantic identity when a smaller local model is a degraded path.
Measure
Contract validity, quality by route, citation parity, and user-visible degradation.
State and memory
Working state can remain local while approved summaries or authoritative records sync centrally.
Implementation
Identify authoritative stores, version records, detect conflicts, and use idempotent commands.
Operational implications
KV cache is execution-specific and is not product memory; do not merge them conceptually.
Measure
Sync latency, conflicts, duplicate prevention, stale state, and memory route.
Fallback and consent
Failure can trigger alternate local, private, managed, or non-AI paths.
Implementation
Declare which fallbacks are permitted by data class and user/tenant policy; disclose route changes.
Operational implications
Fallback that changes residency or side effects may require approval.
Measure
Fallback rate/reason, consent/approval, data transfer, and success.
Network partitions
Disconnected operation requires bounded queues, expiry, local policy, and reconciliation.
Implementation
Classify read-only/local-safe tasks versus actions requiring current central authority.
Operational implications
On reconnect, query authoritative outcomes before replaying writes.
Measure
Offline success, queue age, expired tasks, conflicts, and duplicate attempts.
Route stability
Marginal network or device signals can cause repeated path switching.
Implementation
Use hysteresis, session affinity, minimum route lifetime, and health windows.
Operational implications
Route thrash increases latency, downloads, state movement, and user inconsistency.
Measure
Switch frequency, session route changes, download waste, and latency variance.
Hybrid observability
The trace records route inputs, decision, model/runtime version, data movement, fallback, state sync, and result.
Implementation
Export metadata/references under privacy rules and retain local evidence while offline.
Operational implications
Central observability should not negate local privacy by collecting raw prompts.
Measure
Trace completeness, route attribution, transfer, privacy violations, and delayed upload.
Reference tables
| Condition | Route | Fallback | Required disclosure |
|---|---|---|---|
| Sensitive input, approved device | Local/edge | Fail closed or private cloud | No managed-cloud transfer |
| Large public document | Cloud model | Smaller local summary | Capability/quality difference |
| Offline field operation | On-device | Queue approved sync | Offline model/version |
| Private enterprise RAG | Private cluster | No public fallback | Residency/source refs |
| Consumer interactive feature | Local when capable | Explicit managed service | Telemetry/fallback path |
Decision checklist
- Which data classes may cross each route boundary?
- What route decision is deterministic and auditable?
- How do output contracts differ by path?
- Where is authoritative memory and business state?
- What happens during a network partition?
- How is fallback disclosed and approved?
- How are duplicate side effects prevented during reconciliation?
- What metrics detect route thrash or unexpected cloud use?
Common mistakes
- Silently falling back from local to cloud.
- Treating product memory and KV cache as the same state.
- Assuming different models produce interchangeable outputs.
- Replaying queued writes without checking authoritative outcome.
- Switching routes on every transient signal.
- Centralizing raw telemetry that defeats local privacy.
Sources and further reading
-
Web Neural Network API
(opens in a new tab)
-
ExecuTorch overview
(opens in a new tab)
-
KServe architecture
(opens in a new tab)
-
NIST Privacy Framework
(opens in a new tab)
Last reviewed: 2026-06-21 UTC
