Execute AI in or near the browser with capability detection, model delivery, cache control, offline behavior, and privacy-preserving fallback.
Key takeaways
- Primary risk: Large downloads, unsupported devices, origin data exposure, cache persistence, thermal use, and silent hosted fallback.
- Keep authoritative domain state outside model memory.
- Measure task outcome, safe failure, and evidence—not output fluency alone.
Problem
Execute AI in or near the browser with capability detection, model delivery, cache control, offline behavior, and privacy-preserving fallback.
Principal risk: Large downloads, unsupported devices, origin data exposure, cache persistence, thermal use, and silent hosted fallback.
Why runtime layers are needed
A single model invocation cannot reliably own identity, authorization, durable state, external side effects, recovery, or evidence. The runtime composes the necessary compiler/inference/serving path with application controls appropriate to this use case.
Reference architecture
- Progressive web/application shell
- WebGPU/WebNN/WASM capability detector
- Signed model and tokenizer artifacts
- Origin-scoped cache and storage budget
- Worker-based inference to protect UI responsiveness
- Hosted fallback endpoint with explicit route policy
- Telemetry and privacy controls
Request flow
- Detect browser, hardware API, memory, storage, and policy capability.
- Select a compatible model variant and precision.
- Download with integrity validation and progress/cancellation.
- Initialize the runtime in a worker and warm only necessary resources.
- Run local inference with bounded context and concurrency.
- Fall back to hosted execution only when allowed and visible.
- Cache or evict artifacts according to storage and retention policy.
- Update atomically and preserve a working prior version.
Contracts
- Capability contract records required APIs, memory, artifact size, precision, and fallback.
- Route contract distinguishes local browser, remote browser-adjacent, and hosted model processing.
- Storage contract defines artifact cache, user data, expiration, and deletion.
Use the runtime request, tool, policy and approval, evidence, and trace schemas as versioned reference boundaries.
Failure modes
- WebGPU/WebNN unavailable or disabled
- Model download interrupted or integrity mismatch
- Storage quota eviction
- Worker crash or device loss
- Tab suspension
- Thermal throttling or UI jank
- Fallback sends data outside stated boundary
Security considerations
- Apply CSP and origin isolation appropriate to WordPress/application constraints.
- Verify model artifacts and avoid third-party script access to sensitive state.
- Keep inference in workers and validate messages.
- Do not treat local execution as protection from malicious page code.
- Make remote fallback and analytics explicit.
Observability
Correlate request, model route, context sources, tool operations, policy decisions, approvals, artifacts, failures, recovery, and domain outcome. Apply redaction and retention before exporting traces.
Evaluation and metrics
- Capability success by device class
- Download and initialization time
- Local inference latency and UI responsiveness
- Cache hit/eviction
- Offline success
- Fallback rate
- Energy/thermal impact where measurable
- Privacy-boundary violations
Implementation checklist
- Provide a no-JavaScript or non-AI functional fallback where possible.
- Test current major browsers and mobile constraints.
- Set artifact width/size and storage budgets.
- Use workers and cancellation.
- Verify behavior at 200–400% zoom and reduced motion.
- Use hosted inference when local execution creates an unusable or misleading experience.
