Browser AI Application

Execute AI in or near the browser with capability detection, model delivery, cache control, offline behavior, and privacy-preserving fallback.

Key takeaways

Primary risk: Large downloads, unsupported devices, origin data exposure, cache persistence, thermal use, and silent hosted fallback.
Keep authoritative domain state outside model memory.
Measure task outcome, safe failure, and evidence—not output fluency alone.

Problem

Execute AI in or near the browser with capability detection, model delivery, cache control, offline behavior, and privacy-preserving fallback.

Principal risk: Large downloads, unsupported devices, origin data exposure, cache persistence, thermal use, and silent hosted fallback.

Why runtime layers are needed

A single model invocation cannot reliably own identity, authorization, durable state, external side effects, recovery, or evidence. The runtime composes the necessary compiler/inference/serving path with application controls appropriate to this use case.

Reference architecture

Progressive web/application shell
WebGPU/WebNN/WASM capability detector
Signed model and tokenizer artifacts
Origin-scoped cache and storage budget
Worker-based inference to protect UI responsiveness
Hosted fallback endpoint with explicit route policy
Telemetry and privacy controls

Request flow

Detect browser, hardware API, memory, storage, and policy capability.
Select a compatible model variant and precision.
Download with integrity validation and progress/cancellation.
Initialize the runtime in a worker and warm only necessary resources.
Run local inference with bounded context and concurrency.
Fall back to hosted execution only when allowed and visible.
Cache or evict artifacts according to storage and retention policy.
Update atomically and preserve a working prior version.

Contracts

Capability contract records required APIs, memory, artifact size, precision, and fallback.
Route contract distinguishes local browser, remote browser-adjacent, and hosted model processing.
Storage contract defines artifact cache, user data, expiration, and deletion.

Use the runtime request, tool, policy and approval, evidence, and trace schemas as versioned reference boundaries.

Failure modes

WebGPU/WebNN unavailable or disabled
Model download interrupted or integrity mismatch
Storage quota eviction
Worker crash or device loss
Tab suspension
Thermal throttling or UI jank
Fallback sends data outside stated boundary

Security considerations

Apply CSP and origin isolation appropriate to WordPress/application constraints.
Verify model artifacts and avoid third-party script access to sensitive state.
Keep inference in workers and validate messages.
Do not treat local execution as protection from malicious page code.
Make remote fallback and analytics explicit.

Observability

Correlate request, model route, context sources, tool operations, policy decisions, approvals, artifacts, failures, recovery, and domain outcome. Apply redaction and retention before exporting traces.

Evaluation and metrics

Capability success by device class
Download and initialization time
Local inference latency and UI responsiveness
Cache hit/eviction
Offline success
Fallback rate
Energy/thermal impact where measurable
Privacy-boundary violations

Implementation checklist

Provide a no-JavaScript or non-AI functional fallback where possible.
Test current major browsers and mobile constraints.
Set artifact width/size and storage budgets.
Use workers and cancellation.
Verify behavior at 200–400% zoom and reduced motion.
Use hosted inference when local execution creates an unusable or misleading experience.

Find runtime definitions and implementation guidance