Search ARuntime.com

Find runtime definitions and implementation guidance

Search page titles, summaries, headings, glossary terms, use cases, and runtime-directory entries.

Enter at least two characters.

ARuntime Reference

Runtime Observability

Runtime observability correlates infrastructure, compiler, inference, serving, tool, policy, evaluation, and product outcomes without assuming every organization can retain raw prompts.

Audience: Technical readers Reading time: 3 minutes Status: Production guidance Last reviewed:

Runtime observability explains how infrastructure, model, serving, tool, policy, business, and evaluation behavior combine during a request. It must preserve correlation without turning sensitive model and tool payloads into unrestricted logs.

Key takeaways

  • Separate operational telemetry from durable review evidence.
  • Correlate layers with stable identifiers and explicit span kinds.
  • Measure queueing, cache, tools, approvals, and recovery—not only model latency.

Signal model

Infrastructure

CPU/GPU/NPU, memory, storage, network, process, worker, queue, and dependency health.

Compiler and model

Artifact/version, compile/warmup, route, prefill, decode, cache, tokens, and stop reason.

Serving

Admission, queueing, batching, routing, replica, rollout, overload, and deadline.

Tool and policy

Tool version, operation key, authorization, approval, dependency, side effect, and compensation.

Business outcome

Domain result, changed resource, customer-visible status, and accountable decision.

Evaluation

Quality, safety, task success, evidence completeness, cost, and failure classification.

Correlation

Use requestId and correlationId across application boundaries, traceId/spanId for execution structure, operation IDs for tools, policy decision and approval IDs for authorization, and artifact/evidence IDs for durable review. Async messages use trace links where a single parent is misleading.

Metrics

Representative metrics by layer
Layer Metrics
Compiler/engine load, warmup, compile, prefill, decode, cache occupancy, allocation failure
Serving queue, admission rejection, batch, route, TTFT, TPOT, goodput, rollout health
Distributed collective, transfer, remote-cache hit, placement, worker failure
Agentic step count, tool latency, approval wait, recovery, budget, evidence completion
Product successful outcome, correction, abandonment, safe denial, incident

Logs and events

Use structured events with stable names, versions, UTC timestamps, severity, source layer, correlation, and sanitized attributes. Avoid free-form logging of prompts and tool payloads. Log changes to policy, route, model, tool, and configuration separately from request events.

Traces

Trace model invocation, tool calls, policy decisions, waits, and recovery as distinct spans. Model spans should include deployment/version, input/output token counts, phase durations, cache indicators, and stop reason when available. OpenTelemetry’s generative-AI conventions are evolving; pin the convention version used by an implementation. [ar_cite id=”otel-genai” label=”OpenTelemetry”]

Evidence boundary

Observability may be sampled, short-lived, or operator-focused. Evidence is selected for durable review and may include artifact hashes, policy reasons, approvals, side-effect records, and failure history. Evidence should reference traces without requiring every trace payload to be retained.

Privacy and sampling

  • Default raw prompt and completion capture off.
  • Separate restricted payload storage from broad metrics.
  • Apply tenant-aware access and deletion.
  • Retain denials, high-risk actions, errors, and evidence-required events even when routine traces are sampled.
  • Do not send search text, prompts, or contact content to analytics by default.

Operational views

Provide service-objective, capacity, model quality, tool reliability, approval backlog, security events, recovery, cost, and evidence-gap views. Each dashboard links from aggregate metrics to minimized request evidence under authorization.

Maintenance record

Found an error, outdated capability, or unclear category boundary? Submit a correction with a supporting source.