Key takeaways
- The contract normalizes application intent before provider-specific execution.
- Identity, tenant, delegated authority, risk, deadline, and budget are required inputs—not inferred from prompts.
- Tools, memory, output, policy, and trace behavior must be explicit and versioned.
- Error categories should be stable across providers and distinguish retryable from ambiguous side effects.
- Contracts evolve through compatibility rules, schema IDs, fixtures, and deprecation—not undocumented prompt changes.
Runtime boundary
A useful architecture identifies what this layer receives, owns, emits, measures, and refuses to own. That boundary prevents overlapping products from being treated as interchangeable.
Receives
Product-level intent and governance context.
Owns
Boundary schema, versioning, required fields, validation, stable error/usage semantics, and extension rules.
Emits
An accepted execution or stable structured rejection; later a response, error, events, and evidence tied to the same request.
Does not own
Provider-native hidden fields, unrestricted arbitrary JSON, or authority embedded only in text.
Failure modes
Unknown version, invalid identity/tenant, missing permissions, schema drift, provider leakage, ambiguous error, and incompatible evolution.
Evidence and metrics
Schema validation, version adoption, deprecated fields, extension use, contract errors, response validity, and replay completeness.
Request envelope
The request carries contract version, IDs/UTC time, actor, tenant/session, typed task, risk, permissions, context/model/memory policy, output schema, trace mode, deadline, and budget.
Implementation
Validate before side effects or provider calls. Resolve references under service identity and tenant policy.
Operational implications
Use generated validators from a canonical schema and preserve unknown-field policy.
Measure
Validation failures, contract version, risk class, allowed tools, deadline, and budget.
Response envelope
The response returns structured output, evidence, warnings, tool results, route, policy decisions, memory changes, trace, timing/usage/cost, and review state.
Implementation
Return stable references for protected content and separate warnings from failures.
Operational implications
Do not expose provider hidden reasoning or raw credentials.
Measure
Contract validity, evidence coverage, warnings, usage reconciliation, and review state.
Tool contract and invocation
A tool definition includes name/version, description, schemas, permission, side effect, idempotency, timeout, retry, approval, and audit fields.
Implementation
Pin the version selected during proposal/approval through execution.
Operational implications
Tool descriptions guide the model; deterministic metadata controls execution.
Measure
Schema validity, version mismatch, approval, idempotency, tool result, and side effects.
Trace event
Trace events carry trace/span hierarchy, UTC time, component/operation, attempt, input/output references, model/tool/provider IDs, counts, duration, policy, error, and redaction.
Implementation
Use a controlled event taxonomy and link durable workflow resumes.
Operational implications
Content mode must follow classification and retention policy.
Measure
Event completeness, dropped/exported, redaction, clock skew, and correlation.
Error envelope
Stable categories separate validation, authentication, authorization, policy, capacity, dependency, model, tool, timeout, cancellation, and internal errors.
Implementation
Include retryable, user-safe message, detail reference, trace ID, and review/compensation state.
Operational implications
A timeout after dispatch is not automatically retryable.
Measure
Error category, retry, ambiguous outcome, user message, and incident linkage.
Benchmark disclosure
A machine-readable disclosure binds result to model, hardware, software, workload, method, metrics, cache/warmup, quality, and review date.
Implementation
Store alongside raw benchmark outputs and source-control revision.
Operational implications
It prevents numbers from escaping their tested context.
Measure
Disclosure completeness, artifact hashes, reproduction status, and review age.
Decision record
An architecture decision records scope, exact candidates/versions, requirements, evidence, decision, trade-offs, rollback, and review triggers.
Implementation
Keep it linked from project memory and revisit on trigger events.
Operational implications
Avoid scorecards without narrative scope and uncertainty.
Measure
Review age, trigger occurrence, unresolved risk, and migration readiness.
Evolution rules
Use semantic contract versions, additive changes, schema identifiers, capability negotiation, fixtures, and migration windows.
Implementation
Reject incompatible major versions; define default semantics only when safe; test old/new clients.
Operational implications
Never reinterpret an existing field silently.
Measure
Client/server version matrix, deprecated use, migration failures, and contract-test pass.
Reference tables
| Artifact | Purpose |
|---|---|
| Runtime request | Normalize intent, identity, authority, constraints, budget |
| Runtime response | Return structured result, evidence, route, policy, memory, usage |
| Tool definition | Describe typed capability and execution controls |
| Tool invocation/result | Record one controlled side-effect attempt and outcome |
| Trace event | Correlate components, timing, versions, decisions, errors |
| Error envelope | Stable retry/review semantics |
| Benchmark disclosure | Bind performance claim to exact conditions |
| Decision record | Preserve architecture rationale and review triggers |
Implementation artifacts
These examples use synthetic values. Validate schemas and controls against the target system before deployment.
Runtime request JSON Schema
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "RuntimeRequest",
"type": "object",
"required": [
"contractVersion",
"requestId",
"occurredAtUtc",
"actor",
"tenant",
"task",
"risk",
"permissions",
"output",
"trace",
"deadlineUtc",
"budget"
],
"properties": {
"contractVersion": {
"type": "string",
"const": "2.0"
},
"requestId": {
"type": "string",
"format": "uuid"
},
"occurredAtUtc": {
"type": "string",
"format": "date-time"
},
"actor": {
"type": "object",
"required": [
"subject"
],
"properties": {
"subject": {
"type": "string"
},
"authenticationContext": {
"type": "string"
},
"delegationId": {
"type": [
"string",
"null"
]
}
}
},
"tenant": {
"type": "object",
"required": [
"id"
],
"properties": {
"id": {
"type": "string"
},
"region": {
"type": [
"string",
"null"
]
}
}
},
"session": {
"type": [
"object",
"null"
],
"properties": {
"id": {
"type": "string"
},
"threadId": {
"type": [
"string",
"null"
]
},
"stateVersion": {
"type": [
"integer",
"null"
]
}
}
},
"task": {
"type": "object",
"required": [
"type",
"input"
],
"properties": {
"type": {
"type": "string"
},
"input": {
"type": "object"
},
"idempotencyKey": {
"type": [
"string",
"null"
]
}
}
},
"risk": {
"type": "object",
"required": [
"level"
],
"properties": {
"level": {
"enum": [
"low",
"medium",
"high",
"critical"
]
},
"purpose": {
"type": "string"
},
"dataClasses": {
"type": "array",
"items": {
"type": "string"
}
}
}
},
"permissions": {
"type": "object",
"required": [
"scopes"
],
"properties": {
"scopes": {
"type": "array",
"items": {
"type": "string"
}
},
"allowedTools": {
"type": "array",
"items": {
"type": "string"
}
},
"approvalPolicy": {
"type": "string"
}
}
},
"contextPolicy": {
"type": "object"
},
"modelRoute": {
"type": "object"
},
"memoryPolicy": {
"type": "object"
},
"output": {
"type": "object",
"required": [
"schemaId"
],
"properties": {
"schemaId": {
"type": "string"
},
"stream": {
"type": "boolean"
}
}
},
"trace": {
"type": "object",
"properties": {
"enabled": {
"type": "boolean"
},
"contentMode": {
"enum": [
"references-only",
"redacted",
"approved-content"
]
}
}
},
"deadlineUtc": {
"type": "string",
"format": "date-time"
},
"budget": {
"type": "object",
"required": [
"maxTokens",
"maxCostUsd",
"maxSteps"
],
"properties": {
"maxTokens": {
"type": "integer"
},
"maxCostUsd": {
"type": "number"
},
"maxSteps": {
"type": "integer"
}
}
}
}
}
Runtime response example
{
"contractVersion": "2.0",
"requestId": "70d82541-c047-4bc3-9b58-8e46b65036d0",
"status": "completed",
"output": {
"schemaId": "aruntime.answer.v1",
"value": {
"summary": "The request completed under the approved route and tool policy."
}
},
"evidence": [
{
"sourceId": "policy-doc-2026-04",
"uri": "urn:aruntime:evidence:7f31",
"classification": "internal"
}
],
"warnings": [],
"toolResults": [
{
"invocationId": "tool-03",
"tool": "crm.case.update@2",
"status": "succeeded",
"resultRef": "urn:aruntime:tool-result:923"
}
],
"route": {
"model": "example-model-42",
"runtime": "example-runtime-7.4",
"provider": "private-cluster",
"fallbackUsed": false
},
"policyDecisions": [
{
"decisionId": "pd-918",
"checkpoint": "tool.execute",
"effect": "allow",
"policyVersion": "2026.06.3"
}
],
"memoryChanges": [
{
"scope": "session",
"operation": "upsert",
"recordRef": "urn:aruntime:memory:case-state:44",
"stateVersion": 12
}
],
"traceId": "9af7fb945f56466a9fef4d456317ce30",
"timingMs": {
"queue": 18,
"context": 52,
"model": 731,
"tools": 109,
"total": 944
},
"usage": {
"promptTokens": 1480,
"cachedTokens": 920,
"outputTokens": 186,
"estimatedCostUsd": 0.023
},
"humanReview": {
"state": "not-required"
}
}
Tool definition example
{
"name": "crm.case.update",
"version": "2.1.0",
"description": "Updates approved mutable fields on an existing tenant-scoped case.",
"inputSchema": {
"type": "object",
"required": [
"caseId",
"expectedVersion",
"patch"
],
"properties": {
"caseId": {
"type": "string"
},
"expectedVersion": {
"type": "integer"
},
"patch": {
"type": "object",
"additionalProperties": false,
"properties": {
"status": {
"enum": [
"open",
"pending",
"resolved"
]
},
"ownerId": {
"type": "string"
}
}
}
}
},
"outputSchema": {
"type": "object",
"required": [
"caseId",
"newVersion",
"changedFields"
],
"properties": {
"caseId": {
"type": "string"
},
"newVersion": {
"type": "integer"
},
"changedFields": {
"type": "array",
"items": {
"type": "string"
}
}
}
},
"requiredPermission": "case.write",
"sideEffect": "reversible-write",
"approval": "required-when-status-resolved",
"idempotency": "caller-supplied-key",
"timeoutMs": 5000,
"retryPolicy": {
"maxAttempts": 2,
"retryable": [
"dependency.unavailable"
],
"requiresOutcomeCheck": true
},
"auditFields": [
"actor.subject",
"tenant.id",
"caseId",
"expectedVersion",
"approvalId",
"idempotencyKey"
]
}
Trace event example
{
"traceId": "9af7fb945f56466a9fef4d456317ce30",
"spanId": "a30ffc6204a14df1",
"parentSpanId": "f2b1c5c3471e438a",
"eventType": "tool.result",
"timestampUtc": "2026-06-21T15:32:18.442Z",
"component": "tool-broker",
"operation": "crm.case.update",
"attempt": 1,
"inputRef": "urn:aruntime:tool-input:8d1",
"outputRef": "urn:aruntime:tool-result:923",
"model": null,
"provider": null,
"tool": "crm.case.update@2.1.0",
"counts": {
"bytesIn": 312,
"bytesOut": 148
},
"durationMs": 109,
"policyDecisionId": "pd-918",
"error": null,
"redaction": "references-only"
}
Error envelope example
{
"contractVersion": "2.0",
"requestId": "70d82541-c047-4bc3-9b58-8e46b65036d0",
"status": "failed",
"error": {
"category": "tool.ambiguous-outcome",
"code": "CASE_UPDATE_TIMEOUT_AFTER_DISPATCH",
"message": "The downstream operation timed out after dispatch; authoritative state must be checked before retry.",
"retryable": false,
"userSafe": true,
"detailsRef": "urn:aruntime:error-detail:90a"
},
"traceId": "9af7fb945f56466a9fef4d456317ce30",
"humanReview": {
"state": "required",
"reason": "Ambiguous external side effect"
}
}
Benchmark disclosure example
{
"reviewedAtUtc": "2026-06-21T00:00:00Z",
"model": {
"name": "ExampleLM-8B",
"revision": "8cbe0f2",
"tokenizer": "example-tokenizer-3",
"precision": "weights-int4-group128; activations-fp16; kv-fp16"
},
"hardware": {
"device": "Example GPU 48GB",
"count": 1,
"topology": "single-device",
"powerMode": "default"
},
"software": {
"os": "Linux 6.x",
"driver": "000.0",
"runtime": "ExampleRuntime 7.4",
"containerDigest": "sha256:\u2026"
},
"workload": {
"promptTokens": {
"p50": 512,
"p95": 4096
},
"outputTokens": {
"p50": 128,
"p95": 512
},
"arrival": "Poisson 2-24 requests/s sweep",
"concurrencyMax": 128,
"prefixCache": "cold and warm reported separately"
},
"method": {
"warmupRequests": 100,
"durationSeconds": 900,
"repetitions": 5,
"externalLoadGenerator": true
},
"metrics": [
"queue",
"TTFT",
"TPOT",
"E2E",
"Goodput",
"errors",
"memory",
"power",
"cost"
],
"quality": {
"dataset": "synthetic-public-fixture-v2",
"evaluator": "rule-plus-reviewed-sample",
"minimumPassRate": 0.98
}
}
Architecture decision record example
{
"decisionId": "ADR-AIRUNTIME-014",
"title": "Select the production LLM serving engine for long-context support workloads",
"status": "accepted",
"decidedAtUtc": "2026-06-21T00:00:00Z",
"scope": "Engine and model-server layer only; excludes gateway, workflow, tools, and policy.",
"requirements": [
"p95 TTFT \u2264 1.5s at 12 rps",
"p95 TPOT \u2264 45ms",
"32k context",
"private NVIDIA deployment",
"prefix reuse",
"OpenTelemetry export"
],
"candidates": [
"Candidate A 0.9",
"Candidate B 7.4",
"Candidate C 1.2"
],
"evidence": [
"benchmark-run-2026-06-15",
"failure-test-2026-06-16",
"security-review-2026-06-18"
],
"decision": "Candidate A 0.9",
"tradeoffs": [
"Higher operational tuning burden",
"No managed control plane"
],
"rollback": "Retain Candidate B image and artifact for one release window.",
"reviewTriggers": [
"Model architecture changes",
"Hardware migration",
"SLO or traffic distribution changes",
"Project support status changes"
]
}
Decision checklist
- Which fields are required before any model or tool work?
- How are protected inputs and outputs referenced?
- Which extensions are permitted without a major version?
- How are errors classified for retry and human review?
- How are tool versions and approvals pinned?
- How are UTC timestamps and trace context propagated?
- Which fixtures prove backward compatibility?
- Where are benchmark and decision records retained?
Common mistakes
- Using a provider chat-completion payload as the product contract.
- Putting permissions or tenant scope only in prompt text.
- Allowing arbitrary unversioned tool dictionaries.
- Returning exceptions or strings instead of stable errors.
- Marking timeouts retryable without side-effect knowledge.
- Changing field meaning in place.
- Logging full content when references are sufficient.
Sources and further reading
-
JSON Schema specification
(opens in a new tab)
-
OpenAPI Specification
(opens in a new tab)
-
Trace Context
(opens in a new tab)
-
OpenTelemetry semantic conventions
(opens in a new tab)
-
Model Context Protocol specification
(opens in a new tab)
Last reviewed: 2026-06-21 UTC
