Answer questions from approved enterprise sources while preserving tenant, classification, provenance, citation, and retention boundaries.
Key takeaways
- Primary risk: Sensitive retrieval, unsupported claims, stale sources, and cross-tenant disclosure.
- Keep authoritative domain state outside model memory.
- Measure task outcome, safe failure, and evidence—not output fluency alone.
Problem
Answer questions from approved enterprise sources while preserving tenant, classification, provenance, citation, and retention boundaries.
Principal risk: Sensitive retrieval, unsupported claims, stale sources, and cross-tenant disclosure.
Why runtime layers are needed
A single model invocation cannot reliably own identity, authorization, durable state, external side effects, recovery, or evidence. The runtime composes the necessary compiler/inference/serving path with application controls appropriate to this use case.
Reference architecture
- Authenticated user and tenant boundary
- Classification-aware retrieval broker
- Approved document/index sources with provenance
- Model router with region and provider constraints
- Citation and claim validator
- Evidence store with protected source references
- Enterprise system of record outside model memory
Request flow
- Admit the request and resolve user, tenant, role, purpose, and deadline.
- Classify the question and determine allowed source collections.
- Retrieve with source ID, version, publication/update time, classification, and access decision.
- Assemble a bounded context that preserves citations and excludes unauthorized passages.
- Select an allowed model route and generate a draft answer.
- Validate citations, unsupported claims, and requested output format.
- Require human review before external, legal, financial, or policy-significant use.
- Persist minimized evidence and apply memory policy; do not automatically memorize retrieved content.
Contracts
- Runtime request carries user/tenant, purpose, data classification, allowed sources, route constraints, citation policy, budget, and retention.
- Retrieval tool contract returns stable source references, version, classification, excerpts, and access-decision metadata.
- Output contract requires claim-to-source mapping and an explicit insufficient-evidence state.
Use the runtime request, tool, policy and approval, evidence, and trace schemas as versioned reference boundaries.
Failure modes
- No authorized sources
- Stale or conflicting source versions
- Retrieval timeout or partial index outage
- Model cites a source that does not support the claim
- Cross-tenant cache or memory leak
- Protected source appears in broad trace
- External publication occurs before review
Security considerations
- Apply document-level authorization at retrieval time, not only index time.
- Partition caches and memory by tenant and classification.
- Treat retrieved instructions as untrusted data to reduce indirect prompt injection.
- Keep source payloads out of general telemetry.
- Use approval for externally distributed or high-impact answers.
Observability
Correlate request, model route, context sources, tool operations, policy decisions, approvals, artifacts, failures, recovery, and domain outcome. Apply redaction and retention before exporting traces.
Evaluation and metrics
- Supported-claim rate
- Citation precision and coverage
- Unauthorized-source rate
- Cross-tenant disclosure rate
- Freshness compliance
- Time to supported answer
- Escalation and correction rate
- Evidence completeness
Implementation checklist
- Define authoritative source collections and owners.
- Specify stale, conflicting, and unavailable-source behavior.
- Test prompt injection inside documents.
- Test permissions at document, section, and field level.
- Provide source access and correction links.
- Use deterministic search or templated reporting when generation adds no value.
