Key takeaways
- Prompt text cannot be the final security boundary; deterministic controls must govern tool and data access.
- Treat retrieved content, model output, tool output, and third-party protocol servers as untrusted inputs.
- Identity must distinguish human actor, tenant, runtime service, delegated authority, and tool credential.
- Model, adapter, tokenizer, runtime, container, tool, and policy artifacts form one supply chain.
- Memory and telemetry require classification, provenance, retention, access, deletion, and incident procedures.
- Human approval is effective only when bound to an exact action, target, context, and expiry.
Runtime boundary
A useful architecture identifies what this layer receives, owns, emits, measures, and refuses to own. That boundary prevents overlapping products from being treated as interchangeable.
Receives
Identity, delegated authority, classified data, model/tool artifacts, runtime request, policy, approval requirements, and threat context.
Owns
Trust boundaries, least privilege, validation, isolation, artifact integrity, egress, retention, policy enforcement, and incident controls.
Emits
Access decisions, constrained execution, redacted results, audit events, provenance, incidents, and remediation evidence.
Does not own
Legal advice, authority inferred from model confidence, or security delegated to prompt wording.
Failure modes
Prompt injection, confused deputy, unauthorized tool use, exfiltration, model tampering, memory poisoning, tenant leakage, secret exposure, and untraceable actions.
Evidence and metrics
Authorization deny, approval, data-classification blocks, egress, integrity failure, sandbox violation, memory writes, redaction, incident detection, and recovery.
Trust boundaries and identities
A runtime crosses user, product, model provider, data source, tool, memory, telemetry, and administrative boundaries.
Implementation
Model actor, tenant, service, delegated authority, target resource, and credential separately; authenticate at every boundary.
Operational implications
Do not trust tenant or permission claims copied from prompt text or client JSON.
Measure
Authentication failure, tenant mismatch, service identity, delegated-scope use, and credential rotation.
Prompt injection and untrusted content
Retrieved documents, web pages, emails, tool outputs, and messages can contain instructions intended to override policy.
Implementation
Label untrusted data, minimize context, separate instructions from content, restrict tools, validate actions, and use independent policy/approval.
Operational implications
Detection is defense in depth, not a replacement for least privilege.
Measure
Injection detections, context blocks, denied tools, escalations, and incident outcome.
Tool authorization and confused deputy
The model can propose an action but cannot establish the actor’s authority.
Implementation
Resolve normalized action/target, required permission, tenant scope, side effects, idempotency, approval, rate, and budget before execution.
Operational implications
Use credentials scoped to the permitted operation; never expose raw secrets to the model.
Measure
Tool validation/deny, approval, privileged action, credential scope, and side-effect verification.
Data classification and egress
Context, prompts, outputs, caches, logs, and external calls can expose sensitive data.
Implementation
Classify sources and fields, filter context, control destinations, redact, tokenize or reference protected values, and audit egress.
Operational implications
Model providers and telemetry backends are separate destinations with separate policies.
Measure
Data-class blocks, redaction, outbound bytes/domain, policy violations, and retention.
Model and dependency supply chain
Models can include custom code, unsafe formats, malicious weights, vulnerable libraries, and unexpected license obligations.
Implementation
Use approved registries, hashes/signatures, provenance, scanning, isolated conversion, SBOMs, runtime compatibility, and reproducible build evidence.
Operational implications
Avoid loading arbitrary remote-code models in privileged serving processes.
Measure
Integrity failures, provenance completeness, vulnerabilities, license review, and artifact promotion.
Sandboxing and isolation
Tool code, generated code, parsers, converters, and model plugins may require containment.
Implementation
Use process/container/VM isolation, non-root execution, read-only filesystems, minimal mounts, seccomp/capability restrictions, network egress policy, and resource quotas.
Operational implications
Isolation strength must match side effects and attacker control; containers are not the only boundary.
Measure
Sandbox violations, syscalls/network denies, quota events, escape indicators, and cleanup.
Multi-tenant inference
Tenants can share accelerators, memory pools, caches, queues, models, admin APIs, and telemetry.
Implementation
Enforce identity-aware quotas, cache keys, data separation, namespace/RBAC, admin access, encrypted transport, and protected traces.
Operational implications
Performance isolation and data isolation are separate requirements.
Measure
Cross-tenant alerts, quota denies, cache-sharing attempts, queue fairness, and access audit.
Memory governance
Durable memory can amplify false or malicious content across future sessions.
Implementation
Require schemas, provenance, confidence, owner, write permission, review, expiry, deletion, and conflict handling.
Operational implications
Treat durable memory writes as side effects; do not store hidden reasoning or unrestricted raw content.
Measure
Writes by source, approval, rejection, conflicts, expiry/deletion, and poisoning detections.
Output validation and structured constraints
Runtime outputs may feed APIs, databases, UI, or automation.
Implementation
Use JSON Schema/typed contracts, allowlists, bounded values, citations/evidence, encoding, and downstream validation.
Operational implications
Structured output reduces syntax errors but does not prove truth or permission.
Measure
Contract-valid output, semantic rejection, citation verification, and downstream errors.
Human approval and irreversible actions
High-impact changes should pause with a reviewable proposal.
Implementation
Bind approval to exact arguments, target, version, evidence, side-effect class, expiry, and one-time token; verify reviewer authority.
Operational implications
Revalidate if the action changes after approval.
Measure
Approval time/rate, expired/replayed tokens, modified proposals, and post-action verification.
Incident response and replay
Response requires immutable versions, trace context, protected evidence, state changes, and side-effect records.
Implementation
Define containment, credential revocation, cache/memory invalidation, replay, correction, notification, and lessons-learned workflows.
Operational implications
Do not depend on unavailable raw prompts when policy forbids storing them; use safe references and hashes.
Measure
Detection/containment/recovery time, affected runs, replay completeness, and corrective actions.
Reference tables
| Threat | Boundary | Primary controls | Evidence |
|---|---|---|---|
| Prompt injection | Context → model/tool | Untrusted-data separation, least privilege, action validation | Context provenance and denied actions |
| Confused deputy | Model → tool | Actor/tenant authorization and scoped credentials | Policy decision and tool audit |
| Data exfiltration | Runtime → provider/tool/telemetry | Classification, egress allowlist, redaction | Destination and redaction events |
| Memory poisoning | Model/tool → durable memory | Schema, provenance, review, expiry | Memory-change record |
| Supply-chain compromise | Artifact → runtime | Registry, hash/signature, SBOM, sandbox | Provenance and integrity check |
| Cross-tenant leakage | Shared serving/cache/telemetry | Tenant keys, quotas, isolation, RBAC | Access/cache audit |
| Duplicate side effect | Retry → external system | Idempotency and authoritative outcome check | Invocation/result/compensation |
| Denial of wallet | Request loop → capacity/providers | Budgets, step/tool/token limits, backpressure | Budget decisions and termination |
Decision checklist
- Where are the runtime trust boundaries and identities authenticated?
- Which data classes can reach each model, tool, cache, and telemetry destination?
- How is tool authority determined independently of the prompt?
- Which artifacts can execute code and how are they verified?
- What isolation and egress controls match each side effect?
- How are tenant caches, queues, and traces separated?
- Which memory writes require review or expiry?
- What actions require human approval?
- Can an incident reconstruct versions, decisions, state changes, and side effects?
Common mistakes
- Relying on a system prompt as the security boundary.
- Passing production credentials into model context.
- Using one service identity for every tenant and tool.
- Sharing prefix cache across tenants without explicit policy.
- Loading unverified model artifacts or custom code.
- Logging full sensitive prompts and tool results.
- Writing model output directly into long-term memory.
- Approving an agent broadly rather than an exact action.
- Retrying ambiguous side effects without checking state.
Sources and further reading
-
OWASP Top 10 for LLM Applications
(opens in a new tab)
-
NIST AI Risk Management Framework
(opens in a new tab)
-
MITRE ATLAS
(opens in a new tab)
-
Model Context Protocol security
(opens in a new tab)
-
Supply-chain Levels for Software Artifacts
(opens in a new tab)
-
NIST Privacy Framework
(opens in a new tab)
Last reviewed: 2026-06-21 UTC
