Security and Governance

Key takeaways

Prompt text cannot be the final security boundary; deterministic controls must govern tool and data access.
Treat retrieved content, model output, tool output, and third-party protocol servers as untrusted inputs.
Identity must distinguish human actor, tenant, runtime service, delegated authority, and tool credential.
Model, adapter, tokenizer, runtime, container, tool, and policy artifacts form one supply chain.
Memory and telemetry require classification, provenance, retention, access, deletion, and incident procedures.
Human approval is effective only when bound to an exact action, target, context, and expiry.

Runtime boundary

A useful architecture identifies what this layer receives, owns, emits, measures, and refuses to own. That boundary prevents overlapping products from being treated as interchangeable.

Receives

Identity, delegated authority, classified data, model/tool artifacts, runtime request, policy, approval requirements, and threat context.

Owns

Trust boundaries, least privilege, validation, isolation, artifact integrity, egress, retention, policy enforcement, and incident controls.

Emits

Access decisions, constrained execution, redacted results, audit events, provenance, incidents, and remediation evidence.

Does not own

Legal advice, authority inferred from model confidence, or security delegated to prompt wording.

Failure modes

Prompt injection, confused deputy, unauthorized tool use, exfiltration, model tampering, memory poisoning, tenant leakage, secret exposure, and untraceable actions.

Evidence and metrics

Authorization deny, approval, data-classification blocks, egress, integrity failure, sandbox violation, memory writes, redaction, incident detection, and recovery.

Trust boundaries and identities

A runtime crosses user, product, model provider, data source, tool, memory, telemetry, and administrative boundaries.

Implementation

Model actor, tenant, service, delegated authority, target resource, and credential separately; authenticate at every boundary.

Operational implications

Do not trust tenant or permission claims copied from prompt text or client JSON.

Measure

Authentication failure, tenant mismatch, service identity, delegated-scope use, and credential rotation.

Prompt injection and untrusted content

Retrieved documents, web pages, emails, tool outputs, and messages can contain instructions intended to override policy.

Implementation

Label untrusted data, minimize context, separate instructions from content, restrict tools, validate actions, and use independent policy/approval.

Operational implications

Detection is defense in depth, not a replacement for least privilege.

Measure

Injection detections, context blocks, denied tools, escalations, and incident outcome.

Tool authorization and confused deputy

The model can propose an action but cannot establish the actor’s authority.

Implementation

Resolve normalized action/target, required permission, tenant scope, side effects, idempotency, approval, rate, and budget before execution.

Operational implications

Use credentials scoped to the permitted operation; never expose raw secrets to the model.

Measure

Tool validation/deny, approval, privileged action, credential scope, and side-effect verification.

Data classification and egress

Context, prompts, outputs, caches, logs, and external calls can expose sensitive data.

Implementation

Classify sources and fields, filter context, control destinations, redact, tokenize or reference protected values, and audit egress.

Operational implications

Model providers and telemetry backends are separate destinations with separate policies.

Measure

Data-class blocks, redaction, outbound bytes/domain, policy violations, and retention.

Model and dependency supply chain

Models can include custom code, unsafe formats, malicious weights, vulnerable libraries, and unexpected license obligations.

Implementation

Use approved registries, hashes/signatures, provenance, scanning, isolated conversion, SBOMs, runtime compatibility, and reproducible build evidence.

Operational implications

Avoid loading arbitrary remote-code models in privileged serving processes.

Measure

Integrity failures, provenance completeness, vulnerabilities, license review, and artifact promotion.

Sandboxing and isolation

Tool code, generated code, parsers, converters, and model plugins may require containment.

Implementation

Use process/container/VM isolation, non-root execution, read-only filesystems, minimal mounts, seccomp/capability restrictions, network egress policy, and resource quotas.

Operational implications

Isolation strength must match side effects and attacker control; containers are not the only boundary.

Measure

Sandbox violations, syscalls/network denies, quota events, escape indicators, and cleanup.

Multi-tenant inference

Tenants can share accelerators, memory pools, caches, queues, models, admin APIs, and telemetry.

Implementation

Enforce identity-aware quotas, cache keys, data separation, namespace/RBAC, admin access, encrypted transport, and protected traces.

Operational implications

Performance isolation and data isolation are separate requirements.

Measure

Cross-tenant alerts, quota denies, cache-sharing attempts, queue fairness, and access audit.

Memory governance

Durable memory can amplify false or malicious content across future sessions.

Implementation

Require schemas, provenance, confidence, owner, write permission, review, expiry, deletion, and conflict handling.

Operational implications

Treat durable memory writes as side effects; do not store hidden reasoning or unrestricted raw content.

Measure

Writes by source, approval, rejection, conflicts, expiry/deletion, and poisoning detections.

Output validation and structured constraints

Runtime outputs may feed APIs, databases, UI, or automation.

Implementation

Use JSON Schema/typed contracts, allowlists, bounded values, citations/evidence, encoding, and downstream validation.

Operational implications

Structured output reduces syntax errors but does not prove truth or permission.

Measure

Contract-valid output, semantic rejection, citation verification, and downstream errors.

Human approval and irreversible actions

High-impact changes should pause with a reviewable proposal.

Implementation

Bind approval to exact arguments, target, version, evidence, side-effect class, expiry, and one-time token; verify reviewer authority.

Operational implications

Revalidate if the action changes after approval.

Measure

Approval time/rate, expired/replayed tokens, modified proposals, and post-action verification.

Incident response and replay

Response requires immutable versions, trace context, protected evidence, state changes, and side-effect records.

Implementation

Define containment, credential revocation, cache/memory invalidation, replay, correction, notification, and lessons-learned workflows.

Operational implications

Do not depend on unavailable raw prompts when policy forbids storing them; use safe references and hashes.

Measure

Detection/containment/recovery time, affected runs, replay completeness, and corrective actions.

Reference tables

Runtime threat and control map
Threat	Boundary	Primary controls	Evidence
Prompt injection	Context → model/tool	Untrusted-data separation, least privilege, action validation	Context provenance and denied actions
Confused deputy	Model → tool	Actor/tenant authorization and scoped credentials	Policy decision and tool audit
Data exfiltration	Runtime → provider/tool/telemetry	Classification, egress allowlist, redaction	Destination and redaction events
Memory poisoning	Model/tool → durable memory	Schema, provenance, review, expiry	Memory-change record
Supply-chain compromise	Artifact → runtime	Registry, hash/signature, SBOM, sandbox	Provenance and integrity check
Cross-tenant leakage	Shared serving/cache/telemetry	Tenant keys, quotas, isolation, RBAC	Access/cache audit
Duplicate side effect	Retry → external system	Idempotency and authoritative outcome check	Invocation/result/compensation
Denial of wallet	Request loop → capacity/providers	Budgets, step/tool/token limits, backpressure	Budget decisions and termination

Decision checklist

Where are the runtime trust boundaries and identities authenticated?
Which data classes can reach each model, tool, cache, and telemetry destination?
How is tool authority determined independently of the prompt?
Which artifacts can execute code and how are they verified?
What isolation and egress controls match each side effect?
How are tenant caches, queues, and traces separated?
Which memory writes require review or expiry?
What actions require human approval?
Can an incident reconstruct versions, decisions, state changes, and side effects?

Common mistakes

Relying on a system prompt as the security boundary.
Passing production credentials into model context.
Using one service identity for every tenant and tool.
Sharing prefix cache across tenants without explicit policy.
Loading unverified model artifacts or custom code.
Logging full sensitive prompts and tool results.
Writing model output directly into long-term memory.
Approving an agent broadly rather than an exact action.
Retrying ambiguous side effects without checking state.

Sources and further reading

OWASP Top 10 for LLM Applications
(opens in a new tab)

OWASP GenAI Security Project · Official documentation · accessed 2026-06-21 UTC
NIST AI Risk Management Framework
(opens in a new tab)

NIST · Government framework · accessed 2026-06-21 UTC
MITRE ATLAS
(opens in a new tab)

MITRE · Threat knowledge base · accessed 2026-06-21 UTC
Model Context Protocol security
(opens in a new tab)

MCP · Official documentation · accessed 2026-06-21 UTC
Supply-chain Levels for Software Artifacts
(opens in a new tab)

OpenSSF · Supply-chain specification · accessed 2026-06-21 UTC
NIST Privacy Framework
(opens in a new tab)

NIST · Government framework · accessed 2026-06-21 UTC

Last reviewed: 2026-06-21 UTC

Key takeaways

Runtime boundary

Receives

Owns

Emits

Does not own

Failure modes

Evidence and metrics

Trust boundaries and identities

Implementation

Operational implications

Measure

Prompt injection and untrusted content

Implementation

Operational implications

Measure

Tool authorization and confused deputy

Implementation

Operational implications

Measure

Data classification and egress

Implementation

Operational implications

Measure

Model and dependency supply chain

Implementation

Operational implications

Measure

Sandboxing and isolation

Implementation

Operational implications

Measure

Multi-tenant inference

Implementation

Operational implications

Measure

Memory governance

Implementation

Operational implications

Measure

Output validation and structured constraints

Implementation

Operational implications

Measure

Human approval and irreversible actions

Implementation

Operational implications

Measure

Incident response and replay

Implementation

Operational implications

Measure

Reference tables

Decision checklist

Common mistakes

Sources and further reading

Maintenance record