Security and Governance

AI runtime security assumes that model output, retrieved content, tool results, and other agents can be incorrect or adversarial. Controls therefore sit around execution rather than relying on the model to protect itself.

Key takeaways

Treat all model-generated actions as proposals requiring typed validation and authorization.
Use identity, least privilege, isolation, egress control, idempotency, approval, and evidence as independent controls.
Separate established production controls from experimental defenses and model-based monitors.

[ar_threat_matrix]

Runtime threat model

Threats include direct and indirect prompt injection, overprivileged tools, credential leakage, data exfiltration, runaway loops, resource exhaustion, poisoned memory, silent side effects, compromised connectors, cross-agent trust failure, supply-chain compromise, and evidence tampering. Threat modeling identifies assets, trust boundaries, actor capabilities, data flow, and safe failure.

Prompt injection path

An indirect injection can enter through a webpage, email, document, database field, or tool response. It becomes dangerous when untrusted content shares a context with privileged instructions and the same agent can invoke consequential tools. The runtime reduces blast radius by labeling provenance, separating data from authority, limiting tools, validating calls, and requiring approval. Content filters alone cannot establish authorization. [ar_cite id=”owasp-prompt” label=”OWASP prompt injection guidance”]

Identity and dynamic least privilege

Resolve human, workload, service, and delegated identity. Bind every tool call to actor, tenant, task, resource, action, and expiry. Use vault references or token exchange so credentials are created just in time and never placed in model context. Denied actions do not trigger automatic privilege expansion.

Sandbox and egress

Code execution and untrusted transformations run in isolated processes, containers, microVMs, or other bounded environments appropriate to risk. Use read-only base images, ephemeral filesystems, resource quotas, syscall/process restrictions, and default-deny network egress. Isolation does not replace tool authorization or data minimization.

Tool and connector controls

Typed input and output schema
Permission and side-effect classification
Resource allowlist and data-classification rules
Timeout, rate, concurrency, and budget limits
Idempotency and compensation
Connector provenance, version, and health
Result validation and unexpected-side-effect detection

Memory security

Memory writes are untrusted until validated. Record source, scope, confidence, consent, retention, and correction path. Prevent cross-tenant retrieval, privilege-bearing memories, secret retention, and automatic promotion of tool output. Memory deletion should remove derived indexes and references according to policy.

Human oversight

Use approval for irreversible, high-impact, financial, external communication, administrative, or ambiguous actions. Present concrete proposed changes and evidence rather than a vague “approve agent” prompt. Out-of-band approval can reduce the risk of a compromised conversational channel.

Evidence and incident response

Evidence supports detection and review but can itself contain sensitive data. Store minimized records, access decisions, artifact hashes, and protected references. Incident response needs kill switches, credential revocation, task cancellation, connector disablement, evidence preservation, notification, and correction of poisoned memory.

Governance ownership

Example ownership matrix
Concern	Accountable owner
Legitimate purpose and user experience	Product/business owner
Runtime architecture and service objectives	Platform engineering
Identity, secrets, isolation, incident response	Security
Data classification, retention, and rights	Data/privacy governance
Tool side effects and compensation	Tool/domain owner
Model and evaluation suitability	ML/application owner
Approval authority	Named operational or business role

Production versus research

Least privilege, schema validation, sandboxing, egress control, human approval, rate limits, checkpoints, and audit logging are established production practices. Reliable prompt-injection detection, model-based runtime monitors, formal guarantees for stochastic agents, autonomous self-repair, and broad cross-agent trust remain active research. Deploy research controls only as defense in depth, not as a sole safety boundary.

Find runtime definitions and implementation guidance