AI runtime security assumes that model output, retrieved content, tool results, and other agents can be incorrect or adversarial. Controls therefore sit around execution rather than relying on the model to protect itself.
Key takeaways
- Treat all model-generated actions as proposals requiring typed validation and authorization.
- Use identity, least privilege, isolation, egress control, idempotency, approval, and evidence as independent controls.
- Separate established production controls from experimental defenses and model-based monitors.
[ar_threat_matrix]
Runtime threat model
Threats include direct and indirect prompt injection, overprivileged tools, credential leakage, data exfiltration, runaway loops, resource exhaustion, poisoned memory, silent side effects, compromised connectors, cross-agent trust failure, supply-chain compromise, and evidence tampering. Threat modeling identifies assets, trust boundaries, actor capabilities, data flow, and safe failure.
Prompt injection path
An indirect injection can enter through a webpage, email, document, database field, or tool response. It becomes dangerous when untrusted content shares a context with privileged instructions and the same agent can invoke consequential tools. The runtime reduces blast radius by labeling provenance, separating data from authority, limiting tools, validating calls, and requiring approval. Content filters alone cannot establish authorization. [ar_cite id=”owasp-prompt” label=”OWASP prompt injection guidance”]
Identity and dynamic least privilege
Resolve human, workload, service, and delegated identity. Bind every tool call to actor, tenant, task, resource, action, and expiry. Use vault references or token exchange so credentials are created just in time and never placed in model context. Denied actions do not trigger automatic privilege expansion.
Sandbox and egress
Code execution and untrusted transformations run in isolated processes, containers, microVMs, or other bounded environments appropriate to risk. Use read-only base images, ephemeral filesystems, resource quotas, syscall/process restrictions, and default-deny network egress. Isolation does not replace tool authorization or data minimization.
Tool and connector controls
- Typed input and output schema
- Permission and side-effect classification
- Resource allowlist and data-classification rules
- Timeout, rate, concurrency, and budget limits
- Idempotency and compensation
- Connector provenance, version, and health
- Result validation and unexpected-side-effect detection
Memory security
Memory writes are untrusted until validated. Record source, scope, confidence, consent, retention, and correction path. Prevent cross-tenant retrieval, privilege-bearing memories, secret retention, and automatic promotion of tool output. Memory deletion should remove derived indexes and references according to policy.
Human oversight
Use approval for irreversible, high-impact, financial, external communication, administrative, or ambiguous actions. Present concrete proposed changes and evidence rather than a vague “approve agent” prompt. Out-of-band approval can reduce the risk of a compromised conversational channel.
Evidence and incident response
Evidence supports detection and review but can itself contain sensitive data. Store minimized records, access decisions, artifact hashes, and protected references. Incident response needs kill switches, credential revocation, task cancellation, connector disablement, evidence preservation, notification, and correction of poisoned memory.
Governance ownership
| Concern | Accountable owner |
|---|---|
| Legitimate purpose and user experience | Product/business owner |
| Runtime architecture and service objectives | Platform engineering |
| Identity, secrets, isolation, incident response | Security |
| Data classification, retention, and rights | Data/privacy governance |
| Tool side effects and compensation | Tool/domain owner |
| Model and evaluation suitability | ML/application owner |
| Approval authority | Named operational or business role |
Production versus research
Least privilege, schema validation, sandboxing, egress control, human approval, rate limits, checkpoints, and audit logging are established production practices. Reliable prompt-injection detection, model-based runtime monitors, formal guarantees for stochastic agents, autonomous self-repair, and broad cross-agent trust remain active research. Deploy research controls only as defense in depth, not as a sole safety boundary.
