Download Framework
OWASP / Agentic Top 10
A risk framework derived from the OWASP Top 10 for Agentic Applications 2026 (ASI Top 10), published by the OWASP GenAI Security Project – Agentic Security Initiative. It identifies the ten highest-impact security risks specific to AI agent systems that plan, decide, and act autonomously across multiple steps and systems, together with associated controls to reduce or eliminate those risks.
Type:
Industry
Domain:
Cybersecurity
Agentic
Coverage:
Accountability & Governance
Safety & Reputational Harm
Tags:
GenAI
Content:
10 Risks
41 Controls
Version: 2026
Framework Definition
Risks and controls associated with the framework
Assessment Layer
Concrete evaluations linked to controls to assess pass or fail
No evaluation mapping defined yet.
RISK
Agent Goal Hijack
Risk that adversaries manipulate an AI agent's objectives, task selection, or decision pathways — through prompt injection, deceptive tool outputs, malicious artefacts, forged agent-to-agent messages, or poisoned external data — causing the agent to redirect its autonomous, multi-step behaviour toward unintended or harmful outcomes, resulting in data exfiltration, financial loss, operational disruption, or reputational harm.
CONTROL
Treat All Natural-Language Inputs as Untrusted
Ensure that all natural-language inputs — including user-provided text, uploaded documents, retrieved content, emails, calendar entries, and peer-agent messages — are routed through input validation and prompt-injection safeguards before they can influence goal selection, planning, or tool calls.
CONTROL
Define, Lock, and Version-Control Agent System Prompts
Ensure that agent system prompts are explicitly defined with locked goal priorities and permitted actions, placed under configuration management, and that any changes to goals or reward definitions require human approval.
CONTROL
Validate User and Agent Intent Before Executing Goal-Changing Actions
Ensure that both user intent and agent intent are validated at runtime before executing goal-changing or high-impact actions, requiring confirmation via human approval, policy engine, or platform guardrails whenever the agent proposes actions that deviate from the original task or scope.
CONTROL
Establish Behavioral Baselines and Monitor for Goal Drift
Ensure that comprehensive logging and continuous monitoring of agent activity is maintained, establishing a behavioral baseline that includes goal state, tool-use patterns, and invariant properties, with alerting on unexpected goal changes, anomalous tool sequences, or deviations from the established baseline.
RISK
Tool Misuse and Exploitation
Risk that AI agents misuse legitimate tools — due to prompt injection, misalignment, unsafe delegation, or ambiguous instructions — leading to data exfiltration, tool output manipulation, workflow hijacking, destructive actions, or financial loss, even when the agent operates within its authorized privilege boundaries.
CONTROL
Define and Enforce Per-Tool Least-Privilege Profiles
Ensure that each tool available to an LLM agent has a defined least-privilege profile specifying permitted scopes, maximum invocation rate, and egress allowlists, expressed as IAM or authorization policy configurations rather than ad-hoc conventions.
CONTROL
Require Action-Level Authentication and Human Approval for Destructive Operations
Ensure that explicit authentication is required for each tool invocation and that human confirmation is mandated for high-impact or destructive actions (such as delete, transfer, or publish), with a pre-execution plan or dry-run preview presented before approval.
CONTROL
Run Tool Execution in Sandboxes with Egress Controls
Ensure that tool and code execution occurs in isolated sandboxes with enforced outbound network allowlists, denying all non-approved network destinations to contain the impact of misuse or injection.
CONTROL
Enforce Semantic and Identity Validation of Tool Calls
Ensure that fully qualified tool names and version pins are enforced to prevent tool alias collisions and typo-squatting, and that the intended semantics of tool calls are validated rather than relying on syntax alone, with ambiguous resolutions failing closed.
CONTROL
Maintain Immutable Logs of Tool Invocations and Monitor for Anomalous Chaining
Ensure that immutable logs of all tool invocations and parameter changes are maintained, with continuous monitoring for anomalous execution rates, unusual tool-chaining patterns, and policy violations.
RISK
Identity and Privilege Abuse
Risk that the architectural mismatch between user-centric identity systems and agentic design is exploited — through unscoped privilege inheritance, memory-based credential retention, cross-agent trust exploitation, TOCTOU authorization drift, or synthetic identity injection — enabling escalation of access, hijacking of privileges, or execution of unauthorised actions across interconnected systems.
CONTROL
Issue Task-Scoped, Time-Bound Credentials per Agent
Ensure that each agent is issued short-lived, narrowly scoped credentials (e.g., mTLS certificates or scoped tokens) per task, with permission boundaries that limit blast radius and block delegated-abuse, un-scoped inheritance, and orphaned privileges.
CONTROL
Isolate Agent Identities and Wipe Context Between Sessions
Ensure that agents run in per-session sandboxes with separated permissions and memory, with state wiped between tasks and users to prevent memory-based credential escalation and cross-repository data exfiltration.
CONTROL
Re-Verify Permissions at Each Privileged Step via a Centralised Policy Engine
Ensure that each privileged step in an agent workflow is re-verified by a centralised policy engine that checks external data, preventing cross-agent trust exploitation and reflection-loop privilege elevation.
CONTROL
Detect and Alert on Delegated and Transitive Permission Changes
Ensure that monitoring is in place to detect when an agent gains new permissions indirectly through delegation chains, flagging cases where a low-privilege agent inherits or is handed higher-privilege scopes during multi-agent workflows, and alerting on abnormal cross-agent privilege elevation.
RISK
Agentic Supply Chain Vulnerabilities
Risk that agents, tools, and artefacts sourced from third parties — including models, plug-ins, datasets, other agents, MCP servers, agent registries, agentic communication interfaces (MCP, A2A), or update channels — are malicious, compromised, or tampered with, introducing unsafe code, hidden instructions, or deceptive behaviours into the agent's execution chain, which can cascade vulnerabilities across multi-agent systems at runtime.
CONTROL
Sign, Attest, and Maintain Inventory of All Agentic Components
Ensure that manifests, prompts, and tool definitions are signed and attested, that SBOMs and AIBOMs with periodic attestations are operationalised for all AI components, and that only curated registries are used with untrusted sources blocked.
CONTROL
Allowlist, Pin, and Scan Agentic Dependencies for Typo-Squatting
Ensure that all agentic dependencies (packages, tool adapters, prompt templates) are allowlisted, pinned by content hash and commit ID, and scanned for typo-squatting across registries (PyPI, npm, LangChain, LlamaIndex), with automatic rejection of unsigned or unverified components.
CONTROL
Enforce Mutual Authentication and Message Signing for Inter-Agent Interfaces
Ensure that inter-agent communication interfaces (MCP, A2A) enforce mutual authentication and attestation via PKI and mTLS, with no open registration, and that all inter-agent messages are signed and verified.
CONTROL
Implement a Supply Chain Kill Switch for Emergency Revocation
Ensure that emergency revocation mechanisms are in place that can instantly disable specific tools, prompts, or agent connections across all deployments when a compromise is detected, preventing cascading damage across the agent ecosystem.
RISK
Unexpected Code Execution
Risk that agentic systems generating and executing code are exploited — through prompt injection, tool misuse, unsafe serialisation, or malicious package installation — to achieve remote code execution, local system misuse, container escape, or exploitation of internal systems, bypassing traditional security controls because the code is generated in real-time by the agent.
CONTROL
Prohibit eval() and Enforce Safe Interpreter Usage in Production Agents
Ensure that unsafe evaluation functions (eval(), exec()) are prohibited in production agent environments, safe interpreters with taint-tracking are used instead, and static scans are performed before any agent-generated code is executed.
CONTROL
Run Agent Code Execution in Hardened Sandboxes with Minimal Privilege
Ensure that agent-generated code is executed in sandboxed containers running without root privileges, with strict network access limits, filesystem access restricted to dedicated working directories, and known-vulnerable packages blocked.
CONTROL
Separate Code Generation from Execution with Validation Gates
Ensure that code generation and code execution are architecturally separated by validation gates, with human approval required for elevated or production-system execution runs, and that direct agent-to-production system access is prevented.
RISK
Memory and Context Poisoning
Risk that adversaries corrupt or seed an agent's stored context — including conversation history, memory tools, embeddings, and RAG stores — with malicious or misleading data, causing future reasoning, planning, or tool use to become biased, unsafe, or to facilitate exfiltration, with effects that persist across sessions, propagate between cooperating agents, and resist remediation once embedded.
CONTROL
Validate and Scan All Memory Writes for Malicious or Sensitive Content
Ensure that all new memory writes and model outputs are scanned using rule-based and AI-based methods for malicious instructions or sensitive content before being committed to agent memory, preventing injection of persistent adversarial payloads.
CONTROL
Segment Memory by User Session and Domain Context
Ensure that agent memory is logically segmented by user session and domain context to prevent knowledge leakage and sensitive data cross-contamination between users, tenants, or tasks, using per-tenant namespaces in shared vector and memory stores.
CONTROL
Expire Unverified Memory and Maintain Rollback Capability
Ensure that unverified memory entries are expired after a defined period to limit poison persistence, that snapshots and version control are maintained for rollback, and that high-impact memory entries require a provenance score or human-verified tag before surfacing.
CONTROL
Prevent Re-Ingestion of Agent-Generated Output into Trusted Memory
Ensure that agent-generated outputs are not automatically re-ingested into trusted memory stores without validation, preventing self-reinforcing contamination and bootstrap poisoning scenarios.
RISK
Insecure Inter-Agent Communication
Risk that inter-agent message exchanges lack proper authentication, integrity, confidentiality, or semantic validation — enabling interception, spoofing, replay, protocol downgrade, or semantic manipulation of agent messages across transport, routing, discovery, and semantic layers — leading to misinformation, privilege confusion, or coordinated manipulation across distributed agentic systems.
CONTROL
Enforce End-to-End Encryption and Mutual Authentication for Agent Channels
Ensure that all inter-agent communications use end-to-end encryption with per-agent credentials and mutual authentication, enforcing PKI certificate pinning, forward secrecy, and regular protocol reviews to prevent interception or spoofing.
CONTROL
Digitally Sign and Semantically Validate Inter-Agent Messages
Ensure that all inter-agent messages are digitally signed with both payload and context hashed, and that natural-language-aware sanitisation and intent-diffing are applied to detect goal manipulation, parameter tampering, and hidden instructions within messages.
CONTROL
Enforce Protocol Pinning and Reject Downgrade Attempts
Ensure that allowed protocol versions (MCP, A2A, gRPC) are defined and enforced, that downgrade attempts and unrecognised schemas are rejected, and that both peers are required to advertise matching capability and version fingerprints.
CONTROL
Authenticate Discovery and Coordination Messages via Attested Registries
Ensure that all discovery and coordination messages are authenticated using cryptographic identity, and that agent registries or marketplaces provide digital attestation of agent identity, provenance, and descriptor integrity with signed agent cards and continuous verification before accepting coordination traffic.
RISK
Cascading Failures
Risk that a single fault — such as a hallucination, malicious input, corrupted tool, or poisoned memory — propagates and amplifies across autonomous agents, tools, and workflows, bypassing stepwise human checks to cause system-wide harm including widespread service failures, cross-domain data compromise, operational disruption, and consequences that outpace human ability to intervene.
CONTROL
Implement Blast-Radius Guardrails Between Planner and Executor
Ensure that blast-radius guardrails — including quotas, progress caps, and circuit breakers — are implemented between planner and executor components to limit the propagation scope of any single agent fault.
CONTROL
Enforce Independent Policy Engines to Separate Planning from Execution
Ensure that planning and execution are separated via an external policy engine that independently validates planned actions, preventing corrupt or manipulated planning from directly triggering harmful downstream operations.
CONTROL
Implement Human Oversight Gates Before Propagating High-Risk Agent Outputs
Ensure that checkpoints requiring human review or governance agent validation are placed before high-risk agent outputs are propagated downstream to other agents, tools, or systems, with rate limiting applied to detect and throttle fast-spreading commands.
CONTROL
Record Tamper-Evident Logs with Non-Repudiation Across All Agent Actions
Ensure that all inter-agent messages, policy decisions, and execution outcomes are recorded in tamper-evident, time-stamped logs bound to cryptographic agent identities, maintaining lineage metadata for every propagated action to support forensic traceability and rollback validation during cascade events.
RISK
Human-Agent Trust Exploitation
Risk that adversaries or misaligned agent designs exploit the natural language fluency, emotional intelligence, and perceived expertise of AI agents to manipulate human users into disclosing sensitive information, approving harmful actions, or making unsafe decisions — leveraging automation bias, authority bias, and fabricated rationales to bypass human oversight — resulting in data breaches, financial losses, and reputational harm, with the agent's role invisible to forensic investigation.
CONTROL
Require Multi-Step Human Confirmation for Sensitive or Irreversible Actions
Ensure that multi-step approval or human-in-the-loop controls are required before agents access extra-sensitive data or perform irreversible actions, with immutable tamper-proof audit records of all user queries and agent actions maintained for forensic purposes.
CONTROL
Provide Plain-Language Risk Summaries and Enable Reporting of Suspicious Behaviour
Ensure that in user-interactive systems, plain-language risk summaries (not model-generated rationales) are displayed for high-impact actions, and that users have a clear mechanism to flag suspicious or manipulative agent behaviour, triggering automated review or temporary capability lockdown.
CONTROL
Implement Adaptive Trust Calibration and Human-Factors UI Safeguards
Ensure that agent autonomy and required human oversight are continuously adjusted based on contextual risk scoring, and that UI safeguards visually differentiate high-risk recommendations, with personnel trained to recognise manipulation patterns and agent limitations.
CONTROL
Detect Plan Divergence from Approved Workflow Baselines
Ensure that agent action sequences are continuously compared against approved workflow baselines, with alerts triggered when unusual detours, skipped validation steps, or novel tool combinations indicate possible deception, drift, or manipulation.
RISK
Rogue Agents
Risk that AI agents become malicious or compromised and deviate from their intended function or authorised scope — through goal drift, workflow hijacking, reward hacking, or self-replication — acting harmfully, deceptively, or parasitically within multi-agent or human-agent ecosystems, with individually legitimate-appearing actions whose emergent behaviour escapes detection by traditional rule-based controls.
CONTROL
Maintain Comprehensive Immutable Audit Logs of All Agent Actions
Ensure that comprehensive, immutable, and signed audit logs are maintained for all agent actions, tool calls, and inter-agent communication, enabling review for stealth infiltration, unapproved delegation, and coordinated collusion patterns.
CONTROL
Assign Trust Zones and Deploy Restricted Execution Environments
Ensure that agents are assigned to trust zones with strict inter-zone communication rules and deployed in restricted execution environments (e.g., container sandboxes) with API scopes based on least privilege, with suspicious agents quarantined in isolated environments for forensic review.
CONTROL
Deploy Behavioural Detection and Watchdog Agents
Ensure that behavioural detection is deployed — including watchdog agents that validate peer behaviour and outputs — focusing on detecting collusion patterns, coordinated false signals, and anomalous action executions such as excessive or unexpected tool invocations.
CONTROL
Implement Kill Switches and Credential Revocation for Rapid Agent Containment
Ensure that rapid containment mechanisms — including kill switches and credential revocation — are in place to instantly disable rogue agents, with a recovery and reintegration process requiring fresh attestation, dependency verification, and human approval before an agent is returned to production.
CONTROL
Enforce Per-Agent Cryptographic Identity Attestation and Behavioural Integrity Baselines
Ensure that each agent has a cryptographic identity with signed behavioural manifests declaring expected capabilities, tools, and goals, validated by orchestration services before each action, with a continuous behavioural verification layer that monitors for deviations from the declared manifest and ephemeral per-run credentials mediated by orchestrators.