Download Framework
OWASP / LLM Top 10
A risk framework derived from the OWASP Top 10 for Large Language Model (LLM) and Generative AI Applications (2025 edition). It identifies the ten most critical security risks associated with developing, deploying, and operating LLM-based systems, together with associated controls to reduce or eliminate those risks. Controls have been refined against observed vendor implementations to reflect what is practically enforceable.
Type:
Other
Domain:
Cybersecurity
Coverage:
Accountability & Governance
Safety & Reputational Harm
Privacy & Data
Performance & Reliability
Tags:
GenAI
Content:
10 Risks
40 Controls
Version: 2025.1
Framework Definition
Risks and controls associated with the framework
Assessment Layer
Concrete evaluations linked to controls to assess pass or fail
No evaluation mapping defined yet.
RISK
Prompt Injection
Risk that adversarial or inadvertent user prompts alter the LLM's behaviour or outputs in unintended ways — including bypassing safety guidelines, generating harmful content, enabling unauthorised access, or influencing critical decisions — resulting in security breaches, reputational damage, and operational harm.
CONTROL
Constrain Model Behaviour via System Prompt
Ensure that the system prompt provides specific instructions about the model's role, capabilities, and limitations, enforcing strict context adherence and instructing the model to ignore attempts to modify core instructions.
CONTROL
Implement Input Filtering and Prompt Inspection
Ensure that all user-supplied and external content is inspected before being passed to the model, applying semantic filters, string-checking rules, and policy-based classifiers to detect and block prompt injection attempts, jailbreak patterns, and policy violations.
CONTROL
Implement Output Filtering and Response Validation
Ensure that model-generated responses are validated and filtered before being returned to users or passed to downstream systems, checking for sensitive data disclosure, policy violations, toxic content, and hallucinations, independently of the model's own safety mechanisms.
CONTROL
Enforce Least Privilege Access for LLM Extensions
Ensure that the LLM application and its extensions are granted only the minimum permissions necessary for intended operations; extensible functionality should be handled in code rather than delegated to the model.
CONTROL
Require Human Approval for High-Risk Actions
Ensure that human-in-the-loop controls are implemented for privileged or high-impact operations to prevent unauthorised autonomous actions by the LLM.
CONTROL
Conduct Adversarial Testing and Attack Simulations
Ensure that regular penetration testing and adversarial simulations are performed against LLM systems, treating the model as an untrusted user to validate the effectiveness of trust boundaries and access controls.
CONTROL
Test Guardrails for Bypass Resistance
Ensure that deployed guardrails, safety filters, and content moderation systems are themselves subjected to regular adversarial bypass testing — including character injection, encoding obfuscation, multilingual attacks, and multi-turn manipulation — to verify that protections cannot be circumvented. Having guardrails in place is not sufficient if those guardrails can be bypassed.
RISK
Sensitive Information Disclosure
Risk that LLM applications inadvertently expose personal identifiable information (PII), financial records, health data, confidential business information, security credentials, or proprietary algorithms through model outputs, resulting in unauthorised data access, privacy violations, and intellectual property breaches.
CONTROL
Implement Data Sanitisation Before Training
Ensure that data sanitisation techniques, including scrubbing and masking of sensitive content, are applied before any user data enters the training model, preventing future disclosure through model outputs.
CONTROL
Enforce Strict Access Controls on Data Sources
Ensure that access to sensitive data is limited based on the principle of least privilege, and that model access to external data sources is restricted to prevent unintended data leakage at runtime.
CONTROL
Maintain Clear Data Usage and Retention Policies
Ensure that transparent policies on data retention, usage, and deletion are established and communicated to users, including the option to opt out of having their data used in model training.
CONTROL
Conceal and Protect System Configuration Details
Ensure that system prompts and internal configuration details are not exposed to end users, and that secure system configuration best practices are followed to prevent sensitive information leakage through error messages or settings.
RISK
LLM Supply Chain Compromise
Risk that vulnerabilities in third-party components, pre-trained models, datasets, fine-tuning adapters, MCP servers, tool plugins, or deployment platforms — including tampering, poisoning, or inadequate provenance — compromise the integrity, security, or legal compliance of LLM applications, resulting in biased or malicious outputs, system failures, and regulatory exposure.
CONTROL
Vet and Continuously Monitor Third-Party Suppliers
Ensure that all data sources, model providers, and software suppliers are rigorously vetted, including review of terms and conditions and privacy policies, and that their security posture is regularly re-assessed.
CONTROL
Maintain a Software and AI Bill of Materials (SBOM / AI-BOM)
Ensure that an up-to-date inventory of all components, models, datasets, and dependencies is maintained using a Software Bill of Materials (SBOM) or AI-BOM, enabling rapid detection of new vulnerabilities and tampered packages.
CONTROL
Verify Model Integrity and Provenance
Ensure that models sourced from external repositories are verified through third-party integrity checks, cryptographic signing, and file hashes, and that code signing is applied to externally supplied code.
CONTROL
Evaluate Third-Party Models Prior to Deployment
Ensure that comprehensive safety and security evaluation — including bias assessment, backdoor scanning, and adversarial probing — is performed on any third-party or open-access model before it is approved for deployment, and that results are documented as part of the model onboarding process.
CONTROL
Continuously Red Team Deployed AI Applications
Ensure that AI applications in production are subject to ongoing, scheduled adversarial red-teaming exercises to detect newly discovered vulnerabilities, model behaviour drift, and emerging attack techniques, with findings tracked and remediated through a formal process.
CONTROL
Secure and Monitor MCP Servers and Tool Plugin Integrations
Ensure that all Model Context Protocol (MCP) servers, tool plugins, and external agent integrations are inventoried, verified for integrity prior to use, scanned for known vulnerabilities, and monitored at runtime for suspicious behaviour including tool poisoning, tool shadowing, and unauthorised capability exposure. Only approved integrations should be permitted to connect to deployed AI agents.
RISK
Data and Model Poisoning
Risk that malicious or inadvertent manipulation of pre-training, fine-tuning, or embedding data introduces vulnerabilities, backdoors, or biases into LLM models, compromising model security, performance, and ethical behaviour, and leading to harmful outputs, degraded capabilities, or exploitation of downstream systems.
CONTROL
Track Data Lineage and Verify Data Legitimacy
Ensure that data origins and transformations are tracked using provenance tools (e.g., OWASP CycloneDX or ML-BOM), and that data legitimacy is verified at all stages of model development.
CONTROL
Implement Anomaly Detection and Sandboxing for Training Data
Ensure that strict sandboxing limits model exposure to unverified data sources, and that anomaly detection techniques are applied to filter out adversarial or poisoned data during training pipelines.
CONTROL
Implement Training Pipeline Integrity Controls
Ensure that model training and fine-tuning pipelines implement integrity controls across three dimensions: (i) data provenance verification confirming that all training data is from authorised and unmodified sources; (ii) anomaly detection on training metrics — including loss curves, gradient norms, and output distributions — to detect signs of poisoning; and (iii) version-controlled dataset management enabling detection and rollback of unauthorised modifications.
RISK
Improper Output Handling
Risk that LLM-generated outputs are passed to downstream components or systems without sufficient validation or sanitisation, enabling cross-site scripting (XSS), cross-site request forgery (CSRF), server-side request forgery (SSRF), SQL injection, privilege escalation, or remote code execution, resulting in security breaches and system compromise.
CONTROL
Apply Zero-Trust Validation on LLM Outputs Passed to Downstream Systems
Ensure that LLM outputs are treated as untrusted input when passed to backend functions, APIs, or other system components, applying proper validation and sanitisation — including context-appropriate encoding (e.g. HTML, SQL, shell) — following OWASP ASVS guidelines. LLM output should never be directly executed or forwarded to sensitive sinks without this treatment.
CONTROL
Implement Robust Logging and Monitoring of LLM Outputs
Ensure that robust logging and monitoring systems are deployed to detect unusual patterns in LLM outputs that may indicate exploitation attempts.
RISK
Excessive Agency
Risk that LLM-based systems are granted excessive functionality, permissions, or autonomy beyond what is necessary for intended operations, enabling damaging or unauthorised actions — including data exfiltration, system manipulation, and privilege escalation — in response to hallucinated, manipulated, or ambiguous model outputs.
CONTROL
Minimise LLM Extension Scope and Functionality
Ensure that LLM agents are granted access only to the extensions and functions strictly necessary for their intended operation, and that open-ended or unnecessary extensions are removed or not made available.
CONTROL
Apply Least Privilege to LLM Extension Permissions
Ensure that permissions granted to LLM extensions on downstream systems are limited to the minimum required for intended operations, enforced through appropriate access controls (e.g., database-level permissions, OAuth scopes).
CONTROL
Require Human Approval for High-Impact Autonomous Actions
Ensure that human-in-the-loop controls are in place requiring explicit human approval before the LLM or its extensions execute high-impact or irreversible actions.
CONTROL
Implement Authorisation in Downstream Systems
Ensure that authorisation checks for actions are enforced in downstream systems rather than delegated to the LLM, applying the complete mediation principle to all requests made via extensions.
CONTROL
Document and Bound the Blast Radius of AI Agents
Ensure that prior to deployment, the potential blast radius of each AI agent is documented and reviewed — enumerating all connected tools, identities, data sources, external APIs, and downstream systems the agent can reach. Design boundaries should limit the maximum scope of impact from a compromised or misbehaving agent, and the blast-radius assessment should be repeated whenever agent capabilities or integrations change.
RISK
System Prompt Leakage
Risk that system prompts or instructions used to configure LLM behaviour inadvertently expose sensitive information — such as API keys, database credentials, internal business rules, or role structures — enabling attackers to exploit application weaknesses, bypass controls, or escalate privileges.
CONTROL
Exclude Sensitive Data from System Prompts
Ensure that sensitive information such as API keys, authentication credentials, database names, user roles, and permission structures are not embedded in system prompts; instead externalise such data to systems not directly accessible by the model.
CONTROL
Enforce Security Controls Outside the LLM
Ensure that critical controls such as privilege separation, authorisation checks, and content filtering are enforced by external deterministic systems rather than delegated to the LLM via system prompts.
CONTROL
Implement Independent Guardrails to Inspect LLM Output
Ensure that an independent guardrail system inspects LLM outputs to verify compliance with expected behaviour, rather than relying solely on system prompt instructions to control model conduct.
RISK
Vector and Embedding Weaknesses
Risk that weaknesses in the generation, storage, or retrieval of vectors and embeddings in Retrieval-Augmented Generation (RAG) systems are exploited — through retrieval poisoning, cross-tenant context leakage, embedding inversion, or knowledge-conflict injection — to manipulate model outputs, expose sensitive source information, or produce harmful responses, resulting in privacy violations, compliance failures, and compromised model integrity.
CONTROL
Enforce Access Partitioning and Retrieval Poisoning Controls in Vector Stores
Ensure that vector and embedding stores enforce fine-grained, identity-aware access controls with strict logical partitioning between user classes and tenant groups to prevent cross-context data leakage. Additionally, implement retrieval-layer defences against poisoning attacks — including validation of ingested content, detection of hidden instructions or adversarial payloads within documents, and monitoring for anomalous retrieval patterns that may indicate an active injection or exfiltration attempt.
CONTROL
Validate Knowledge Base Sources Against Poisoning and Embedding Inversion
Ensure that all content ingested into the RAG knowledge base is validated against trusted and verified sources, screened for hidden adversarial instructions, and subject to regular integrity audits. Apply controls to mitigate embedding inversion risks — whereby attackers reconstruct sensitive source text from exposed embeddings — and implement knowledge-conflict detection to identify cases where retrieved content contradicts established ground truth.
CONTROL
Maintain Immutable Retrieval Activity Logs
Ensure that detailed, immutable logs of all retrieval activities from vector stores are maintained to enable timely detection of and response to suspicious or anomalous access patterns.
RISK
Misinformation and Hallucination
Risk that LLMs produce false, fabricated, or misleading outputs — including hallucinated facts, unsupported claims, misrepresented expertise, or insecure code suggestions — that appear credible, leading to harmful user decisions, operational failures, legal liability, and reputational damage.
CONTROL
Deploy Retrieval-Augmented Generation to Ground Model Outputs
Ensure that Retrieval-Augmented Generation (RAG) is used where appropriate to enhance the reliability of model outputs by grounding responses in verified external knowledge sources, reducing the incidence of hallucinations.
CONTROL
Implement Human Oversight and Cross-Verification Processes
Ensure that human oversight and fact-checking processes are in place for critical or sensitive LLM-generated content, and that users are encouraged and equipped to verify AI outputs against trusted sources.
CONTROL
Implement Automatic Validation Mechanisms for High-Stakes Outputs
Ensure that automated tools and processes are deployed to validate key LLM outputs, particularly in high-stakes environments such as healthcare, legal, and financial services.
RISK
Unbounded Consumption
Risk that LLM applications permit excessive or uncontrolled inference operations, enabling adversaries to cause denial-of-service (DoS), unsustainable financial losses ("Denial of Wallet"), degradation of service quality, or intellectual property theft through model extraction — exploiting the high computational demands of LLMs, particularly in cloud environments.
CONTROL
Apply Rate Limiting and User Quotas at the Infrastructure Layer
Ensure that rate limiting and per-user or per-application quotas are enforced to restrict the volume of inference operations within a given time period. Primary ownership of this control rests with infrastructure and platform teams (API gateways, cloud provider quotas, service meshes); AI application teams should verify that such controls are configured and actively monitored for their AI workloads.
CONTROL
Validate and Bound Input Size at the Infrastructure Layer
Ensure that input size limits are enforced to reject requests exceeding reasonable token or byte thresholds, preventing resource exhaustion from variable-length input flooding or context window overflow attacks. This control is typically implemented at the API gateway or LLM proxy layer and owned by infrastructure teams; AI application teams should confirm limits are in place and calibrated for their workload.
CONTROL
Monitor AI Resource Usage for Anomalous Consumption Patterns
Ensure that resource usage — including token consumption, request rates, and cost metrics — is continuously monitored with anomaly detection to identify and respond to unusual patterns that may indicate DoS attempts, Denial of Wallet attacks, or model extraction activity.
CONTROL
Restrict LLM Access to Network Resources and Internal Services
Ensure that the LLM application's access to network resources, internal services, and APIs is restricted through sandboxing and network segmentation to mitigate side-channel attacks and limit the scope of potential resource exploitation or model extraction.