owasp_llm_top10_2025

OWASP / LLM Top 10

A risk framework derived from the OWASP Top 10 for Large Language Model (LLM) and Generative AI Applications (2025 edition). It identifies the ten most critical security risks associated with developing, deploying, and operating LLM-based systems, together with associated controls to reduce or eliminate those risks. Controls have been refined against observed vendor implementations to reflect what is practically enforceable.

Type:

Other

Domain:

Cybersecurity

Coverage:

Accountability & Governance

Safety & Reputational Harm

Privacy & Data

Performance & Reliability

Tags:

GenAI

Content:

10 Risks

40 Controls

Version: 2025.1

Framework Definition

Risks and controls associated with the framework

Assessment Layer

Concrete evaluations linked to controls to assess pass or fail

No evaluation mapping defined yet.

RISK

Prompt Injection

Risk that adversarial or inadvertent user prompts alter the LLM's behaviour or outputs in unintended ways — including bypassing safety guidelines, generating harmful content, enabling unauthorised access, or influencing critical decisions — resulting in security breaches, reputational damage, and operational harm.

LLM01:2025

7 Controls

CONTROL

Constrain Model Behaviour via System Prompt

Ensure that the system prompt provides specific instructions about the model's role, capabilities, and limitations, enforcing strict context adherence and instructing the model to ignore attempts to modify core instructions.

C.1.1

CONTROL

Implement Input Filtering and Prompt Inspection

Ensure that all user-supplied and external content is inspected before being passed to the model, applying semantic filters, string-checking rules, and policy-based classifiers to detect and block prompt injection attempts, jailbreak patterns, and policy violations.

C.1.2a

CONTROL

Implement Output Filtering and Response Validation

Ensure that model-generated responses are validated and filtered before being returned to users or passed to downstream systems, checking for sensitive data disclosure, policy violations, toxic content, and hallucinations, independently of the model's own safety mechanisms.

C.1.2b

CONTROL

Enforce Least Privilege Access for LLM Extensions

Ensure that the LLM application and its extensions are granted only the minimum permissions necessary for intended operations; extensible functionality should be handled in code rather than delegated to the model.

C.1.3

CONTROL

Require Human Approval for High-Risk Actions

Ensure that human-in-the-loop controls are implemented for privileged or high-impact operations to prevent unauthorised autonomous actions by the LLM.

C.1.4

CONTROL

Conduct Adversarial Testing and Attack Simulations

Ensure that regular penetration testing and adversarial simulations are performed against LLM systems, treating the model as an untrusted user to validate the effectiveness of trust boundaries and access controls.

C.1.5

CONTROL

Test Guardrails for Bypass Resistance

Ensure that deployed guardrails, safety filters, and content moderation systems are themselves subjected to regular adversarial bypass testing — including character injection, encoding obfuscation, multilingual attacks, and multi-turn manipulation — to verify that protections cannot be circumvented. Having guardrails in place is not sufficient if those guardrails can be bypassed.

C.1.6

RISK

Sensitive Information Disclosure

Risk that LLM applications inadvertently expose personal identifiable information (PII), financial records, health data, confidential business information, security credentials, or proprietary algorithms through model outputs, resulting in unauthorised data access, privacy violations, and intellectual property breaches.

LLM02:2025

4 Controls

CONTROL

Implement Data Sanitisation Before Training

Ensure that data sanitisation techniques, including scrubbing and masking of sensitive content, are applied before any user data enters the training model, preventing future disclosure through model outputs.

C.2.1

CONTROL

Enforce Strict Access Controls on Data Sources

Ensure that access to sensitive data is limited based on the principle of least privilege, and that model access to external data sources is restricted to prevent unintended data leakage at runtime.

C.2.2

CONTROL

Maintain Clear Data Usage and Retention Policies

Ensure that transparent policies on data retention, usage, and deletion are established and communicated to users, including the option to opt out of having their data used in model training.

C.2.3

CONTROL

Conceal and Protect System Configuration Details

Ensure that system prompts and internal configuration details are not exposed to end users, and that secure system configuration best practices are followed to prevent sensitive information leakage through error messages or settings.

C.2.4

RISK

LLM Supply Chain Compromise

Risk that vulnerabilities in third-party components, pre-trained models, datasets, fine-tuning adapters, MCP servers, tool plugins, or deployment platforms — including tampering, poisoning, or inadequate provenance — compromise the integrity, security, or legal compliance of LLM applications, resulting in biased or malicious outputs, system failures, and regulatory exposure.

LLM03:2025

6 Controls

CONTROL

Vet and Continuously Monitor Third-Party Suppliers

Ensure that all data sources, model providers, and software suppliers are rigorously vetted, including review of terms and conditions and privacy policies, and that their security posture is regularly re-assessed.

C.3.1

CONTROL

Maintain a Software and AI Bill of Materials (SBOM / AI-BOM)

Ensure that an up-to-date inventory of all components, models, datasets, and dependencies is maintained using a Software Bill of Materials (SBOM) or AI-BOM, enabling rapid detection of new vulnerabilities and tampered packages.

C.3.2

CONTROL

Verify Model Integrity and Provenance

Ensure that models sourced from external repositories are verified through third-party integrity checks, cryptographic signing, and file hashes, and that code signing is applied to externally supplied code.

C.3.3

CONTROL

Evaluate Third-Party Models Prior to Deployment

Ensure that comprehensive safety and security evaluation — including bias assessment, backdoor scanning, and adversarial probing — is performed on any third-party or open-access model before it is approved for deployment, and that results are documented as part of the model onboarding process.

C.3.4a

CONTROL

Continuously Red Team Deployed AI Applications

Ensure that AI applications in production are subject to ongoing, scheduled adversarial red-teaming exercises to detect newly discovered vulnerabilities, model behaviour drift, and emerging attack techniques, with findings tracked and remediated through a formal process.

C.3.4b

CONTROL

Secure and Monitor MCP Servers and Tool Plugin Integrations

Ensure that all Model Context Protocol (MCP) servers, tool plugins, and external agent integrations are inventoried, verified for integrity prior to use, scanned for known vulnerabilities, and monitored at runtime for suspicious behaviour including tool poisoning, tool shadowing, and unauthorised capability exposure. Only approved integrations should be permitted to connect to deployed AI agents.

C.3.5

RISK

Data and Model Poisoning

Risk that malicious or inadvertent manipulation of pre-training, fine-tuning, or embedding data introduces vulnerabilities, backdoors, or biases into LLM models, compromising model security, performance, and ethical behaviour, and leading to harmful outputs, degraded capabilities, or exploitation of downstream systems.

LLM04:2025

3 Controls

CONTROL

Track Data Lineage and Verify Data Legitimacy

Ensure that data origins and transformations are tracked using provenance tools (e.g., OWASP CycloneDX or ML-BOM), and that data legitimacy is verified at all stages of model development.

C.4.1

CONTROL

Implement Anomaly Detection and Sandboxing for Training Data

Ensure that strict sandboxing limits model exposure to unverified data sources, and that anomaly detection techniques are applied to filter out adversarial or poisoned data during training pipelines.

C.4.2

CONTROL

Implement Training Pipeline Integrity Controls

Ensure that model training and fine-tuning pipelines implement integrity controls across three dimensions: (i) data provenance verification confirming that all training data is from authorised and unmodified sources; (ii) anomaly detection on training metrics — including loss curves, gradient norms, and output distributions — to detect signs of poisoning; and (iii) version-controlled dataset management enabling detection and rollback of unauthorised modifications.

C.4.3

RISK

Improper Output Handling

Risk that LLM-generated outputs are passed to downstream components or systems without sufficient validation or sanitisation, enabling cross-site scripting (XSS), cross-site request forgery (CSRF), server-side request forgery (SSRF), SQL injection, privilege escalation, or remote code execution, resulting in security breaches and system compromise.

LLM05:2025

2 Controls

CONTROL

Apply Zero-Trust Validation on LLM Outputs Passed to Downstream Systems

Ensure that LLM outputs are treated as untrusted input when passed to backend functions, APIs, or other system components, applying proper validation and sanitisation — including context-appropriate encoding (e.g. HTML, SQL, shell) — following OWASP ASVS guidelines. LLM output should never be directly executed or forwarded to sensitive sinks without this treatment.

C.5.1

CONTROL

Implement Robust Logging and Monitoring of LLM Outputs

Ensure that robust logging and monitoring systems are deployed to detect unusual patterns in LLM outputs that may indicate exploitation attempts.

C.5.2

RISK

Excessive Agency

Risk that LLM-based systems are granted excessive functionality, permissions, or autonomy beyond what is necessary for intended operations, enabling damaging or unauthorised actions — including data exfiltration, system manipulation, and privilege escalation — in response to hallucinated, manipulated, or ambiguous model outputs.

LLM06:2025

5 Controls

CONTROL

Minimise LLM Extension Scope and Functionality

Ensure that LLM agents are granted access only to the extensions and functions strictly necessary for their intended operation, and that open-ended or unnecessary extensions are removed or not made available.

C.6.1

CONTROL

Apply Least Privilege to LLM Extension Permissions

Ensure that permissions granted to LLM extensions on downstream systems are limited to the minimum required for intended operations, enforced through appropriate access controls (e.g., database-level permissions, OAuth scopes).

C.6.2

CONTROL

Require Human Approval for High-Impact Autonomous Actions

Ensure that human-in-the-loop controls are in place requiring explicit human approval before the LLM or its extensions execute high-impact or irreversible actions.

C.6.3

CONTROL

Implement Authorisation in Downstream Systems

Ensure that authorisation checks for actions are enforced in downstream systems rather than delegated to the LLM, applying the complete mediation principle to all requests made via extensions.

C.6.4

CONTROL

Document and Bound the Blast Radius of AI Agents

Ensure that prior to deployment, the potential blast radius of each AI agent is documented and reviewed — enumerating all connected tools, identities, data sources, external APIs, and downstream systems the agent can reach. Design boundaries should limit the maximum scope of impact from a compromised or misbehaving agent, and the blast-radius assessment should be repeated whenever agent capabilities or integrations change.

C.6.5

RISK

System Prompt Leakage

Risk that system prompts or instructions used to configure LLM behaviour inadvertently expose sensitive information — such as API keys, database credentials, internal business rules, or role structures — enabling attackers to exploit application weaknesses, bypass controls, or escalate privileges.

LLM07:2025

3 Controls

CONTROL

Exclude Sensitive Data from System Prompts

Ensure that sensitive information such as API keys, authentication credentials, database names, user roles, and permission structures are not embedded in system prompts; instead externalise such data to systems not directly accessible by the model.

C.7.1

CONTROL

Enforce Security Controls Outside the LLM

Ensure that critical controls such as privilege separation, authorisation checks, and content filtering are enforced by external deterministic systems rather than delegated to the LLM via system prompts.

C.7.2

CONTROL

Implement Independent Guardrails to Inspect LLM Output

Ensure that an independent guardrail system inspects LLM outputs to verify compliance with expected behaviour, rather than relying solely on system prompt instructions to control model conduct.

C.7.3

RISK

Vector and Embedding Weaknesses

Risk that weaknesses in the generation, storage, or retrieval of vectors and embeddings in Retrieval-Augmented Generation (RAG) systems are exploited — through retrieval poisoning, cross-tenant context leakage, embedding inversion, or knowledge-conflict injection — to manipulate model outputs, expose sensitive source information, or produce harmful responses, resulting in privacy violations, compliance failures, and compromised model integrity.

LLM08:2025

3 Controls

CONTROL

Enforce Access Partitioning and Retrieval Poisoning Controls in Vector Stores

Ensure that vector and embedding stores enforce fine-grained, identity-aware access controls with strict logical partitioning between user classes and tenant groups to prevent cross-context data leakage. Additionally, implement retrieval-layer defences against poisoning attacks — including validation of ingested content, detection of hidden instructions or adversarial payloads within documents, and monitoring for anomalous retrieval patterns that may indicate an active injection or exfiltration attempt.

C.8.1

CONTROL

Validate Knowledge Base Sources Against Poisoning and Embedding Inversion

Ensure that all content ingested into the RAG knowledge base is validated against trusted and verified sources, screened for hidden adversarial instructions, and subject to regular integrity audits. Apply controls to mitigate embedding inversion risks — whereby attackers reconstruct sensitive source text from exposed embeddings — and implement knowledge-conflict detection to identify cases where retrieved content contradicts established ground truth.

C.8.2

CONTROL

Maintain Immutable Retrieval Activity Logs

Ensure that detailed, immutable logs of all retrieval activities from vector stores are maintained to enable timely detection of and response to suspicious or anomalous access patterns.

C.8.3

RISK

Misinformation and Hallucination

Risk that LLMs produce false, fabricated, or misleading outputs — including hallucinated facts, unsupported claims, misrepresented expertise, or insecure code suggestions — that appear credible, leading to harmful user decisions, operational failures, legal liability, and reputational damage.

LLM09:2025

3 Controls

CONTROL

Deploy Retrieval-Augmented Generation to Ground Model Outputs

Ensure that Retrieval-Augmented Generation (RAG) is used where appropriate to enhance the reliability of model outputs by grounding responses in verified external knowledge sources, reducing the incidence of hallucinations.

C.9.1

CONTROL

Implement Human Oversight and Cross-Verification Processes

Ensure that human oversight and fact-checking processes are in place for critical or sensitive LLM-generated content, and that users are encouraged and equipped to verify AI outputs against trusted sources.

C.9.2

CONTROL

Implement Automatic Validation Mechanisms for High-Stakes Outputs

Ensure that automated tools and processes are deployed to validate key LLM outputs, particularly in high-stakes environments such as healthcare, legal, and financial services.

C.9.3

RISK

Unbounded Consumption

Risk that LLM applications permit excessive or uncontrolled inference operations, enabling adversaries to cause denial-of-service (DoS), unsustainable financial losses ("Denial of Wallet"), degradation of service quality, or intellectual property theft through model extraction — exploiting the high computational demands of LLMs, particularly in cloud environments.

LLM10:2025

4 Controls

CONTROL

Apply Rate Limiting and User Quotas at the Infrastructure Layer

Ensure that rate limiting and per-user or per-application quotas are enforced to restrict the volume of inference operations within a given time period. Primary ownership of this control rests with infrastructure and platform teams (API gateways, cloud provider quotas, service meshes); AI application teams should verify that such controls are configured and actively monitored for their AI workloads.

C.10.1

CONTROL

Validate and Bound Input Size at the Infrastructure Layer

Ensure that input size limits are enforced to reject requests exceeding reasonable token or byte thresholds, preventing resource exhaustion from variable-length input flooding or context window overflow attacks. This control is typically implemented at the API gateway or LLM proxy layer and owned by infrastructure teams; AI application teams should confirm limits are in place and calibrated for their workload.

C.10.2

CONTROL

Monitor AI Resource Usage for Anomalous Consumption Patterns

Ensure that resource usage — including token consumption, request rates, and cost metrics — is continuously monitored with anomaly detection to identify and respond to unusual patterns that may indicate DoS attempts, Denial of Wallet attacks, or model extraction activity.

C.10.3

CONTROL

Restrict LLM Access to Network Resources and Internal Services

Ensure that the LLM application's access to network resources, internal services, and APIs is restricted through sandboxing and network segmentation to mitigate side-channel attacks and limit the scope of potential resource exploitation or model extraction.

C.10.4