Download Framework
https://saif.google/
saif_agentic
SAIF / Secure AI Framework
Google's Secure AI Framework (SAIF) — a conceptual framework for securing AI systems, covering risks and controls across the AI development and deployment lifecycle. SAIF 2.0 extends the base framework with agentic risks (Rogue Actions, expanded Sensitive Data Disclosure) and agent-specific controls (Agent Permissions, Agent User Control, Agent Observability).
Type:
Industry
Domain:
Cross-sector
Coverage:
Cybersecurity
Performance & Reliability
Safety & Reputational Harm
Privacy & Data
Legal & Compliance
Accountability & Governance
Content:
17 Risks
47 Controls
Version: 2.0
Framework Definition
Risks and controls associated with the framework
Assessment Layer
Concrete evaluations linked to controls to assess pass or fail
No evaluation mapping defined yet.
Performance & Reliability
Risks arising from failures in model accuracy, robustness, or availability that degrade the intended behaviour of AI systems.
RISK
Model Evasion
Risk that adversaries cause a model to produce incorrect inferences by slightly perturbing inputs, leading to reputational or legal challenges and triggering downstream security or privacy failures.
MEV
1 Control
CONTROL
Adversarial Training and Testing
Ensure that AI models are made robust to adversarial inputs through adversarial training and testing techniques in the context of their intended application.
MEV-C1
RISK
Denial of ML Service
Risk that adversaries reduce the availability of ML systems by issuing resource-exhausting queries, rendering the service unavailable to legitimate users.
DMS
1 Control
CONTROL
Application Access Management
Ensure that only authorised users and endpoints can access AI system resources, and that rate limiting and load balancing are in place to prevent service exhaustion.
DMS-C1
Safety & Reputational Harm
Risks arising from AI model outputs that are harmful, insecure, or damaging to users, downstream systems, or organisational reputation.
RISK
Insecure Model Output
Risk that model output is not appropriately validated or sanitised before being passed to downstream systems or users, exposing the organisation to reputational, security, and user safety harms.
IMO
2 Controls
CONTROL
Output Validation and Sanitization
Ensure that model outputs are blocked, nullified, or sanitised before being passed to applications, extensions, or users.
IMO-C1
CONTROL
Adversarial Training and Testing
Ensure that AI models are made robust to adversarial inputs through adversarial training and testing techniques in the context of their intended application.
IMO-C2
Cybersecurity
Risks arising from adversarial attacks, supply chain compromises, unauthorised access, and manipulation of AI model assets, infrastructure, and interfaces.
RISK
Data Poisoning
Risk that training or retraining data is maliciously altered — through deletion, modification, or injection of adversarial data — to degrade model performance, skew outputs, or install hidden backdoors.
DP
5 Controls
CONTROL
Training Data Sanitization
Ensure that poisoned or sensitive data is detected and removed or remediated in training and evaluation datasets.
DP-C1
CONTROL
Secure-by-Default ML Tooling
Ensure that secure-by-default frameworks, libraries, software systems, and hardware components are used for AI development and deployment to protect confidentiality and integrity of AI assets.
DP-C2
CONTROL
Model and Data Integrity Management
Ensure that all data, models, and code used to produce AI models are verifiably integrity-protected during development and deployment.
DP-C3
CONTROL
Model and Data Access Controls
Ensure that internal access to models, weights, and datasets in storage and production is minimised through least-privilege access controls.
DP-C4
CONTROL
Model and Data Inventory Management
Ensure that all data, code, models, and transformation tools used in AI applications are inventoried and tracked.
DP-C5
RISK
Model Source Tampering
Risk that an adversary tampers with a model's source code, dependencies, or weights through supply chain or insider attacks, introducing vulnerabilities or unexpected behaviours including persistent architectural backdoors.
MST
4 Controls
CONTROL
Secure-by-Default ML Tooling
Ensure that secure-by-default frameworks, libraries, software systems, and hardware components are used for AI development and deployment to protect confidentiality and integrity of AI assets.
MST-C1
CONTROL
Model and Data Integrity Management
Ensure that all data, models, and code used to produce AI models are verifiably integrity-protected during development and deployment.
MST-C2
CONTROL
Model and Data Access Controls
Ensure that internal access to models, weights, and datasets in storage and production is minimised through least-privilege access controls.
MST-C3
CONTROL
Model and Data Inventory Management
Ensure that all data, code, models, and transformation tools used in AI applications are inventoried and tracked.
MST-C4
RISK
Model Deployment Tampering
Risk that adversaries make unauthorised modifications to components used for deploying a model — via supply chain attacks or exploitation of known vulnerabilities in serving infrastructure — altering model behaviour in production.
MDT
1 Control
CONTROL
Secure-by-Default ML Tooling
Ensure that secure-by-default frameworks, libraries, software systems, and hardware components are used for AI development and deployment to protect confidentiality and integrity of AI assets.
MDT-C1
RISK
Model Exfiltration
Risk of unauthorised appropriation of an AI model — by exploiting storage or serving infrastructure — resulting in intellectual property theft, replication of functionality, and potential downstream security and privacy harm.
MXF
3 Controls
CONTROL
Model and Data Inventory Management
Ensure that all data, code, models, and transformation tools used in AI applications are inventoried and tracked.
MXF-C1
CONTROL
Model and Data Access Controls
Ensure that internal access to models, weights, and datasets in storage and production is minimised through least-privilege access controls.
MXF-C2
CONTROL
Secure-by-Default ML Tooling
Ensure that secure-by-default frameworks, libraries, software systems, and hardware components are used for AI development and deployment to protect confidentiality and integrity of AI assets.
MXF-C3
RISK
Model Reverse Engineering
Risk that adversaries clone or recreate a model by systematically analysing its inputs, outputs, and behaviours — typically via API abuse — enabling imitation products or facilitating targeted adversarial attacks on the original model.
MRE
1 Control
CONTROL
Application Access Management
Ensure that only authorised users and endpoints can access AI system resources, and that rate limiting controls prevent excessive API querying that could enable reverse engineering.
MRE-C1
RISK
Insecure Integrated Component
Risk that vulnerabilities in software interacting with AI models — such as plugins, libraries, or applications — are exploited by attackers to gain unauthorised access, introduce malicious code, or compromise system operations, posing broad threats to user trust, privacy, and security.
IIC
1 Control
CONTROL
Agent Permissions
Ensure that the least-privilege principle is applied as the upper bound on permissions for integrated components and plugins, minimising the number of tools and actions permitted.
IIC-C1
RISK
Prompt Injection
Risk that adversaries cause a model to execute commands injected inside a prompt — exploiting the boundary between instructions and input data — leading to unintended model behaviour, data leakage, jailbreaks, or downstream harm when combined with other risks.
PIJ
3 Controls
CONTROL
Input Validation and Sanitization
Ensure that adversarial queries to AI models are blocked or restricted through robust input filtering and validation.
PIJ-C1
CONTROL
Adversarial Training and Testing
Ensure that AI models are made robust to adversarial inputs through adversarial training and testing techniques in the context of their intended application.
PIJ-C2
CONTROL
Output Validation and Sanitization
Ensure that model outputs are blocked, nullified, or sanitised before being passed to applications, extensions, or users.
PIJ-C3
Privacy & Data Governance
Risks arising from the improper handling, retention, disclosure, or inference of personal and confidential data across the AI development and deployment lifecycle.
RISK
Excessive Data Handling
Risk that user data is collected, retained, processed, or shared beyond what is permitted by applicable policies, creating legal and regulatory exposure.
EDH
2 Controls
CONTROL
User Data Management
Ensure that all user data from AI applications — including prompts and logs — is stored, processed, and used in compliance with user consent and applicable policies.
EDH-C1
CONTROL
User Transparency and Controls
Ensure that users are informed of relevant AI risks through disclosures, and that transparency and control mechanisms are provided for the use of their data in AI applications.
EDH-C2
RISK
Sensitive Data Disclosure
Risk that private or confidential data — including memorised training data, user interactions, system prompts, or data passing through integrated systems — is disclosed through querying of the model or agent. In agentic systems this risk is amplified, as agents may have privileged access to emails, files, and credentials, enabling large-scale exfiltration through tools and external channels.
SDD
7 Controls
CONTROL
Privacy Enhancing Technologies
Ensure that technologies minimising, de-identifying, or restricting use of PII data in training or evaluating models are applied.
SDD-C1
CONTROL
User Data Management
Ensure that all user data from AI applications — including prompts and logs — is stored, processed, and used in compliance with user consent and applicable policies.
SDD-C2
CONTROL
Output Validation and Sanitization
Ensure that model outputs are blocked, nullified, or sanitised before being passed to applications, extensions, or users.
SDD-C3
CONTROL
User Transparency and Controls
Ensure that users are informed of relevant AI risks through disclosures, and that transparency and control mechanisms are provided for the use of their data in AI applications.
SDD-C4
CONTROL
Agent Permissions
Ensure that the least-privilege principle is applied as the upper bound on agent permissions, with contextual and dynamic access to tools and data, to minimise the volume of sensitive information an agent can access or disclose.
SDD-C5
CONTROL
Agent User Control
Ensure that user approval is required for any agent actions that alter user data or act on the user's behalf, preventing unsanctioned disclosure of sensitive information through agent tools.
SDD-C6
CONTROL
Agent Observability
Ensure that an agent's actions, tool use, and reasoning are logged and auditable, enabling detection of unauthorised data access or disclosure through agent channels.
SDD-C7
RISK
Inferred Sensitive Data
Risk that models infer sensitive personal information — such as gender, political affiliation, or health status — not contained in training data, causing privacy violations and potential harm to data subjects even when derived from public inputs.
ISD
2 Controls
CONTROL
Training Data Management
Ensure that all data used to train and evaluate models is authorised for the intended purposes, and that data likely to lead to sensitive inferences is identified and managed.
ISD-C1
CONTROL
Output Validation and Sanitization
Ensure that model outputs are blocked, nullified, or sanitised before being passed to applications, extensions, or users.
ISD-C2
Legal & Regulatory Compliance
Risks arising from the use of data that violates intellectual property rights, contractual obligations, or regulatory requirements, exposing the organisation to legal liability.
RISK
Unauthorized Training Data
Risk that a model is trained or fine-tuned on data that is not authorised for that use — including data lacking user consent, unlicensed copyrighted material, or legally restricted data — resulting in legal and ethical liability.
UTD
2 Controls
CONTROL
Training Data Management
Ensure that all data used to train and evaluate models is authorised for the intended purposes, covering consent, licensing, and regulatory requirements.
UTD-C1
CONTROL
Training Data Sanitization
Ensure that unauthorised or sensitive data is detected and removed or remediated in training and evaluation datasets.
UTD-C2
Accountability & Governance
Cross-cutting risks arising from the absence of systematic assurance practices and governance structures needed to identify, measure, and manage AI security and privacy risks across the full development lifecycle.
RISK
Inadequate AI Assurance
Risk that the absence of continuous adversarial testing, vulnerability management, threat detection, and incident response leaves AI systems exposed to undetected security and privacy failures across all risk areas.
ASR
4 Controls
CONTROL
Red Teaming
Ensure that security and privacy improvements are identified through self-driven adversarial attacks on AI infrastructure and products.
ASR-C1
CONTROL
Vulnerability Management
Ensure that production infrastructure and products are proactively and continually tested and monitored for security and privacy regressions.
ASR-C2
CONTROL
Threat Detection
Ensure that internal or external attacks on AI assets, infrastructure, and products are detected and alerted on in a timely manner.
ASR-C3
CONTROL
Incident Response Management
Ensure that a defined and exercised incident response process is in place to manage AI security and privacy incidents.
ASR-C4
RISK
Inadequate AI Governance
Risk that the absence of organisational policies, education, product governance, and risk management processes leaves AI security and privacy risks unmanaged and residual risk unmeasured across the organisation.
GVR
4 Controls
CONTROL
User Policies and Education
Ensure that easy-to-understand AI security and privacy policies and education are published for users.
GVR-C1
CONTROL
Internal Policies and Education
Ensure that comprehensive AI security and privacy policies and education are published for employees.
GVR-C2
CONTROL
Product Governance
Ensure that all AI models and products are validated against established security and privacy requirements before deployment.
GVR-C3
CONTROL
Risk Governance
Ensure that residual AI risk is inventoried, measured, and monitored across the organisation on an ongoing basis.
GVR-C4
Agentic AI Safety
Risks unique to model-based agents that can take autonomous real-world actions, where failures in reasoning, alignment, or orchestration security can cause unintended or malicious consequences at scale.
RISK
Rogue Actions
Risk that a model-based agent executes unintended actions — whether through accidental misalignment in task planning or malicious manipulation via prompt injection, poisoning, or evasion — causing harm to organisational reputation, user trust, security, and safety. Severity scales directly with the agent's level of autonomy and the breadth of permissions granted.
RA
4 Controls
CONTROL
Agent Permissions
Ensure that the least-privilege principle is applied as the upper bound on agent permissions, with contextual and dynamic access to tools and actions, to constrain the blast radius of accidental or malicious rogue actions.
RA-C1
CONTROL
Agent User Control
Ensure that user approval is required for any agent actions that alter user data or act on the user's behalf, preventing unsanctioned real-world consequences from rogue agent behaviour.
RA-C2
CONTROL
Agent Observability
Ensure that an agent's actions, tool use, and reasoning are logged and auditable, enabling timely detection of and response to rogue behaviour.
RA-C3
CONTROL
Output Validation and Sanitization
Ensure that agent outputs are normalised and sanitised before rendering, to prevent malicious content from being executed within the user's application context.
RA-C4