saif_agentic

SAIF / Secure AI Framework

Google's Secure AI Framework (SAIF) — a conceptual framework for securing AI systems, covering risks and controls across the AI development and deployment lifecycle. SAIF 2.0 extends the base framework with agentic risks (Rogue Actions, expanded Sensitive Data Disclosure) and agent-specific controls (Agent Permissions, Agent User Control, Agent Observability).

Type:

Industry

Domain:

Cross-sector

Coverage:

Cybersecurity

Performance & Reliability

Safety & Reputational Harm

Privacy & Data

Legal & Compliance

Accountability & Governance

Content:

17 Risks

47 Controls

Version: 2.0

Framework Definition

Risks and controls associated with the framework

Assessment Layer

Concrete evaluations linked to controls to assess pass or fail

No evaluation mapping defined yet.

Performance & Reliability

Risks arising from failures in model accuracy, robustness, or availability that degrade the intended behaviour of AI systems.

RISK

Model Evasion

Risk that adversaries cause a model to produce incorrect inferences by slightly perturbing inputs, leading to reputational or legal challenges and triggering downstream security or privacy failures.

MEV

1 Control

CONTROL

Adversarial Training and Testing

Ensure that AI models are made robust to adversarial inputs through adversarial training and testing techniques in the context of their intended application.

MEV-C1

RISK

Denial of ML Service

Risk that adversaries reduce the availability of ML systems by issuing resource-exhausting queries, rendering the service unavailable to legitimate users.

DMS

1 Control

CONTROL

Application Access Management

Ensure that only authorised users and endpoints can access AI system resources, and that rate limiting and load balancing are in place to prevent service exhaustion.

DMS-C1

Safety & Reputational Harm

Risks arising from AI model outputs that are harmful, insecure, or damaging to users, downstream systems, or organisational reputation.

RISK

Insecure Model Output

Risk that model output is not appropriately validated or sanitised before being passed to downstream systems or users, exposing the organisation to reputational, security, and user safety harms.

IMO

2 Controls

CONTROL

Output Validation and Sanitization

Ensure that model outputs are blocked, nullified, or sanitised before being passed to applications, extensions, or users.

IMO-C1

CONTROL

Adversarial Training and Testing

Ensure that AI models are made robust to adversarial inputs through adversarial training and testing techniques in the context of their intended application.

IMO-C2

Cybersecurity

Risks arising from adversarial attacks, supply chain compromises, unauthorised access, and manipulation of AI model assets, infrastructure, and interfaces.

RISK

Data Poisoning

Risk that training or retraining data is maliciously altered — through deletion, modification, or injection of adversarial data — to degrade model performance, skew outputs, or install hidden backdoors.

5 Controls

CONTROL

Training Data Sanitization

Ensure that poisoned or sensitive data is detected and removed or remediated in training and evaluation datasets.

DP-C1

CONTROL

Secure-by-Default ML Tooling

Ensure that secure-by-default frameworks, libraries, software systems, and hardware components are used for AI development and deployment to protect confidentiality and integrity of AI assets.

DP-C2

CONTROL

Model and Data Integrity Management

Ensure that all data, models, and code used to produce AI models are verifiably integrity-protected during development and deployment.

DP-C3

CONTROL

Model and Data Access Controls

Ensure that internal access to models, weights, and datasets in storage and production is minimised through least-privilege access controls.

DP-C4

CONTROL

Model and Data Inventory Management

Ensure that all data, code, models, and transformation tools used in AI applications are inventoried and tracked.

DP-C5

RISK

Model Source Tampering

Risk that an adversary tampers with a model's source code, dependencies, or weights through supply chain or insider attacks, introducing vulnerabilities or unexpected behaviours including persistent architectural backdoors.

MST

4 Controls

CONTROL

Secure-by-Default ML Tooling

Ensure that secure-by-default frameworks, libraries, software systems, and hardware components are used for AI development and deployment to protect confidentiality and integrity of AI assets.

MST-C1

CONTROL

Model and Data Integrity Management

Ensure that all data, models, and code used to produce AI models are verifiably integrity-protected during development and deployment.

MST-C2

CONTROL

Model and Data Access Controls

Ensure that internal access to models, weights, and datasets in storage and production is minimised through least-privilege access controls.

MST-C3

CONTROL

Model and Data Inventory Management

Ensure that all data, code, models, and transformation tools used in AI applications are inventoried and tracked.

MST-C4

RISK

Model Deployment Tampering

Risk that adversaries make unauthorised modifications to components used for deploying a model — via supply chain attacks or exploitation of known vulnerabilities in serving infrastructure — altering model behaviour in production.

MDT

1 Control

CONTROL

Secure-by-Default ML Tooling

Ensure that secure-by-default frameworks, libraries, software systems, and hardware components are used for AI development and deployment to protect confidentiality and integrity of AI assets.

MDT-C1

RISK

Model Exfiltration

Risk of unauthorised appropriation of an AI model — by exploiting storage or serving infrastructure — resulting in intellectual property theft, replication of functionality, and potential downstream security and privacy harm.

MXF

3 Controls

CONTROL

Model and Data Inventory Management

Ensure that all data, code, models, and transformation tools used in AI applications are inventoried and tracked.

MXF-C1

CONTROL

Model and Data Access Controls

Ensure that internal access to models, weights, and datasets in storage and production is minimised through least-privilege access controls.

MXF-C2

CONTROL

Secure-by-Default ML Tooling

Ensure that secure-by-default frameworks, libraries, software systems, and hardware components are used for AI development and deployment to protect confidentiality and integrity of AI assets.

MXF-C3

RISK

Model Reverse Engineering

Risk that adversaries clone or recreate a model by systematically analysing its inputs, outputs, and behaviours — typically via API abuse — enabling imitation products or facilitating targeted adversarial attacks on the original model.

MRE

1 Control

CONTROL

Application Access Management

Ensure that only authorised users and endpoints can access AI system resources, and that rate limiting controls prevent excessive API querying that could enable reverse engineering.

MRE-C1

RISK

Insecure Integrated Component

Risk that vulnerabilities in software interacting with AI models — such as plugins, libraries, or applications — are exploited by attackers to gain unauthorised access, introduce malicious code, or compromise system operations, posing broad threats to user trust, privacy, and security.

IIC

1 Control

CONTROL

Agent Permissions

Ensure that the least-privilege principle is applied as the upper bound on permissions for integrated components and plugins, minimising the number of tools and actions permitted.

IIC-C1

RISK

Prompt Injection

Risk that adversaries cause a model to execute commands injected inside a prompt — exploiting the boundary between instructions and input data — leading to unintended model behaviour, data leakage, jailbreaks, or downstream harm when combined with other risks.

PIJ

3 Controls

CONTROL

Input Validation and Sanitization

Ensure that adversarial queries to AI models are blocked or restricted through robust input filtering and validation.

PIJ-C1

CONTROL

Adversarial Training and Testing

Ensure that AI models are made robust to adversarial inputs through adversarial training and testing techniques in the context of their intended application.

PIJ-C2

CONTROL

Output Validation and Sanitization

Ensure that model outputs are blocked, nullified, or sanitised before being passed to applications, extensions, or users.

PIJ-C3

Privacy & Data Governance

Risks arising from the improper handling, retention, disclosure, or inference of personal and confidential data across the AI development and deployment lifecycle.

RISK

Excessive Data Handling

Risk that user data is collected, retained, processed, or shared beyond what is permitted by applicable policies, creating legal and regulatory exposure.

EDH

2 Controls

CONTROL

User Data Management

Ensure that all user data from AI applications — including prompts and logs — is stored, processed, and used in compliance with user consent and applicable policies.

EDH-C1

CONTROL

User Transparency and Controls

Ensure that users are informed of relevant AI risks through disclosures, and that transparency and control mechanisms are provided for the use of their data in AI applications.

EDH-C2

RISK

Sensitive Data Disclosure

Risk that private or confidential data — including memorised training data, user interactions, system prompts, or data passing through integrated systems — is disclosed through querying of the model or agent. In agentic systems this risk is amplified, as agents may have privileged access to emails, files, and credentials, enabling large-scale exfiltration through tools and external channels.

SDD

7 Controls

CONTROL

Privacy Enhancing Technologies

Ensure that technologies minimising, de-identifying, or restricting use of PII data in training or evaluating models are applied.

SDD-C1

CONTROL

User Data Management

Ensure that all user data from AI applications — including prompts and logs — is stored, processed, and used in compliance with user consent and applicable policies.

SDD-C2

CONTROL

Output Validation and Sanitization

Ensure that model outputs are blocked, nullified, or sanitised before being passed to applications, extensions, or users.

SDD-C3

CONTROL

User Transparency and Controls

Ensure that users are informed of relevant AI risks through disclosures, and that transparency and control mechanisms are provided for the use of their data in AI applications.

SDD-C4

CONTROL

Agent Permissions

Ensure that the least-privilege principle is applied as the upper bound on agent permissions, with contextual and dynamic access to tools and data, to minimise the volume of sensitive information an agent can access or disclose.

SDD-C5

CONTROL

Agent User Control

Ensure that user approval is required for any agent actions that alter user data or act on the user's behalf, preventing unsanctioned disclosure of sensitive information through agent tools.

SDD-C6

CONTROL

Agent Observability

Ensure that an agent's actions, tool use, and reasoning are logged and auditable, enabling detection of unauthorised data access or disclosure through agent channels.

SDD-C7

RISK

Inferred Sensitive Data

Risk that models infer sensitive personal information — such as gender, political affiliation, or health status — not contained in training data, causing privacy violations and potential harm to data subjects even when derived from public inputs.

ISD

2 Controls

CONTROL

Training Data Management

Ensure that all data used to train and evaluate models is authorised for the intended purposes, and that data likely to lead to sensitive inferences is identified and managed.

ISD-C1

CONTROL

Output Validation and Sanitization

Ensure that model outputs are blocked, nullified, or sanitised before being passed to applications, extensions, or users.

ISD-C2

Legal & Regulatory Compliance

Risks arising from the use of data that violates intellectual property rights, contractual obligations, or regulatory requirements, exposing the organisation to legal liability.

RISK

Unauthorized Training Data

Risk that a model is trained or fine-tuned on data that is not authorised for that use — including data lacking user consent, unlicensed copyrighted material, or legally restricted data — resulting in legal and ethical liability.

UTD

2 Controls

CONTROL

Training Data Management

Ensure that all data used to train and evaluate models is authorised for the intended purposes, covering consent, licensing, and regulatory requirements.

UTD-C1

CONTROL

Training Data Sanitization

Ensure that unauthorised or sensitive data is detected and removed or remediated in training and evaluation datasets.

UTD-C2

Accountability & Governance

Cross-cutting risks arising from the absence of systematic assurance practices and governance structures needed to identify, measure, and manage AI security and privacy risks across the full development lifecycle.

RISK

Inadequate AI Assurance

Risk that the absence of continuous adversarial testing, vulnerability management, threat detection, and incident response leaves AI systems exposed to undetected security and privacy failures across all risk areas.

ASR

4 Controls

CONTROL

Red Teaming

Ensure that security and privacy improvements are identified through self-driven adversarial attacks on AI infrastructure and products.

ASR-C1

CONTROL

Vulnerability Management

Ensure that production infrastructure and products are proactively and continually tested and monitored for security and privacy regressions.

ASR-C2

CONTROL

Threat Detection

Ensure that internal or external attacks on AI assets, infrastructure, and products are detected and alerted on in a timely manner.

ASR-C3

CONTROL

Incident Response Management

Ensure that a defined and exercised incident response process is in place to manage AI security and privacy incidents.

ASR-C4

RISK

Inadequate AI Governance

Risk that the absence of organisational policies, education, product governance, and risk management processes leaves AI security and privacy risks unmanaged and residual risk unmeasured across the organisation.

GVR

4 Controls

CONTROL

User Policies and Education

Ensure that easy-to-understand AI security and privacy policies and education are published for users.

GVR-C1

CONTROL

Internal Policies and Education

Ensure that comprehensive AI security and privacy policies and education are published for employees.

GVR-C2

CONTROL

Product Governance

Ensure that all AI models and products are validated against established security and privacy requirements before deployment.

GVR-C3

CONTROL

Risk Governance

Ensure that residual AI risk is inventoried, measured, and monitored across the organisation on an ongoing basis.

GVR-C4

Agentic AI Safety

Risks unique to model-based agents that can take autonomous real-world actions, where failures in reasoning, alignment, or orchestration security can cause unintended or malicious consequences at scale.

RISK

Rogue Actions

Risk that a model-based agent executes unintended actions — whether through accidental misalignment in task planning or malicious manipulation via prompt injection, poisoning, or evasion — causing harm to organisational reputation, user trust, security, and safety. Severity scales directly with the agent's level of autonomy and the breadth of permissions granted.

4 Controls

CONTROL

Agent Permissions

Ensure that the least-privilege principle is applied as the upper bound on agent permissions, with contextual and dynamic access to tools and actions, to constrain the blast radius of accidental or malicious rogue actions.

RA-C1

CONTROL

Agent User Control

Ensure that user approval is required for any agent actions that alter user data or act on the user's behalf, preventing unsanctioned real-world consequences from rogue agent behaviour.

RA-C2

CONTROL

Agent Observability

Ensure that an agent's actions, tool use, and reasoning are logged and auditable, enabling timely detection of and response to rogue behaviour.

RA-C3

CONTROL

Output Validation and Sanitization

Ensure that agent outputs are normalised and sanitised before rendering, to prevent malicious content from being executed within the user's application context.

RA-C4