Download Framework
FINMA / AI Governance and Risk Management icon

FINMA / AI Governance and Risk Management

Guidance on governance and risk management when using artificial intelligence (AI) developed by the Swiss Financial Market Supervisory Authority FINMA. The guidance draws attention to the risks associated with the use of AI and describes FINMA’s observations from ongoing supervision.
Type:

Regulation

Domain:

Finance

Coverage:

Accountability & Governance

Cybersecurity

Privacy & Data

Transparency

Performance & Reliability

Legal & Compliance

Region:

Switzerland

Content:

13 Risks

35 Controls

Version: 08/2024

Framework Definition

Risks and controls associated with the framework

Assessment Layer

Concrete evaluations linked to controls to assess pass or fail
FINMA Research Assistant Mapping

Governance and accountability

Ensure AI-related risks (model, IT/cyber, legal, reputational, third-party) are covered by governance, risk management and control systems in a proportional, technology-neutral way.
RISK

Inadequate AI governance and accountability

AI use is not holistically governed; focus is narrow (e.g. only data protection), responsibilities are unclear, and model, IT/cyber, legal and reputational risks are not consistently addressed.
R.1.1
3 Controls
CONTROL

Establish AI governance framework with central inventory, risk classification and role definitions

G.1.1.1
CONTROL

Avoid decentralised, unmanaged AI development

G.1.1.2
CONTROL

Ensure appropriately skilled staff and broad training for AI roles

G.1.1.3
RISK

Third-party / outsourcing risk

AI capabilities are sourced from third parties (models, cloud, tools) without sufficient transparency, due diligence, controls or clear allocation of responsibilities and liability.
R.1.2
2 Controls
CONTROL

Strengthen due diligence on external and cloud-based AI (data, methods, AI presence)

G.1.2.1
CONTROL

Use contractual clauses and controls to govern outsourced AI responsibilities and liability

G.1.2.2

Inventory and risk classification

Maintain a complete, technology-neutral view of AI uses, with risk-based classification by materiality and probability of risk materialisation.
RISK

Incomplete AI inventory and scope

AI is defined too narrowly, development is distributed and (especially with generative AI) easily accessible, leading to incomplete inventories and blind spots.
R.2.1
2 Controls
CONTROL

Use a broad AI definition aligned with international standards (incl. traditional applications with similar risks)

G.2.1.1
CONTROL

Build and maintain complete, centrally managed AI inventories

G.2.1.2
RISK

Inadequate risk-based classification of AI use

Applications are not systematically classified based on materiality, specific risks and probability; high-impact use cases (compliance, critical functions, strong client impact) may not receive stronger controls.
R.2.2
2 Controls
CONTROL

Define criteria for AI materiality, specific risks and probability of risk materialisation

G.2.2.1
CONTROL

Prioritise higher-risk AI uses (e.g. compliance, critical functions, strong client impact)

G.2.2.2

Data quality

Ensure training and input data are complete, correct, representative and of sufficient quality (incl. unstructured and third-party data).
RISK

Poor or inappropriate data quality

Data may be incorrect, inconsistent, incomplete, unrepresentative, outdated or biased, leading to poor quality outputs and unrecognised model risk.
R.3.1
3 Controls
CONTROL

Define internal rules and controls for data completeness, correctness, integrity, availability and access

G.3.1.1
EVALUATION

Structured Data Completeness

Measures the degree to which all required attributes and time periods are present in a structured dataset, across attribute completeness and time completeness dimensions.
learn more →
EVALUATION

Structured Data Accuracy

Measures the degree to which attributes in a structured (tabular) dataset correctly represent the true value of the intended concept, across syntactic, type, and semantic accuracy dimensions.
learn more →
CONTROL

Assess representativeness, timeliness and bias of training data

G.3.1.2
EVALUATION

Structured Data Representativeness

Measures how well a structured dataset reflects the distribution of the target population or deployment environment for an AI application.
learn more →
EVALUATION

Structured Data Bias

Measures the degree to which a structured dataset contains systematic skews that cause an AI model to produce unfair or discriminatory outputs across subgroups.
learn more →
CONTROL

Address challenges of unstructured data quality assessment

G.3.1.3
EVALUATION

Unstructured Data Accuracy

Measures the degree to which attributes in an unstructured dataset correctly represent the true value of the intended concept, across syntactic, type, and semantic accuracy dimensions.
learn more →
RISK

Opaque or unsuitable vendor data

For purchased solutions, institutions have limited influence or knowledge of underlying data, which may be unsuitable or deliberately manipulated.
R.3.2
2 Controls
CONTROL

Require transparency on vendor data sources and suitability

G.3.2.1
CONTROL

Manage risk of manipulation or poisoning of external data

G.3.2.2
EVALUATION

Data Poisoning

Detects poisoned samples in a dataset that could elicit backdoor behaviour when used for LLM training — including trigger-payload pairs, sleeper agent patterns, and adversarial trigger phrases.
learn more →

Testing and ongoing monitoring

Ensure AI applications are adequately tested before and during use, with clear performance indicators, thresholds and continuous monitoring.
RISK

Inadequate testing and validation of AI models

Weaknesses in performance indicators, test design and validation undermine assurance on accuracy, robustness, stability and bias.
R.4.1
5 Controls
CONTROL

Ensure data quality tests are scheduled

G.4.1.1
CONTROL

Ensure tests for AI model quality (accuracy, robustness, stability, bias) are scheduled

G.4.1.2
EVALUATION

RAG Recall

Measures the fraction of ground-truth answer claims that are correctly reproduced by the RAG model's response.
learn more →
EVALUATION

Text Robustness

Evaluates whether AI models produce semantically consistent responses when the same question is presented with meaning-preserving input perturbations such as typos, casing changes, or paraphrasing.
learn more →
EVALUATION

RAG Hallucination

Measures the rate at which a RAG model generates claims that are neither supported by the retrieved references nor correct with respect to the ground-truth answer.
learn more →
CONTROL

Use diverse test types (backtesting, sensitivity, adversarial, benchmark models)

G.4.1.3
CONTROL

Ensure domain experts provide predefined expectations

G.4.1.4
CONTROL

Define performance indicators in advance

G.4.1.5
RISK

Insufficient ongoing monitoring and control

Lack of regular checks, thresholds and monitoring of data drift and user overrides leads to undetected model degradation.
R.4.2
3 Controls
CONTROL

Define thresholds and validation methods to ensure ongoing output quality

G.4.2.1
CONTROL

Monitor data drift and adapt models to changes in input data

G.4.2.2
EVALUATION

Data Drift

Measures the degree to which each sample in a new dataset has drifted from the reference (training) distribution, detecting covariate shift at both the dataset and per-sample level.
learn more →
CONTROL

Analyse user overrides to identify weaknesses

G.4.2.3
RISK

Lack of fallback and exception handling

No pre-planned mechanisms exist to handle undesirable model behaviour, failures or exceptions, especially in adaptive systems.
R.4.3
2 Controls
CONTROL

Define fallback mechanisms for AI applications and test them

G.4.3.1
CONTROL

Pre-define recognition and handling of exceptions

G.4.3.2

Documentation

Ensure AI applications and their lifecycle are documented centrally and consistently for effective oversight and risk management.
RISK

Inadequate documentation standards and traceability

Documentation is missing, fragmented or not recipient-oriented, hindering understanding of model purpose, behaviour, assumptions and controls.
R.5.1
5 Controls
CONTROL

Establish centralised and recipient-oriented documentation requirements

G.5.1.1
CONTROL

Document model lifecycle: purpose, data, model selection, performance, assumptions, limitations, tests, controls, fallback

G.5.1.2
CONTROL

Document data sources and data quality checks (integrity, correctness, appropriateness, relevance, bias, stability)

G.5.1.3
CONTROL

Ensure robustness, reliability and traceability

G.5.1.4
EVALUATION

Text Robustness

Evaluates whether AI models produce semantically consistent responses when the same question is presented with meaning-preserving input perturbations such as typos, casing changes, or paraphrasing.
learn more →
EVALUATION

RAG Faithfulness

Measures the fraction of a RAG model's response claims that are grounded in the retrieved references.
learn more →
EVALUATION

RAG Hallucination

Measures the rate at which a RAG model generates claims that are neither supported by the retrieved references nor correct with respect to the ground-truth answer.
learn more →
CONTROL

Categorise applications into risk categories with justification and review

G.5.1.5

Explainability

Ensure AI results can be understood, explained and critically assessed, especially where decisions must be justified to stakeholders.
RISK

Lack of explainability

Users cannot understand or explain model outputs, limiting ability to critically assess decisions and satisfy stakeholder expectations.
R.6.1
2 Controls
CONTROL

Improve explainability where decisions must be justified

G.6.1.1
EVALUATION

RAG Faithfulness

Measures the fraction of a RAG model's response claims that are grounded in the retrieved references.
learn more →
EVALUATION

RAG Hallucination

Measures the rate at which a RAG model generates claims that are neither supported by the retrieved references nor correct with respect to the ground-truth answer.
learn more →
CONTROL

Ensure understanding of model drivers and behaviour under different conditions

G.6.1.2
RISK

Lack of reproducibility

Model outputs cannot be reproduced or checked for repeatability, undermining trust, auditability and critical assessment.
R.6.2
1 Control
CONTROL

Use tests and monitoring to check repeatability of model results

G.6.2.1

Independent review

Ensure independent, qualified review of AI models and development processes to reduce model risk.
RISK

Weak independent review and challenge

There is no clear separation between model development and review; few institutions conduct an independent review of the full development process with qualified staff; feedback from reviews may not be incorporated.
R.7.1
3 Controls
CONTROL

Separate AI development from independent review functions

G.7.1.1
CONTROL

Perform independent review of the entire model development process by qualified personnel

G.7.1.2
CONTROL

Ensure the review provides an objective, unbiased opinion and is fed back into model development

G.7.1.3