Download Framework
FINMA / AI Governance and Risk Management
Guidance on governance and risk management when using artificial intelligence (AI) developed by the Swiss Financial Market Supervisory Authority FINMA. The guidance draws attention to the risks associated with the use of AI and describes FINMA’s observations from ongoing supervision.
Type:
Regulation
Domain:
Finance
Coverage:
Accountability & Governance
Cybersecurity
Privacy & Data
Transparency
Performance & Reliability
Legal & Compliance
Region:
Switzerland
Content:
13 Risks
35 Controls
Version: 08/2024
Framework Definition
Risks and controls associated with the framework
Assessment Layer
Concrete evaluations linked to controls to assess pass or fail
FINMA Research Assistant Mapping
Governance and accountability
Ensure AI-related risks (model, IT/cyber, legal, reputational, third-party) are covered by governance, risk management and control systems in a proportional, technology-neutral way.
RISK
Inadequate AI governance and accountability
AI use is not holistically governed; focus is narrow (e.g. only data protection), responsibilities are unclear, and model, IT/cyber, legal and reputational risks are not consistently addressed.
R.1.1
3 Controls
CONTROL
Establish AI governance framework with central inventory, risk classification and role definitions
G.1.1.1
CONTROL
Avoid decentralised, unmanaged AI development
G.1.1.2
CONTROL
Ensure appropriately skilled staff and broad training for AI roles
G.1.1.3
RISK
Third-party / outsourcing risk
AI capabilities are sourced from third parties (models, cloud, tools) without sufficient transparency, due diligence, controls or clear allocation of responsibilities and liability.
R.1.2
2 Controls
CONTROL
Strengthen due diligence on external and cloud-based AI (data, methods, AI presence)
G.1.2.1
CONTROL
Use contractual clauses and controls to govern outsourced AI responsibilities and liability
G.1.2.2
Inventory and risk classification
Maintain a complete, technology-neutral view of AI uses, with risk-based classification by materiality and probability of risk materialisation.
RISK
Incomplete AI inventory and scope
AI is defined too narrowly, development is distributed and (especially with generative AI) easily accessible, leading to incomplete inventories and blind spots.
R.2.1
2 Controls
CONTROL
Use a broad AI definition aligned with international standards (incl. traditional applications with similar risks)
G.2.1.1
CONTROL
Build and maintain complete, centrally managed AI inventories
G.2.1.2
RISK
Inadequate risk-based classification of AI use
Applications are not systematically classified based on materiality, specific risks and probability; high-impact use cases (compliance, critical functions, strong client impact) may not receive stronger controls.
R.2.2
2 Controls
CONTROL
Define criteria for AI materiality, specific risks and probability of risk materialisation
G.2.2.1
CONTROL
Prioritise higher-risk AI uses (e.g. compliance, critical functions, strong client impact)
G.2.2.2
Data quality
Ensure training and input data are complete, correct, representative and of sufficient quality (incl. unstructured and third-party data).
RISK
Poor or inappropriate data quality
Data may be incorrect, inconsistent, incomplete, unrepresentative, outdated or biased, leading to poor quality outputs and unrecognised model risk.
R.3.1
3 Controls
CONTROL
Define internal rules and controls for data completeness, correctness, integrity, availability and access
G.3.1.1
EVALUATION
Structured Data Completeness
Measures the degree to which all required attributes and time periods are present in a structured dataset, across attribute completeness and time completeness dimensions.
learn more →
EVALUATION
Structured Data Accuracy
Measures the degree to which attributes in a structured (tabular) dataset correctly represent the true value of the intended concept, across syntactic, type, and semantic accuracy dimensions.
learn more →
CONTROL
Assess representativeness, timeliness and bias of training data
G.3.1.2
EVALUATION
Structured Data Representativeness
Measures how well a structured dataset reflects the distribution of the target population or deployment environment for an AI application.
learn more →
EVALUATION
Structured Data Bias
Measures the degree to which a structured dataset contains systematic skews that cause an AI model to produce unfair or discriminatory outputs across subgroups.
learn more →
CONTROL
Address challenges of unstructured data quality assessment
G.3.1.3
EVALUATION
Unstructured Data Accuracy
Measures the degree to which attributes in an unstructured dataset correctly represent the true value of the intended concept, across syntactic, type, and semantic accuracy dimensions.
learn more →
RISK
Opaque or unsuitable vendor data
For purchased solutions, institutions have limited influence or knowledge of underlying data, which may be unsuitable or deliberately manipulated.
R.3.2
2 Controls
CONTROL
Require transparency on vendor data sources and suitability
G.3.2.1
CONTROL
Manage risk of manipulation or poisoning of external data
G.3.2.2
EVALUATION
Data Poisoning
Detects poisoned samples in a dataset that could elicit backdoor behaviour when used for LLM training — including trigger-payload pairs, sleeper agent patterns, and adversarial trigger phrases.
learn more →
Testing and ongoing monitoring
Ensure AI applications are adequately tested before and during use, with clear performance indicators, thresholds and continuous monitoring.
RISK
Inadequate testing and validation of AI models
Weaknesses in performance indicators, test design and validation undermine assurance on accuracy, robustness, stability and bias.
R.4.1
5 Controls
CONTROL
Ensure data quality tests are scheduled
G.4.1.1
CONTROL
Ensure tests for AI model quality (accuracy, robustness, stability, bias) are scheduled
G.4.1.2
EVALUATION
RAG Recall
Measures the fraction of ground-truth answer claims that are correctly reproduced by the RAG model's response.
learn more →
EVALUATION
Text Robustness
Evaluates whether AI models produce semantically consistent responses when the same question is presented with meaning-preserving input perturbations such as typos, casing changes, or paraphrasing.
learn more →
EVALUATION
RAG Hallucination
Measures the rate at which a RAG model generates claims that are neither supported by the retrieved references nor correct with respect to the ground-truth answer.
learn more →
CONTROL
Use diverse test types (backtesting, sensitivity, adversarial, benchmark models)
G.4.1.3
CONTROL
Ensure domain experts provide predefined expectations
G.4.1.4
CONTROL
Define performance indicators in advance
G.4.1.5
RISK
Insufficient ongoing monitoring and control
Lack of regular checks, thresholds and monitoring of data drift and user overrides leads to undetected model degradation.
R.4.2
3 Controls
CONTROL
Define thresholds and validation methods to ensure ongoing output quality
G.4.2.1
CONTROL
Monitor data drift and adapt models to changes in input data
G.4.2.2
EVALUATION
Data Drift
Measures the degree to which each sample in a new dataset has drifted from the reference (training) distribution, detecting covariate shift at both the dataset and per-sample level.
learn more →
CONTROL
Analyse user overrides to identify weaknesses
G.4.2.3
RISK
Lack of fallback and exception handling
No pre-planned mechanisms exist to handle undesirable model behaviour, failures or exceptions, especially in adaptive systems.
R.4.3
2 Controls
CONTROL
Define fallback mechanisms for AI applications and test them
G.4.3.1
CONTROL
Pre-define recognition and handling of exceptions
G.4.3.2
Documentation
Ensure AI applications and their lifecycle are documented centrally and consistently for effective oversight and risk management.
RISK
Inadequate documentation standards and traceability
Documentation is missing, fragmented or not recipient-oriented, hindering understanding of model purpose, behaviour, assumptions and controls.
R.5.1
5 Controls
CONTROL
Establish centralised and recipient-oriented documentation requirements
G.5.1.1
CONTROL
Document model lifecycle: purpose, data, model selection, performance, assumptions, limitations, tests, controls, fallback
G.5.1.2
CONTROL
Document data sources and data quality checks (integrity, correctness, appropriateness, relevance, bias, stability)
G.5.1.3
CONTROL
Ensure robustness, reliability and traceability
G.5.1.4
EVALUATION
Text Robustness
Evaluates whether AI models produce semantically consistent responses when the same question is presented with meaning-preserving input perturbations such as typos, casing changes, or paraphrasing.
learn more →
EVALUATION
RAG Faithfulness
Measures the fraction of a RAG model's response claims that are grounded in the retrieved references.
learn more →
EVALUATION
RAG Hallucination
Measures the rate at which a RAG model generates claims that are neither supported by the retrieved references nor correct with respect to the ground-truth answer.
learn more →
CONTROL
Categorise applications into risk categories with justification and review
G.5.1.5
Explainability
Ensure AI results can be understood, explained and critically assessed, especially where decisions must be justified to stakeholders.
RISK
Lack of explainability
Users cannot understand or explain model outputs, limiting ability to critically assess decisions and satisfy stakeholder expectations.
R.6.1
2 Controls
CONTROL
Improve explainability where decisions must be justified
G.6.1.1
EVALUATION
RAG Faithfulness
Measures the fraction of a RAG model's response claims that are grounded in the retrieved references.
learn more →
EVALUATION
RAG Hallucination
Measures the rate at which a RAG model generates claims that are neither supported by the retrieved references nor correct with respect to the ground-truth answer.
learn more →
CONTROL
Ensure understanding of model drivers and behaviour under different conditions
G.6.1.2
RISK
Lack of reproducibility
Model outputs cannot be reproduced or checked for repeatability, undermining trust, auditability and critical assessment.
R.6.2
1 Control
CONTROL
Use tests and monitoring to check repeatability of model results
G.6.2.1
Independent review
Ensure independent, qualified review of AI models and development processes to reduce model risk.
RISK
Weak independent review and challenge
There is no clear separation between model development and review; few institutions conduct an independent review of the full development process with qualified staff; feedback from reviews may not be incorporated.
R.7.1
3 Controls
CONTROL
Separate AI development from independent review functions
G.7.1.1
CONTROL
Perform independent review of the entire model development process by qualified personnel
G.7.1.2
CONTROL
Ensure the review provides an objective, unbiased opinion and is fed back into model development
G.7.1.3