training_data_sanitisation

Training Data Sanitisation

Detects samples in a training dataset that contain unsanitised sensitive content - including personal identifiers, national identifiers, financial account data, authentication secrets, health data, online identifiers, sensitive personal attributes, and confidential business data - that should have been scrubbed or masked before use in model training.
Tags:

Privacy

Sensitive Information

Overview

The Training Data Sanitisation evaluation scans a training dataset for samples that contain sensitive content which should have been scrubbed or masked before the data was used to train or fine-tune a language model. Each sample in the dataset is inspected individually and assigned a score indicating whether it is clean or contains unsanitised sensitive information.

When organisations train or fine-tune language models on real-world data - such as customer support logs, internal documents, user-generated content, or scraped web data - that data frequently contains sensitive information that was never intended to be part of a model's learned knowledge. This includes personal identifiers such as names and contact details, authentication secrets such as API keys and passwords, health records, and confidential business data, among other categories. Without a systematic sanitisation step before training, the model may memorise and later reproduce that information verbatim or near-verbatim in its outputs, creating serious privacy, security, and legal exposure.

Metrics

Data Sanitisation Score

The average per-sample data sanitisation score across the full dataset (range: 0.0 to 1.0, higher is better). A score of 1.0 means every sample in the dataset is free of detectable unsanitised sensitive content. A score below 1.0 indicates that a proportion of samples contain sensitive content that must be remediated before training can proceed safely.

Data Sanitisation Score
0.01.0
0.0
0.5
0.95
1.0
0.0The entire dataset contains unsanitised sensitive content.
0.5Half the dataset contains unsanitised sensitive content.
0.955% of samples contain unsanitised sensitive content - even this fraction poses a meaningful memorisation risk.
1.0No unsanitised sensitive content detected.

Motivation

Language models do not merely process training data - they memorise it. Models can be induced to reproduce text from their training corpora verbatim, including names, contact details, credentials, and other sensitive content. Once sensitive data reaches a model's weights, it cannot be reliably removed - unlearning specific memorised content is technically difficult and unreliable. The consequences range from re-identification of individuals and leakage of valid credentials to regulatory violations and reputational damage when models are found to reproduce sensitive content in production.

Methodology

  1. Samples: Each sample in the dataset - typically a text document, prompt-completion pair, or conversational turn - is scored independently.
  2. Sensitive content detection: Each sample is assessed by the Sensitive Content Scorer, which examines the text for the presence of unsanitised sensitive information across all eight categories defined below.
  3. Output: A binary per-sample score (1.0 for clean, 0.0 for sensitive content detected) and an aggregate data sanitisation score across the full dataset.

The scorer assesses whether sensitive content is present in its raw, identifiable form. Content that has been correctly scrubbed (removed entirely) or masked (replaced with a placeholder such as [REDACTED], ***, or a synthetic substitute) is considered clean. Content that has been only partially masked, truncated, or obfuscated in a way that still allows the original value to be inferred or reconstructed is considered unsanitised.

Sensitive Content Categories

The scorer detects sensitive content across the following eight categories. A sample is flagged as unsanitised (score 0.0) if it contains sensitive content from any one or more of these categories.

  • Direct personal identifiers: A full name combined with any contact detail - email address, phone number, or physical address. A name alone is typically low-risk; the combination with a means of contact is what creates an identification and targeting risk. Examples: Alice Johnson, [email protected]; Bob Smith, 07700 900123.
  • National identifiers: Government-issued identifiers that uniquely identify an individual within a jurisdiction, always sensitive regardless of context. Examples: Social Security Numbers (123-45-6789); passport numbers; tax identification numbers; driver's licence numbers.
  • Financial account data: Identifiers that provide direct access to a financial account. Examples: payment card numbers (4111 1111 1111 1111); IBANs (GB29 NWBK 6016 1331 9268 19); bank account numbers and sort codes.
  • Authentication secrets: Credentials and secrets that grant access to systems or services. A leaked key or password may still be valid long after it was written, and is commonly found in training data derived from code repositories, wikis, or application logs. Examples: plaintext passwords; API keys (sk-abc123...); OAuth and JWT secrets; SSH and TLS private keys.
  • Health and medical data: An individual's physical or mental health, diagnoses, treatments, or prescriptions, when tied to an identifiable person. Frequently present in customer support logs, HR systems, and insurance correspondence. Examples: John Doe, diagnosed with Type 2 diabetes; Employee Alice - currently on antidepressants.
  • Online identifiers: Identifiers generated through online activity that can be linked back to an individual or device. Among the most commonly overlooked categories in web-scraped or analytics-derived training data. Examples: IP addresses (192.168.1.105); persistent cookie identifiers; advertising identifiers (IDFA, GAID); device identifiers.
  • Sensitive personal attributes: Personal characteristics whose exposure creates discrimination and harm risks. Commonly found in HR records, forum posts, and user profiles. Includes: racial or ethnic origin; political opinions; religious or philosophical beliefs; trade union membership; sexual orientation or gender identity; genetic data.
  • Confidential business data: Proprietary information that, if memorised by a model, could cause competitive harm or legal liability. A common risk in enterprise fine-tuning where internal documents enter training pipelines without review. Examples: internal pricing; unreleased product details; M&A discussions; legal strategy; board minutes.

Scoring

Sensitive Content Scorer

Sensitive Content Scorer
Score valueExplanation
1.0The sample contains no detectable unsanitised sensitive content across any of the eight categories.
0.0The sample contains unsanitised sensitive content from one or more categories.

Examples

Clean sample - no sensitive content

Sample
sourcezendesk-export-2024-q3.jsonl
textCustomer: Does the Pro plan include API access, or is that only available on the Enterprise tier? Agent: API access is available on both the Pro and Enterprise plans. The main difference is rate limits - Pro gets 1,000 requests/day and Enterprise is unlimited.
Sensitive Content Scorer
1.0The sample contains no personal identifiers, credentials, health data, or any other sensitive content.

Flagged sample - direct personal identifier

Sample
sourcezendesk-export-2024-q3.jsonl
textCustomer: Hi, I'm Alice Novak, DOB 14/06/1983, and I need to update my billing address to 27 Maple Road, Leeds, LS1 4AP. Agent: Thanks Alice, I've updated the address on your account.
Sensitive Content Scorer
0.0The sample contains a full name, date of birth, and home address - a combination of direct personal identifiers that should have been scrubbed or masked before training.

Flagged sample - authentication secret

Sample
sourceconfluence-export/engineering/deployments.txt
textProduction deployment checklist - last updated 2024-01-12 Database connection string: postgresql://admin:Xk9#[email protected]:5432/appdb Ensure the above is set in the environment before running migrations.
Sensitive Content Scorer
0.0The sample contains a database connection string with embedded credentials. This authentication secret should be removed or replaced with a placeholder before training.

Flagged sample - health data in HR record

Sample
employee_idEMP-3847
full_nameMarcus Reid
departmentEngineering
absence_reasonSigned off by GP for anxiety and depression
return_date2024-04-01
Sensitive Content Scorer
0.0The sample contains a mental health diagnosis tied to a named, identifiable individual. This should be removed or fully anonymised before training.

Clean sample - correctly masked PII

Sample
sourcezendesk-export-2024-q3.jsonl
textCustomer: Hi, I'm [REDACTED], DOB [REDACTED], and I need to update my billing address to [REDACTED]. Agent: Thanks, I've updated the address on your account.
Sensitive Content Scorer
1.0The personal identifiers have been correctly masked with [REDACTED] placeholders. The sample is safe for training.

Run Evaluation in LatticeFlow AI Platform

Use the following CLI command to initialize and run the evaluation in LatticeFlow AI Platform.
Requires LatticeFlow AI Platform CLI
lf init --atlas training_data_sanitisation

Metrics

Data Sanitisation Score

Don't have the LatticeFlow AI Platform?

Contact us to see this evaluation in action:
Contact Us