Structured Data Bias
Data Quality
Bias & Fairness
Overview
The Structured Data Bias evaluation measures the extent to which systematic skews in a structured dataset cause an AI model to produce outputs that are unfair or discriminatory across subgroups. Each sample is inspected for disparate outcomes across protected attributes, and the result is aggregated into a bias metric.
The evaluation covers:
- Subgroup partitioning: records are segmented by the values of protected or sensitive attributes (e.g. age, gender, geography, client segment).
- Outcome disparity: model predictions, scores, or decisions are compared across subgroups to detect systematic gaps.
- Bias scoring: the degree of disparity is quantified as a single score from 0.0 (equitable) to 1.0 (maximum systematic disadvantage for at least one group).
Metrics
Bias
The aggregate measure of systematic disparity in model outputs across protected subgroups in the dataset (range: 0.0 to 1.0). A score of 0.0 means no detectable disparity; a score of 1.0 means maximum systematic disadvantage for at least one group.
Motivation
Historical structured data commonly encodes past discrimination - lending decisions, hiring outcomes, insurance pricing - which an AI model will learn and reproduce unless the bias is detected and mitigated. A dataset can appear statistically sound in aggregate while still producing harmful disparities for specific subgroups, and those disparities are invisible without explicitly partitioning by protected attributes.
This evaluation surfaces those disparities before the data is used for training or inference, so that remediation (reweighting, resampling, or targeted data collection) can be applied with knowledge of which subgroups are most affected.
Methodology
- Samples: Each sample in the dataset is scored as a subgroup outcome comparison for a given protected attribute.
- Scoring: Each sample is scored by the Bias Scorer, which measures the degree of disparity in model outputs across the identified subgroups. Per-attribute bias scores are averaged into an aggregate bias metric.
Scoring
Bias Scorer
Examples
Equitable credit scoring - no age-based disparity (passing)
Credit scores and approval rates are equitably distributed across all age groups. The maximum score gap of 1.6 points and approval rate gap of 3% are within acceptable tolerance, indicating no systematic age-based bias in the dataset.
Biased credit scoring - significant age-based disparity (failing)
Credit scores and approval rates drop systematically with age. The 24-point score gap and 35% approval rate gap between the youngest and oldest groups are highly significant, indicating that the dataset has encoded historical age-based discrimination that the model is reproducing.
Moderate geographic bias - one region underserved (partial)
The southern region shows a notable score gap (7.4 points) and approval rate gap (12%) compared to other regions. This moderate disparity warrants investigation into whether geographic proxies for protected attributes are driving the difference.