Structured Data Bias
Data Quality
Bias & Fairness
Overview
The Structured Data Bias evaluation measures the extent to which systematic skews in a structured dataset cause an AI model to produce outputs that are unfair or discriminatory across subgroups. Each sample is inspected for disparate outcomes across protected attributes, and the result is aggregated into a bias metric.
The evaluation covers:
- Subgroup partitioning: records are segmented by the values of protected or sensitive attributes (e.g. age, gender, geography, client segment).
- Outcome disparity: model predictions, scores, or decisions are compared across subgroups to detect systematic gaps.
- Bias scoring: the degree of disparity is quantified as a single score from 0.0 (equitable) to 1.0 (maximum systematic disadvantage for at least one group).
Metrics
Bias
The aggregate measure of systematic disparity in model outputs across protected subgroups in the dataset (range: 0.0 to 1.0). A score of 0.0 means no detectable disparity; a score of 1.0 means maximum systematic disadvantage for at least one group.
Motivation
Historical structured data commonly encodes past discrimination - lending decisions, hiring outcomes, insurance pricing - which an AI model will learn and reproduce unless the bias is detected and mitigated. A dataset can appear statistically sound in aggregate while still producing harmful disparities for specific subgroups, and those disparities are invisible without explicitly partitioning by protected attributes.
This evaluation surfaces those disparities before the data is used for training or inference, so that remediation (reweighting, resampling, or targeted data collection) can be applied with knowledge of which subgroups are most affected.
Methodology
- Samples: Each sample in the dataset is scored as a subgroup outcome comparison for a given protected attribute.
- Scoring: Each sample is scored by the Bias Scorer, which measures the degree of disparity in model outputs across the identified subgroups. Per-attribute bias scores are averaged into an aggregate bias metric.
Scoring
Bias Scorer
Examples
Equitable credit scoring - no age-based disparity (passing)
Biased credit scoring - significant age-based disparity (failing)
Moderate geographic bias - one region underserved (partial)