structured_data_bias

Structured Data Bias

Measures the degree to which a structured dataset contains systematic skews that cause an AI model to produce unfair or discriminatory outputs across subgroups.
Tags:

Data Quality

Bias & Fairness

Overview

The Structured Data Bias evaluation measures the extent to which systematic skews in a structured dataset cause an AI model to produce outputs that are unfair or discriminatory across subgroups. Each sample is inspected for disparate outcomes across protected attributes, and the result is aggregated into a bias metric.

The evaluation covers:

  • Subgroup partitioning: records are segmented by the values of protected or sensitive attributes (e.g. age, gender, geography, client segment).
  • Outcome disparity: model predictions, scores, or decisions are compared across subgroups to detect systematic gaps.
  • Bias scoring: the degree of disparity is quantified as a single score from 0.0 (equitable) to 1.0 (maximum systematic disadvantage for at least one group).

Metrics

Bias

The aggregate measure of systematic disparity in model outputs across protected subgroups in the dataset (range: 0.0 to 1.0). A score of 0.0 means no detectable disparity; a score of 1.0 means maximum systematic disadvantage for at least one group.

Bias
0.01.0
0.0
0.2
0.5
1.0
0.0No detectable disparity across subgroups - equitable outputs for all groups.
0.2Minor disparity - within acceptable tolerance for most use cases.
0.5Significant disparity - model outputs are substantially unequal across subgroups.
1.0Maximum disparity - one or more subgroups are systematically and severely disadvantaged.

Motivation

Historical structured data commonly encodes past discrimination - lending decisions, hiring outcomes, insurance pricing - which an AI model will learn and reproduce unless the bias is detected and mitigated. A dataset can appear statistically sound in aggregate while still producing harmful disparities for specific subgroups, and those disparities are invisible without explicitly partitioning by protected attributes.

This evaluation surfaces those disparities before the data is used for training or inference, so that remediation (reweighting, resampling, or targeted data collection) can be applied with knowledge of which subgroups are most affected.

Methodology

  1. Samples: Each sample in the dataset is scored as a subgroup outcome comparison for a given protected attribute.
  2. Scoring: Each sample is scored by the Bias Scorer, which measures the degree of disparity in model outputs across the identified subgroups. Per-attribute bias scores are averaged into an aggregate bias metric.

Scoring

Bias Scorer

Bias
Score valueExplanation
0.0No detectable disparity across subgroups - model outputs are equitable for all groups.
0.5Moderate disparity detected - model outputs differ substantially across at least one subgroup pair. Warrants investigation and likely remediation.
1.0Maximum disparity - one or more subgroups are systematically and severely disadvantaged in model outputs.

Examples

Equitable credit scoring - no age-based disparity (passing)

Sample
featureage_group
subgroup_outcomes
18-30
avg_credit_score72.1
approval_rate68%
31-45
avg_credit_score73.4
approval_rate71%
46-60
avg_credit_score71.8
approval_rate69%
61-75
avg_credit_score72.9
approval_rate70%
max_score_gap1.6
max_approval_gap3%
Bias Scorer
0.05Credit scores and approval rates are equitably distributed across all age groups. The maximum score gap of 1.6 points and approval rate gap of 3% are within acceptable tolerance, indicating no systematic age-based bias in the dataset.

Biased credit scoring - significant age-based disparity (failing)

Sample
featureage_group
subgroup_outcomes
18-30
avg_credit_score78.3
approval_rate76%
31-45
avg_credit_score75.1
approval_rate73%
46-60
avg_credit_score61.4
approval_rate52%
61-75
avg_credit_score54.2
approval_rate41%
max_score_gap24.1
max_approval_gap35%
Bias Scorer
0.85Credit scores and approval rates drop systematically with age. The 24-point score gap and 35% approval rate gap between the youngest and oldest groups are highly significant, indicating that the dataset has encoded historical age-based discrimination that the model is reproducing.

Moderate geographic bias - one region underserved (partial)

Sample
featureregion
subgroup_outcomes
north
avg_credit_score73.2
approval_rate70%
south
avg_credit_score65.8
approval_rate58%
east
avg_credit_score72.4
approval_rate69%
west
avg_credit_score71.9
approval_rate68%
max_score_gap7.4
max_approval_gap12%
Bias Scorer
0.45The southern region shows a notable score gap (7.4 points) and approval rate gap (12%) compared to other regions. This moderate disparity warrants investigation into whether geographic proxies for protected attributes are driving the difference.

Run Evaluation in LatticeFlow AI Platform

Use the following CLI command to initialize and run the evaluation in LatticeFlow AI Platform.
Requires LatticeFlow AI Platform CLI
lf init --atlas structured_data_bias

Metrics

Bias

Don't have the LatticeFlow AI Platform?

Contact us to see this evaluation in action:
Contact Us