Structured Data Completeness
Data Quality
Overview
The Structured Data Completeness evaluation measures how complete a structured dataset is with respect to the attributes required by an AI system. Each record is inspected for missing or null values, and the results are aggregated into a completeness metric.
Metrics
Data Completeness
The fraction of required attributes that are present and non-null across all records (range: 0.0 to 1.0).
Motivation
Missing values are common and expected in tabular data, and many imputation techniques exist to address them. This evaluation is not designed to enforce 100% completeness. Instead, it ensures that:
- Completeness requirements are explicitly documented - the acceptable threshold for each attribute is defined and version-controlled alongside the dataset.
- Requirements are not violated over time - if completeness degrades below the documented threshold, the evaluation flags it before it reaches production.
A complementary model-performance evaluation connects completeness to model outputs, ensuring that model quality does not degrade for samples with low completeness.
Methodology
- Samples: Each record in the dataset is scored independently based on the presence and population of its required attributes.
- Scoring: Each sample is scored by the Completeness Scorer, which checks whether required attributes are populated (non-null, non-empty). Scores are averaged into an aggregate completeness metric.
Scoring
Completeness Scorer
Examples
Complete record - all required attributes present (passing)
Incomplete record - required attributes missing (failing)
Partially complete record (partial)