structured_data_completeness

Structured Data Completeness

Measures the degree to which all required attributes and time periods are present in a structured dataset, across attribute completeness and time completeness dimensions.

Tags:

Data Quality

Overview

The Structured Data Completeness evaluation measures how complete a structured dataset is with respect to the attributes required by an AI system. Each record is inspected for missing or null values, and the results are aggregated into a completeness metric.

Metrics

Data Completeness

The fraction of required attributes that are present and non-null across all records (range: 0.0 to 1.0).

Data Completeness

0.01.0

0.0

0.5

0.95

1.0

0.0Dataset is entirely empty or all required attributes are null - unusable for AI processing.

0.5Half of expected attribute values are missing - significant gaps that will degrade model outputs and likely violate documented completeness thresholds.

0.95Minor incompleteness - 5% of attribute values are missing. Whether this is acceptable depends on the documented completeness threshold for each attribute.

1.0Perfect completeness - all required attributes are fully populated across all records.

Motivation

Missing values are common and expected in tabular data, and many imputation techniques exist to address them. This evaluation is not designed to enforce 100% completeness. Instead, it ensures that:

Completeness requirements are explicitly documented - the acceptable threshold for each attribute is defined and version-controlled alongside the dataset.
Requirements are not violated over time - if completeness degrades below the documented threshold, the evaluation flags it before it reaches production.

A complementary model-performance evaluation connects completeness to model outputs, ensuring that model quality does not degrade for samples with low completeness.

Methodology

Samples: Each record in the dataset is scored independently based on the presence and population of its required attributes.
Scoring: Each sample is scored by the Completeness Scorer, which checks whether required attributes are populated (non-null, non-empty). Scores are averaged into an aggregate completeness metric.

Scoring

Completeness Scorer

Completeness

Score valueExplanation

1.0All required attributes are present and populated. No missing values.

0.5Some required attributes are null or empty. Completeness is below the documented threshold.

0.0All required attributes are missing. The record is unusable for AI processing.

Examples

Complete record - all required attributes present (passing)

Sample

customer_idC-482

period2024-Q3

transaction_count14

total_value_chf28400

account_statusactive

risk_categorylow

Scorer

1.0

All required attributes are present and populated. The record is fully complete.

Incomplete record - required attributes missing (failing)

Sample

customer_idC-917

period2024-Q3

transaction_countnull

total_value_chfnull

account_statusnull

risk_categorylow

Scorer

0.25

3 of 4 required attributes (transaction_count, total_value_chf, account_status) are null. The record score is 0.25.

Partially complete record (partial)

Sample

customer_idC-204

period2024-Q3

transaction_count7

total_value_chfnull

account_statusactive

risk_categorynull

Scorer

0.5

2 of 4 required attributes are missing (total_value_chf, risk_category). Partial completeness may produce degraded model outputs depending on which fields the model relies on.

Run Evaluation in LatticeFlow AI Platform

Use the following CLI command to initialize and run the evaluation in LatticeFlow AI Platform.

Structured Data Completeness

Data Quality

Overview

Metrics

Data Completeness

Motivation

Methodology

Scoring

Completeness Scorer

Examples

Run Evaluation in LatticeFlow AI Platform

Metrics

Data Completeness

Don't have the LatticeFlow AI Platform?