structured_data_completeness

Structured Data Completeness

Measures the degree to which all required attributes and time periods are present in a structured dataset, across attribute completeness and time completeness dimensions.
Tags:

Data Quality

Overview

The Structured Data Completeness evaluation measures how complete a structured dataset is with respect to the attributes required by an AI system. Each record is inspected for missing or null values, and the results are aggregated into a completeness metric.

Metrics

Data Completeness

The fraction of required attributes that are present and non-null across all records (range: 0.0 to 1.0).

Data Completeness
0.01.0
0.0
0.5
0.95
1.0
0.0Dataset is entirely empty or all required attributes are null - unusable for AI processing.
0.5Half of expected attribute values are missing - significant gaps that will degrade model outputs and likely violate documented completeness thresholds.
0.95Minor incompleteness - 5% of attribute values are missing. Whether this is acceptable depends on the documented completeness threshold for each attribute.
1.0Perfect completeness - all required attributes are fully populated across all records.

Motivation

Missing values are common and expected in tabular data, and many imputation techniques exist to address them. This evaluation is not designed to enforce 100% completeness. Instead, it ensures that:

  • Completeness requirements are explicitly documented - the acceptable threshold for each attribute is defined and version-controlled alongside the dataset.
  • Requirements are not violated over time - if completeness degrades below the documented threshold, the evaluation flags it before it reaches production.

A complementary model-performance evaluation connects completeness to model outputs, ensuring that model quality does not degrade for samples with low completeness.

Methodology

  1. Samples: Each record in the dataset is scored independently based on the presence and population of its required attributes.
  2. Scoring: Each sample is scored by the Completeness Scorer, which checks whether required attributes are populated (non-null, non-empty). Scores are averaged into an aggregate completeness metric.

Scoring

Completeness Scorer

Completeness
Score valueExplanation
1.0All required attributes are present and populated. No missing values.
0.5Some required attributes are null or empty. Completeness is below the documented threshold.
0.0All required attributes are missing. The record is unusable for AI processing.

Examples

Complete record - all required attributes present (passing)

Sample
customer_idC-482
period2024-Q3
transaction_count14
total_value_chf28400
account_statusactive
risk_categorylow
Scorer
1.0All required attributes are present and populated. The record is fully complete.

Incomplete record - required attributes missing (failing)

Sample
customer_idC-917
period2024-Q3
transaction_countnull
total_value_chfnull
account_statusnull
risk_categorylow
Scorer
0.253 of 4 required attributes (transaction_count, total_value_chf, account_status) are null. The record score is 0.25.

Partially complete record (partial)

Sample
customer_idC-204
period2024-Q3
transaction_count7
total_value_chfnull
account_statusactive
risk_categorynull
Scorer
0.52 of 4 required attributes are missing (total_value_chf, risk_category). Partial completeness may produce degraded model outputs depending on which fields the model relies on.

Run Evaluation in LatticeFlow AI Platform

Use the following CLI command to initialize and run the evaluation in LatticeFlow AI Platform.
Requires LatticeFlow AI Platform CLI
lf init --atlas structured_data_completeness

Metrics

Data Completeness

Don't have the LatticeFlow AI Platform?

Contact us to see this evaluation in action:
Contact Us