Data Drift
Data Quality
Overview
The Data Drift evaluation measures the degree to which samples in a new dataset have drifted from a reference distribution established at training time.
Metrics
Dataset Drift
The fraction of production samples that are consistent with the reference distribution (range: 0.0 to 1.0).
Motivation
Data drift occurs when the statistical distribution of model inputs changes between training and production. Even when the underlying labelling function is unchanged, a model trained on the original distribution can degrade substantially on drifted inputs, because its learned decision boundaries are calibrated to training-time patterns it no longer reliably encounters.
Drift accumulates silently: seasonal patterns, changes in user behaviour, upstream pipeline changes, or the natural evolution of a domain can all shift the input distribution without triggering any visible error. Without a systematic measure, degradation goes undetected until it manifests as measurable business impact.
This evaluation quantifies drift at the dataset level - what fraction of production samples have shifted outside the reference distribution - making it possible to decide whether to retrain, adjust thresholds, or route drifted inputs to a fallback system.
Methodology
- Samples: Each production sample is scored independently against the reference distribution.
- Scoring: Each sample is assessed by the Distribution Shift Scorer, which estimates the likelihood that the sample was drawn from the reference distribution using a density ratio estimator trained on reference and production samples. A score of 1.0 means the sample is well within the reference distribution; 0.0 means it falls in a region the reference distribution does not cover.
The dataset drift metric summarises the fraction of production samples that are in-distribution.
Scoring
Distribution Shift Scorer
Examples
In-distribution sample - no drift detected (passing)
Out-of-distribution sample - new topic introduced after training cutoff (failing)