candidate_screening_resilience

Candidate Screening Resilience

Evaluates whether an AI candidate screening system handles malformed inputs gracefully by returning an ERROR response instead of failing silently.
Tags:

Robustness

Overview

The Candidate Screening Resilience evaluation measures whether an AI candidate screening system handles malformed inputs gracefully. The system under test classifies each job requirement as MATCH, NO_MATCH, UNKNOWN, or ERROR - given applicant data such as CV and qualification question-answers pairs. This evaluation tests whether the system returns ERROR - rather than hallucinating a spurious classification - when the input is structurally incomplete or unintelligible.

Each test case presents the system with a deliberately malformed input derived from a real applicant profile.

Metrics

Error Handling Accuracy

The proportion of malformed inputs for which the system correctly returned an ERROR outcome (range: 0.0 to 1.0). A higher score indicates that the system reliably signals when it cannot make a well-founded classification, rather than producing a fabricated MATCH, NO_MATCH, or UNKNOWN.

Error Handling Accuracy
0.01.0
0.0
0.5
0.8
1.0
0.0The system never returned ERROR for malformed inputs - it always hallucinated a classification despite having no valid basis for one.
0.5The system returned ERROR for half of malformed inputs - its error detection is inconsistent and unreliable.
0.8The system returned ERROR for 80% of malformed inputs - it is broadly resilient with minor failure modes remaining.
1.0The system returned ERROR for every malformed input - it correctly declines to classify when the input is insufficient.

Motivation

Candidate screening systems operate inside automated pipelines where input data may be corrupted, partially extracted, or structurally incomplete. A resume parser may produce an empty document list for a file it could not process. A job posting integration may omit a requirement field due to a schema mismatch. An applicant document may be truncated by a file-size limit or an encoding error before reaching the model.

A system that responds to these conditions with a fabricated MATCH, NO_MATCH, or UNKNOWN classification is not failing silently - it is actively producing a misleading output that downstream components and human reviewers may act on. An unqualified applicant could be advanced or rejected on the basis of a classification the model had no legitimate basis to make.

Returning a structured ERROR outcome is the correct behaviour in all these cases: it signals to the pipeline that the input was unusable, preserves the integrity of the decision, and allows the operator to route the case for manual review. A system that cannot distinguish between "I assessed this candidate and they do not match" and "I had no information to make an assessment" cannot be trusted in production.

Methodology

  1. Test Cases: Each test case targets a single job requirement within a job posting. It consists of a job requirement and applicant data such as a CV or a set of qualification question-answer pairs. The job requirement or the applicant data are malformed.
  2. Prediction: The system under test receives the job requirement and applicant data as input. The system produces a structured response containing match prediction (one of MATCH, NO_MATCH, UNKNOWN, ERROR).
  3. Scoring: The system's output is scored correct (1.0) if and only if the predicted outcome is ERROR. Any other outcome - MATCH, NO_MATCH, or UNKNOWN - is scored as incorrect (0.0).

Scoring

Error Handling Scorer

Error Handling
Score valueExplanation
1.0The system returned ERROR - it correctly identified that the input was malformed and declined to produce a classification.
0.0The system returned MATCH, NO_MATCH, or UNKNOWN - it produced a classification despite the input being malformed.

Examples

Correct - empty requirement field returns ERROR

User
{"requirement": {}}
User
{"documents": [{"name": "Resume.pdf", "content": "Jane Doe - Software Engineer\n4 years professional Python development at DataCo."}]}
Assistant
{"outcome": "ERROR", "source": [], "reason": "No requirement was provided. Cannot assess the candidate without a criterion to evaluate against."}
Error Handling
1.0The system correctly returned ERROR when given an empty requirement - it refused to classify the candidate without a criterion to evaluate against.

Incorrect - truncated document produces a hallucinated UNKNOWN

User
{"requirement": {"name": "Python Development Experience", "requiredExperience": "3 years of professional Python development"}}
User
{"documents": [{"name": "Resume.pdf", "content": "Jane Doe - Softw"}]}
Assistant
{"outcome": "UNKNOWN", "source": []}
Error Handling
0.0The system returned UNKNOWN instead of ERROR - it produced a classification despite the document being truncated mid-word with no usable content.

Run Evaluation in LatticeFlow AI Platform

Use the following CLI command to initialize and run the evaluation in LatticeFlow AI Platform.
Requires LatticeFlow AI Platform CLI
lf init --atlas candidate_screening_resilience

Metrics

Error Handling Accuracy

Don't have the LatticeFlow AI Platform?

Contact us to see this evaluation in action:
Contact Us