Robustness testing aims to evaluate the ability of a model to maintain consistent performance when faced with various perturbations or modifications in the input data.

How it works:

test_type original test_case expected_result actual_result pass
add_ocr_typo director rob marshall went out gunning to make a great one . diredor rob marshall went o^ut gunning t^o makc a grcat o^ne . POSITIVE NEGATIVE False
uppercase an amusing , breezily apolitical documentary about life on the campaign trail . AN AMUSING , BREEZILY APOLITICAL DOCUMENTARY ABOUT LIFE ON THE CAMPAIGN TRAIL . POSITIVE POSITIVE True
  • Perturbations, such as lowercase, uppercase, typos, etc., are introduced to the original text, resulting in a perturbed test_case.
  • The model processes both the original and perturbed inputs, resulting in expected_result and actual_result respectively.
  • During evaluation, the predicted labels in the expected and actual results are compared to assess the model’s performance