Robustness

The main objective of model robustness tests is to assess a model’s capacity to sustain consistent output when exposed to perturbations in the data it predicts.

How it works:

test_type	original	test_case	expected_result	actual_result	eval_score	pass
lowercase	Are you coming back tomorrow?	are you coming back tomorrow?	Kommen Sie morgen zurück?	kehren Sie morgen zurück?	0.004109	True
uppercase	Are you ever wrong?	ARE YOU EVER WRONG?	Haben Sie sich jemals geirrt?	IST SIE JEGLICHE WRONG?	0.462017	False

Perturbations, such as lowercase, uppercase, typos, etc., are introduced to the original text, resulting in a perturbed test_case.
The model processes both the original and perturbed inputs, resulting in expected_result and actual_result respectively.

Evaluation Criteria

The evaluation begins by obtaining embeddings for both the original and perturbed inputs, as well as for the expected_result and actual_result.
We then calculate the cosine distances between the embeddings of the original and perturbed inputs, saved as original_similarities.
Similarly, we compute the cosine distances between the embeddings of the expected_result and actual_result, stored as translation_similarities.
Next, we compare the difference between original_similarities and translation_similarities against a fixed threshold of 0.1.
If the absolute difference is less than 0.1, indicating close similarity, the test passes.

PREVIOUSContribution Guidelines