Robustness
The main objective of model robustness tests is to assess a model’s capacity to sustain consistent output when exposed to perturbations in the data it predicts.
How it works:
test_type | original | test_case | expected_result | actual_result | eval_score | pass |
---|---|---|---|---|---|---|
lowercase | Are you coming back tomorrow? | are you coming back tomorrow? | Kommen Sie morgen zurück? | kehren Sie morgen zurück? | 0.004109 | True |
uppercase | Are you ever wrong? | ARE YOU EVER WRONG? | Haben Sie sich jemals geirrt? | IST SIE JEGLICHE WRONG? | 0.462017 | False |
- Perturbations, such as lowercase, uppercase, typos, etc., are introduced to the original text, resulting in a perturbed test_case.
- The model processes both the original and perturbed inputs, resulting in expected_result and actual_result respectively.
Evaluation Criteria
- The evaluation begins by obtaining embeddings for both the original and perturbed inputs, as well as for the expected_result and actual_result.
- We then calculate the cosine distances between the embeddings of the original and perturbed inputs, saved as
original_similarities
. - Similarly, we compute the cosine distances between the embeddings of the expected_result and actual_result, stored as
translation_similarities
. - Next, we compare the difference between original_similarities and translation_similarities against a fixed threshold of 0.1.
- If the absolute difference is less than 0.1, indicating close similarity, the test passes.
PREVIOUSContribution Guidelines