Translation is the process of converting a sequence of text from one language into another. It operates within the framework of a sequence-to-sequence problem, which is a versatile approach for generating output based on an input, such as translation or summarization. While translation systems primarily facilitate the conversion of written texts between different languages, they can also extend to transforming spoken language, encompassing tasks like text-to-speech or speech-to-text conversion. Translation plays a vital role in bridging linguistic barriers and facilitating communication across diverse cultural and linguistic contexts.
Supported Test Category | Supported Data |
---|---|
Robustness | Translation |
To get more information about the supported data, click here.
Task Specification
When specifying the task for Translation, use the following format:
task: str
task = "translation"
Robustness
The main objective of model robustness tests is to assess a model’s capacity to sustain consistent output when exposed to perturbations in the data it predicts.
How it works:
test_type | original | test_case | expected_result | actual_result | eval_score | pass |
---|---|---|---|---|---|---|
lowercase | Are you coming back tomorrow? | are you coming back tomorrow? | Kommen Sie morgen zurück? | kehren Sie morgen zurück? | 0.004109 | True |
uppercase | Are you ever wrong? | ARE YOU EVER WRONG? | Haben Sie sich jemals geirrt? | IST SIE JEGLICHE WRONG? | 0.462017 | False |
- Perturbations, such as lowercase, uppercase, typos, etc., are introduced to the original text, resulting in a perturbed test_case.
- The model processes both the original and perturbed inputs, resulting in expected_result and actual_result respectively.
Evaluation Criteria
- The evaluation begins by obtaining embeddings for both the original and perturbed inputs, as well as for the expected_result and actual_result.
- We then calculate the cosine distances between the embeddings of the original and perturbed inputs, saved as
original_similarities
. - Similarly, we compute the cosine distances between the embeddings of the expected_result and actual_result, stored as
translation_similarities
. - Next, we compare the difference between original_similarities and translation_similarities against a fixed threshold of 0.1.
- If the absolute difference is less than 0.1, indicating close similarity, the test passes.