Translation is the process of converting a sequence of text from one language into another. It operates within the framework of a sequence-to-sequence problem, which is a versatile approach for generating output based on an input, such as translation or summarization. While translation systems primarily facilitate the conversion of written texts between different languages, they can also extend to transforming spoken language, encompassing tasks like text-to-speech or speech-to-text conversion. Translation plays a vital role in bridging linguistic barriers and facilitating communication across diverse cultural and linguistic contexts.

Supported Test Category Supported Data
Robustness Translation

To get more information about the supported data, click here.

Task Specification

When specifying the task for Translation, use the following format:

task: str

task = "translation"


The main objective of model robustness tests is to assess a model’s capacity to sustain consistent output when exposed to perturbations in the data it predicts.

How it works:

test_type original test_case expected_result actual_result eval_score pass
lowercase Are you coming back tomorrow? are you coming back tomorrow? Kommen Sie morgen zurück? kehren Sie morgen zurück? 0.004109 True
uppercase Are you ever wrong? ARE YOU EVER WRONG? Haben Sie sich jemals geirrt? IST SIE JEGLICHE WRONG? 0.462017 False
  • Perturbations, such as lowercase, uppercase, typos, etc., are introduced to the original text, resulting in a perturbed test_case.
  • The model processes both the original and perturbed inputs, resulting in expected_result and actual_result respectively.

Evaluation Criteria

  • The evaluation begins by obtaining embeddings for both the original and perturbed inputs, as well as for the expected_result and actual_result.
  • We then calculate the cosine distances between the embeddings of the original and perturbed inputs, saved as original_similarities.
  • Similarly, we compute the cosine distances between the embeddings of the expected_result and actual_result, stored as translation_similarities.
  • Next, we compare the difference between original_similarities and translation_similarities against a fixed threshold of 0.1.
  • If the absolute difference is less than 0.1, indicating close similarity, the test passes.
Last updated