Translation

Translation is the process of converting a sequence of text from one language into another. It operates within the framework of a sequence-to-sequence problem, which is a versatile approach for generating output based on an input, such as translation or summarization. While translation systems primarily facilitate the conversion of written texts between different languages, they can also extend to transforming spoken language, encompassing tasks like text-to-speech or speech-to-text conversion. Translation plays a vital role in bridging linguistic barriers and facilitating communication across diverse cultural and linguistic contexts.

Supported Test Category	Supported Data
Robustness	Translation

To get more information about the supported data, click here.

Task Specification

When specifying the task for Translation, use the following format:

task: str

task = "translation"

Robustness

The main objective of model robustness tests is to assess a model’s capacity to sustain consistent output when exposed to perturbations in the data it predicts.

How it works:

test_type	original	test_case	expected_result	actual_result	eval_score	pass
lowercase	Are you coming back tomorrow?	are you coming back tomorrow?	Kommen Sie morgen zurück?	kehren Sie morgen zurück?	0.004109	True
uppercase	Are you ever wrong?	ARE YOU EVER WRONG?	Haben Sie sich jemals geirrt?	IST SIE JEGLICHE WRONG?	0.462017	False

Perturbations, such as lowercase, uppercase, typos, etc., are introduced to the original text, resulting in a perturbed test_case.
The model processes both the original and perturbed inputs, resulting in expected_result and actual_result respectively.

Evaluation Criteria

The evaluation begins by obtaining embeddings for both the original and perturbed inputs, as well as for the expected_result and actual_result.
We then calculate the cosine distances between the embeddings of the original and perturbed inputs, saved as original_similarities.
Similarly, we compute the cosine distances between the embeddings of the expected_result and actual_result, stored as translation_similarities.
Next, we compare the difference between original_similarities and translation_similarities against a fixed threshold of 0.1.
If the absolute difference is less than 0.1, indicating close similarity, the test passes.