Robustness

 

ADD Abbreviations

This test replaces familiar words or expressions in texts with their abbreviations. These abbreviations are commonly used on social media platforms and some are generic. It evaluates the NLP model’s ability to handle text with such abbreviations.

alias_name: add_abbreviation

To test QA models, we are using QAEval from Langchain where we need to use the model itself or other ML model for evaluation, which can make mistakes.

Config

add_abbreviation:
    min_pass_rate: 0.7
    prob: 0.5 # Defaults to 1.0, which means all words will be transformed.

You can adjust the level of transformation in the sentence by using the “prob” parameter, which controls the proportion of words to be changed during add_abbreviation test.

  • min_pass_rate (float): Minimum pass rate to pass the test.
  • prob (float): Controls the proportion of words to be changed.

Examples

Original Test Case
Amazing food! Great service! Amzn food! Gr8 service!
Make sure you’ve gone online to download one of the vouchers - it’s definitely not worth paying full price for! Make sure u’ve gone onl 2 d/l one of da vouchers - it’s dfntly not worth paying full price 4!

Add Context

This test checks if the NLP model can handle input text with added context, such as a greeting or closing.

alias_name: add_context

To test QA models, we are using QAEval from Langchain where we need to use the model itself or other ML model for evaluation, which can make mistakes.

Config

add_context:
    min_pass_rate: 0.65
    prob: 0.5 # Defaults to 1.0, which means all words will be transformed.
    parameters:
      ending_context: ['Bye', 'Reported']
      starting_context: ['Hi', 'Good morning', 'Hello']
      count: 1 # Defaults to 1

You can adjust the level of transformation in the sentence by using the “prob” parameter, which controls the proportion of words to be changed during add_context test.

  • min_pass_rate (float): Minimum pass rate to pass the test.
  • starting_context (<List[str]>): Phrases to be added at the start of inputs.
  • ending_context (<List[str]>): Phrases to be added at the end of inputs.
  • prob (float): Controls the proportion of words to be changed.
  • count (int): Number of variations of sentence to be constructed.

Examples

Original Test Case
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog, bye.
I love playing football. Hello, I love playing football.

Add Contraction

This test checks if the NLP model can handle input text if the data uses contractions instead of expanded forms.

alias_name: add_contraction

To test QA models, we are using QAEval from Langchain where we need to use the model itself or other ML model for evaluation, which can make mistakes.

Config

add_contraction:
    min_pass_rate: 0.7
    prob: 0.5 # Defaults to 1.0, which means all words will be transformed.

You can adjust the level of transformation in the sentence by using the “prob” parameter, which controls the proportion of words to be changed during add_contraction test.

  • min_pass_rate (float): Minimum pass rate to pass the test.
  • prob (float): Controls the proportion of words to be changed.

Examples

Original Test Case
He is not a great chess player. He isn’t a great chess player.
I will wash the car this afternoon. I’ll wash the car this afternoon.

Add OCR Typo

This test checks if the NLP model can handle input text with common ocr typos. A ocr typo dictionary is used to apply most common ocr typos to the input data.

alias_name: add_ocr_typo

Config

add_ocr_typo:
    min_pass_rate: 0.7
    prob: 0.5 # Defaults to 1.0, which means all words will be transformed.
    parameters:
        count: 1 # Defaults to 1

You can adjust the level of transformation in the sentence by using the “prob” parameter, which controls the proportion of words to be changed during add_ocr_typo test.

  • min_pass_rate (float): Minimum pass rate to pass the test.
  • prob (float): Controls the proportion of words to be changed.
  • count (int): Number of variations of sentence to be constructed.

Examples

Original Test Case
This organization’s art can win tough acts. Tbis organization’s a^rt c^an w^in tougb acts.
Anyone can join our community garden. Anyone c^an j0in o^ur communitv gardcn.

Add Punctuation

This test checks if the NLP model can handle input text with sentences with a punctuation at the end. The added punctuation is randomly chosen from the list ['!', '?', ',', '.', '-', ':', ';']. If there already is a punctuation it is not changed.

alias_name: add_punctuation

To test QA models, we are using QAEval from Langchain where we need to use the model itself or other ML model for evaluation, which can make mistakes.

Config

add_punctuation:
    min_pass_rate: 0.7
    prob: 0.5 # Defaults to 1.0, which means all words will be transformed.

You can adjust the level of transformation in the sentence by using the “prob” parameter, which controls the proportion of words to be changed during add_punctuation test.

  • min_pass_rate (float): Minimum pass rate to pass the test.
  • prob (float): Controls the proportion of words to be changed.

Examples

Original Test Case
The quick brown fox jumps over the lazy dog The quick brown fox jumps over the lazy dog.
Good morning Good morning!

Add Slangs

This test involves substituting certain words (specifically nouns, adjectives, and adverbs) in the original text with their corresponding slang terms. The purpose is to assess the NLP model’s ability to handle input text that includes slang language.

alias_name: add_slangs

To test QA models, we are using QAEval from Langchain where we need to use the model itself or other ML model for evaluation, which can make mistakes.

Config

add_slangs:
    min_pass_rate: 0.7
    prob: 0.5 # Defaults to 1.0, which means all words will be transformed.

You can adjust the level of transformation in the sentence by using the “prob” parameter, which controls the proportion of words to be changed during add_slangs test.

  • min_pass_rate (float): Minimum pass rate to pass the test.
  • prob (float): Controls the proportion of words to be changed.

Examples

Original Test Case
It was totally excellent but useless bet. It was totes grand but cruddy flutter.
Obviously, money are a great stimulus but people might go crazy about it. Obvs, spondulicks are a nang stimulus but peeps might go rental about it.

Add Speech to Text Typo

This test evaluates the NLP model’s proficiency in handling input text that contains common typos resulting from Speech to Text conversion. A Speech to Text typo dictionary is utilized to apply the most frequent typos found in speech recognition output to the input data.

alias_name: add_speech_to_text_typo

To test QA models, we are using QAEval from Langchain where we need to use the model itself or other ML model for evaluation, which can make mistakes.

Config

add_speech_to_text_typo:
    min_pass_rate: 0.7
    prob: 0.5 # Defaults to 1.0, which means all words will be transformed.
    parameters:
      count: 1 # Defaults to 1

You can adjust the level of transformation in the sentence by using the “prob” parameter, which controls the proportion of words to be changed during add_speech_to_text_typo test.

  • min_pass_rate (float): Minimum pass rate to pass the test.
  • prob (float): Controls the proportion of words to be changed.
  • count (int): Number of variations of sentence to be constructed.

Examples

Original Test Case
Andrew finally returned the French book to Chris that I bought last week. Andrew finally returned the French book to Chris that I bot lass week.
The more you learn, the more you grow. Thee morr you learn, the mor you grow.

Add Typo

This test checks if the NLP model can handle input text with typos. A typo frequency dictionary is used to apply most common typos to the input data.

alias_name: add_typo

To test QA models, we are using QAEval from Langchain where we need to use the model itself or other ML model for evaluation, which can make mistakes.

Config

add_typo:
    min_pass_rate: 0.7
    prob: 0.5 # Defaults to 1.0, which means all words will be transformed.
    parameters:
      count: 1 # Defaults to 1

You can adjust the level of transformation in the sentence by using the “prob” parameter, which controls the proportion of words to be changed during add_typo test.

  • min_pass_rate (float): Minimum pass rate to pass the test.
  • prob (float): Controls the proportion of words to be changed.
  • count (int): Number of variations of sentence to be constructed.

Examples

Original Test Case
The quick brown fox jumps over the lazy dog. The wuick brown fox jumps over the fazy dog.
Good morning Good morninh

Adjective Antonym Swap

This test provides a convenient way to convert adjectives into their equivalent antonyms.

alias_name: adjective_antonym_swap

To test QA models, we are using QAEval from Langchain where we need to use the model itself or other ML model for evaluation, which can make mistakes.

Config

adjective_antonym_swap:
    min_pass_rate: 0.8
    prob: 0.5 # Defaults to 1.0, which means all words will be transformed.

You can adjust the level of transformation in the sentence by using the “prob” parameter, which controls the proportion of words to be changed during adjective_antonym_swap test.

  • min_pass_rate (float): Minimum pass rate to pass the test.
  • prob (float): Controls the proportion of words to be changed.

Examples

Original Test Case
Lisa is wearing a beautiful shirt today. This soup is edible. Lisa is wearing a ugly shirt today. This soup is inedible.
They have a beautiful house. They have a ugly house.

Adjective Synonym Swap

This test provides a convenient way to convert adjectives into their equivalent synonyms.

alias_name: adjective_synonym_swap

To test QA models, we are using QAEval from Langchain where we need to use the model itself or other ML model for evaluation, which can make mistakes.

Config

adjective_synonym_swap:
    min_pass_rate: 0.8
    prob: 0.5 # Defaults to 1.0, which means all words will be transformed.

You can adjust the level of transformation in the sentence by using the “prob” parameter, which controls the proportion of words to be changed during adjective_synonym_swap test.

  • min_pass_rate (float): Minimum pass rate to pass the test.
  • prob (float): Controls the proportion of words to be changed.

Examples

Original Test Case
Lisa is wearing a beautiful shirt today. This soup is not edible. Lisa is wearing a pretty shirt today. This soup is consumable.
They have a beautiful house. They have a alluring house.

American to British

This test checks if the NLP model can handle input text with british accent. An accent dictionary is used to convert sentences into british accent.

alias_name: american_to_british

To test QA models, we are using QAEval from Langchain where we need to use the model itself or other ML model for evaluation, which can make mistakes.

Config

american_to_british:
    min_pass_rate: 0.7
    prob: 0.5 # Defaults to 1.0, which means all words will be transformed.

You can adjust the level of transformation in the sentence by using the “prob” parameter, which controls the proportion of words to be changed during american_to_british test.

  • min_pass_rate (float): Minimum pass rate to pass the test.
  • prob (float): Controls the proportion of words to be changed.

Examples

Original Test Case
The technician analyzed your samples. The technician analysed your samples.
What color is this? What colour is this?

British to American

This test checks if the NLP model can handle input text with American accent. An accent dictionary is used to convert sentences into American accent.

alias_name: british_to_american

To test QA models, we are using QAEval from Langchain where we need to use the model itself or other ML model for evaluation, which can make mistakes.

Config

british_to_american:
    min_pass_rate: 0.7
    prob: 0.5 # Defaults to 1.0, which means all words will be transformed.

You can adjust the level of transformation in the sentence by using the “prob” parameter, which controls the proportion of words to be changed during british_to_american test.

  • min_pass_rate (float): Minimum pass rate to pass the test.
  • prob (float): Controls the proportion of words to be changed.

Examples

Original Test Case
The technician analysed your samples. The technician analyzed your samples.
What colour is this? What color is this?

Dyslexia Word Swap

This test assesses the NLP model’s capability to handle input text with common word swap errors associated with dyslexia. A Dyslexia Word Swap dictionary is employed to apply the most common word swap errors found in dyslexic writing to the input data.

alias_name: dyslexia_word_swap

To test QA models, we are using QAEval from Langchain where we need to use the model itself or other ML model for evaluation, which can make mistakes.

Config

dyslexia_word_swap:
    min_pass_rate: 0.7
    prob: 0.5 # Defaults to 1.0, which means all words will be transformed.

You can adjust the level of transformation in the sentence by using the “prob” parameter, which controls the proportion of words to be changed during dyslexia_word_swap test.

  • min_pass_rate (float): Minimum pass rate to pass the test.
  • prob (float): Controls the proportion of words to be changed.

Examples

Original Test Case
Please, you should be careful and must wear a mask. Please, you should be careful and must where a mask.
Biden hails your relationship with Australia just days after new partnership drew ire from France. Biden hails you’re relationship with Australia just days after new partnership drew ire from France.

Lowercase

This test checks if the NLP model can handle input text that is in all lowercase. Like the uppercase test, this is important to ensure that your NLP model can handle input text in any case.

alias_name: lowercase

To test QA models, we are using QAEval from Langchain where we need to use the model itself or other ML model for evaluation, which can make mistakes.

Config

lowercase:
    min_pass_rate: 0.8
    prob: 0.5 # Defaults to 1.0, which means all words will be transformed.

You can adjust the level of transformation in the sentence by using the “prob” parameter, which controls the proportion of words to be changed during lowercase test.

  • min_pass_rate (float): Minimum pass rate to pass the test.
  • prob (float): Controls the proportion of words to be changed.

Examples

Original Test Case
The quick brown fox jumps over the lazy dog. the quick brown fox jumps over the lazy dog.
I AM VERY QUIET. i am very quiet.

Multiple Perturbations

The multiple_perturbations test combines multiple tests into a single test by applying a sequence of perturbations to transform the given sentences. These perturbations are applied in a specific sequence.

Please note that this test is only supported for the text-classification, question-answering, and summarization tasks.

alias_name: multiple_perturbations

Config YAML format :

multiple_perturbations:
    min_pass_rate: 0.60
    prob: 0.5 # Defaults to 1.0, which means all words will be transformed.
    perturbations1:
        lowercase
        add_ocr_typo
        titlecase

The perturbation set perturbations1 follows the transformation order: lowercaseadd_ocr_typotitlecase

You can adjust the level of transformation in the sentence by using the “prob” parameter, which controls the proportion of words to be changed during multiple_perturbations test.

  • min_pass_rate (float): Minimum pass rate to pass the test.
  • prob (float): Controls the proportion of words to be changed.

Examples

Original Test Case
I live in London, United Kingdom since 2019 . I L1Ve I^N London, United Kingdom Slnce 2019 .
I can’t move to the USA because they have an average of 1000 tornadoes a year, and I’m terrified of them. I Can’T Movc T^O T^Ie Usa Hccause Thev Liave An Average Of 1000 Tornadoes A Ycar, A^Nd I’M Terrified Of Th^M.</span>

Number To Word

This test provides a convenient way to convert numerical values in text into their equivalent words. It is particularly useful for applications that require handling or processing textual data containing numbers.

alias_name: number_to_word

To test QA models, we are using QAEval from Langchain where we need to use the model itself or other ML model for evaluation, which can make mistakes.

Config

number_to_word:
    min_pass_rate: 0.8
    prob: 0.5 # Defaults to 1.0, which means all words will be transformed.

You can adjust the level of transformation in the sentence by using the “prob” parameter, which controls the proportion of words to be changed during number_to_word test.

  • min_pass_rate (float): Minimum pass rate to pass the test.
  • prob (float): Controls the proportion of words to be changed.

Examples

Original Test Case
I live in London, United Kingdom since 2019 I live in London, United Kingdom since two thousand and nineteen
I can’t move to the USA because they have an average of 100 tornadoes a year, and I’m terrified of them I can’t move to the USA because they have an average of one hundred tornadoes a year, and I’m terrified of them

Randomize Age

This test checks if the NLP model can handle age differences. The test replaces age statements like “x years old” with x ± random_amount. The value is set to 1 if its smaller than 0.

alias_name: randomize_age

To test QA models, we are using QAEval from Langchain where we need to use the model itself or other ML model for evaluation, which can make mistakes.

Config

randomize_age:
    min_pass_rate: 0.65
    prob: 1.0 # Defaults to 1.0, which means all statements will be transformed.
    parameters:
      random_amount: 5 # 
      count: 1 # Defaults to 1

You can adjust the level of transformation in the sentence by using the “prob” parameter, which controls the proportion of statements to be changed during randomize_age test.

  • min_pass_rate (float): Minimum pass rate to pass the test.
  • random_amount (int): Range of random value to be added/substracted from existing age value.
  • prob (float): Controls the proportion of statements to be changed.
  • count (int): Number of variations of sentence to be constructed.

Examples

Original Test Case
The baby was 20 days old. The baby was 23 days old.
My grandfather got sick when he was 89 years old. My grandfather got sick when he was 80 years old.

Strip All Punctuation

This test checks if the NLP model can handle sentences with no punctuations.

alias_name: strip_all_punctuation

To test QA models, we are using QAEval from Langchain where we need to use the model itself or other ML model for evaluation, which can make mistakes.

Config

strip_all_punctuation:
    min_pass_rate: 0.7
    prob: 0.5 # Defaults to 1.0, which means all words will be transformed.

You can adjust the level of transformation in the sentence by using the “prob” parameter, which controls the proportion of words to be changed during strip_all_punctuation test.

  • min_pass_rate (float): Minimum pass rate to pass the test.
  • prob (float): Controls the proportion of words to be changed.

Examples

Original Test Case
Dutasteride 0.5 mg Capsule Sig : One ( 1 ) Capsule PO once a day. Dutasteride 0.5 mg Capsule Sig One ( 1 ) Capsule PO once a day
In conclusion , RSDS is a relevant osteoarticular complication in patients receiving either anticalcineurinic drug ( CyA or tacrolimus ) , even under monotherapy or with a low steroid dose. In conclusion RSDS is a relevant osteoarticular complication in patients receiving either anticalcineurinic drug ( CyA or tacrolimus ) even under monotherapy or with a low steroid dose

Strip Punctuation

This test checks if the NLP model can handle sentences without punctuations at the end.

alias_name: strip_punctuation

To test QA models, we are using QAEval from Langchain where we need to use the model itself or other ML model for evaluation, which can make mistakes.

Config

strip_punctuation:
    min_pass_rate: 0.7
    prob: 0.5 # Defaults to 1.0, which means all words will be transformed.

You can adjust the level of transformation in the sentence by using the “prob” parameter, which controls the proportion of words to be changed during strip_punctuation test.

  • min_pass_rate (float): Minimum pass rate to pass the test.
  • prob (float): Controls the proportion of words to be changed.

Examples

Original Test Case
The quick brown fox jumps over the lazy dog. The quick brown fox jumps over the lazy dog
Good morning! Good morning

Swap Entities

This test shuffles the labeled entities in the input to test the models robustness.

alias_name: swap_entities

To test QA models, we are using QAEval from Langchain where we need to use the model itself or other ML model for evaluation, which can make mistakes.

Config

swap_entities:
    min_pass_rate: 0.8
    prob: 0.5 # Defaults to 1.0, which means all words will be transformed.
    parameters:
      count: 1 # Defaults to 1

You can adjust the level of transformation in the sentence by using the “prob” parameter, which controls the proportion of words to be changed during swap_entities test.

  • min_pass_rate (float): Minimum pass rate to pass the test.
  • prob (float): Controls the proportion of words to be changed.
  • count (int): Number of variations of sentence to be constructed.

Examples

Original Test Case
I love Paris. I love Istanbul.
Jack is sick today. Adam is sick today.

Titlecase

This test checks if the NLP model can handle input text that is in titlecase format, where the first letter of each word is capitalized.

alias_name: titlecase

To test QA models, we are using QAEval from Langchain where we need to use the model itself or other ML model for evaluation, which can make mistakes.

Config

titlecase:
    min_pass_rate: 0.7
    prob: 0.5 # Defaults to 1.0, which means all words will be transformed.

You can adjust the level of transformation in the sentence by using the “prob” parameter, which controls the proportion of words to be changed during titlecase test.

  • min_pass_rate (float): Minimum pass rate to pass the test.
  • prob (float): Controls the proportion of words to be changed.

Examples

Original Test Case
The quick brown fox jumps over the lazy dog. The Quick Brown Fox Jumps Over The Lazy Dog.
I LIKE TO SHOUT. I Like To Shout.

Uppercase

This test checks if the NLP model can handle input text that is in all uppercase. Accidentally entering text in all caps is common, and you want to ensure that your NLP model can still process it correctly.

alias_name: uppercase

To test QA models, we are using QAEval from Langchain where we need to use the model itself or other ML model for evaluation, which can make mistakes.

Config

uppercase:
    min_pass_rate: 0.8
    prob: 0.5 # Defaults to 1.0, which means all words will be transformed.

You can adjust the level of transformation in the sentence by using the “prob” parameter, which controls the proportion of words to be changed during uppercase test.

  • min_pass_rate (float): Minimum pass rate to pass the test.
  • prob (float): Controls the proportion of words to be changed.

Examples

Original Test Case
The quick brown fox jumps over the lazy dog. THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG.
I like to shout. I LIKE TO SHOUT.
Last updated