Robustness Notebook

Overview

In the Robustness notebook, we’re looking at how tough the ner.dl model is. Robustness means checking if the model can stay accurate, precise, and on point even when we shake things up in the data it’s working on. Take, for instance, NER – we want to see how the model handles documents with typos or sentences that are all in uppercase. The goal of the notebook is to figure out if these changes mess with the model’s predictions compared to its usual groove. We’re essentially stress-testing it to see how well it holds up in different situations.

Open in Collab

Category	Hub	Task	Open In Colab
Robustness	John Snow Labs	NER

Config Used

tests:     
  defaults:
    min_pass_rate: 0.65
  robustness:
    add_typo:
      min_pass_rate: 0.66
    uppercase:
      min_pass_rate: 0.62

Supported Tests

uppercase: capitalization of the test set is turned into uppercase
lowercase: capitalization of the test set is turned into lowercase
titlecase: capitalization of the test set is turned into title case
add_punctuation: special characters at end of each sentence are replaced by other special characters, if no special character at the end, one is added
strip_punctuation: special characters are removed from the sentences (except if found in numbers, such as ‘2.5’)
add_typo: typos are introduced in sentences
add_contraction: contractions are added where possible (e.g. ‘do not’ contracted into ‘don’t’)
add_context: tokens are added at the beginning and at the end of the sentences
swap_entities: named entities replaced with same entity type with same token count from terminology
swap_cohyponyms: Named entities replaced with co-hyponym from the WordNet database
american_to_british: American English will be changed to British English
british_to_american: British English will be changed to American English
number_to_word: Converts numeric values in sentences to their equivalent verbal representation.
add_ocr_typo: Ocr typos are introduced in sentences
add_speech_to_text_typo: Introduce common conversion errors from SSpeech to Text conversion.
add_abbreviation:Replaces words or expressions in texts with their abbreviations
multiple_perturbations : Transforms the given sentences by applying multiple perturbations in a specific sequence.
adjective_synonym_swap : Transforms the adjectives in the given sentences to their synonyms.
adjective_antonym_swap : Transforms the adjectives in the given sentences to their antonyms.
strip_all_punctuation: Strips all punctuation from the sentences.