Robustness Notebook

 

Overview

In the Robustness notebook, we’re looking at how tough the ner.dl model is. Robustness means checking if the model can stay accurate, precise, and on point even when we shake things up in the data it’s working on. Take, for instance, NER – we want to see how the model handles documents with typos or sentences that are all in uppercase. The goal of the notebook is to figure out if these changes mess with the model’s predictions compared to its usual groove. We’re essentially stress-testing it to see how well it holds up in different situations.

Open in Collab

Category Hub Task Open In Colab
Robustness John Snow Labs NER Open In Colab

Config Used

tests:     
  defaults:
    min_pass_rate: 0.65
  robustness:
    add_typo:
      min_pass_rate: 0.66
    uppercase:
      min_pass_rate: 0.62

Supported Tests

  • uppercase: capitalization of the test set is turned into uppercase

  • lowercase: capitalization of the test set is turned into lowercase

  • titlecase: capitalization of the test set is turned into title case

  • add_punctuation: special characters at end of each sentence are replaced by other special characters, if no special character at the end, one is added

  • strip_punctuation: special characters are removed from the sentences (except if found in numbers, such as ‘2.5’)

  • add_typo: typos are introduced in sentences

  • add_contraction: contractions are added where possible (e.g. ‘do not’ contracted into ‘don’t’)

  • add_context: tokens are added at the beginning and at the end of the sentences

  • swap_entities: named entities replaced with same entity type with same token count from terminology

  • swap_cohyponyms: Named entities replaced with co-hyponym from the WordNet database

  • american_to_british: American English will be changed to British English

  • british_to_american: British English will be changed to American English

  • number_to_word: Converts numeric values in sentences to their equivalent verbal representation.

  • add_ocr_typo: Ocr typos are introduced in sentences

  • add_speech_to_text_typo: Introduce common conversion errors from SSpeech to Text conversion.

  • add_abbreviation:Replaces words or expressions in texts with their abbreviations

  • multiple_perturbations : Transforms the given sentences by applying multiple perturbations in a specific sequence.

  • adjective_synonym_swap : Transforms the adjectives in the given sentences to their synonyms.

  • adjective_antonym_swap : Transforms the adjectives in the given sentences to their antonyms.

  • strip_all_punctuation: Strips all punctuation from the sentences.