One Liners

 

With just one line of code, you can generate and run over 50 different test types to assess the quality of John Snow Labs, Hugging Face, OpenAI and Spacy models. These tests fall into robustness, accuracy, bias, representation and fairness test categories for NER, Text Classification and Question Answering models, with support for many more models and test types actively being developed.

One Liner - NER

Try out the LangTest library on the following default model-dataset combinations for NER. To run tests on any model other than those displayed in the code snippets here, make sure to provide a dataset that matches your model’s label predictions (check Test Harness docs).

!pip install langtest[johnsnowlabs]

from langtest import Harness
# Make sure to specify data='path_to_data' when using custom models
h = Harness(task='ner', model={'model': 'ner.dl', 'hub':'johnsnowlabs'})

# Generate, run and get a report on your test cases
h.generate().run().report()
!pip install langtest[transformers]

from langtest import Harness

# Make sure to specify data='path_to_data' when using custom models
h = Harness(task='ner', model={'model': 'dslim/bert-base-NER', 'hub':'huggingface'})

# Generate, run and get a report on your test cases
h.generate().run().report()
!pip install langtest[spacy]

from langtest import Harness

# Make sure to specify data='path_to_data' when using custom models
h = Harness(task='ner', model={'model': 'en_core_web_sm', 'hub':'spacy'})

# Generate, run and get a report on your test cases
h.generate().run().report()

One Liner - Text Classification

Try out the LangTest library on the following default model-dataset combinations for Text Classification. To run tests on any model other than those displayed in the code snippets here, make sure to provide a dataset that matches your model’s label predictions (check Test Harness docs).

!pip install langtest[johnsnowlabs]

from langtest import Harness

# Make sure to specify data='path_to_data' when using custom models
h = Harness(task='text-classification', model={'model': 'en.sentiment.imdb.glove', 'hub':'johnsnowlabs'})

# Generate, run and get a report on your test cases
h.generate().run().report()
!pip install langtest[transformers]

from langtest import Harness

# Make sure to specify data='path_to_data' when using custom models
h = Harness(task='text-classification', model={'model': 'lvwerra/distilbert-imdb', 'hub':'huggingface'})

# Generate, run and get a report on your test cases
h.generate().run().report()
!pip install langtest[spacy]

from langtest import Harness

# Make sure to specify data='path_to_data' when using custom models
h = Harness(task='text-classification', model={'model': 'textcat_imdb', 'hub':'spacy'})

# Generate, run and get a report on your test cases
h.generate().run().report()

One Liner - Question Answering

Try out the LangTest library on the following default model-dataset combinations for Question Answering. To get a list of valid dataset options, please navigate to the Data Input docs.

!pip install "langtest[openai]"

from langtest import Harness

# Set API keys
import os
os.environ['OPENAI_API_KEY'] = "<ADD OPEN-AI-KEY>"

# Create a Harness object
h = Harness(task="question-answering", 
            model={"model": "text-davinci-003","hub":"openai"}, 
            data={"data_source" :"BBQ", "split":"test-tiny"})

# Generate, run and get a report on your test cases
h.generate().run().report()

One Liner - Summarization

Try out the LangTest library on the following default model-dataset combinations for Summarization. To get a list of valid dataset options, please navigate to the Data Input docs.

!pip install "langtest[evaluate,openai,transformers]"

from langtest import Harness

# Set API keys
import os
os.environ['OPENAI_API_KEY'] = "<ADD OPEN-AI-KEY>"

# Create a Harness object
h = Harness(task="summarization", 
            model={"model": "text-davinci-003","hub":"openai"}, 
            data={"data_source" :"XSum", "split":"test-tiny"})

# Generate, run and get a report on your test cases
h.generate().run().report()

One Liner - Toxicity

Try out the LangTest library on the following default model-dataset combinations for Toxicity. To get a list of valid dataset options, please navigate to the Data Input docs.

!pip install "langtest[evaluate,openai,transformers]"

from langtest import Harness

# Set API keys
import os
os.environ['OPENAI_API_KEY'] = "<ADD OPEN-AI-KEY>"

# Create a Harness object
h = Harness(task={"task":"text-generation", "category":"toxicity"}, 
            model={"model": "text-davinci-002","hub":"openai"}, 
            data={"data_source" :'Toxicity', "split":"test"})

# Generate, run and get a report on your test cases
h.generate().run().report()

One Liner - Model Comparisons

To compare different models (either from same or different hubs) on the same task and test configuration, you can pass a dictionary to the ‘model’ parameter of the harness. This dictionary should contain the names of the models you want to compare, each paired with its respective hub.

!pip install "langtest[spacy,johnsnowlabs]" 
!wget https://raw.githubusercontent.com/JohnSnowLabs/langtest/main/langtest/data/conll/sample.conll

from langtest import Harness

# Define the list
models = [{"model": "ner.dl" , "hub":"johnsnowlabs"} , {"model":"en_core_web_sm", "hub": "spacy"}]

# Create a Harness object
h = Harness(task="ner", model=models, data={"data_source":'sample.conll'})

# Generate, run and get a report on your test cases
h.generate().run().report()

One Liner - Translation

Try out the LangTest library on the following default model-dataset combinations for Translation.

!pip install langtest[transformers]

from langtest import Harness

# Create a Harness object

h = Harness(task="translation",
            model={"model":'t5-base', "hub": "huggingface"},
            data={"data_source": "Translation", "split":"test"})

# Generate, run and get a report on your test cases
h.generate().run().report()

One Liner - Clinical-Tests

Try out the LangTest library on the following default model-dataset combinations for Clinical-Tests.

!pip install "langtest[openai,transformers]"

import os
os.environ["OPENAI_API_KEY"] = "<ADD OPEN-AI-KEY>"

from langtest import Harness

# Create a Harness object
h = Harness(task={"task":"text-generation", "category":"clinical-tests"}, 
            model={"model": "text-davinci-003", "hub": "openai"},
            data = {"data_source": "Clinical", "split":"Medical-files"})

# Generate, run and get a report on your test cases
h.generate().run().report()

One Liner - Security-Test

Try out the LangTest library on the following default model-dataset combinations for Security Test.

!pip install langtest[openai]

import os
os.environ["OPENAI_API_KEY"] = "<ADD OPEN-AI-KEY>"

from langtest import Harness

# Create a Harness object
h = Harness(task={"task":"text-generation", "category":"security"}, 
            model={'model': "text-davinci-003", "hub": "openai"},
            data={'data_source':'Prompt-Injection-Attack'})

# Generate, run and get a report on your test cases
h.generate().run().report()

One Liner - Disinformation-Test

Try out the LangTest library on the following default model-dataset combinations for Disinformation-Test.

!pip install "langtest[ai21,langchain,transformers]" 

import os
os.environ["AI21_API_KEY"] = "<YOUR_API_KEY>"

from langtest import Harness

# Create a Harness object
h  =  Harness(task={"task":"text-generation", "category":"disinformation-test"}, 
              model={"model": "j2-jumbo-instruct", "hub":"ai21"},
              data = {"data_source": "Narrative-Wedging"})

# Generate, run and get a report on your test cases
h.generate().run().report()

One Liner - Ideology

Try out the LangTest library on the following default model for Ideology Test.

!pip install langtest[openai]

import os
os.environ["OPENAI_API_KEY"] = "<ADD OPEN-AI-KEY>"

from langtest import Harness

# Create a Harness object
h = Harness(task={"task":"question-answering", "category":"ideology"}, 
            model={'model': "text-davinci-003", "hub": "openai"})

# Generate, run and get a report on your test cases
h.generate().run().report()

One Liner - Factuality Test

Try out the LangTest library on the following default model-dataset combinations for Factuality Test.

!pip install "langtest[openai,transformers]" 

import os
os.environ["OPENAI_API_KEY"] = "<YOUR_API_KEY>"

from langtest import Harness

# Create a Harness object
h  =  Harness(task={"task":"question-answering", "category":"factuality-test"}, 
              model = {"model": "text-davinci-003", "hub":"openai"},
              data = {"data_source": "Factual-Summary-Pairs", "split":"test"})

              

# Generate, run and get a report on your test cases
h.generate().run().report()

One Liner - Sensitivity Test

Try out the LangTest library on the following default model-dataset combinations for Sensitivity Test.

! pip install "langtest[openai,transformers]"

import os
os.environ["OPENAI_API_KEY"] = "<YOUR_API_KEY>"

from langtest import Harness

# Create a Harness object
h  =  Harness(task={"task":"question-answering", "category":"sensitivity-test"}, 
              model = {"model": "text-davinci-003", "hub":"openai"},
              data={"data_source" :"NQ-open","split":"test-tiny"})

# Generate, run and get a report on your test cases
h.generate().run().report()

One Liner - Wino Bias

Try out the LangTest library on the following default model-dataset combinations for wino-bias test.

!pip install langtest[transformers]

from langtest import Harness

# Create a Harness object
h = Harness(task={"task":"fill-mask", "category":"wino-bias"}, 
            model={"model" : "bert-base-uncased", "hub":"huggingface" }, 
            data ={"data_source":"Wino-test", "split":"test"})

# Generate, run and get a report on your test cases
h.generate().run().report()

One Liner - Wino Bias LLMs

Try out the LangTest library on the following default model-dataset combinations for wino-bias test.

!pip install langtest[openai]
from langtest import Harness

import os
os.environ["OPENAI_API_KEY"] = "<YOUR_API_KEY>"

# Create a Harness object
h=  Harness(task={"task":"question-answering", "category":"wino-bias"},
            model={"model": "text-davinci-003","hub":"openai"},
            data ={"data_source":"Wino-test", "split":"test"})

# Generate, run and get a report on your test cases
h.generate().run().report()

Try out the LangTest library on the following default model-dataset combinations for legal tests.

!pip install "langtest[openai]" 

import os
os.environ["OPENAI_API_KEY"] = "<YOUR_API_KEY>"

from langtest import Harness

# Create a Harness object
h = Harness(task={"task":"question-answering", "category":"legal-tests"}, 
            model={"model" : "text-davinci-002", "hub":"openai"}, 
            data = {"data_source":"Legal-Support"})

# Generate, run and get a report on your test cases
h.generate().run().report()

One Liner - Sycophancy Test

Try out the LangTest library on the following default model-dataset combinations for sycophancy test.

!pip install "langtest[openai]" 

import os
os.environ["OPENAI_API_KEY"] = "<YOUR_API_KEY>"

from langtest import Harness

# Create a Harness object
h = Harness(task={"task":"question-answering", "category":"sycophancy-test"},
            model={"model": "text-davinci-003","hub":"openai"}, 
            data={"data_source": 'synthetic-math-data',})

# Generate, run and get a report on your test cases
h.generate().run().report()

One Liner - Crows Pairs

Try out the LangTest library on the following default model-dataset combinations for crows-pairs test.

!pip install langtest[transformers]

from langtest import Harness

# Create a Harness object
h = Harness(task={"task":"fill-mask", "category":"crows-pairs"}, 
            model={"model" : "bert-base-uncased", "hub":"huggingface" },
            data = {"data_source":"Crows-Pairs"})

# Generate, run and get a report on your test cases
h.generate().run().report()

One Liner - StereoSet

Try out the LangTest library on the following default model-dataset combinations for StereoSet test.

!pip install langtest[transformers]

from langtest import Harness

# Create a Harness object
h = Harness(task={"task":"question-answering", "category":"stereoset"}, 
            model={"model" : "bert-base-uncased", "hub":"huggingface" },
            data = {"data_source":"StereoSet"})

# Generate, run and get a report on your test cases
h.generate().run().report()
Last updated