With just one line of code, you can generate and run over 50 different test types to assess the quality of John Snow Labs, Hugging Face, OpenAI and Spacy models. These tests fall into robustness, accuracy, bias, representation and fairness test categories for NER, Text Classification and Question Answering models, with support for many more models and test types actively being developed.
NER
Try out the LangTest library on the following default model-dataset combinations for NER. To run tests on any model other than those displayed in the code snippets here, make sure to provide a dataset that matches your model’s label predictions (check Test Harness docs).
Text Classification
Try out the LangTest library on the following default model-dataset combinations for Text Classification. To run tests on any model other than those displayed in the code snippets here, make sure to provide a dataset that matches your model’s label predictions(check Test Harness docs)
Model Comparisons
To compare different models (either from same or different hubs) on the same task and test configuration, you can pass a dictionary to the ‘model’ parameter of the harness. This dictionary should contain the names of the models you want to compare, each paired with its respective hub.
Question Answering
Question Answering task contains various test-categories, and by default, the question answering task supports robustness, accuracy, fairness, representation, and bias for the benchmark dataset. However, if you want to access a specific sub-task (Category) within the question answering task, it is data-dependent.
Try out the LangTest library on the following default model-dataset combinations for Question Answering. To get a list of valid dataset options, please navigate to the Data Input docs.
Ideology
Try out the LangTest library on the following default model for ideology test.
Factuality
Try out the LangTest library on the following default model-dataset combinations for factuality test.
Legal
Try out the LangTest library on the following default model-dataset combinations for legal test.
Sensitivity
Try out the LangTest library on the following default model-dataset combinations for sensitivity test.
StereoSet
Try out the LangTest library on the following default model-dataset combinations for stereoset test.
Sycophancy
Try out the LangTest library on the following default model-dataset combinations for sycophancy test.
Wino Bias LLM
Try out the LangTest library on the following default model-dataset combinations for wino-bias test.
Summarization
Try out the LangTest library on the following default model-dataset combinations for Summarization. To get a list of valid dataset options, please navigate to the Data Input docs.
Fill Mask
Fill Mask task currently supports only Stereotype test categories. Accessing a specific test within the Stereotype category depends on the dataset. To get a list of valid dataset options, please navigate to the Data Input docs.
Wino Bias
Try out the LangTest library on the following default model-dataset combinations for wino-bias test.
Crows Pairs
Try out the LangTest library on the following default model-dataset combinations for crows-pairs test.
Text-generation
Text Generation task contains various test-categories. Accessing a specific sub-task (category) within the text generation task depends on the dataset. To get a list of valid dataset options, please navigate to the Data Input docs.
Clinical
Try out the LangTest library on the following default model-dataset combinations for clinical test.
Disinformation
Try out the LangTest library on the following default model-dataset combinations for disinformation test.
Security
Try out the LangTest library on the following default model-dataset combinations for security test.
Toxicity
Try out the LangTest library on the following default model-dataset combinations for Toxicity.
Translation
Try out the LangTest library on the following default model-dataset combinations for translation. To get a list of valid dataset options, please navigate to the Data Input docs.