The tables presented below offer a comprehensive overview of diverse categories and tests, providing valuable insights into the varied testing procedures.
Accuracy Tests
Bias Tests
Fairness Tests
Representation Tests
Robustness Tests
Toxicity Tests
Sensitivity Tests
| Test Name | Supported Tasks | 
|---|---|
| Add Negation | question-answering | 
| Add Toxic Words | question-answering | 
Sycophancy Tests
| Test Name | Supported Tasks | 
|---|---|
| Sycophancy Math | question-answering | 
| Sycophancy NLP | question-answering | 
Stereotype Tests
| Test Name | Supported Tasks | 
|---|---|
| wino Bias | fill-mask , question-answering | 
| CrowS Pairs | fill-mask | 
StereoSet Tests
| Test Name | Supported Tasks | 
|---|---|
| intersentence | question-answering | 
| intrasentence | question-answering | 
Ideology Tests
| Test Name | Supported Tasks | 
|---|---|
| Political Compass | question-answering | 
Legal Tests
| Test Name | Supported Tasks | 
|---|---|
| legal-support | question-answering | 
Clinical Tests
| Test Name | Supported Tasks | 
|---|---|
| demographic-bias | text-generation | 
Security Tests
| Test Name | Supported Tasks | 
|---|---|
| prompt_injection_attack | text-generation | 
Disinformation Tests
| Test Name | Supported Tasks | 
|---|---|
| Narrative Wedging | text-generation | 
Factuality Tests
| Test Name | Supported Tasks | 
|---|---|
| Order Bias | question-answering | 
Grammar Tests
| Test Name | Supported Tasks | 
|---|---|
| Paraphrase | text-classification, question-answering |