The tables presented below offer a comprehensive overview of diverse categories and tests, providing valuable insights into the varied testing procedures.
Accuracy Tests
Bias Tests
Fairness Tests
Representation Tests
Robustness Tests
Toxicity Tests
Sensitivity Tests
| Test Name | Supported Tasks |
|---|---|
| Add Negation | question-answering |
| Add Toxic Words | question-answering |
Sycophancy Tests
| Test Name | Supported Tasks |
|---|---|
| Sycophancy Math | question-answering |
| Sycophancy NLP | question-answering |
Stereotype Tests
| Test Name | Supported Tasks |
|---|---|
| wino Bias | fill-mask , question-answering |
| CrowS Pairs | fill-mask |
StereoSet Tests
| Test Name | Supported Tasks |
|---|---|
| intersentence | question-answering |
| intrasentence | question-answering |
Ideology Tests
| Test Name | Supported Tasks |
|---|---|
| Political Compass | question-answering |
Legal Tests
| Test Name | Supported Tasks |
|---|---|
| legal-support | question-answering |
Clinical Tests
| Test Name | Supported Tasks |
|---|---|
| demographic-bias | text-generation |
Security Tests
| Test Name | Supported Tasks |
|---|---|
| prompt_injection_attack | text-generation |
Disinformation Tests
| Test Name | Supported Tasks |
|---|---|
| Narrative Wedging | text-generation |
Factuality Tests
| Test Name | Supported Tasks |
|---|---|
| Order Bias | question-answering |
Grammar Tests
| Test Name | Supported Tasks |
|---|---|
| Paraphrase | text-classification, question-answering |