LangTest offers support for diverse benchmark datasets related to commonsense scenarios, and the listings for these datasets can be found below.
| Dataset | Task | Category | Source | Colab |
|---|---|---|---|---|
| CommonsenseQA | question-answering | robustness, accuracy, fairness |
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge | |
| HellaSwag | question-answering | robustness, accuracy, fairness |
HellaSwag: Can a Machine Really Finish Your Sentence? | |
| OpenBookQA | question-answering | robustness, accuracy, fairness |
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering | |
| PIQA | question-answering | robustness |
PIQA: Reasoning about Physical Commonsense in Natural Language | |
| SIQA | question-answering | robustness, accuracy, fairness |
SocialIQA: Commonsense Reasoning about Social Interactions |