LangTest offers support for diverse benchmark datasets related to commonsense scenarios, and the listings for these datasets can be found below.
Dataset | Task | Category | Source | Colab |
---|---|---|---|---|
CommonsenseQA | question-answering | robustness , accuracy , fairness |
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge | |
HellaSwag | question-answering | robustness , accuracy , fairness |
HellaSwag: Can a Machine Really Finish Your Sentence? | |
OpenBookQA | question-answering | robustness , accuracy , fairness |
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering | |
PIQA | question-answering | robustness |
PIQA: Reasoning about Physical Commonsense in Natural Language | |
SIQA | question-answering | robustness , accuracy , fairness |
SocialIQA: Commonsense Reasoning about Social Interactions |