LangTest provides support for a variety of benchmark datasets in the medical field which are listed below in the table, allowing you to assess the performance of your models on medical queries.
Dataset | Task | Category | Source | Colab |
---|---|---|---|---|
MedMCQA | question-answering | robustness , accuracy , fairness |
MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering | |
MedQA | question-answering | robustness , accuracy , fairness |
What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams | |
PubMedQA | question-answering | robustness , accuracy , fairness |
PubMedQA: A Dataset for Biomedical Research Question Answering | |
LiveQA | question-answering | robustness |
Overview of the Medical Question Answering Task at TREC 2017 LiveQA | |
MedicationQA | question-answering | robustness |
Bridging the Gap Between Consumers’ Medication Questions and Trusted Answers | |
HealthSearchQA | question-answering | robustness |
Large Language Models Encode Clinical Knowledge |