LangTest provides support for a variety of benchmark datasets in the medical field which are listed below in the table, allowing you to assess the performance of your models on medical queries.
| Dataset | Task | Category | Source | Colab |
|---|---|---|---|---|
| MedMCQA | question-answering | robustness, accuracy, fairness |
MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering | |
| MedQA | question-answering | robustness, accuracy, fairness |
What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams | |
| PubMedQA | question-answering | robustness, accuracy, fairness |
PubMedQA: A Dataset for Biomedical Research Question Answering | |
| LiveQA | question-answering | robustness |
Overview of the Medical Question Answering Task at TREC 2017 LiveQA | |
| MedicationQA | question-answering | robustness |
Bridging the Gap Between Consumers’ Medication Questions and Trusted Answers | |
| HealthSearchQA | question-answering | robustness |
Large Language Models Encode Clinical Knowledge |