Medical Benchmark Datasets

LangTest provides support for a variety of benchmark datasets in the medical field which are listed below in the table, allowing you to assess the performance of your models on medical queries.

Dataset	Task	Category	Source
MedMCQA	question-answering	`robustness`, `accuracy`, `fairness`	MedMCQA: A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering
MedQA	question-answering	`robustness`, `accuracy`, `fairness`	What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams
PubMedQA	question-answering	`robustness`, `accuracy`, `fairness`	PubMedQA: A Dataset for Biomedical Research Question Answering
LiveQA	question-answering	`robustness`	Overview of the Medical Question Answering Task at TREC 2017 LiveQA
MedicationQA	question-answering	`robustness`	Bridging the Gap Between Consumers’ Medication Questions and Trusted Answers
HealthSearchQA	question-answering	`robustness`	Large Language Models Encode Clinical Knowledge