Legal Benchmark Datasets

LangTest provides support for a variety of benchmark datasets in the legal domain which are listed below in the table, allowing you to assess the performance of your models on legal queries.

Dataset	Task	Category	Source
Contracts	question-answering	`robustness`, `accuracy`, `fairness`	Answer yes/no questions about whether contractual clauses discuss particular issues.
Consumer-Contracts	question-answering	`robustness`, `accuracy`, `fairness`	Answer yes/no questions on the rights and obligations created by clauses in terms of services agreements.
Privacy-Policy	question-answering	`robustness`, `accuracy`, `fairness`	Given a question and a clause from a privacy policy, determine if the clause contains enough information to answer the question.
FIQA	question-answering	`robustness`, `accuracy`, `fairness`	FIQA (Financial Opinion Mining and Question Answering)
MultiLexSum	summarization	`robustness`, `accuracy`, `fairness`	Multi-LexSum: Real-World Summaries of Civil Rights Lawsuits at Multiple Granularities