Source: Answer yes/no questions about whether contractual clauses discuss particular issues.

Contracts is a binary classification dataset where the LLM must determine if language from a contract contains a particular type of content.

You can see which subsets and splits are available below.

Split Details
test Test set from the Contracts dataset, containing 80 samples.


In the evaluation process, we start by fetching original_context and original_question from the dataset. The model then generates an expected_result based on this input. To assess model robustness, we introduce perturbations to the original_context and original_question, resulting in perturbed_context and perturbed_question. The model processes these perturbed inputs, producing an actual_result. The comparison between the expected_result and actual_result is conducted using the llm_eval approach (where llm is used to evaluate the model response). Alternatively, users can employ metrics like String Distance or Embedding Distance to evaluate the model’s performance in the Question-Answering Task within the robustness category. For a more in-depth exploration of these approaches, you can refer to this notebook discussing these three methods.

category test_type original_context original_question perturbed_context perturbed_question expected_result actual_result pass
robustness add_abbreviation In the event that a user’s credentials are compromised, the Company shall promptly notify the affected user and require them to reset their password. The Company shall also take reasonable steps to prevent unauthorized access to the user’s account and to prevent future compromises of user credentials. Does the clause discuss compromised user credentials? In da event that a user’s credentials r compromised, tdaCompany shall promptly notify thdaffected user and require them 2 reset their password. Thedampany shall also take reasonable steps t2prevent unauthorized access to2he user’s account and to 2event future compromises of user credentials. Does da clause discuss compromised user credentials? True True True

Generated Results for gpt-3.5-turbo-instruct model from OpenAI

