Contracts is a binary classification dataset where the LLM must determine if language from a contract contains a particular type of content.
You can see which subsets and splits are available below.
|Test set from the Contracts dataset, containing 80 samples.
In the evaluation process, we start by fetching original_context and original_question from the dataset. The model then generates an expected_result based on this input. To assess model robustness, we introduce perturbations to the original_context and original_question, resulting in perturbed_context and perturbed_question. The model processes these perturbed inputs, producing an actual_result. The comparison between the expected_result and actual_result is conducted using the
llm_eval approach (where llm is used to evaluate the model response). Alternatively, users can employ metrics like String Distance or Embedding Distance to evaluate the model’s performance in the Question-Answering Task within the robustness category. For a more in-depth exploration of these approaches, you can refer to this notebook discussing these three methods.
|In the event that a user’s credentials are compromised, the Company shall promptly notify the affected user and require them to reset their password. The Company shall also take reasonable steps to prevent unauthorized access to the user’s account and to prevent future compromises of user credentials.
|Does the clause discuss compromised user credentials?
|In da event that a user’s credentials r compromised, tdaCompany shall promptly notify thdaffected user and require them 2 reset their password. Thedampany shall also take reasonable steps t2prevent unauthorized access to2he user’s account and to 2event future compromises of user credentials.
|Does da clause discuss compromised user credentials?
Generated Results for