In the Factuality Test notebook, we’re evaluating gpt-3.5-turbo-instruct model on factuality test. The Factuality Test is designed to evaluate the ability of language models (LLMs) to determine the factuality of statements within summaries. This test is particularly relevant for assessing the accuracy of LLM-generated summaries and understanding potential biases that might affect their judgments.

Open in Collab

Category Hub Task Dataset Used Open In Colab
Factuality OpenAI Question-Answering Factual-Summary-Pairs Open In Colab

Config Used

    min_pass_rate: 0.80

      min_pass_rate: 0.70

Supported Tests

  • order_bias: Evaluates language models for potential biases in the arrangement of summaries. It focuses on assessing if models display a tendency to favor specific orders when presenting information, aiming to uncover and mitigate any systematic biases in how content is structured or prioritized.