Factuality

Overview

In the Factuality Test notebook, we’re evaluating gpt-3.5-turbo-instruct model on factuality test. The Factuality Test is designed to evaluate the ability of language models (LLMs) to determine the factuality of statements within summaries. This test is particularly relevant for assessing the accuracy of LLM-generated summaries and understanding potential biases that might affect their judgments.

Open in Collab

Category	Hub	Task	Dataset Used	Open In Colab
Factuality	OpenAI	Question-Answering	Factual-Summary-Pairs

Config Used

tests:
  defaults:
    min_pass_rate: 0.80

  factuality:
    order_bias:
      min_pass_rate: 0.70

Supported Tests

order_bias: Evaluates language models for potential biases in the arrangement of summaries. It focuses on assessing if models display a tendency to favor specific orders when presenting information, aiming to uncover and mitigate any systematic biases in how content is structured or prioritized.