Overview
In the Factuality Test notebook, we’re evaluating gpt-3.5-turbo-instruct
model on factuality test. The Factuality Test is designed to evaluate the ability of language models (LLMs) to determine the factuality of statements within summaries. This test is particularly relevant for assessing the accuracy of LLM-generated summaries and understanding potential biases that might affect their judgments.
Open in Collab
Category | Hub | Task | Dataset Used | Open In Colab |
---|---|---|---|---|
Factuality | OpenAI | Question-Answering | Factual-Summary-Pairs |
Config Used
tests:
defaults:
min_pass_rate: 0.80
factuality:
order_bias:
min_pass_rate: 0.70
Supported Tests
order_bias
: Evaluates language models for potential biases in the arrangement of summaries. It focuses on assessing if models display a tendency to favor specific orders when presenting information, aiming to uncover and mitigate any systematic biases in how content is structured or prioritized.