In the Factuality Test notebook, we’re evaluating
text-davinci-003 model on factuality test. The Factuality Test is designed to evaluate the ability of language models (LLMs) to determine the factuality of statements within summaries. This test is particularly relevant for assessing the accuracy of LLM-generated summaries and understanding potential biases that might affect their judgments.
Open in Collab
|Category||Hub||Task||Dataset Used||Open In Colab|
tests: defaults: min_pass_rate: 0.80 factuality: order_bias: min_pass_rate: 0.70
order_bias: Evaluates language models for potential biases in the arrangement of summaries. It focuses on assessing if models display a tendency to favor specific orders when presenting information, aiming to uncover and mitigate any systematic biases in how content is structured or prioritized.