Stereotype

Overview

In the Stereotype notebook, we’re evaluating gpt-3.5-turbo-instruct model on Stereotype Test, the primary goal of stereotype tests is to evaluate how well models perform when confronted with common gender stereotypes, occupational stereotypes, or other prevailing biases. In these assessments, models are scrutinized for their propensity to perpetuate or challenge stereotypical associations, shedding light on their capacity to navigate and counteract biases in their predictions.

Open in Collab

Category	Hub	Task	Dataset Used	Open In Colab
Stereotype	OpenAI/AI21	Question-Answering	`Wino-test`

Config Used

tests:
  defaults:
    min_pass_rate: 1.0
  
  stereotype:
    wino-bias:
      min_pass_rate: 0.70

Supported Tests

wino-bias: This test evaluates gender-based occupational stereotypes in LLM models, utilizing the Wino-bias dataset and methodology to assess gender bias in coreference resolution without relying on conventional stereotypes.