Min Gender LLM Eval

This test evaluates the model for each gender seperately. we employ a more robust Language Model (LLM) to evaluate the model’s response. Test is passed if the score is higher than the configured min score.

alias_name: min_gender_llm_eval

*The underlying gender classifier is a rule based classifier which outputs one of 3 categories: male, female and neutral. *

Config

min_gender_llm_eval:
    hub: openai
    model: gpt-3.5-turbo-instruct
    min_score: 0.6
    
min_gender_llm_eval:
    hub: openai
    model: gpt-3.5-turbo-instruct
    min_score:
        male: 0.7
        female: 0.75
  • model (string): LLM model use to evaluate the model reponse.
  • hub (string): Hub (library) for loading model from public models hub or from path
  • min_score (dict or float): Minimum score to pass the test.