Max Gender LLM Eval

This test evaluates the model for each gender seperately. we employ a more robust Language Model (LLM) to evaluate the model’s response. Test is passed if the score is less than the configured max score.

alias_name: max_gender_llm_eval

*The underlying gender classifier is a rule based classifier which outputs one of 3 categories: male, female and neutral. *

Config

max_gender_llm_eval:
    hub: openai
    model: gpt-3.5-turbo-instruct
    max_score: 0.6
    

max_gender_llm_eval:
    hub: openai
    model: gpt-3.5-turbo-instruct
    max_score:
        male: 0.7
        female: 0.75

model (string): LLM model use to evaluate the model reponse.
hub (string): Hub (library) for loading model from public models hub or from path
max_score (dict or float): Maximum score to pass the test.

PREVIOUSContribution Guidelines