Fairness

Max Gender F1 Score

This test evaluates the model for each gender seperately. The f1 score for each gender is calculated and the test is passed if the score is less than the configured max score.

alias_name: min_gender_f1_score

*The underlying gender classifier is a rule based classifier which outputs one of 3 categories: male, female and neutral. *

Config

max_gender_f1_score:
    max_score: 0.6

max_gender_f1_score:
    max_score:
        male: 0.7
        female: 0.75

max_score (dict or float): Maximum score to pass the test.

Max Gender LLM Eval

This test evaluates the model for each gender seperately. we employ a more robust Language Model (LLM) to evaluate the model’s response. Test is passed if the score is less than the configured max score.

alias_name: max_gender_llm_eval

*The underlying gender classifier is a rule based classifier which outputs one of 3 categories: male, female and neutral. *

Config

max_gender_llm_eval:
    hub: openai
    model: gpt-3.5-turbo-instruct
    max_score: 0.6
    

max_gender_llm_eval:
    hub: openai
    model: gpt-3.5-turbo-instruct
    max_score:
        male: 0.7
        female: 0.75

model (string): LLM model use to evaluate the model reponse.
hub (string): Hub (library) for loading model from public models hub or from path
max_score (dict or float): Maximum score to pass the test.

Max Gender Rouge1 Score

This test evaluates the model for each gender seperately. The rouge1 score for each gender is calculated and the test is passed if the score is less than the configured max score.

alias_name: max_gender_rouge1_score

*The underlying gender classifier is a rule based classifier which outputs one of 3 categories: male, female and neutral. *

Config

max_gender_rouge1_score:
    max_score: 0.6

max_gender_rouge1_score:
    max_score:
        male: 0.7
        female: 0.75

max_score (dict or float): Maximum score to pass the test.

Max Gender Rouge2 Score

This test evaluates the model for each gender seperately. The rouge2 score for each gender is calculated and the test is passed if the score is less than the configured max score.

alias_name: max_gender_rouge2_score

*The underlying gender classifier is a rule based classifier which outputs one of 3 categories: male, female and neutral. *

Config

max_gender_rouge2_score:
    max_score: 0.6

max_gender_rouge2_score:
    max_score:
        male: 0.7
        female: 0.75

max_score (dict or float): Maximum score to pass the test.

Max Gender RougeL Score

This test evaluates the model for each gender seperately. The rougeL score for each gender is calculated and the test is passed if the score is less than the configured max score.

alias_name: max_gender_rougeL_score

*The underlying gender classifier is a rule based classifier which outputs one of 3 categories: male, female and neutral. *

Config

max_gender_rougeL_score:
    max_score: 0.6

max_gender_rougeL_score:
    max_score:
        male: 0.7
        female: 0.75

max_score (dict or float): Maximum score to pass the test.

Max Gender RougeLsum Score

This test evaluates the model for each gender seperately. The rougeLsum score for each gender is calculated and the test is passed if the score is less than the configured max score.

alias_name: max_gender_rougeLsum_score

*The underlying gender classifier is a rule based classifier which outputs one of 3 categories: male, female and neutral. *

Config

max_gender_rougeLsum_score:
    max_score: 0.6

max_gender_rougeLsum_score:
    max_score:
        male: 0.7
        female: 0.75

max_score (dict or float): Maximum score to pass the test.

Min Gender F1 Score

This test evaluates the model for each gender seperately. The f1 score for each gender is calculated and the test passes if the scores are higher than the configured min score.

alias_name: min_gender_f1_score

*The underlying gender classifier is a rule based classifier which outputs one of 3 categories: male, female and neutral. *

Config

min_gender_f1_score:
    min_score: 0.6

min_gender_f1_score:
    min_score:
        male: 0.7
        female: 0.75

min_score (dict or float): Minimum score to pass the test.

Min Gender LLM Eval

This test evaluates the model for each gender seperately. we employ a more robust Language Model (LLM) to evaluate the model’s response. Test is passed if the score is higher than the configured min score.

alias_name: min_gender_llm_eval

*The underlying gender classifier is a rule based classifier which outputs one of 3 categories: male, female and neutral. *

Config

min_gender_llm_eval:
    hub: openai
    model: gpt-3.5-turbo-instruct
    min_score: 0.6
    

min_gender_llm_eval:
    hub: openai
    model: gpt-3.5-turbo-instruct
    min_score:
        male: 0.7
        female: 0.75

model (string): LLM model use to evaluate the model reponse.
hub (string): Hub (library) for loading model from public models hub or from path
min_score (dict or float): Minimum score to pass the test.

Min Gender Rouge1 Score

This test evaluates the model for each gender seperately. The rouge1 score for each gender is calculated and the test passes if the scores are higher than the configured min score.

alias_name: min_gender_rouge1_score

*The underlying gender classifier is a rule based classifier which outputs one of 3 categories: male, female and neutral. *

Config

min_gender_rouge1_score:
    min_score: 0.6

min_gender_rouge1_score:
    min_score:
        male: 0.7
        female: 0.75

min_score (dict or float): Minimum score to pass the test.

Min Gender Rouge2 Score

This test evaluates the model for each gender seperately. The rouge2 score for each gender is calculated and the test passes if the scores are higher than the configured min score.

alias_name: min_gender_rouge2_score

*The underlying gender classifier is a rule based classifier which outputs one of 3 categories: male, female and neutral. *

Config

min_gender_rouge2_score:
    min_score: 0.6

min_gender_rouge2_score:
    min_score:
        male: 0.7
        female: 0.75

min_score (dict or float): Minimum score to pass the test.

Min Gender RougeL Score

This test evaluates the model for each gender seperately. The rougeL score for each gender is calculated and the test passes if the scores are higher than the configured min score.

alias_name: min_gender_rougeL_score

*The underlying gender classifier is a rule based classifier which outputs one of 3 categories: male, female and neutral. *

Config

min_gender_rougeL_score:
    min_score: 0.6

min_gender_rougeL_score:
    min_score:
        male: 0.7
        female: 0.75

min_score (dict or float): Minimum score to pass the test.

Min Gender RougeLsum Score

This test evaluates the model for each gender seperately. The rougeLsum score for each gender is calculated and the test passes if the scores are higher than the configured min score.

alias_name: min_gender_rougeLsum_score

*The underlying gender classifier is a rule based classifier which outputs one of 3 categories: male, female and neutral. *

Config

min_gender_rougeLsum_score:
    min_score: 0.6

min_gender_rougeLsum_score:
    min_score:
        male: 0.7
        female: 0.75

min_score (dict or float): Minimum score to pass the test.