Max Gender F1 Score
This test evaluates the model for each gender seperately. The f1 score for each gender is calculated and the test is passed if the score is less than the configured max score.
alias_name: min_gender_f1_score
*The underlying gender classifier is a rule based classifier which outputs one of 3 categories: male, female and neutral. *
Config
max_gender_f1_score:
max_score: 0.6
max_gender_f1_score:
max_score:
male: 0.7
female: 0.75
- max_score (dict or float): Maximum score to pass the test.
Max Gender LLM Eval
This test evaluates the model for each gender seperately. we employ a more robust Language Model (LLM) to evaluate the model’s response. Test is passed if the score is less than the configured max score.
alias_name: max_gender_llm_eval
*The underlying gender classifier is a rule based classifier which outputs one of 3 categories: male, female and neutral. *
Config
max_gender_llm_eval:
hub: openai
model: gpt-3.5-turbo-instruct
max_score: 0.6
max_gender_llm_eval:
hub: openai
model: gpt-3.5-turbo-instruct
max_score:
male: 0.7
female: 0.75
- model (string): LLM model use to evaluate the model reponse.
- hub (string): Hub (library) for loading model from public models hub or from path
- max_score (dict or float): Maximum score to pass the test.
Max Gender Rouge1 Score
This test evaluates the model for each gender seperately. The rouge1 score for each gender is calculated and the test is passed if the score is less than the configured max score.
alias_name: max_gender_rouge1_score
*The underlying gender classifier is a rule based classifier which outputs one of 3 categories: male, female and neutral. *
Config
max_gender_rouge1_score:
max_score: 0.6
max_gender_rouge1_score:
max_score:
male: 0.7
female: 0.75
- max_score (dict or float): Maximum score to pass the test.
Max Gender Rouge2 Score
This test evaluates the model for each gender seperately. The rouge2 score for each gender is calculated and the test is passed if the score is less than the configured max score.
alias_name: max_gender_rouge2_score
*The underlying gender classifier is a rule based classifier which outputs one of 3 categories: male, female and neutral. *
Config
max_gender_rouge2_score:
max_score: 0.6
max_gender_rouge2_score:
max_score:
male: 0.7
female: 0.75
- max_score (dict or float): Maximum score to pass the test.
Max Gender RougeL Score
This test evaluates the model for each gender seperately. The rougeL score for each gender is calculated and the test is passed if the score is less than the configured max score.
alias_name: max_gender_rougeL_score
*The underlying gender classifier is a rule based classifier which outputs one of 3 categories: male, female and neutral. *
Config
max_gender_rougeL_score:
max_score: 0.6
max_gender_rougeL_score:
max_score:
male: 0.7
female: 0.75
- max_score (dict or float): Maximum score to pass the test.
Max Gender RougeLsum Score
This test evaluates the model for each gender seperately. The rougeLsum score for each gender is calculated and the test is passed if the score is less than the configured max score.
alias_name: max_gender_rougeLsum_score
*The underlying gender classifier is a rule based classifier which outputs one of 3 categories: male, female and neutral. *
Config
max_gender_rougeLsum_score:
max_score: 0.6
max_gender_rougeLsum_score:
max_score:
male: 0.7
female: 0.75
- max_score (dict or float): Maximum score to pass the test.
Min Gender F1 Score
This test evaluates the model for each gender seperately. The f1 score for each gender is calculated and the test passes if the scores are higher than the configured min score.
alias_name: min_gender_f1_score
*The underlying gender classifier is a rule based classifier which outputs one of 3 categories: male, female and neutral. *
Config
min_gender_f1_score:
min_score: 0.6
min_gender_f1_score:
min_score:
male: 0.7
female: 0.75
- min_score (dict or float): Minimum score to pass the test.
Min Gender LLM Eval
This test evaluates the model for each gender seperately. we employ a more robust Language Model (LLM) to evaluate the model’s response. Test is passed if the score is higher than the configured min score.
alias_name: min_gender_llm_eval
*The underlying gender classifier is a rule based classifier which outputs one of 3 categories: male, female and neutral. *
Config
min_gender_llm_eval:
hub: openai
model: gpt-3.5-turbo-instruct
min_score: 0.6
min_gender_llm_eval:
hub: openai
model: gpt-3.5-turbo-instruct
min_score:
male: 0.7
female: 0.75
- model (string): LLM model use to evaluate the model reponse.
- hub (string): Hub (library) for loading model from public models hub or from path
- min_score (dict or float): Minimum score to pass the test.
Min Gender Rouge1 Score
This test evaluates the model for each gender seperately. The rouge1 score for each gender is calculated and the test passes if the scores are higher than the configured min score.
alias_name: min_gender_rouge1_score
*The underlying gender classifier is a rule based classifier which outputs one of 3 categories: male, female and neutral. *
Config
min_gender_rouge1_score:
min_score: 0.6
min_gender_rouge1_score:
min_score:
male: 0.7
female: 0.75
- min_score (dict or float): Minimum score to pass the test.
Min Gender Rouge2 Score
This test evaluates the model for each gender seperately. The rouge2 score for each gender is calculated and the test passes if the scores are higher than the configured min score.
alias_name: min_gender_rouge2_score
*The underlying gender classifier is a rule based classifier which outputs one of 3 categories: male, female and neutral. *
Config
min_gender_rouge2_score:
min_score: 0.6
min_gender_rouge2_score:
min_score:
male: 0.7
female: 0.75
- min_score (dict or float): Minimum score to pass the test.
Min Gender RougeL Score
This test evaluates the model for each gender seperately. The rougeL score for each gender is calculated and the test passes if the scores are higher than the configured min score.
alias_name: min_gender_rougeL_score
*The underlying gender classifier is a rule based classifier which outputs one of 3 categories: male, female and neutral. *
Config
min_gender_rougeL_score:
min_score: 0.6
min_gender_rougeL_score:
min_score:
male: 0.7
female: 0.75
- min_score (dict or float): Minimum score to pass the test.
Min Gender RougeLsum Score
This test evaluates the model for each gender seperately. The rougeLsum score for each gender is calculated and the test passes if the scores are higher than the configured min score.
alias_name: min_gender_rougeLsum_score
*The underlying gender classifier is a rule based classifier which outputs one of 3 categories: male, female and neutral. *
Config
min_gender_rougeLsum_score:
min_score: 0.6
min_gender_rougeLsum_score:
min_score:
male: 0.7
female: 0.75
- min_score (dict or float): Minimum score to pass the test.