LLM Eval
we employ a more robust Language Model (LLM) to evaluate the model’s response. Test is passed if the score is higher than the configured min score.
alias_name: llm_eval
supported tasks: question_answering
Config
llm_eval:
hub: openai
min_score: 0.75
model: gpt-3.5-turbo-instruct
- min_score (float): Minimum score to pass the test.
- model (string): LLM model use to evaluate the model reponse.
- hub (string): Hub (library) for loading model from public models hub or from path
Min Bleu Score
This test uses the “bleu_score” from evaluate library. Test is passed if the score is higher than the configured min score.
alias_name: min_bleu_score
supported tasks: question_answering
Config
min_bleu_score:
min_score: 0.8
- min_score (float): Minimum score to pass the test.
Min Exact Match Score
This test uses the “exact_match” from evaluate library. Test is passed if the score is higher than the configured min score.
alias_name: min_exact_match_score
supported tasks: question_answering
Config
min_exact_match_score:
min_score: 0.8
- min_score (float): Minimum score to pass the test.
Min F1 Score
This test checks the f1 score for each label. Test is passed if the f1 score is higher than the configured min score.
alias_name: min_f1_score
supported tasks: ner
, text-classification
Config
min_f1_score:
min_score: 0.8
min_f1_score:
min_score:
O: 0.75
PER: 0.65
LOC: 0.90
- min_score (dict or float): Minimum pass rate to pass the test.
Min Macro-F1 Score
This test checks the macro-f1 score. Test is passed if the macro-f1 score is higher than the configured min score.
alias_name: min_macro_f1_score
supported tasks: ner
, text-classification
Config
min_macro_f1_score:
min_score: 0.8
- min_score (float): Minimum pass rate to pass the test.
Min Micro-F1 Score
This test checks the micro-f1 score. Test is passed if the micro-f1 score is higher than the configured min score.
alias_name: min_micro_f1_score
supported tasks: ner
, text-classification
Config
min_micro_f1_score:
min_score: 0.8
- min_score (float): Minimum pass rate to pass the test.
Min Precision Score
This test checks the precision score for each label. Test is passed if the precision score is higher than the configured min score.
alias_name: min_precision_score
supported tasks: ner
, text-classification
Config
min_precision_score:
min_score: 0.8
min_precision_score:
min_score:
O: 0.75
PER: 0.65
LOC: 0.90
- min_score (dict or float): Minimum pass rate to pass the test.
Min Recall Score
This test checks the recall score for each label. Test is passed if the recall score is higher than the configured min score.
alias_name: min_recall_score
supported tasks: ner
, text-classification
Config
min_recall_score:
min_score: 0.5
min_recall_score:
min_score:
O: 0.75
PER: 0.65
LOC: 0.90
- min_score (dict or float): Minimum pass rate to pass the test.
Min Rouge1 Score
This test uses the “rouge_score” from evaluate library. This test uses rouge1 result of “rouge_score”. Test is passed if the score is higher than the configured min score.
alias_name: min_rouge1_score
supported tasks: question_answering
Config
min_rouge1_score:
min_score: 0.8
- min_score (float): Minimum score to pass the test.
Min Rouge2 Score
This test uses the “rouge_score” from evaluate library. This test uses rouge2 result of “rouge_score”. Test is passed if the score is higher than the configured min score.
alias_name: min_rouge2_score
supported tasks: question_answering
Config
min_rouge2_score:
min_score: 0.8
- min_score (float): Minimum score to pass the test.
Min RougeL Score
This test uses the “rouge_score” from evaluate library. This test uses rougeL result of “rouge_score”. Test is passed if the score is higher than the configured min score.
alias_name: min_rougeL_score
supported tasks: question_answering
Config
min_rougeL_score:
min_score: 0.8
- min_score (float): Minimum score to pass the test.
Min RougeLsum Score
This test uses the “rouge_score” from evaluate library. This test uses rougeLsum result of “rouge_score”. Test is passed if the score is higher than the configured min score.
alias_name: min_rougeLsum_score
supported tasks: question_answering
Config
min_rougeLsum_score:
min_score: 0.8
- min_score (float): Minimum score to pass the test.
Min Weighted-F1 Score
This test checks the weighted-f1 score. Test is passed if the weighted-f1 score is higher than the configured min score.
alias_name: min_weighted_f1_score
supported tasks: ner
, text-classification
Config
min_weighted_f1_score:
min_score: 0.8
- min_score (float): Minimum pass rate to pass the test.