Accuracy

LLM Eval

we employ a more robust Language Model (LLM) to evaluate the model’s response. Test is passed if the score is higher than the configured min score.

alias_name: llm_eval

supported tasks: question_answering

Config

llm_eval:
    hub: openai
    min_score: 0.75
    model: gpt-3.5-turbo-instruct

min_score (float): Minimum score to pass the test.
model (string): LLM model use to evaluate the model reponse.
hub (string): Hub (library) for loading model from public models hub or from path

Min Bleu Score

This test uses the “bleu_score” from evaluate library. Test is passed if the score is higher than the configured min score.

alias_name: min_bleu_score

supported tasks: question_answering

Config

min_bleu_score:
      min_score: 0.8

min_score (float): Minimum score to pass the test.

Min Exact Match Score

This test uses the “exact_match” from evaluate library. Test is passed if the score is higher than the configured min score.

alias_name: min_exact_match_score

supported tasks: question_answering

Config

min_exact_match_score:
      min_score: 0.8

min_score (float): Minimum score to pass the test.

Min F1 Score

This test checks the f1 score for each label. Test is passed if the f1 score is higher than the configured min score.

alias_name: min_f1_score

supported tasks: ner, text-classification

Config

min_f1_score:
      min_score: 0.8

min_f1_score:
      min_score:
        O: 0.75
        PER: 0.65
        LOC: 0.90

min_score (dict or float): Minimum pass rate to pass the test.

Min Macro-F1 Score

This test checks the macro-f1 score. Test is passed if the macro-f1 score is higher than the configured min score.

alias_name: min_macro_f1_score

supported tasks: ner, text-classification

Config

min_macro_f1_score:
      min_score: 0.8

min_score (float): Minimum pass rate to pass the test.

Min Micro-F1 Score

This test checks the micro-f1 score. Test is passed if the micro-f1 score is higher than the configured min score.

alias_name: min_micro_f1_score

supported tasks: ner, text-classification

Config

min_micro_f1_score:
      min_score: 0.8

min_score (float): Minimum pass rate to pass the test.

Min Precision Score

This test checks the precision score for each label. Test is passed if the precision score is higher than the configured min score.

alias_name: min_precision_score

supported tasks: ner, text-classification

Config

min_precision_score:
      min_score: 0.8

min_precision_score:
      min_score:
        O: 0.75
        PER: 0.65
        LOC: 0.90

min_score (dict or float): Minimum pass rate to pass the test.

Min Recall Score

This test checks the recall score for each label. Test is passed if the recall score is higher than the configured min score.

alias_name: min_recall_score

supported tasks: ner, text-classification

Config

min_recall_score:
      min_score: 0.5

min_recall_score:
      min_score:
        O: 0.75
        PER: 0.65
        LOC: 0.90

min_score (dict or float): Minimum pass rate to pass the test.

Min Rouge1 Score

This test uses the “rouge_score” from evaluate library. This test uses rouge1 result of “rouge_score”. Test is passed if the score is higher than the configured min score.

alias_name: min_rouge1_score

supported tasks: question_answering

Config

min_rouge1_score:
      min_score: 0.8

min_score (float): Minimum score to pass the test.

Min Rouge2 Score

This test uses the “rouge_score” from evaluate library. This test uses rouge2 result of “rouge_score”. Test is passed if the score is higher than the configured min score.

alias_name: min_rouge2_score

supported tasks: question_answering

Config

min_rouge2_score:
      min_score: 0.8

min_score (float): Minimum score to pass the test.

Min RougeL Score

This test uses the “rouge_score” from evaluate library. This test uses rougeL result of “rouge_score”. Test is passed if the score is higher than the configured min score.

alias_name: min_rougeL_score

supported tasks: question_answering

Config

min_rougeL_score:
      min_score: 0.8

min_score (float): Minimum score to pass the test.

Min RougeLsum Score

This test uses the “rouge_score” from evaluate library. This test uses rougeLsum result of “rouge_score”. Test is passed if the score is higher than the configured min score.

alias_name: min_rougeLsum_score

supported tasks: question_answering

Config

min_rougeLsum_score:
      min_score: 0.8

min_score (float): Minimum score to pass the test.

Min Weighted-F1 Score

This test checks the weighted-f1 score. Test is passed if the weighted-f1 score is higher than the configured min score.

alias_name: min_weighted_f1_score

supported tasks: ner, text-classification

Config

min_weighted_f1_score:
      min_score: 0.8

min_score (float): Minimum pass rate to pass the test.