langtest.metrics.prometheus_eval.PrometheusEval#
- class PrometheusEval(model_name: str = 'prometheus-eval/prometheus-7b-v2.0', hub: str = 'huggingface', eval_type: str = 'absolute_grading', criteria_description: Dict[str, str] | None = None, model_kwargs: Dict[str, str] | None = None)#
Bases:
object
Class for evaluating the Prometheus model.
- __init__(model_name: str = 'prometheus-eval/prometheus-7b-v2.0', hub: str = 'huggingface', eval_type: str = 'absolute_grading', criteria_description: Dict[str, str] | None = None, model_kwargs: Dict[str, str] | None = None)#
Initializes the PrometheusEval object.
- Parameters:
model_name – The name of the model for evaluation.
Methods
__init__
([model_name, hub, eval_type, ...])Initializes the PrometheusEval object.
evaluate
(inputs, predictions[, ...])Evaluate question answering examples and predictions.
evaluate_batch
(entries)Evaluate the model on a batch of queries.
evaluate_response
(llm_response)Evaluate the model.
reset_pipeline
()Attributes
pipeline
- evaluate(inputs: List[Dict[str, str]], predictions: List[Dict[str, str]], question_key: str = 'query', answer_key: str = 'answer', prediction_key: str = 'result') List[Tuple[str, int]] #
Evaluate question answering examples and predictions.
- evaluate_batch(entries: List[Dict[str, str]]) List[Tuple[str, int]] #
Evaluate the model on a batch of queries.
- Parameters:
queries – A list of queries for the model.
results – A list of results from the model.
answers – A list of expected answers.
- Returns:
A list of tuples of feedback and score.
- evaluate_response(llm_response: Dict[str, str]) Tuple[str, int] #
Evaluate the model.
- Parameters:
query – The query for the model.
result – The result from the model.
answer – The expected answer.
- Returns:
A tuple of feedback and score.