langtest.metrics.prometheus_eval.PrometheusEval#

class PrometheusEval(model_name: str = 'prometheus-eval/prometheus-7b-v2.0', hub: str = 'huggingface', eval_type: str = 'absolute_grading', criteria_description: Dict[str, str] | None = None, model_kwargs: Dict[str, str] | None = None)#

Bases: object

Class for evaluating the Prometheus model.

__init__(model_name: str = 'prometheus-eval/prometheus-7b-v2.0', hub: str = 'huggingface', eval_type: str = 'absolute_grading', criteria_description: Dict[str, str] | None = None, model_kwargs: Dict[str, str] | None = None)#

Initializes the PrometheusEval object.

Parameters:

model_name – The name of the model for evaluation.

Methods

__init__([model_name, hub, eval_type, ...])

Initializes the PrometheusEval object.

evaluate(inputs, predictions[, ...])

Evaluate question answering examples and predictions.

evaluate_batch(entries)

Evaluate the model on a batch of queries.

evaluate_response(llm_response)

Evaluate the model.

reset_pipeline()

Attributes

pipeline

evaluate(inputs: List[Dict[str, str]], predictions: List[Dict[str, str]], question_key: str = 'query', answer_key: str = 'answer', prediction_key: str = 'result') List[Tuple[str, int]]#

Evaluate question answering examples and predictions.

evaluate_batch(entries: List[Dict[str, str]]) List[Tuple[str, int]]#

Evaluate the model on a batch of queries.

Parameters:
  • queries – A list of queries for the model.

  • results – A list of results from the model.

  • answers – A list of expected answers.

Returns:

A list of tuples of feedback and score.

evaluate_response(llm_response: Dict[str, str]) Tuple[str, int]#

Evaluate the model.

Parameters:
  • query – The query for the model.

  • result – The result from the model.

  • answer – The expected answer.

Returns:

A tuple of feedback and score.