langtest.transform.accuracy.LLMEval#

class LLMEval#

Bases: BaseAccuracy

Evaluation class for Language Model performance on question-answering tasks using the Language Model Metric (LLM).

alias_name#

Alias names for the evaluation class, should include “llm_eval”.

Type:

List[str]

supported_tasks#

Supported tasks for evaluation, includes “question-answering”.

Type:

List[str]

transform(cls, test

str, y_true: List[Any], params: Dict) -> List[MinScoreSample]: Transforms evaluation parameters and initializes the evaluation model.

run(cls, sample_list

List[MinScoreSample], *args, **kwargs) -> List[MinScoreSample]: Runs the evaluation on a list of samples using the Language Model Metric (LLM).

__init__()#

Methods

__init__()

async_run(sample_list, y_true, y_pred, **kwargs)

Creates a task to run the accuracy measure.

run(sample_list, y_true, y_pred, **kwargs)

Runs the evaluation on a list of samples using the Language Model Metric (LLM).

transform(test, y_true, params)

Transforms evaluation parameters and initializes the evaluation model.

Attributes

alias_name

eval_model

supported_tasks

test_types

class TestConfig#

Bases: dict

clear() None.  Remove all items from D.#
copy() a shallow copy of D#
fromkeys(value=None, /)#

Create a new dictionary with keys from iterable and values set to value.

get(key, default=None, /)#

Return the value for key if key is in the dictionary, else default.

items() a set-like object providing a view on D's items#
keys() a set-like object providing a view on D's keys#
pop(k[, d]) v, remove specified key and return the corresponding value.#

If the key is not found, return the default if given; otherwise, raise a KeyError.

popitem()#

Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.

setdefault(key, default=None, /)#

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update([E, ]**F) None.  Update D from dict/iterable E and F.#

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() an object providing a view on D's values#
async classmethod async_run(sample_list: List[MinScoreSample], y_true: List[Any], y_pred: List[Any], **kwargs)#

Creates a task to run the accuracy measure.

Parameters:
  • sample_list (List[MinScoreSample]) – List of samples to be transformed.

  • y_true (List[Any]) – True values

  • y_pred (List[Any]) – Predicted values

async static run(sample_list: List[MinScoreSample], y_true: List[Any], y_pred: List[Any], **kwargs)#

Runs the evaluation on a list of samples using the Language Model Metric (LLM).

Parameters:
  • sample_list (List[MinScoreSample]) – List of MinScoreSample instances containing evaluation information.

  • y_true (List[Any]) – List of true values for the model’s predictions.

  • y_pred (List[Any]) – List of predicted values by the model.

  • X_test (Optional) – Additional keyword argument representing the test data.

  • progress_bar (Optional) – Additional keyword argument indicating whether to display a progress bar.

  • **kwargs – Additional keyword arguments.

Returns:

List containing updated MinScoreSample instances after evaluation.

Return type:

List[MinScoreSample]

classmethod transform(test: str, y_true: List[Any], params: Dict) List[MinScoreSample]#

Transforms evaluation parameters and initializes the evaluation model.

Parameters:
  • test (str) – The alias name for the evaluation class.

  • y_true (List[Any]) – List of true labels (not used in this method).

  • params (Dict) – Additional parameters for evaluation, including ‘model’, ‘hub’, and ‘min_score’.

Returns:

List containing a MinScoreSample instance with evaluation information.

Return type:

List[MinScoreSample]

Raises:

AssertionError – If the ‘test’ parameter is not in the alias_name list.