langtest.transform.accuracy.LLMEval#
- class LLMEval#
Bases:
BaseAccuracy
Evaluation class for Language Model performance on question-answering tasks using the Language Model Metric (LLM).
- alias_name#
Alias names for the evaluation class, should include “llm_eval”.
- Type:
List[str]
- supported_tasks#
Supported tasks for evaluation, includes “question-answering”.
- Type:
List[str]
- transform(cls, test
str, y_true: List[Any], params: Dict) -> List[MinScoreSample]: Transforms evaluation parameters and initializes the evaluation model.
- run(cls, sample_list
List[MinScoreSample], *args, **kwargs) -> List[MinScoreSample]: Runs the evaluation on a list of samples using the Language Model Metric (LLM).
- __init__()#
Methods
__init__
()async_run
(sample_list, y_true, y_pred, **kwargs)Creates a task to run the accuracy measure.
run
(sample_list, y_true, y_pred, **kwargs)Runs the evaluation on a list of samples using the Language Model Metric (LLM).
transform
(test, y_true, params)Transforms evaluation parameters and initializes the evaluation model.
Attributes
eval_model
test_types
- class TestConfig#
Bases:
dict
- clear() None. Remove all items from D. #
- copy() a shallow copy of D #
- fromkeys(value=None, /)#
Create a new dictionary with keys from iterable and values set to value.
- get(key, default=None, /)#
Return the value for key if key is in the dictionary, else default.
- items() a set-like object providing a view on D's items #
- keys() a set-like object providing a view on D's keys #
- pop(k[, d]) v, remove specified key and return the corresponding value. #
If the key is not found, return the default if given; otherwise, raise a KeyError.
- popitem()#
Remove and return a (key, value) pair as a 2-tuple.
Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.
- setdefault(key, default=None, /)#
Insert key with a value of default if key is not in the dictionary.
Return the value for key if key is in the dictionary, else default.
- update([E, ]**F) None. Update D from dict/iterable E and F. #
If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
- values() an object providing a view on D's values #
- async classmethod async_run(sample_list: List[MinScoreSample], y_true: List[Any], y_pred: List[Any], **kwargs)#
Creates a task to run the accuracy measure.
- Parameters:
sample_list (List[MinScoreSample]) – List of samples to be transformed.
y_true (List[Any]) – True values
y_pred (List[Any]) – Predicted values
- async static run(sample_list: List[MinScoreSample], y_true: List[Any], y_pred: List[Any], **kwargs)#
Runs the evaluation on a list of samples using the Language Model Metric (LLM).
- Parameters:
sample_list (List[MinScoreSample]) – List of MinScoreSample instances containing evaluation information.
y_true (List[Any]) – List of true values for the model’s predictions.
y_pred (List[Any]) – List of predicted values by the model.
X_test (Optional) – Additional keyword argument representing the test data.
progress_bar (Optional) – Additional keyword argument indicating whether to display a progress bar.
**kwargs – Additional keyword arguments.
- Returns:
List containing updated MinScoreSample instances after evaluation.
- Return type:
List[MinScoreSample]
- classmethod transform(test: str, y_true: List[Any], params: Dict) List[MinScoreSample] #
Transforms evaluation parameters and initializes the evaluation model.
- Parameters:
test (str) – The alias name for the evaluation class.
y_true (List[Any]) – List of true labels (not used in this method).
params (Dict) – Additional parameters for evaluation, including ‘model’, ‘hub’, and ‘min_score’.
- Returns:
List containing a MinScoreSample instance with evaluation information.
- Return type:
List[MinScoreSample]
- Raises:
AssertionError – If the ‘test’ parameter is not in the alias_name list.