langtest.transform.accuracy.DegradationAnalysis#
- class DegradationAnalysis#
Bases:
BaseAccuracy
Evaluation class for model performance degradation analysis.
- alias_name#
Alias names for the evaluation class, should include “degradation_analysis”.
- Type:
List[str]
- supported_tasks#
Supported tasks for evaluation,
- Type:
List[str]
Methods:
- __init__()#
Methods
__init__
()async_run
(sample_list, y_true, y_pred, **kwargs)Creates a task to run the accuracy measure.
preprocess
(y_true, y_pred)Preprocesses the input data for the degradation analysis.
qa_evaluation
(samples, X_test)Evaluates the model performance on question-answering tasks.
run
(sample_list, y_true, y_pred, **kwargs)Computes the accuracy score for the given data.
show_results
()transform
(test, y_true, params)Abstract method that implements the accuracy measure.
Attributes
result_data
test_types
- class TestConfig#
Bases:
dict
- clear() None. Remove all items from D. #
- copy() a shallow copy of D #
- fromkeys(value=None, /)#
Create a new dictionary with keys from iterable and values set to value.
- get(key, default=None, /)#
Return the value for key if key is in the dictionary, else default.
- items() a set-like object providing a view on D's items #
- keys() a set-like object providing a view on D's keys #
- pop(k[, d]) v, remove specified key and return the corresponding value. #
If the key is not found, return the default if given; otherwise, raise a KeyError.
- popitem()#
Remove and return a (key, value) pair as a 2-tuple.
Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.
- setdefault(key, default=None, /)#
Insert key with a value of default if key is not in the dictionary.
Return the value for key if key is in the dictionary, else default.
- update([E, ]**F) None. Update D from dict/iterable E and F. #
If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
- values() an object providing a view on D's values #
- async classmethod async_run(sample_list: List[MinScoreSample], y_true: List[Any], y_pred: List[Any], **kwargs)#
Creates a task to run the accuracy measure.
- Parameters:
sample_list (List[MinScoreSample]) – List of samples to be transformed.
y_true (List[Any]) – True values
y_pred (List[Any]) – Predicted values
- static preprocess(y_true: list | Series, y_pred: list | Series)#
Preprocesses the input data for the degradation analysis.
- Parameters:
y_true (List) – The true labels.
y_pred (List) – The predicted labels.
- Returns:
The preprocessed true and predicted labels.
- Return type:
y_true, y_pred (Tuple[pd.Series, pd.Series])
- static qa_evaluation(samples: List[QASample], X_test: DataFrame)#
Evaluates the model performance on question-answering tasks.
- Parameters:
samples (List[QASample]) – The list of QASample instances.
X_test (pd.DataFrame) – The test data.
- Returns:
The accuracy scores for the original and perturbed samples.
- Return type:
Tuple[float, float]
- async static run(sample_list: List[DegradationSample], y_true: List[Any], y_pred: List[Any], **kwargs)#
Computes the accuracy score for the given data.
- Parameters:
sample_list (List[MinScoreSample]) – List of samples to be transformed.
y_true (List[Any]) – True values
y_pred (List[Any]) – Predicted values
- classmethod transform(test: str, y_true: List[Any], params: Dict)#
Abstract method that implements the accuracy measure.
- Parameters:
y_true (List[Any]) – True values
params (Dict) – parameters for tests configuration
- Returns:
The transformed data based on the implemented accuracy measure.
- Return type:
List[MinScoreSample]