langtest.transform.accuracy.DegradationAnalysis#

class DegradationAnalysis#

Bases: BaseAccuracy

Evaluation class for model performance degradation analysis.

alias_name#

Alias names for the evaluation class, should include “degradation_analysis”.

Type:: List[str]

supported_tasks#

Supported tasks for evaluation,

Type:: List[str]

Methods:

__init__()#

Methods

`__init__`()
`async_run`(sample_list, y_true, y_pred, **kwargs)	Creates a task to run the accuracy measure.
`preprocess`(y_true, y_pred)	Preprocesses the input data for the degradation analysis.
`qa_evaluation`(samples, X_test)	Evaluates the model performance on question-answering tasks.
`run`(sample_list, y_true, y_pred, **kwargs)	Computes the accuracy score for the given data.
`show_results`()
`transform`(test, y_true, params)	Abstract method that implements the accuracy measure.

Attributes

`alias_name`
`result_data`
`supported_tasks`
`test_types`

class TestConfig#

Bases: dict

clear() → None. Remove all items from D.#

copy() → a shallow copy of D#

fromkeys(value=None, /)#: Create a new dictionary with keys from iterable and values set to value.

get(key, default=None, /)#: Return the value for key if key is in the dictionary, else default.

items() → a set-like object providing a view on D's items#

keys() → a set-like object providing a view on D's keys#

pop(k[, d]) → v, remove specified key and return the corresponding value.#: If the key is not found, return the default if given; otherwise, raise a KeyError.

popitem()#

Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.

setdefault(key, default=None, /)#

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update([E, ]**F) → None. Update D from dict/iterable E and F.#: If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() → an object providing a view on D's values#

async classmethod async_run(sample_list: List[MinScoreSample], y_true: List[Any], y_pred: List[Any], **kwargs)#

Creates a task to run the accuracy measure.

Parameters:

sample_list (List[MinScoreSample]) – List of samples to be transformed.
y_true (List[Any]) – True values
y_pred (List[Any]) – Predicted values

static preprocess(y_true: list | Series, y_pred: list | Series)#

Preprocesses the input data for the degradation analysis.

Parameters:

y_true (List) – The true labels.
y_pred (List) – The predicted labels.

Returns:

The preprocessed true and predicted labels.

Return type:

y_true, y_pred (Tuple[pd.Series, pd.Series])

static qa_evaluation(samples: List[QASample], X_test: DataFrame)#

Evaluates the model performance on question-answering tasks.

Parameters:

samples (List[QASample]) – The list of QASample instances.
X_test (pd.DataFrame) – The test data.

Returns:

The accuracy scores for the original and perturbed samples.

Return type:

Tuple[float, float]

async static run(sample_list: List[DegradationSample], y_true: List[Any], y_pred: List[Any], **kwargs)#

Computes the accuracy score for the given data.

Parameters:

sample_list (List[MinScoreSample]) – List of samples to be transformed.
y_true (List[Any]) – True values
y_pred (List[Any]) – Predicted values

classmethod transform(test: str, y_true: List[Any], params: Dict)#

Abstract method that implements the accuracy measure.

Parameters:

y_true (List[Any]) – True values
params (Dict) – parameters for tests configuration

Returns:

The transformed data based on the implemented accuracy measure.

Return type:

List[MinScoreSample]