langtest.transform.accuracy.DegradationAnalysis#

class DegradationAnalysis#

Bases: BaseAccuracy

Evaluation class for model performance degradation analysis.

alias_name#

Alias names for the evaluation class, should include “degradation_analysis”.

Type:

List[str]

supported_tasks#

Supported tasks for evaluation,

Type:

List[str]

Methods:

__init__()#

Methods

__init__()

async_run(sample_list, y_true, y_pred, **kwargs)

Creates a task to run the accuracy measure.

preprocess(y_true, y_pred)

Preprocesses the input data for the degradation analysis.

qa_evaluation(samples, X_test)

Evaluates the model performance on question-answering tasks.

run(sample_list, y_true, y_pred, **kwargs)

Computes the accuracy score for the given data.

show_results()

transform(test, y_true, params)

Abstract method that implements the accuracy measure.

Attributes

alias_name

result_data

supported_tasks

test_types

class TestConfig#

Bases: dict

clear() None.  Remove all items from D.#
copy() a shallow copy of D#
fromkeys(value=None, /)#

Create a new dictionary with keys from iterable and values set to value.

get(key, default=None, /)#

Return the value for key if key is in the dictionary, else default.

items() a set-like object providing a view on D's items#
keys() a set-like object providing a view on D's keys#
pop(k[, d]) v, remove specified key and return the corresponding value.#

If the key is not found, return the default if given; otherwise, raise a KeyError.

popitem()#

Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.

setdefault(key, default=None, /)#

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

update([E, ]**F) None.  Update D from dict/iterable E and F.#

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

values() an object providing a view on D's values#
async classmethod async_run(sample_list: List[MinScoreSample], y_true: List[Any], y_pred: List[Any], **kwargs)#

Creates a task to run the accuracy measure.

Parameters:
  • sample_list (List[MinScoreSample]) – List of samples to be transformed.

  • y_true (List[Any]) – True values

  • y_pred (List[Any]) – Predicted values

static preprocess(y_true: list | Series, y_pred: list | Series)#

Preprocesses the input data for the degradation analysis.

Parameters:
  • y_true (List) – The true labels.

  • y_pred (List) – The predicted labels.

Returns:

The preprocessed true and predicted labels.

Return type:

y_true, y_pred (Tuple[pd.Series, pd.Series])

static qa_evaluation(samples: List[QASample], X_test: DataFrame)#

Evaluates the model performance on question-answering tasks.

Parameters:
  • samples (List[QASample]) – The list of QASample instances.

  • X_test (pd.DataFrame) – The test data.

Returns:

The accuracy scores for the original and perturbed samples.

Return type:

Tuple[float, float]

async static run(sample_list: List[DegradationSample], y_true: List[Any], y_pred: List[Any], **kwargs)#

Computes the accuracy score for the given data.

Parameters:
  • sample_list (List[MinScoreSample]) – List of samples to be transformed.

  • y_true (List[Any]) – True values

  • y_pred (List[Any]) – Predicted values

classmethod transform(test: str, y_true: List[Any], params: Dict)#

Abstract method that implements the accuracy measure.

Parameters:
  • y_true (List[Any]) – True values

  • params (Dict) – parameters for tests configuration

Returns:

The transformed data based on the implemented accuracy measure.

Return type:

List[MinScoreSample]