langtest.langtest.Harness#

Bases: object

Harness is a testing class for NLP models.

Harness class evaluates the performance of a given NLP model. Given test data is used to test the model. A report is generated with test results.

Initialize the Harness object.

Parameters:

task (str, optional) – Task for which the model is to be evaluated.
model (list | dict, optional) – Specifies the model to be evaluated. If provided as a list, each element should be a dictionary with ‘model’ and ‘hub’ keys. If provided as a dictionary, it must contain ‘model’ and ‘hub’ keys when specifying a path.
data (dict, optional) – The data to be used for evaluation.
config (str | dict, optional) – Configuration for the tests to be performed.

Raises:

ValueError – Invalid arguments.

Methods

`__init__`(task[, model, data, config, ...])	Initialize the Harness object.
`augment`(training_data, save_data_path[, ...])	Augments the data in the input file located at input_path and saves the result to output_path.
`available_tests`([test_type])	Returns a dictionary of available tests categorized by test type.
`configure`(config)	Configure the Harness with a given configuration.
`edit_testcases`(output_path, **kwargs)	Testcases are exported to a csv file to be edited.
`generate`([seed])	Generate the testcases to be used when evaluating the model.
`generated_results`()	Generates an overall report with every textcase and labelwise metrics.
`get_leaderboard`([indices, columns, ...])	Get the rank of the model on the leaderboard.
`import_edited_testcases`(input_path, **kwargs)	Testcases are imported from a csv file
`load`(save_dir, task[, model, ...])	Loads a previously saved Harness from a given configuration and dataset
`load_checkpoints`(task, model, ...)	Load checkpoints and other necessary data to recreate a Harness object.
`model_response`([category])	Retrieves the model response for a specific category.
`pass_custom_data`(file_path[, test_name, ...])	Load custom data from a JSON file and store it in a class variable.
`report`([format, save_dir, mlflow_tracking])	Generate a report of the test results.
`run`([checkpoint, batch_size, ...])	Run the tests on the model using the generated test cases.
`save`(save_dir[, include_generated_results])	Save the configuration, generated testcases and the DataFactory to be reused later.
`testcases`()	Testcases after .generate() is called
`upload_file_to_hub`(repo_type, file_path, token)	Uploads a file or a Dataset to the Hugging Face Model Hub.
`upload_folder_to_hub`(repo_type, folder_path, ...)	Uploads a folder containing a model or dataset to the Hugging Face Model Hub or Dataset Hub.

Attributes

`DEFAULTS_CONFIG`
`DEFAULTS_DATASET`
`SUPPORTED_HUBS`
`SUPPORTED_HUBS_HF_DATASET_CLASSIFICATION`
`SUPPORTED_HUBS_HF_DATASET_LLM`
`SUPPORTED_HUBS_HF_DATASET_NER`
`SUPPORTED_TASKS`

augment(training_data: dict, save_data_path: str, custom_proportions: List | Dict | None = None, export_mode: str = 'add', templates: str | List[str] | None = None, append_original: bool = False, generate_templates: bool = False, show_templates: bool = False) → Harness#

Augments the data in the input file located at input_path and saves the result to output_path.

Parameters:

training_data (dict) – A dictionary containing the input data for augmentation.
save_data_path (str) – Path to save the augmented data.
custom_proportions (Union[Dict, List]) –
export_mode (str, optional) – Determines how the samples are modified or exported. - ‘inplace’: Modifies the list of samples in place. - ‘add’: Adds new samples to the input data. - ‘transformed’: Exports only the transformed data, excluding untransformed samples. Defaults to ‘add’.
templates (Optional[Union[str, List[str]]]) –
append_original (bool, optional) – If set to True, appends the original data to the augmented data. Defaults to False.
generate_templates (bool, optional) – if set to True, generates sample templates from given ones.
show_templates (bool, optional) – if set to True, displays the used templates.

Returns:

The instance of the class calling this method.

Return type:

Harness

Raises:

ValueError – If the pass_rate or minimum_pass_rate columns have an unexpected data type.

Note

This method uses an instance of AugmentRobustness to perform the augmentation.

static available_tests(test_type: str | None = None) → Dict[str, List[str]]#

Returns a dictionary of available tests categorized by test type.

Parameters:: test_type (str, optional) – The specific test type to retrieve. Defaults to None.
Returns:: Returns a dictionary containing available tests for the specified test type and defaults to all available tests.
Return type:: dict
Raises:: ValueError – If an invalid test type is provided.

configure(config: str | dict) → dict#

Configure the Harness with a given configuration.

Parameters:: config (str | dict) – Configuration file path or dictionary for the tests to be performed.
Returns:: Loaded configuration.
Return type:: dict

edit_testcases(output_path: str, **kwargs)#

Testcases are exported to a csv file to be edited.

The edited file can be imported back to the harness

Parameters:: output_path (str) – path to save the testcases to

generate(seed: int | None = None) → Harness#

Generate the testcases to be used when evaluating the model.

The generated testcases are stored in _testcases attribute.

generated_results() → DataFrame | None#

Generates an overall report with every textcase and labelwise metrics.

Returns:: Generated dataframe.
Return type:: pd.DataFrame

get_leaderboard(indices=[], columns=[], category=False, split_wise=False, test_wise=False, rank_by: str | list = 'Avg', *args, **kwargs)#: Get the rank of the model on the leaderboard.

import_edited_testcases(input_path: str, **kwargs)#

Testcases are imported from a csv file

Parameters:: input_path (str) – location of the file to load

classmethod load(save_dir: str, task: str, model: list | dict | None = None, load_testcases: bool = False, load_model_response: bool = False) → Harness#

Loads a previously saved Harness from a given configuration and dataset

Parameters:

save_dir (str) – path to folder containing all the needed files to load an saved Harness
task (str) – task for which the model is to be evaluated.
model (Union[list, dict], optional) – Specifies the model to be evaluated. If provided as a list, each element should be a dictionary with ‘model’ and ‘hub’ keys. If provided as a dictionary, it must contain ‘model’ and ‘hub’ keys when specifying a path.
hub (str, optional) – model hub to load from the path. Required if path is passed as ‘model’.

Returns:

Harness loaded from from a previous configuration along with the new model to evaluate

Return type:

Harness

classmethod load_checkpoints(task, model, save_checkpoints_dir: str) → Harness#

Load checkpoints and other necessary data to recreate a Harness object.

Parameters:

task – The task for which the model was tested.
model – The model or models used for testing.
save_checkpoints_dir (str) – Directory containing saved checkpoints and data.

Returns:

A Harness object reconstructed with loaded checkpoints and data.

Return type:

Harness

Raises:

OSError – Raised if necessary files (config.yaml, data.pkl) are missing in the checkpoint directory.

model_response(category: str | None = None)#

Retrieves the model response for a specific category.

Parameters:

category (str) – The category for which the model response is requested. It should be one of the supported categories: “accuracy” or “fairness”.

Returns:

A DataFrame containing the model response data with columns including ‘gender’, ‘original’,: ’original_question’, ‘original_context’, ‘options’, ‘expected_results’, and ‘actual_results’. If the model response is empty or None, returns an empty DataFrame.

Return type:

pd.DataFrame

Raises:

ValueError – If the category is None or not one of the supported categories.

pass_custom_data(file_path: str, test_name: str | None = None, task: str | None = None, append: bool = False) → None#

Load custom data from a JSON file and store it in a class variable.

Parameters:

file_path (str) – Path to the JSON file.
test_name (str, optional) – Name parameter. Defaults to None.
task (str, optional) – Task type. Either “bias” or “representation”. Defaults to None.
append (bool, optional) – Whether to append the data or overwrite it. Defaults to False.

report(format: str = 'dataframe', save_dir: str | None = None, mlflow_tracking: bool = False) → DataFrame#

Generate a report of the test results.

Parameters:

format (str) – format in which to save the report
save_dir (str) – name of the directory to save the file

Returns:

DataFrame containing the results of the tests.

Return type:

pd.DataFrame

run(checkpoint: bool = False, batch_size=500, save_checkpoints_dir: str = 'checkpoints') → Harness#

Run the tests on the model using the generated test cases.

Parameters:

checkpoint (bool) – If True, enable checkpointing to save intermediate results.
batch_size (int) – Batch size for dividing test cases into batches.
save_checkpoints_dir (str) – Directory to save checkpoints and intermediate results.

Returns:

The updated Harness object with test results stored in generated_results attribute.

Return type:

Harness

Raises:

RuntimeError – Raised if test cases are not provided (None).

save(save_dir: str, include_generated_results: bool = False) → None#

Save the configuration, generated testcases and the DataFactory to be reused later.

Parameters:: save_dir (str) – path to folder to save the different files

Returns:

testcases() → DataFrame#

Testcases after .generate() is called

Returns:: testcases formatted into a pd.DataFrame
Return type:: pd.DataFrame

upload_file_to_hub(repo_type: str, file_path: str, token: str, exist_ok: bool = False, split: str = 'train')#

Uploads a file or a Dataset to the Hugging Face Model Hub.

Parameters:

repo_name (str) – The name of the repository in the format ‘username/repository’.
repo_type (str) – The type of the repository, e.g: ‘dataset’ or ‘model’.
file_path (str) – Path to the file to be uploaded.
token (str) – Hugging Face Hub authentication token.
exist_ok (bool, optional) – If True, do not raise an error if repo already exists.
split (str, optional) – The split of the dataset. Defaults to ‘train’.

Raises:

ValueError – Raised if a valid token is not provided.
ModuleNotFoundError – Raised if required packages are not installed.

Returns:

None

upload_folder_to_hub(repo_type: str, folder_path: str, token: str, model_type: str = 'huggingface', exist_ok: bool = False)#

Uploads a folder containing a model or dataset to the Hugging Face Model Hub or Dataset Hub.

This function facilitates the process of uploading a local folder containing a model or dataset to the Hugging Face Model Hub or Dataset Hub. It requires proper authentication through a valid token.

Parameters:

repo_name (str) – The name of the repository on the Hub.
repo_type (str) – The type of the repository, either “model” or “dataset”.
folder_path (str) – The local path to the folder containing the model or dataset files to be uploaded.
token (str) – The authentication token for accessing the Hugging Face Hub services.
model_type (str, optional) – The type of the model, currently supports “huggingface” and “spacy”. Defaults to “huggingface”.
exist_ok (bool, optional) – If True, do not raise an error if repo already exists.

Raises:

ValueError – If a valid token is not provided for Hugging Face Hub authentication.
ModuleNotFoundError – If required package is not installed. This package needs to be installed based on model_type (“huggingface” or “spacy”).