langtest.langtest.Harness#
- class Harness(task: str | dict, model: list | dict | None = None, data: list | dict | None = None, config: str | dict | None = None, benchmarking: dict | None = None)#
Bases:
object
Harness is a testing class for NLP models.
Harness class evaluates the performance of a given NLP model. Given test data is used to test the model. A report is generated with test results.
- __init__(task: str | dict, model: list | dict | None = None, data: list | dict | None = None, config: str | dict | None = None, benchmarking: dict | None = None)#
Initialize the Harness object.
- Parameters:
task (str, optional) – Task for which the model is to be evaluated.
model (list | dict, optional) – Specifies the model to be evaluated. If provided as a list, each element should be a dictionary with ‘model’ and ‘hub’ keys. If provided as a dictionary, it must contain ‘model’ and ‘hub’ keys when specifying a path.
data (dict, optional) – The data to be used for evaluation.
config (str | dict, optional) – Configuration for the tests to be performed.
- Raises:
ValueError – Invalid arguments.
Methods
__init__
(task[, model, data, config, ...])Initialize the Harness object.
augment
(training_data, save_data_path[, ...])Augments the data in the input file located at input_path and saves the result to output_path.
available_tests
([test_type])Returns a dictionary of available tests categorized by test type.
configure
(config)Configure the Harness with a given configuration.
edit_testcases
(output_path, **kwargs)Testcases are exported to a csv file to be edited.
generate
([seed])Generate the testcases to be used when evaluating the model.
Generates an overall report with every textcase and labelwise metrics.
get_leaderboard
([indices, columns, ...])Get the rank of the model on the leaderboard.
import_edited_testcases
(input_path, **kwargs)Testcases are imported from a csv file
load
(save_dir, task[, model, ...])Loads a previously saved Harness from a given configuration and dataset
load_checkpoints
(task, model, ...)Load checkpoints and other necessary data to recreate a Harness object.
model_response
([category])Retrieves the model response for a specific category.
pass_custom_data
(file_path[, test_name, ...])Load custom data from a JSON file and store it in a class variable.
report
([format, save_dir, mlflow_tracking])Generate a report of the test results.
run
([checkpoint, batch_size, ...])Run the tests on the model using the generated test cases.
save
(save_dir[, include_generated_results])Save the configuration, generated testcases and the DataFactory to be reused later.
Testcases after .generate() is called
upload_file_to_hub
(repo_type, file_path, token)Uploads a file or a Dataset to the Hugging Face Model Hub.
upload_folder_to_hub
(repo_type, folder_path, ...)Uploads a folder containing a model or dataset to the Hugging Face Model Hub or Dataset Hub.
Attributes
DEFAULTS_CONFIG
DEFAULTS_DATASET
SUPPORTED_HUBS
SUPPORTED_HUBS_HF_DATASET_CLASSIFICATION
SUPPORTED_HUBS_HF_DATASET_LLM
SUPPORTED_HUBS_HF_DATASET_NER
SUPPORTED_TASKS
- augment(training_data: dict, save_data_path: str, custom_proportions: List | Dict | None = None, export_mode: str = 'add', templates: str | List[str] | None = None, append_original: bool = False, generate_templates: bool = False, show_templates: bool = False) Harness #
Augments the data in the input file located at input_path and saves the result to output_path.
- Parameters:
training_data (dict) – A dictionary containing the input data for augmentation.
save_data_path (str) – Path to save the augmented data.
custom_proportions (Union[Dict, List]) –
export_mode (str, optional) – Determines how the samples are modified or exported. - ‘inplace’: Modifies the list of samples in place. - ‘add’: Adds new samples to the input data. - ‘transformed’: Exports only the transformed data, excluding untransformed samples. Defaults to ‘add’.
templates (Optional[Union[str, List[str]]]) –
append_original (bool, optional) – If set to True, appends the original data to the augmented data. Defaults to False.
generate_templates (bool, optional) – if set to True, generates sample templates from given ones.
show_templates (bool, optional) – if set to True, displays the used templates.
- Returns:
The instance of the class calling this method.
- Return type:
- Raises:
ValueError – If the pass_rate or minimum_pass_rate columns have an unexpected data type.
Note
This method uses an instance of AugmentRobustness to perform the augmentation.
- static available_tests(test_type: str | None = None) Dict[str, List[str]] #
Returns a dictionary of available tests categorized by test type.
- Parameters:
test_type (str, optional) – The specific test type to retrieve. Defaults to None.
- Returns:
Returns a dictionary containing available tests for the specified test type and defaults to all available tests.
- Return type:
dict
- Raises:
ValueError – If an invalid test type is provided.
- configure(config: str | dict) dict #
Configure the Harness with a given configuration.
- Parameters:
config (str | dict) – Configuration file path or dictionary for the tests to be performed.
- Returns:
Loaded configuration.
- Return type:
dict
- edit_testcases(output_path: str, **kwargs)#
Testcases are exported to a csv file to be edited.
The edited file can be imported back to the harness
- Parameters:
output_path (str) – path to save the testcases to
- generate(seed: int | None = None) Harness #
Generate the testcases to be used when evaluating the model.
The generated testcases are stored in _testcases attribute.
- generated_results() DataFrame | None #
Generates an overall report with every textcase and labelwise metrics.
- Returns:
Generated dataframe.
- Return type:
pd.DataFrame
- get_leaderboard(indices=[], columns=[], category=False, split_wise=False, test_wise=False, rank_by: str | list = 'Avg', *args, **kwargs)#
Get the rank of the model on the leaderboard.
- import_edited_testcases(input_path: str, **kwargs)#
Testcases are imported from a csv file
- Parameters:
input_path (str) – location of the file to load
- classmethod load(save_dir: str, task: str, model: list | dict | None = None, load_testcases: bool = False, load_model_response: bool = False) Harness #
Loads a previously saved Harness from a given configuration and dataset
- Parameters:
save_dir (str) – path to folder containing all the needed files to load an saved Harness
task (str) – task for which the model is to be evaluated.
model (Union[list, dict], optional) – Specifies the model to be evaluated. If provided as a list, each element should be a dictionary with ‘model’ and ‘hub’ keys. If provided as a dictionary, it must contain ‘model’ and ‘hub’ keys when specifying a path.
hub (str, optional) – model hub to load from the path. Required if path is passed as ‘model’.
- Returns:
Harness loaded from from a previous configuration along with the new model to evaluate
- Return type:
- classmethod load_checkpoints(task, model, save_checkpoints_dir: str) Harness #
Load checkpoints and other necessary data to recreate a Harness object.
- Parameters:
task – The task for which the model was tested.
model – The model or models used for testing.
save_checkpoints_dir (str) – Directory containing saved checkpoints and data.
- Returns:
A Harness object reconstructed with loaded checkpoints and data.
- Return type:
- Raises:
OSError – Raised if necessary files (config.yaml, data.pkl) are missing in the checkpoint directory.
- model_response(category: str | None = None)#
Retrieves the model response for a specific category.
- Parameters:
category (str) – The category for which the model response is requested. It should be one of the supported categories: “accuracy” or “fairness”.
- Returns:
- A DataFrame containing the model response data with columns including ‘gender’, ‘original’,
’original_question’, ‘original_context’, ‘options’, ‘expected_results’, and ‘actual_results’. If the model response is empty or None, returns an empty DataFrame.
- Return type:
pd.DataFrame
- Raises:
ValueError – If the category is None or not one of the supported categories.
- pass_custom_data(file_path: str, test_name: str | None = None, task: str | None = None, append: bool = False) None #
Load custom data from a JSON file and store it in a class variable.
- Parameters:
file_path (str) – Path to the JSON file.
test_name (str, optional) – Name parameter. Defaults to None.
task (str, optional) – Task type. Either “bias” or “representation”. Defaults to None.
append (bool, optional) – Whether to append the data or overwrite it. Defaults to False.
- report(format: str = 'dataframe', save_dir: str | None = None, mlflow_tracking: bool = False) DataFrame #
Generate a report of the test results.
- Parameters:
format (str) – format in which to save the report
save_dir (str) – name of the directory to save the file
- Returns:
DataFrame containing the results of the tests.
- Return type:
pd.DataFrame
- run(checkpoint: bool = False, batch_size=500, save_checkpoints_dir: str = 'checkpoints') Harness #
Run the tests on the model using the generated test cases.
- Parameters:
checkpoint (bool) – If True, enable checkpointing to save intermediate results.
batch_size (int) – Batch size for dividing test cases into batches.
save_checkpoints_dir (str) – Directory to save checkpoints and intermediate results.
- Returns:
The updated Harness object with test results stored in generated_results attribute.
- Return type:
- Raises:
RuntimeError – Raised if test cases are not provided (None).
- save(save_dir: str, include_generated_results: bool = False) None #
Save the configuration, generated testcases and the DataFactory to be reused later.
- Parameters:
save_dir (str) – path to folder to save the different files
Returns:
- testcases() DataFrame #
Testcases after .generate() is called
- Returns:
testcases formatted into a pd.DataFrame
- Return type:
pd.DataFrame
- upload_file_to_hub(repo_type: str, file_path: str, token: str, exist_ok: bool = False, split: str = 'train')#
Uploads a file or a Dataset to the Hugging Face Model Hub.
- Parameters:
repo_name (str) – The name of the repository in the format ‘username/repository’.
repo_type (str) – The type of the repository, e.g: ‘dataset’ or ‘model’.
file_path (str) – Path to the file to be uploaded.
token (str) – Hugging Face Hub authentication token.
exist_ok (bool, optional) – If True, do not raise an error if repo already exists.
split (str, optional) – The split of the dataset. Defaults to ‘train’.
- Raises:
ValueError – Raised if a valid token is not provided.
ModuleNotFoundError – Raised if required packages are not installed.
- Returns:
None
- upload_folder_to_hub(repo_type: str, folder_path: str, token: str, model_type: str = 'huggingface', exist_ok: bool = False)#
Uploads a folder containing a model or dataset to the Hugging Face Model Hub or Dataset Hub.
This function facilitates the process of uploading a local folder containing a model or dataset to the Hugging Face Model Hub or Dataset Hub. It requires proper authentication through a valid token.
- Parameters:
repo_name (str) – The name of the repository on the Hub.
repo_type (str) – The type of the repository, either “model” or “dataset”.
folder_path (str) – The local path to the folder containing the model or dataset files to be uploaded.
token (str) – The authentication token for accessing the Hugging Face Hub services.
model_type (str, optional) – The type of the model, currently supports “huggingface” and “spacy”. Defaults to “huggingface”.
exist_ok (bool, optional) – If True, do not raise an error if repo already exists.
- Raises:
ValueError – If a valid token is not provided for Hugging Face Hub authentication.
ModuleNotFoundError – If required package is not installed. This package needs to be installed based on model_type (“huggingface” or “spacy”).